Monitoring EC2 network utilization with CloudWatch metrics and alarms

Andreas Wittig – 02 May 2019

Are you monitoring the network utilization of your EC2 instances? Why not? The network is one of the rare resources that will limit your workload’s maximum throughput:

  1. CPU
  2. Memory
  3. Network
  4. Disk
  5. GPU

I’ve debugged performance problems in a lot of infrastructures during the last 12 months. In most of the scenarios, the network capabilities of EC2 or RDS instances was the bottleneck causing troubles. That is why I want to share with you how to monitor the network utilization of EC2 instances.

Monitoring the Network Utilization of EC2

To monitor the networking utilization of an EC2 instance, we need to solve two challenges.

Challenge #1: What’s the network performance of my EC2 instance?

To be able to monitor the network utilization of your EC2 instance, you need to be able to answer the following question. What are the baseline and maximum network throughput of your EC2 instance? Unfortunately, AWS does not provide accurate information about the network performance for most instance types. For example, AWS promises Moderate network performance for a t2.xlarge instance or Up to 10 Gbps for a m5.large instance.

This provided information is not satisfactory. That is why I ran a network performance benchmark and published the results at EC2 Network Performance Cheat Sheet. The results are astonishing.

An m5.large instance provides 10.04 Gbit/s for a few minutes only. Afterward, the baseline network performance for an m5.large instance is around 0.74 Gbit/s. The results for other instance types look similar.

The EC2 Network Performance Cheat Sheet gives you an estimation for the baseline and maximum network throughput of your EC2 instance which allows you to define a threshold for monitoring.

Fine, we have solved the challenge #1.

Challenge #2: How to combine multiple CloudWatch metrics?

Each EC2 instance reports various metrics to CloudWatch. The metrics NetworkIn and NetworkOut collect the number of bytes received on all network interfaces by the instance. However, to calculate the network utilization of your EC2 instance, you need to add up both metrics.

Pick one of the following options to create a CloudWatch alarm monitoring the total network utilization of your EC2 instance:

  1. Use the AWS Management Console to create the CloudWatch alarm manually.
  2. Use CloudFormation to create the CloudWatch alarm with Infrastructure as Code.
  3. Let marbot create the CloudWatch alarm for you.

Monitoring Assistant
Ask marbot to monitor EC2 instances for you and receive alerts in Slack or Microsoft Teams.

  1. Add marbot to Slack or Microsoft Teams.
  2. Invite marbot to a channel.
  3. Follow the setup wizard.
It couldn't be easier!

AWS Management Console

Log into the AWS Management Console and go to CloudWatch. Select Alarms from the sub-navigation and click the Create Alarm button. The wizard shown in the following screenshot appears. Click the Select metric button.

Step 1: Creating CloudWatch Alarm monitoring Network Utilization

Search for the NetworkIn and NetworkOut metrics of your EC2 instance and select them both. After doing so, select the Graphed metrics tab.

  1. Click Add a math expression.
  2. Type in id out for the NetworkOut metric and in for the NetworkIn metric.
  3. Type in the expression (in+out)/300/1000/1000/1000*8.

Let me quickly explain the math expression (in+out)/300/1000/1000/1000*8:

  • Add up in and out.
  • Divide by 300 to convert from 5 minutes to 1 second.
  • Divide by 1000/1000/1000*8 to convert Byte in Gbit.

Make sure you have only selected the math expression before you click the Select metric button.

Step 2: Creating CloudWatch Alarm monitoring Network Utilization

Finally, set up the alarm.

  1. Type in a name and description.
  2. Define the threshold. For example, 80% of the baseline network performance listed in the EC2 Network Performance Cheat Sheet.
  3. To avoid alarms from short network utilization spikes configure 8 out of 12 datapoints. Which translates to 45 minutes within an hour.

Click the Create Alarm button.

Step 3: Creating CloudWatch Alarm monitoring Network Utilization

Fine, you have set up a CloudWatch alarm monitoring the network utilization of your EC2 instance.

Instead of going through this process manually, you could create CloudWatch alarms in an automated way with the help of CloudFormation as well.

CloudFormation

The following snippet shows a CloudFormation template setting up a CloudWatch alarm monitoring the network utilization of an EC2 instance.

You need to modify the Threshold. I suggest 80% of the network baseline performance as listed in the EC2 Network Performance Cheat Sheet.

AWSTemplateFormatVersion: '2010-09-09'
Parameters:
Topic:
Type: String
AutoScalingGroupName:
Type: String
Resources:
NetworkUtilizationTooHighAlarm:
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmDescription: 'EC2 High Network Utilization'
Metrics:
- Id: summary
Label: EC2 Utilization
Expression: IF(cpu > 80, 1, 0) OR IF(memory > 80, 1, 0) OR IF(network > 80, 1, 0)
ReturnData: true
- Id: cpu
MetricStat:
Metric:
Namespace: AWS/EC2
MetricName: CPUUtilization
Dimensions:
- Name: AutoScalingGroupName
Value: !Ref AutoScalingGroupName
Stat: Maximum
Period: 300
ReturnData: false
- Id: memory
MetricStat:
Metric:
Namespace: CWAgent
MetricName: mem_used_percent
Dimensions:
- Name: AutoScalingGroupName
Value: !Ref AutoScalingGroupName
Stat: Maximum
Period: 300
ReturnData: false
- Id: network
Label: Network Utilization
Expression: "((network_in+network_out)/300/1000/1000/1000*8)/0.75*100"
ReturnData: false
- Id: network_in
MetricStat:
Metric:
Namespace: AWS/EC2
MetricName: NetworkIn
Dimensions:
- Name: AutoScalingGroupName
Value: !Ref AutoScalingGroupName
Stat: Sum
Period: 300
ReturnData: false
- Id: network_out
MetricStat:
Metric:
Namespace: AWS/EC2
MetricName: NetworkOut
Dimensions:
- Name: AutoScalingGroupName
Value: !Ref AutoScalingGroupName
Stat: Sum
Period: 300
ReturnData: false
ComparisonOperator: GreaterThanOrEqualsThreshold
EvaluationPeriods: 1
DatapointsToAlarm: 1
Threshold: '1'

ChatOps

Are you looking for an, even more, simpler way to monitor the network utilization of your EC2 instance?

Monitoring Assistant
Ask marbot to monitor EC2 instances for you and receive alerts in Slack or Microsoft Teams.

  1. Add marbot to Slack or Microsoft Teams.
  2. Invite marbot to a channel.
  3. Follow the setup wizard.
It couldn't be easier!

It couldn’t be easier!

Summary

Monitoring the network utilization of your EC2 instance is essential, as the network is a limited resource. The instance type affects maximum and baseline performance. Your EC2 instance might not be able to provide the maximum network performance for more than 5 to 30 minutes. Therefore, use the baseline performance to define the alarm threshold. Use EC2 Network Performance Cheat Sheet to get an estimation of the network performance of your EC2 instance.

Andreas Wittig

Andreas Wittig

Consultant focusing on Amazon Web Services (AWS). Entrepreneur building marbot.io. Author of Amazon Web Services in Action, Rapid Docker on AWS, and cloudonaut.io.

You can contact me via Email, Twitter, and LinkedIn.

Published on

marbot teaser

Chatbot for AWS Monitoring

Configure monitoring for Amazon Web Services: CloudWatch, EC2, RDS, EB, Lambda, and more. Receive and manage alerts via Slack. Solve incidents as a team.

Slack
Add to Slack
Microsoft Teams
Add to Teams