Monitor VPC NAT gateways with CloudWatch metrics and alarms
Michael Wittig – 15 Aug 2022
Many VPC designs make use of public and private subnets. You need a NAT gateway to communicate from a private subnet with the Internet.
A VPC NAT gateway is a finite resource that can be exhausted. That’s why you need to add monitoring to be alerted if the NAT gateway gets a bottleneck.
Each NAT gateway sends metrics to CloudWatch that we can monitor with CloudWatch alarms. We recommend creating alarms for the following metrics:
ErrorPortAllocation: The number of times the NAT gateway could not allocate a source port.
PacketsDropCount: The number of packets dropped by the NAT gateway.
Unfortunately, NAT gateways do not report a single metric on the throughput utilization of bandwidth and packets. The maximum bandwidth is 100 Gbit/second and 10,000,000 packets/second. Luckily, we can calculate throughput by using CloudWatch metric math.
To calculate the bandwidth utilization, we use the following metrics:
And the following expressions:
|bandwidth||(in1+in2+out1+out2)/60*8/1000/1000/1000||Bytes/min to Gbit/s|
|utilization||bandwidth/100*100||to %; 100 Gbit/s is the hard limit|
CloudWatch metric math sounds complicated? We have you covered! Monitor NAT gateways and receive alerts in Slack or Microsoft Teams!
Take your AWS monitoring to a new level! Chatbot for AWS Monitoring: Configure monitoring, escalate alerts, solve incidents.