Load testing the Amazon API Gateway

Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale.

This post helps Flood IO customers get up and running with load testing for their own API gateway.

Please note: the default limit for API Gateway is 500 RPS as we soon discover in this post. For more info about rate limits including how to raise this see here.

Load Test API

After creating a gateway you will add resources to that gateway in your AWS console. We've set up three methods and deployed it here.

The HTTP, DELETE, POST and GET methods act as a simple proxy to our backend load testing service which we host for the purpose of demonstration. Each call returns a JSON payload with some information about the service.

~ ยป curl -s -X GET https://7wam4h0629.execute-api.us-west-2.amazonaws.com/production/ | jq -r .
  "elapsed": "1.001 seconds",
  "status": "OK",
  "connections_active": 4,
  "connections_waiting": 3

Creating a Load Test

We're using our favoured Ruby-JMeter tool to write our tests in an expressive DSL which simplifies the process of crafting a load test.

You can see the full script here but the takeaway from that is we're exercising our 3 resource calls on the API itself.

As our API is protected by an API key we can specify that as an additional header in each request. We can also specify the Accept header value for application/json

header [  
    { name: 'x-api-key', value: 'BguwPvwIHD4vTTq2MHL3HTWKibe4LHC9GZYV4FUi' },
    { name: 'Accept', value: 'application/json' },

We can also simulate request User-Agent headers as if they're coming from an iPhone with a handy helper like this.

with_user_agent :iphone  

Our first request is a HTTP GET to the entry point of our API. We make that request and also assert that the response payload contains the JSON value OK for the status key.

get name: 'entry point', url: '/production' do  
  assert json: '.status', value: 'OK'

Our second request is a HTTP POST to the create session endpoint. We're passing in username and password parameters as part of the request. In the response we're extracting the value from the connections_active key which we'll use in our third request.

post name: 'create session', url: '/production',  
  fill_in: {
    username: 'MrRobot',
    password: 4141414141
  } do
    extract json: '.connections_active', name: 'connections_active'

In our third request we make a HTTP DELETE call to the destroy session endpoint. We pass it the connections variable we extracted from the prior response, and we're also specifying a duration assertion of 5 seconds, so we can mark the request as failed if the response takes longer than that.

delete name: 'destroy session', url: '/production?connections=${connections_active_1}' do  
  duration_assertion duration: 5_000

We use the flood method to trigger the load test on Flood IO specifying the ID of a 10 node grid we already had running for the demonstration.

flood ENV['FLOOD_API_TOKEN'], {  
  privacy: 'public',
  name: 'Shakeout Loadtest API',
  grid: 'Wvf78fVsSWvTekNftt3ZsQ'

Analysing the Results

In just a couple of minutes you'll be analysing your results on Flood IO.

So why the big spike in Response Time (blue line) before it drops back down?

We can see RT climbs to around 4 seconds before it backs off, we can also see an increase in overall transaction throughput (green line).

Looking at individual transaction statistics and traces the answer is evident.

The API has a high percentage of errors at about the same time. Most of these errors are HTTP 429 response codes which we capture sample response headers for you to help debug.

We can see that this is a rate limit type of error with the header x-amzn-ErrorType : TooManyRequestsException. Our API has been throttled to an arbitrary 500 requests per second and we soon receive an email notification warning us this has occurred on our account from AWS.

Our CloudWatch dashboard is also helpful in tracking down issues, we notice some HTTP 5XX response codes are present in the test.

These look like HTTP 504 Gateway Timeouts on CloudFront in Flood IO

Next Steps

Following this first round of testing it's sensible to request larger rate limits if your production throughput is going to be an order of magnitude larger than 500 requests per second.

We used a grid of 10 nodes but you can set up multiple grids in multiple regions to have more sessions coming from unique IPs which will reduce the likelihood of rate limiting from single load generation origins.

If you'd like to conduct higher volume load testing with Flood IO please contact support@flood.io and we can help via our partnership with AWS to prepare and scale your load tests.

Start Load Testing Now with Flood IO