Wednesday, 11 March 2015

Benchmark Python Web production stack: Nginx with uWSGI, Meinheld and API-Hour

Disclaimer: If you have some bias and/or dislike AsyncIO, please read my previous blog post before starting a war.


Tip: If you don't have the time to read the text, scroll down to see graphics.


Summary of previous episodes



After the publication of “Macro-benchmark with Django, Flask and AsyncIO (aiohttp.web+API-Hour)”, I received a lot of remarks, this is a synthesis:


  1. It’s impossible, you change the numbers/you don’t measure the right values/…: Come on people, if you don’t believe me, test by yourself: I’ve published as many pieces of information as possible to be reproducible by others in API-Hour repository. Don’t hesitate to ask me if you have issues to test by yourself.


    Nginx is configured to avoid being a bottleneck for Python daemons.
  2. Changing kernel parameters is a cheat: No, it isn't a cheat, most production applications recommend to do that, not only for benchmarks. Example with:
    1. PostgreSQL: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server shared_buffers config (BTW, I forgot to push my kernel settings for postgresql, it's now in the repository)
    2. Nginx: http://wiki.nginx.org/SSL-Offloader section preparation


  1. Daemons were all in debug mode: All daemons were impacted, I've disabled that. I've relaunched localhost benchmark on /agents endpoint, I get almost the same values, certainly because I've already disabled logging globally in my benchmarks.
  2. You should disable middlewares in Django: On production, you would keep them, but nevertheless, I’ve disabled them to be fair with other frameworks where I don’t use middlewares.


  1. wrk/wrk2 aren't good tool to benchmark HTTP, it can hit too hard: It's the goal of a benchmark to hit as hard as possible, isn't it ? FYI, almost all serious benchmarks reports on the Web use wrk. As frameworks performance in general is increasing, the tools used to challenge them have to hit stronger to bring out the differences.


  1. Keep-alive isn’t enabled for Flask or Django / Nobody uses Flask or Django alone in prod, you must use Nginx and uWSGI/Meinheld: No problems: you’ll find below a new serie of benchmarks based on these remarks.


In this article, I’ll test three scenarios:
  1. A resource limit use case: 4000 requests/s with wrk2 during 5 minutes
  2. A standard use case: 50 simultaneous connections with wrk during 5 minutes
  3. A slow use case: 10 requests/s with wrk2 during 30 seconds


To be closer with a production scenario, I’ll test only via the network and with agents list endpoint that uses a database connection, as described in my previous article.


I test 6 architectures:
  1. Django+Meinheld+Nginx
  2. Django+uWSGI+Nginx
  3. Django+Meinheld+Nginx
  4. Django+uWSGI+Nginx
  5. API-Hour+Nginx
  6. API-Hour without Nginx


As you can see, all are behind a Nginx server, except the last one, in order to serve as a control.


As usual, you can find config files in API-Hour repository: https://github.com/Eyepea/API-Hour/tree/master/benchmarks


Round 4: 4000 requests/s with wrk2



Requests by seconds
(Higher is better)
Latency (s)
(Lower is better)
As you can see, API-Hour without Nginx handles less requests by second, moreover:
Errors
(Lower is better)
API-Hour without Nginx has a lot of latency compared to others solutions, but see the explanation below.
At first sight, you seem to handle more requests with Nginx, and you even have less latency. This last point is intriguing: how, could you have less latency with API-Hour+NGINX, than with API-Hour alone ?
After careful examination, I’ve seen a lot of 404 responses on the wire. My understanding is that NGINX will return a 404 if the framework behind it doesn’t answer fast enough. A 404 is a response for wrk, and a response that comes quickly (the NGINX timeout to the backend is short). Wrk therefore can immediately launch another request. Hence you see more requests and less latency with NGINX, but (contrary the the control API-Hour W/o NGINX) many requests have in fact not been answered properly.


Array results




Requests/s
Errors
Avg Latency (s)
Django+Meinheld
3992.68
1031238
0.121
Django+uWSGI
3991.96
1029213
0.072
Flask+Meinheld
3991.43
1024192
0.111
Flask+uWSGI
3994.09
1021953
0.066
API-Hour
3994.96
312600
0.043
API-Hour w/o Nginx
3646.15
0
9.74


To avoid these artefacts, the round 5 doesn’t try to “force-feed” frameworks with requests. instead, it tries to make as many requests as possible (launching one as soon as the previous was in answered), on 50 parallel connections. Now you the frameworks work properly on each request, so the error-rate is zero for all of them. Again, to be fair, I decreased the number of simultaneous connections until all framework had a zero error-rate, and I did not do any lower (thus this is the maximum # of connections for which all error-rates are zero).

Round 5: 50 simultaneous connections with wrk



Requests by seconds
(Higher is better)
Errors
(Lower is better)

Latency (s)
(Lower is better)


Array results

Requests/s
Errors
Avg Latency (s)
Django+Meinheld
603.07
0
0.07977
Django+uWSGI
603.38
0
0.07958
Flask+Meinheld
623.85
0
0.07705
Flask+uWSGI
628.58
0
0.07655
API-Hour
3033.17
0
0.0161
API-Hour w/o Nginx
3610.96
0
0.01398


(Bonus) Round 6: 10 requests/s with wrk2 during 30 seconds



Not really interesting on a production environment, this test is only to validate that AsyncIO is interesting even with a small load.
With this round, I’ve no errors and all frameworks handle 10 requests/s.


Latency (s)
(Lower is better)
Array results

Requests/s
Errors
Avg Latency (s)
Django+Meinheld
10
0
0.02142
Django+uWSGI
10
0
0.02083
Flask+Meinheld
10
0
0.01912
Flask+uWSGI
10
0
0.01896
API-Hour
10
0
0.00783
API-Hour w/o Nginx
10
0
0.00855


Conclusion

As demonstrated in my previous benchmark, API-Hour rocks.
Meinheld/uWSGI+Nginx help to increase performances and reduce error rate for Python sync frameworks, but the internal architecture of your application has more impact on performances than change an external component.

For API-Hour, it isn’t a good idea to have Nginx as a reverse proxy, because you add latency and you reduce performances, as you can see in round 5.
With API-Hour, you can use a subdomain to serve your static files with Nginx.
If a subdomain is not an option, you can also route the static traffic with a HAProxy and a specific URL (a folder), but this would probably impact the latency a bit as HAProxy has to open all packets to get the URL, and  apply a regexp to know where to route it.


As a side note, I'd say that making technically sound and fair benchmarks is not easy: You don't just connect some tool on all tested frameworks, and present the figures.
Moreover, the interpretation of the results implies you precisely understand what you’re measuring. As you have seen above, some nasty side-effects can crawl in and give weird artifacts, let alone create unfair results for some. In many occasions, I had to pull out my wireshark and dissect what was on the wire to understand what was really going on.


I tried my best in these two articles to make these testings as error-free, honest and fair as possible, also integrating some clever (non-trolling) remarks I received from some of you. Again, the method is explained and all sources (as well as my help if needed) are publicly available if you think other factors or settings should be taken into account.
I'd welcome any relevant remarks or proven affirmations, just like I'll dismiss any ungrounded, troll-style, non-constructive ones.