Tuesday, 26 January 2016

Python-FOSDEM devroom+dinner 2016

If you come at Brussels for the FOSDEM, join us this Saturday.

FOSDEM is a free event for software developers to meet, share ideas and collaborate.
Every year, 6500+ developers of free and open source software from all over the world gather at the event in Brussels.

We also have a stand with special T-shirts in H building.

And to finish this great day, we organize a dinner.
If you are interested in, please subscribe here:

Sunday, 19 July 2015

EuroPython 2015 schedule for AsyncIO curious/newcomers

Hello everybody,

If you are at EuroPython 2015 and you've planned to use AsyncIO, or you are just curious, you should be interested in:

Moreover, depends on people who will be present during sprint code, we should sprint on AsyncIO and/or aiohttp. A sprint code on Panoramisk (Asterisk binding for AsyncIO) is planned.

Have a nice EuroPython 2015 ;-)

Monday, 27 April 2015

TechEmpower FrameworkBenchmarks round 10: The Python results

TechEmpower FrameworkBenchmarks round 10 results are available.

What is TechEmpower FrameworkBenchmarks ?

As explained on their website:
This is a performance comparison of many web application frameworks executing fundamental tasks such as JSON serialization, database access, and server-side template composition. Each framework is operating in a realistic production configuration. Results are captured on Amazon EC2 and on physical hardware. The test implementations are largely community-contributed and all source is available at the GitHub repository.
You can read the technical description of each test on their website.

Goal of this blogpost

I'm concentrated on the results for Python ecosystem, and especially for AsyncIO+aiohttp.web+API-Hour, because it's the first time that an independent organization benchmarks publicly AsyncIO on several use cases with a macro-benchmark that implies a complete system stack (Network+Linux+Databases), closer to a production environment.

/!\ /!\ /!\ Warning - Warning - Warning /!\ /!\ /!\

As explained several times, do not follow any benchmarks results to immediately decide on the architecture of your applications, especially micro-benchmarks:
  1. You must understand a minimum what is tested to be sure you are in the same use case as the benchmark.
  2. Benchmark by yourself with your needs: It will help you to better understand the interactions between layers and maybe to discover that your bias about efficient architecture are false.
  3. Benchmark your complete stack: Sometimes, you can have bigger bottlenecks effects in your complete stack that in your Python code: you can optimize a big loop with Cython, if your database schema contains a lot of data and sucks, you'll wait more your database than in your loop.

The good results

Note: Click on the images to see directly the results on the FrameworkBenchmarks website.

IMPORTANT: FrameworkBenchmarks isn't only for Python, you can compare Web frameworks in other programming languages. Check the main website: https://www.techempower.com/benchmarks/

With no surprises, databases benchmarks are good for AsyncIO+aiohttp.web+API-Hour:

Data updates


Fortunes


Multiple queries



The average result

With less interactions with database, API-Hour results are in the average of others Web frameworks.

Single query


The bad results

The benchmarks without databases interactions are bad.

JSON serialization


Plaintext

Why theses benchmarks are less good that them provided on my blog ?

The difference of results between my DB benchmarks and DB benchmarks in FrameworkBenchmarks can be explained because, in my benchmarks, I use a bigger payload and a SQL query more complex that in FrameworkBenchmarks.
I did this because, as explained above, it is important to me to be as close as a production scenario when benchmarking. A realistic SQL query takes more time than a trivial SELECT, and the framework therefore has a serious advantage if it can manage several other things while waiting for the query result. Hence AsyncIO can better show its power there.
For round 11, TechEmpower should add a more realistic test, with a bigger payload and more DB interactions.

Conclusion

Nevertheless, for a first time participation, I'm really happy of the results. We have now one month to improve results for round 11, if you want to help, send me an e-mail.

My goal in theses benchmarks isn't only to find bottlenecks in the Python stack I’m using everyday, but also to make sure that this Python stack continues to be efficient over the time.
Most stacks become bloatwares over the time.

Future improvements

To increase a bit performance in the slowest test, the HTTP pipelining in aiohttp should be improved, but as it stands today, this might have positive effects in some cases (such as multiplexed HTTP requests), while bringing negative side effect in some other cases (such as old-fashioned HTTP  requests or keep-alive requests).

But even with a better HTTP pipelining support, this test won’t probably give very good results, as pure JSON provider.

I'll reorganize tests to include more changes like several event loops and more databases.

For the next round, several improvements are on the road:
  1. aiohttp new release has several speed improvements.
  2. An alternative event loop like aiouv might help, even if it isn't really the case for now, some improvements are necessary.
  3. A potential AsyncIO booster written in C is discussed on Python mailing-list
  4. AsyncIO support in the next Cypthon release
  5. During Python language submit 2015, several presentations had some parts to improve efficiency of Python:
    1. Making Python 3 more attractive
    2. Python 3 adoption
    3. Atomicity of operations
    4. PyParallel
  6. PyPy3: with a patched version of AsyncIO and Py3.3 branch of PyPy3, it's possible to execute some FrameworkBenchmarks tests. For now, PyPy3 is slower than CPython, nevertheless, you must understand that the goal of PyPy developers is to have a working version of PyPy3 that support Python 3.3, not to improve efficiency. You can donate to the PyPy3 project.
However, don't expect a magic change in the results for the next round, the efficiency problematic in Python ecosystem isn't new.
Nevertheless, to my knowledge, I’ve never seen as many solutions to boost code in other interpreted languages, except in Python, I hope that more solutions will emerge in the future.

Special thanks

Even if I was pleased to get some attention for these benchmarks, and even if sometimes I had some “viril exchanges”/ping-pong "fights" (another blogpost should be published) on mailing-lists or during PyCONs, I won’t forget that theses results aren't possible without Python community.

Thanks a lot for everybody for your help, especially:
  1. Ukrainian (+Russian ?) Dream Team: Sorry to reduce you to your nationality/language, nevertheless, you're over-represented in AsyncIO community, especially in aiohttp ;-): aiohttp/aio-libs, they helped a lot to boost performances and I received numerous excellent advices from Andrew Svetlov, Nikolay Kim, Alexander Shorin, Alexey Popravka, Yury Selivanov and all others.
  2. Victor Stinner for AsyncIO improvements, help, and benchmarks tests.
  3. Antoine Pitrou for CPython optimizations like --with-computed-gotos and to have accepted AsyncIO PEP.
  4. Inada Naoki for all tips and challenged me in my benchmarks.
  5. Benoit Chesneau for his help to integrate Gunicorn in API-Hour.
  6. Stéphane Wirtel and Fabien Benetou to help me to promote Python in Belgium and for their support.
  7. All people I've forgotten in my list, especially AsyncIO libraries maintainers like Saúl Ibarra Corretgé (aiodns, aiouv), Jonathan Slenders (asyncio_redis), Gael Pasgrimaud (Panoramisk, irc3) Ron Frederick (asyncssh) and Tin Tvrtković (aiofiles, pytest-asyncio).
  8. Guido Van Rossum, because to maintain a community having the size of Python’s ecosystem, political skills are as important as technical skills. And also to have implemented and pushed AsyncIO on the Python scene.
  9. My Tech team at ALLOcloud/Eyepea, especially Nicolas Stein and Sylvain Francis, to challenge me everyday and to be the right place for technical innovations in Europe: Almost no management, no “reserved area” in our Tech team (Everybody must have some skills of others: Dev team must have the hands in sysadmin infrastructure, Telco team must be capable to edit source code from Dev team) and the Tech team's trust in me: I've changed their toolbox for AsyncIO, and even if we are a small team, in retrospect, they were very open to change their habits and improve the toolbox, instead of blocking innovations just to stay in their comfort zone. A working environment like that is precious and should be more mainstream.
  10. Last by not least, my family for their unconditional support.

I'm proud to be an atom inside Python community ;-)

And don't forget

IT world has as bias that Python is slow because you (as member of the Python community) believe that it is slow: With absolute values on microbenchmarks, it might be true. But, with some tips & tricks, it can easily be more than fast enough for most production use cases in companies AND with Python, you keep the simplicity of coding.
Be honest with yourself: does it make sense to try to compare yourself with the very large player such as Google or Facebook ? Most IT companies are far from having the same constraints as them, and the same human resources to address them. The main thing everyone has in common with them is “time to market”. Therefore, the simplicity of coding your business logic has much more weight in your case.
Developer speed is really more important than programs performance, but unfortunately, there are no benchmarks to measure this ;-).

Wednesday, 11 March 2015

Benchmark Python Web production stack: Nginx with uWSGI, Meinheld and API-Hour

Disclaimer: If you have some bias and/or dislike AsyncIO, please read my previous blog post before starting a war.


Tip: If you don't have the time to read the text, scroll down to see graphics.


Summary of previous episodes



After the publication of “Macro-benchmark with Django, Flask and AsyncIO (aiohttp.web+API-Hour)”, I received a lot of remarks, this is a synthesis:


  1. It’s impossible, you change the numbers/you don’t measure the right values/…: Come on people, if you don’t believe me, test by yourself: I’ve published as many pieces of information as possible to be reproducible by others in API-Hour repository. Don’t hesitate to ask me if you have issues to test by yourself.


    Nginx is configured to avoid being a bottleneck for Python daemons.
  2. Changing kernel parameters is a cheat: No, it isn't a cheat, most production applications recommend to do that, not only for benchmarks. Example with:
    1. PostgreSQL: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server shared_buffers config (BTW, I forgot to push my kernel settings for postgresql, it's now in the repository)
    2. Nginx: http://wiki.nginx.org/SSL-Offloader section preparation


  1. Daemons were all in debug mode: All daemons were impacted, I've disabled that. I've relaunched localhost benchmark on /agents endpoint, I get almost the same values, certainly because I've already disabled logging globally in my benchmarks.
  2. You should disable middlewares in Django: On production, you would keep them, but nevertheless, I’ve disabled them to be fair with other frameworks where I don’t use middlewares.


  1. wrk/wrk2 aren't good tool to benchmark HTTP, it can hit too hard: It's the goal of a benchmark to hit as hard as possible, isn't it ? FYI, almost all serious benchmarks reports on the Web use wrk. As frameworks performance in general is increasing, the tools used to challenge them have to hit stronger to bring out the differences.


  1. Keep-alive isn’t enabled for Flask or Django / Nobody uses Flask or Django alone in prod, you must use Nginx and uWSGI/Meinheld: No problems: you’ll find below a new serie of benchmarks based on these remarks.


In this article, I’ll test three scenarios:
  1. A resource limit use case: 4000 requests/s with wrk2 during 5 minutes
  2. A standard use case: 50 simultaneous connections with wrk during 5 minutes
  3. A slow use case: 10 requests/s with wrk2 during 30 seconds


To be closer with a production scenario, I’ll test only via the network and with agents list endpoint that uses a database connection, as described in my previous article.


I test 6 architectures:
  1. Django+Meinheld+Nginx
  2. Django+uWSGI+Nginx
  3. Django+Meinheld+Nginx
  4. Django+uWSGI+Nginx
  5. API-Hour+Nginx
  6. API-Hour without Nginx


As you can see, all are behind a Nginx server, except the last one, in order to serve as a control.


As usual, you can find config files in API-Hour repository: https://github.com/Eyepea/API-Hour/tree/master/benchmarks


Round 4: 4000 requests/s with wrk2



Requests by seconds
(Higher is better)
Latency (s)
(Lower is better)
As you can see, API-Hour without Nginx handles less requests by second, moreover:
Errors
(Lower is better)
API-Hour without Nginx has a lot of latency compared to others solutions, but see the explanation below.
At first sight, you seem to handle more requests with Nginx, and you even have less latency. This last point is intriguing: how, could you have less latency with API-Hour+NGINX, than with API-Hour alone ?
After careful examination, I’ve seen a lot of 404 responses on the wire. My understanding is that NGINX will return a 404 if the framework behind it doesn’t answer fast enough. A 404 is a response for wrk, and a response that comes quickly (the NGINX timeout to the backend is short). Wrk therefore can immediately launch another request. Hence you see more requests and less latency with NGINX, but (contrary the the control API-Hour W/o NGINX) many requests have in fact not been answered properly.


Array results




Requests/s
Errors
Avg Latency (s)
Django+Meinheld
3992.68
1031238
0.121
Django+uWSGI
3991.96
1029213
0.072
Flask+Meinheld
3991.43
1024192
0.111
Flask+uWSGI
3994.09
1021953
0.066
API-Hour
3994.96
312600
0.043
API-Hour w/o Nginx
3646.15
0
9.74


To avoid these artefacts, the round 5 doesn’t try to “force-feed” frameworks with requests. instead, it tries to make as many requests as possible (launching one as soon as the previous was in answered), on 50 parallel connections. Now you the frameworks work properly on each request, so the error-rate is zero for all of them. Again, to be fair, I decreased the number of simultaneous connections until all framework had a zero error-rate, and I did not do any lower (thus this is the maximum # of connections for which all error-rates are zero).

Round 5: 50 simultaneous connections with wrk



Requests by seconds
(Higher is better)
Errors
(Lower is better)

Latency (s)
(Lower is better)


Array results

Requests/s
Errors
Avg Latency (s)
Django+Meinheld
603.07
0
0.07977
Django+uWSGI
603.38
0
0.07958
Flask+Meinheld
623.85
0
0.07705
Flask+uWSGI
628.58
0
0.07655
API-Hour
3033.17
0
0.0161
API-Hour w/o Nginx
3610.96
0
0.01398


(Bonus) Round 6: 10 requests/s with wrk2 during 30 seconds



Not really interesting on a production environment, this test is only to validate that AsyncIO is interesting even with a small load.
With this round, I’ve no errors and all frameworks handle 10 requests/s.


Latency (s)
(Lower is better)
Array results

Requests/s
Errors
Avg Latency (s)
Django+Meinheld
10
0
0.02142
Django+uWSGI
10
0
0.02083
Flask+Meinheld
10
0
0.01912
Flask+uWSGI
10
0
0.01896
API-Hour
10
0
0.00783
API-Hour w/o Nginx
10
0
0.00855


Conclusion

As demonstrated in my previous benchmark, API-Hour rocks.
Meinheld/uWSGI+Nginx help to increase performances and reduce error rate for Python sync frameworks, but the internal architecture of your application has more impact on performances than change an external component.

For API-Hour, it isn’t a good idea to have Nginx as a reverse proxy, because you add latency and you reduce performances, as you can see in round 5.
With API-Hour, you can use a subdomain to serve your static files with Nginx.
If a subdomain is not an option, you can also route the static traffic with a HAProxy and a specific URL (a folder), but this would probably impact the latency a bit as HAProxy has to open all packets to get the URL, and  apply a regexp to know where to route it.


As a side note, I'd say that making technically sound and fair benchmarks is not easy: You don't just connect some tool on all tested frameworks, and present the figures.
Moreover, the interpretation of the results implies you precisely understand what you’re measuring. As you have seen above, some nasty side-effects can crawl in and give weird artifacts, let alone create unfair results for some. In many occasions, I had to pull out my wireshark and dissect what was on the wire to understand what was really going on.


I tried my best in these two articles to make these testings as error-free, honest and fair as possible, also integrating some clever (non-trolling) remarks I received from some of you. Again, the method is explained and all sources (as well as my help if needed) are publicly available if you think other factors or settings should be taken into account.
I'd welcome any relevant remarks or proven affirmations, just like I'll dismiss any ungrounded, troll-style, non-constructive ones.