Lifting The Curse of Apollo For Benchmarking
Aug 9, 2016 | 5 MIN READ
Aug 9, 2016 | 5 MIN READ
Here at Azul, a large part of our business is improving the performance of Java applications through improvements to the JVM. The key to determining where improvements can be made is through accurate measurement of how a system performs. This is far from simple due to at least two major factors:
Recently one of my colleagues, Nitsan Wakart, contributed some changes to the Apache Cassandra Stress tests, which he documents in detail here. These changes address an issue in the way the stress tests measure performance.
The need for these changes demonstrates how difficult it is to get benchmarking right. Let’s look at the key points here:
Response time = wait time + service time:
On a system that is being swamped with requests, service time should hit a maximum as the system processes as many concurrent requests as it can. Assuming that the rate of requests received is greater than the maximum rate at which the system can process them, the response time will continue to increase as the number of new requests in the queue waiting to be processed grows.
The way the Cassandra Stress test worked in rate mode is common to many benchmarking frameworks: it measured how long it takes M threads to make N calls per second. The problem with this is that each thread is sending requests synchronously. If a thread is blocked waiting for a response it is not able to send another request. This is not how the vast majority of systems work in real life; users don’t wait for other users to complete their requests before sending their own request. Gil Tene, our CTO, has given this type of benchmarking problem its own name: coordinated omission (you can find a lot more detail about this in Gil’s talk on “How Not To Measure Latency”).
To put the problem another way when testing in rate mode at, say 1,000 requests per second, what you want is one request submitted every millisecond. If the system under test takes longer than 1ms to respond to a request the rate at which requests are submitted will clearly not be what is desired.
In addition to modifying how the Cassandra stress tests work for applying a constant rate of requests, Nitsan has also added the ability to generate output files that are suitable for use with HdrHistogram (this is a high-definition histogram, which is especially well suited to representing response time data). They say, “A picture is worth a thousand words”; in this case, an HdrHistogram is worth a thousand lines of code when analysing performance.
The mythological Cassandra’s curse was to never be believed. With Nitsan’s changes, this curse has been lifted from the benchmarks generated by the non-mythological Cassandra load generator.