Currently set to No Index
Currently set to No Follow

224% ROI and payback in under 3 months for Azul Zing.
Read Forrester’s Total Economic Impact™ Study.

Slash your Java support costs as much as 90%!
We make it easy, safe and secure.

Lifting The Curse of Apollo For Benchmarking

Lifting The Curse of Apollo For Benchmarking

Aug 9, 2016 | 5 MIN READ

Here at Azul, a large part of our business is improving the performance of Java applications through improvements to the JVM. The key to determining where improvements can be made is through accurate measurement of how a system performs. This is far from simple due to at least two major factors:

  1. Having repeatable tests so that the effect of changes can be measured accurately. For this we use benchmarks. Much has been written about the merits of benchmarks in different forms: micro, synthetic, kernel and so on, especially in terms of what constitutes a real-world test of an application.
  2. Ensuring that the benchmarking tests we use are measuring the right thing. This may seem like an obvious requirement, but is something that often gets overlooked as a result of subtle effects in a benchmark.

Recently one of my colleagues, Nitsan Wakart, contributed some changes to the Apache Cassandra Stress tests, which he documents in detail here. These changes address an issue in the way the stress tests measure performance.

The need for these changes demonstrates how difficult it is to get benchmarking right. Let’s look at the key points here:

Response time = wait time + service time:

On a system that is being swamped with requests, service time should hit a maximum as the system processes as many concurrent requests as it can. Assuming that the rate of requests received is greater than the maximum rate at which the system can process them, the response time will continue to increase as the number of new requests in the queue waiting to be processed grows.

The way the Cassandra Stress test worked in rate mode is common to many benchmarking frameworks: it measured how long it takes M threads to make N calls per second. The problem with this is that each thread is sending requests synchronously. If a thread is blocked waiting for a response it is not able to send another request. This is not how the vast majority of systems work in real life; users don’t wait for other users to complete their requests before sending their own request. Gil Tene, our CTO, has given this type of benchmarking problem its own name: coordinated omission (you can find a lot more detail about this in Gil’s talk on “How Not To Measure Latency”).

To put the problem another way when testing in rate mode at, say 1,000 requests per second, what you want is one request submitted every millisecond. If the system under test takes longer than 1ms to respond to a request the rate at which requests are submitted will clearly not be what is desired.

In addition to modifying how the Cassandra stress tests work for applying a constant rate of requests, Nitsan has also added the ability to generate output files that are suitable for use with HdrHistogram (this is a high-definition histogram, which is especially well suited to representing response time data). They say, “A picture is worth a thousand words”; in this case, an HdrHistogram is worth a thousand lines of code when analysing performance.

The mythological Cassandra’s curse was to never be believed. With Nitsan’s changes, this curse has been lifted from the benchmarks generated by the non-mythological Cassandra load generator.

© Azul 2021 All rights reserved.