Azul’s Platform Prime is based on OpenJDK but uses a modified JVM to deliver lower latency and higher throughput. This video will explain what changes have been made to improve the performance of the garbage collector and Just-in-time (JIT) compiler. This includes how ReadyNow warmup elimination technology can make applications perform at peak performance right from the start.
Video Transcript
Hello and welcome to this overview of Azul Platform Prime. My name is Simon Ritter. I’m the Deputy CTO at Azul. If we look at the way that the Java virtual machine works, we’ll find that from a performance perspective, we typically see a graph like this. When you start up a JVM based application, whether it’s written in Java or Scala or Kotlin, it’s compiled into bytecodes. And those bytecodes are the instructions for a virtual machine, hence the Java virtual machine. What that means is that in order for them to execute on a particular platform, whether it’s Windows or Linux, Intel or ARM, those virtual instructions need to be converted into the physical instructions for that machine. And that can take time. As we can see from the graph here, when you start up your application, we run in interpreted mode. Each bytecode is converted individually into those instructions for the platform that we’re running on. That’s quite slow. And in order to speed things up, we see how many times particular pieces of code called methods are executed. And we keep track of that so that when we find particular methods that are being used very frequently, rather than executing the bytecodes in them individually, we take that whole method and we pass it to a compiler that’s running at the same time as the application. Those instructions can then be compiled within the method into native instructions and run without having to use the interpreter. that delivers much better performance. We do that in two stages. There are two different, what I call, just-in-time compilers or JIT compilers. The first of those will compile. bytecodes very quickly into native instructions, but it won’t apply much in the way of optimization. The second compiler takes longer to compile code, but uses profiling information that we’ve collected as the code is running to understand how it’s being used. And then it can generate much more heavily optimized code, which gives us even better performance. As we can see from the graph, over time we’ll compile more and more of methods that are being used frequently until we get to a point where all the frequently used methods have been compiled and recompiled with the maximum level of optimization. That takes time and obviously we want to try and improve that. If we look at Platform Prime that we’ve developed at Azul. There are a number of things that we can say about this. The first is that it is a drop-in replacement for other JVMs, meaning that you don’t have to change any of the code that you have in your application. You don’t even have to recompile it in order to use it. It’s just a replacement for the JVM, but delivering better performance. So no code changes, no recompilation required. That’s very important because it means that you then don’t have to do any additional work to take advantage of these improvements. The big question then is how can we do this without you changing anything? And the answer is in several different ways. First of those relates to the way that memory management happens for your application. The way that the Java Virtual Machine works is to automate memory management. When objects are created, the space is allocated by the JVM in the heap, and the JVM will keep track of those objects so that when they’re no longer being used by your application, it will know that it can reclaim that space so it can be reused by other objects. That’s what’s called garbage collection. And What people often do in terms of tuning the way that the JVM works, especially from a performance perspective, is to minimize the pause times associated with garbage collection. Most garbage collection algorithms will stop the application in order to do the work of reclaiming that space in a safe way. So we need to mark all the objects that are still in use and then we may need to move objects around within the heap in order to eliminate fragmentation. To do that safely, as I said, most algorithms will actually pause the application. And so we end up with garbage collection pauses that interfere with the way that the application works. If we look at the way that the Azul way of doing things, the what’s called the C4 collector, the continuous concurrent compacting collector works, what we do is slightly different. So we have a truly pauseless collector. We can do all of the work of the garbage collector, all of this. marking all of the relocating of objects to eliminate fragmentation whilst the application is running and do that completely safely so that any changes that are being made to the data in the application do not get lost or corrupted as the work of the garbage collector happens. So it is a truly pauseless garbage collector and this can lead to delivering very low latency times associated with your application. That’s a big performance improvement. Second area that platform prime changes the way things work internally is the just in time compilation system. As I said there are two different compilers one that works very quickly and then one that’s doing much more heavy optimization. What we’ve done is to look at the way that works and to modify that to improve the code that’s generated by the second of these What we do is we replace the C2JIT compiler with one that we call Falcon. This is based on an LLVM open source project. It’s a project designed to create compiler technology supported by many large companies and individuals. We’ve integrated that into the JVM, the Prime JVM, so that what we can do is when we compile the code with heavy optimization, we can deliver even better performing code, giving you higher throughput for your application, as well as the lower latency associated with a different garbage collector. Another thing we can do is take advantage of the more modern architecture that we have in terms of the CPU that’s being used to run your application. So if you’re using the most up-to-date CPU, then we can take advantage of the latest features and use those to deliver better performance and better throughput. The last thing that we’ve done in terms of changing things is to look at that application warm up time and determine how can we change that. The problem is that when you run your application you go through this graph and you analyse the methods that are being used frequently, you compile them with C1, then as they get used more frequently we recompile them with C2 or Falcon. When you restart the same application, there’s no memory of what happened before. So the JVM will have to go through exactly the same process again, identifying which code needs to be compiled, compiling it with C1, then recompiling it with C2 or Falcon and moving on from there. What we provide as an additional part of Platform Prime is Ready Now. And the idea here is that you let your application run the first time. you use it in production, you let it run with a representative workload until you’ve got to the point where the performance has reached its optimum level. So you’ve warmed up the application, JVM has compiled all the necessary code for the frequently used methods. At that point we take a profile of the application and we record certain pieces of information about what’s internal to the JVM in terms of the code that’s being compiled, in terms of the classes in terms of methods that need to be compiled and so on. When you restart the application, you can use that profile as a way of telling the JVM what it needs to do. That way it immediately knows the methods that need to be compiled. It immediately knows that it can use potentially the code that was compiled from the previous run because we store that as well. And that way you don’t have to go through this process of warming the application up in the same way. Suddenly, you can get much, much better performance very quickly by reusing the information about the previous run. we have a memory of what happened in earlier runs of the application. If we look at our performance graph again, we can see that if we overlay the idea of platform prime on top of this, we now have the dark blue graph, which is different to the one that we saw before. And it’s different in a number of ways. The first of those is that the dips in performance we saw even after the application had warmed up, which were due to garbage collection. We’ve eliminated by using the C4 algorithm. Second thing we’ve done is to improve the performance overall by saying, let’s make use of the more heavily optimized code, make use of the up-to-date features in terms of the CPU that we’re using and deliver better performance for that compiled code. So we raise that overall level of performance. And then the third thing we do is to eliminate a lot of this warm up slope and make it much steeper so we get to that optimal level of performance much more quickly by using the recorded profile of a previous run of the application and then delivering that and reusing it to immediately compile methods if we need to or reusing compile code from a previous run. So we get much, much faster warm up of our application. So the idea behind, as all platform prime, is better application performance for your code without you having to make any changes, without having to recompile your code. Simply a drop-in replacement and deliver better performance.