Gartner Report: Which Java Runtime is for you? | DOWNLOAD NOW

Support
Blog chevron_right Performance

Understanding Java Compilation: From Bytecodes to Machine Code in the JVM

When I was at university, one of my favourite Computer Science courses was compiler theory.  Something about how you convert from a human-readable programming language to machine and operating system specific instructions seems particularly intriguing.

For the Java platform, compilation is different to many other languages because of the Java Virtual Machine (JVM).  To run an application with the JVM, Java code is compiled into a set of class files that contain instructions for the JVM, not the operating system and hardware on which the JVM is installed.  This provides the Write Once, Run Anywhere capability for which Java has been famous. 

How does this conversion from virtual machine instructions to native instructions happen?

This is not a simple question to answer, so I’ve decided to write a series of blog posts exploring the different aspects of interpreting and adaptive compilation within the JVM.

At Azul, we’re constantly looking at ways to improve the performance of JVM-based applications, and I’ll include plenty of information about the work we’ve done in the following areas:

Since we’re not the only people with ideas in this space, I’ll also look at alternative approaches like the Graal compiler (not to be confused with the Graal VM).

Let’s start with some fundamental concepts that we’ll build on in the rest of the blog series.

Source Code 

What is Source Code?

Source code is high-level statements and expressions developers write to define the application’s instructions.  We call this high-level because these types of programming languages provide strong abstractions of the operating system and hardware used to run the application. 

Source Code Example

As a simple example, if we want to sum the numbers from one to ten, we could write this in Java using a loop, one of the fundamental constructs in many languages:

public class Sum {
public static void main(String[] args) {
int sum = 0;

for (int i = 1; i <= 10; i++) {
sum += i;
}

System.out.println(sum);
}
}

This hides the complexity of how an operating system and processor work for developers.  For example, we can declare a local integer variable and give it a meaningful name, sum.  This is simpler for us to work with than using an explicit memory address.  Similarly, we can call a method in the PrintStream core library class via a reference through the System class that will print a string on whatever the standard output is for our application.  How this magically appears as characters in a terminal, which is controlled by a window manager and gets drawn on the screen via a graphics card, is not our concern.

However, our high-level code needs to be converted into a set of numeric instructions and operands that can be understood by the machine on which we run the application.

To better understand what’s involved in that conversion, we could rewrite our Sum.java example in a low-level language.  Unlike a high-level language, this does not provide abstractions but allows us to control the operating system and processor directly using instructions they understand.

For this example, we’ll assume we’re going to run our application on a Linux machine with an x64 processor.

One way we could write the loop part of our application in assembly language is shown below.  (As we’ll see later, just like in Java, there are multiple ways we could write this code to do the same thing).

section .text
global _start
_start:
mov eax, 1
mov ecx, 10
xor edx, edx
L:
add edx, eax
inc eax
dec ecx
jnz L
EL:
mov eax, 1
mov ebx, 0
int 0x80

In this code, I’ve left out the part that prints the result at the end; doing this in assembler requires a lot more code than for the loop.

As you can see, this is considerably less readable than it is in Java.  But even this is still somewhat human-readable.  If you understand basic computer architecture and the instruction set being used, you can see that most of the work involves manipulating registers and performing basic calculations.  More complex tasks can be achieved through interrupt calls such as the one at the end where we use the Linux interrupt 80H to invoke a system call to terminate the application (without which, as I learnt in writing this article, you get a segmentation fault).

Even this is too high-level for the computer hardware.  The computer just needs a stream of multi-byte words to understand which instruction to execute with which operands.  

Using an assembler and linker, we can convert the assembly code to object code and an executable.  This is generated chiefly by mapping from textual instructions like JNZ to the appropriate value (in this case, 0x75). Finally, we end up with a file that the operating system can execute.

Our executable file looks like this when dumped as a series of hexadecimal values:


0000160 0000 0020 0000 0000 0000 0000 0000 0000
0000200 01b8 0000 b900 000a 0000 d231 c201 c0ff
0000220 c9ff f875 01b8 0000 bb00 0000 0000 80cd
0000240 2e00 6873 7473 7472 6261 2e00 6574 7478
0000260 0000 0000 0000 0000 0000 0000 0000 0000

Note: this is not the complete file, just the loop execution part.

However, for our high-level Java code, we cannot map directly from the statements and expressions we use to machine instructions.

For this, we must use compilation.

Java Compilation

Generically, compilation is the process of translating source code into target code using a compiler.

As we know, the Java platform uses the JVM to run Java applications. However, the JVM is an abstract computer.  The JVM specification, which is part of the Java SE specification, defines features every JVM must have (what the JVM must do).  However, it does not specify details of the implementation of these features (how the JVM does those things).  This is the reason, for example, why there are such a variety of garbage collection algorithms available in different JVM implementations.

Part of the JVM specification is a list of bytecodes that define the instruction set of our virtual (abstract) machine.  The name bytecode comes from the fact that each operand is only one byte in size.   Of the 256 possible bytecodes, only 202 are used (with a further three reserved for JVM implementation use).  This is quite incredible when you think that the x86-64 instruction set, which appears to be very difficult to give a precise count of, is approximately a thousand.

One reason for the significant difference in instruction set size is that some of the JVM instructions perform complex tasks.  For example, invokevirtual, which invokes an instance method.  The description of this instruction in the JVM specification is five pages long.  Another reason is that the JVM does not have explicit registers and uses the stack for almost all operations.

As an aside, I’ll add a couple of interesting things I learnt while researching this post.  The first is that the Sun implementation of the JVM (which became OpenJDK) used to have an additional 25 _quick bytecodes.  These were only used internally as replacements for bytecodes that referred to constant pool entries.  The other is that the early JVM developers had a premonition about the invokedynamic bytecode added in JDK 7.  Bytecode number 186 was the only value that wasn’t initially used, and it’s precisely where invokedynamic needed to go.

The JDK includes a tool, javac, that compiles from Java source code to a target of Java bytecodes.  The bytecodes are packaged in class files (also defined by the JVM specification).  To run an application, the JVM loads class files and executes bytecodes.

Again, using our Sum.java example, we can compile this with javac Sum.java.  The JDK also includes a helpful tool, javap, a class file disassembler.  Using the -c option of this, we can print out the bytecodes in our freshly compiled class file

public static void main(java.lang.String[]);
    Code:
       0: iconst_0  
       1: istore_1  
       2: iconst_1
       3: istore_2
       4: iload_2
       5: bipush        10
       7: if_icmpgt     20
       10: iload_1
       11: iload_2
       12: iadd
       13: istore_1
       14: iinc          2, 1
       17: goto          4 
       20: getstatic     #7 // Field System.out:Ljava/io/PrintStream; 
       23: iload_1
       24: invokevirtual #13 // Method PrintStream.println:(I)V
       27: return

As mentioned earlier, the JVM is stack-based, so in bytecodes, we need 13 instructions compared to 7 in x64 assembler. Bytecodes spend a lot more time pushing and popping.

In the next post in this series, we’ll look at how bytecodes from our virtual instruction set get converted to the native instructions of the underlying computing platform, which is where the real fun starts.