Java’s Magic Sauce
Jun 28, 2018 | 8 MIN READ
Jun 28, 2018 | 8 MIN READ
Or should I say, Java’s Magic Source?
If we go all the way back to JDK 1.0, there were 211 classes in the public API. Out of interest, I created a graph showing the growth in public classes over time. To extract my data I used the API documentation and copied the list of all classes.
I stopped at JDK 8; both JDK 9 and 10 have the same number of public classes as JDK 8 (there are a few changes in available methods but not classes).
Next, I took the rt.jar file for each JDK and extracted a list of all the classes it contained. Since rt.jar was only introduced in JDK 1.2, I used the classes.zip file from JDK 1.0 and JDK 1.1. After removing all inner classes to simplify things a bit, I took the difference between the total number of classes and those in the public API. The graph below shows this data.
As you can see, back in JDK 1.0 there were only thirteen internal classes; this peaked in JDK 8 at a whopping 8709. I did start trying to get numbers for JDK 9 and 10, but this proved too complicated due to the elimination of the rt.jar and tools.jar files, being replaced with 97 modules.
The interesting thing about this is that from JDK 1.1 onwards there was a lot of functionality buried in the JDK. Right from the very start, developers have been warned by Sun and then Oracle that the internal APIs are not intended for use in application development, are not documented and may be removed from the JDK without notice.
Despite these warnings, there are plenty of developers who have used these classes. Oracle conducted an analysis of a large amount of their own code written in Java and found that the top three internal classes used were:
Apparently, their code does more encoding than decoding. The use of these two classes was driven, primarily, by the lack of a public BASE64 API until JDK 8.
Which brings us to the most notorious internal API in the JDK: sun.misc.Unsafe. The clue, is most definitely, in the name meaning this a class that will let you do things that are well outside of the defined boundaries of regular Java code.
I was surprised to find that the Unsafe class was only introduced in JDK 1.4. According to a presentation given by Mark Reinhold at the JVM Language Summit, the Unsafe introduction was part of a major rewrite of Reflection and Serialisation as well as being required for support of direct buffers and NIO.
Despite being undocumented, the source code for Unsafe is readily available thanks to the OpenJDK project. You can run javadoc on Unsafe.java and get at least a minimal set of documentation for all 113 (in JDK 8) available methods.
The theme of this post is not to discuss the details of Unsafe functionality. Suffice to say that creating an instance of Unsafe requires more work than a typical class. The constructor is private and, although there is a static method, getUnsafe(), you need to access this via the bootclasspath. Probably the most common way to access Unsafe is through reflection to gain access to the internal instance.
Once you have access to Unsafe, well the gloves are off, so you had better know what you are doing and be careful. As the name of the class suggests, many of the things you can do are inherently unsafe. For example, you can allocate and free native memory (analogous to malloc and free in C). You can also manipulate memory using addresses, as you would using pointers in C. It’s not just memory that you get access to: you can do things like allocate a new instance of an object but not run the constructor. Think what might happen if you use that object. One phrase crops up many times in the documentation: “results are undefined”. There are many ways you can use these methods that will return a result that is meaningless (if you’re lucky) or causes the JVM to stop abruptly (if you’re not).
The Java language was designed to be safe. This is the whole rationale behind eliminating the use of explicit pointers and manual memory management, as well as numerous other basic features. Java was never intended to be used as a systems programming language, which is where C initially started life (good old UNIX). To write systems code, you need these types of low-level, potentially dangerous interfaces to let you implement what’s required. Java was always intended to be used for developing application code that would not need this kind of access directly. The concept is for developers to trust the developers of the JVM and the core libraries to guarantee the safety of the application code. The Unsafe class was introduced to allow the developers of the public classes to use it to deliver better performance or to make use of low-level features (like memory fencing), which would not be possible with standard Java code.
The issue, which became quickly apparent when it was proposed to encapsulate all internal APIs in JDK 9, was that many people had used these classes. There was a fascinating study that analysed 74 Gb of compiled Java code from Maven Central only looking at the use of sun.mis.Unsafe. The results showed that 25% of the code relied in some way on this internal class.
This heavy reliance, especially from popular open-source libraries and frameworks, is a large part of the reason that the JEP 260 was included in JDK 9. This provides a module, jdk.unsupported, which exports the internal packages that are deemed critical to the JDK. (Interestingly, the module API documentation still does not provide any information on this). This module exports the com.sun.nio.file package and exports and opens (for reflective access) the sun.misc and sun.reflect packages.
Oracle, to its credit, has performed a cleanup of the Unsafe class in JDK 9. I particularly liked the comment about the link between the over-use of extern and the increased mortality rate of kittens.
The reason Unsafe has been heavily used is that library developers need the same enhanced level of performance and extended capabilities that the core Java API developers use. Without it, many enterprise applications would run a lot slower than they currently do.
This brings me to the crux of my post.
Java is a hugely successful application development platform and the reasons for this are many. The very safety that the language provides is core to why developers find it so appealing. Not having to worry about incorrect pointer manipulation, forgetting to free memory and memory leaks (although you can still have these in Java) make reliable, fast code much easier to write. However, having the secret sauce has also been vitally important to the success of Java as it has allowed powerful, high performing libraries and frameworks to be developed, providing developers with a wealth of functionality to build on.
Already we’ve seen parts of Unsafe being implemented in a way that makes them accessible through a public API. Variable handles, introduced in JDK 9, are an excellent example of this. Developers can use these to fence memory access operations and perform atomic operations directly on variables without the need to create instances of classes in the java.util.concurrent.atomic package.
I’ll leave you with a question to ponder, would Java have been as successful as it has been had sun.misc.Unsafe not been hidden in the library code but still accessible?