Common Garbage Collection Techniques
Garbage collectors are divided into several types. For each type some collectors are categorized as ‘mostly’, as in ‘mostly concurrent’. This means that sometimes it doesn’t operate according to that classification and has a fallback mechanism for when that occurs. So, a ‘mostly concurrent’ collector may operate concurrently with application execution and only occasionally stop-the-world if needed.
Various garbage collectors are classified below, for detailed description, refer this white paper.
1. Concurrent collector – performs garbage collection concurrently while application execution continues.
2. Parallel collector – uses multiple threads. A collector can be concurrent but not parallel, and it can be concurrent AND parallel. (Side note – be cautious when researching older literature on garbage collection, since what we used to call parallel is now called concurrent.)
3. Stop-the-world (STW) – is the opposite of concurrent. It performs garbage collection while the application is completely stopped.
4. Incremental – performs garbage collection as a series of smaller increments with potentially long gaps in between. The application is stopped during garbage collection but runs in between increments.
5. Moving – the collector moves objects during garbage collection and has to update references to those live objects.
6.Conservative – most non-managed runtimes are conservative. In this model, the collector is unsure of whether a eld is a reference or not, so it assumes that it is. This is in contrast to a Precise Collector.
7. Precise – a precise collector knows exactly where every possible object reference is. A collector cannot be a moving collector without also being precise, because you have to know which references to x when you move the live objects. Precise collectors identify the live objects in the memory heap, reclaim resources held by dead objects and periodically relocate live objects.
Most of the work the virtual machine does to be precise, is actually in the compiler, not the collector itself. All commercial JVMs today are moving and precise.
Steps in Garbage Collection
Before the garbage collector can reclaim memory, it must ensure the application is at a ‘GC safepoint’. A GC safepoint is a point or range in a thread’s execution where the collector can identify all the references in the thread’s execution stack. The terms ‘safepoint’ and ‘GC safepoint’ are often used interchangeably, however many types of safepoints exist, some of which require more information than a GC safepoint. A ‘Global Safepoint’ is when all application threads are at a safepoint.
Safepoint opportunities in your code should be frequent. If the garbage collector has to wait for a safepoint that is minutes (or longer) away, your application could run out 3 Understanding Java Garbage Collection of memory and crash before garbage can be collected. Once the GC safepoint is reached, garbage collection can begin.
This phase, also known as ‘trace’, finds all the live objects in the heap. The process starts from the ‘roots’, which includes thread stacks, static variables, special references from JNI code and other areas where live objects are likely to be found. A reference to an object can only prevent the object from being garbage collected, if the reference chains from a GC root.
The garbage collector ‘paints’ any objects it can reach as ‘live’. Any objects left at the end of this step are ‘dead’. If any objects are still reachable that the developer thought were dead, it’s an object leak, a form of memory leak.The work of a marker is linear to the amount of live objects and references, regardless of the size of the objects. In other words it takes the marker the same amount of time to paint 1,000 10KB objects as 1,000 1MB objects.
In concurrent marking all reachable objects are being marked as live, but the object graph is mutating (i.e. changing) as the marker works. This can lead to a classic concurrent marking race. The application can move a reference that has not yet been seen by the marker into an object that has already been visited. If this change is not intercepted or prevented in some way, it can corrupt the heap. The object would be collected, even though a reference to it still exists.
Usually a ‘write barrier’ is used to prevent this condition. The write barrier captures changes to object references (e.g. in a card table) that might otherwise be missed by the marker. With this information, the marker can revisit all mutated references and track new mutations. When the set is small enough, a stop-the-world pause can be used to catch up, making the collector ‘mostly’ concurrent. But it is important to note that the collector is sensitive to the mutation rate and the amount of work done grows with the mutation rate and may fail to finish.
In this phase the garbage collector scans through the heap to identify the locations of ‘dead’ objects and tracks their location, usually in some sort of ‘free list’. Unlike the Mark phase, the work done by the Sweep phase is linear to the size of the heap, not the size of the live set. If your application is using a very large heap with very little left alive, Sweep still has to scan on the entire heap.
Over time, the Java memory heap gets ‘fragmented’, where the dead space between objects is no longer large enough to hold new objects, making new object allocation slower. If your application creates objects of variable sizes, fragmentation will happen more quickly. XML is a great example of this. The format is dened but the size of the information in the object is not controlled, often leading to objects with great variations in sizes and a fragmented heap.
In the Compact phase the garbage collector relocates live objects to free up contiguous empty space. As these objects are moved, the collector must x all references in the threads to these live objects, called ‘remapping’. Remap has to cover all references that could point to an object, so it usually scans everything. The amount of work done in this phase is generally linear to the size of the live set.
Incremental compaction is used in a couple of commercial collectors (Oracle G1 and the Balanced Collector from IBM). This technique assumes that some regions of memory are more popular than others, although this is not always the case depending on the application. The GC algorithm tracks cross-region remembered sets (i.e. which region points to which). This allows the collector to compact a single region at a time and only scan regions pointing into it when remapping all potential references. The collector identifies region sets that t into limited pause times, allowing the maximum time for application interruption to be controlled. Large heaps have fewer non-popular regions; the number of regions pointing into a single region tends to be linear to the size of the heap. Because of this, the work for this type of compaction can grow with the square of the heap size.
Types of Garbage Collectors
1.Mark/Sweep/Compact Collector – performs the three phases as three separate steps.
2.Mark/Compact Collector – skips the sweep and moves live objects to a contiguous area of the heap.
3. Copying Collector – performs all three phases in one pass. A copying collector is pretty aggressive. It uses a ‘from’ and ‘to’ space and moves all the live objects and then fixes the references all in one pass. When the ‘from’ space is empty the collection is complete. Work done in a copying collector is linear to the size of the live set.
Drawback of Major Garbage Collectors
The main drawback of most garbage collectors is the need for long application pauses. These pauses are the result of an inevitable requirement to compact the heap to free up space. Collectors use different strategies to delay these events, but compaction is inevitable for all commercial available collectors except Azul C4, which employs a Continuously Concurrent Compacting Collector that avoids pauses altogether.