Collecting Garbage in Java

Java is the most popular used language in the world, and also the pretty most language that people reproach performance, due to the use of GC.

GC is the acronym of Garbage collector, the system that, at runtime, scans for unused heap space, and reclaims it back to the system to avoid memory leaks and overflow. Along with the runtime overhead generated by a side algorithm running with your application, the most penalizing characteristic of GCs is its VM pauses.

With the evolution of Java, there where many implementations of GC algorithms each with its own pros and cons.

As a Java developer/architect, you should care about this aspect of the language, I dont see people care about it anywhere in the process of designing Java entreprise applications.

Here is a very short notes about it.

Serial Gc

This is a basic and simplest one. It needs to pause all the application to clean the heap with a single-threaded GC process. Not suitable to most recent application since multi-threaded cores are everywhere.

Parallel Gc

Same as the Serial one, but uses multiple threads to free data and again it freezes all the application to do the work.

CMS

This starts to be interesting, it separates the process to two phases the mark (marking instances for eviction) & sweep (liberate memory). Again pauses are always there but minimal compared to the two above since it needs to do so only on the mark phase.

G1

Designed to work on a multicore machines with large memory (The case for the most servers nowadays). This is why it is the default Garbage collector for the JVM. This is designed as CMS with mark & sweep phases but here the mark phase can run concurrently. Also, it separates the memory to generations and apply some heuristics of where would be more significat to liberate memory first. And thus minimizing again pause times needed.

Epsilon

The do nothing collector. It simply does not free memory. It is like running a java program without a GC. I find this very interesting. Because we can find many tools developed in java, for example batches, that by default uses G1, and are super slow to run. Whenever the memory fingerprint is knowable, I think it would be adviseable to use Epsilon instead and gain the performance boost of not using a GC.

Z

ZGC does all the heavy work concurrently and without too much stopping the application. In average it pauses the VM for around 1ms (200ms for comparaison to Parallel collectors).

Shenendoah : The new star

The extra low pause time Collector, it does most of the work concurrently, including the mark phases. Though, it needs to do some pauses at the start and at the end of each phase to detect changed application state while concurrently marking or sweeping memory. It garantees a sub-millisecond pauses time per cyle, a first GC to reach this performance. It also divide the heap into regions and apply some heuristics on to select best fit strategies. Worth mentionning that this GC is not available in all JDK distribution, it is not available in the Oracle’s JDK for example.