JVM Warmup | Developer Playground

Class Loading

When a JVM process starts, all required classes are loaded into memory by the class loader through three stages. This process is based on lazy loading.

Bootstrap Class Loading: The Bootstrap Class Loader loads Java classes. It loads essential classes like java.lang.Object from JRE/lib/rt.jar.
Extension Class Loading: The ExtClassLoader is responsible for all JAR files in the java.ext.dirs path. These are JAR files manually added by developers, not those in Gradle or Maven-based applications.
Application Class Loading: The AppClassLoader loads all classes in the application class path.

Execution Engine

Java is a hybrid language that uses both an interpreter and a compiler. When you compile your code with javac, it generates platform-independent bytecode. The JVM then interprets this bytecode at runtime. Frequently executed code is converted to native code by the JIT (Just-In-Time) compiler to improve performance. This converted code is then used directly in subsequent executions (hotspot).

C1 - Client Compiler

Prioritizes fast startup and responsiveness. While it has a lower level of optimization compared to C2, it has a shorter compilation time, resulting in faster initial execution and better user experience. It applies simple optimizations to quickly convert code to native.

C2 - Server Compiler

Focuses on performance optimization rather than speed. It takes longer to compile than C1 but applies deeper optimizations to maximize long-term execution speed. It analyzes code execution patterns at runtime to apply advanced optimizations.

Tiered Compilation

Enabled by default since Java 8, and it is recommended to use the default settings.

The C2 compiler requires more memory and time than C1 but generates more optimized native code. Tiered Compilation was first introduced in Java 7 and aims to achieve both fast startup and long-term performance improvement by utilizing both C1 and C2.

The interpreter collects profiling information about methods and provides it to the compiler, and C1 generates a compiled version based on this information. Methods that are frequently used during the application lifecycle are loaded into the native cache.

When the application starts, the JVM interprets all bytecode and profiles it. The JIT compiler uses this profiling data to identify hotspots.

First, the JIT compiler quickly converts frequently executed code to native code using C1, and then C2 applies additional optimizations based on the profiling information generated by the interpreter and C1. This process takes longer than C1.

Code Cache

This is a memory area where the JVM stores all bytecode compiled into native code. Tiered Compilation increases the amount of code in the code cache area by 4 times.

After Java 9, the code cache was divided into three areas to improve locality and reduce memory fragmentation:

Non-method segment - JVM internal related code (about 5MB, adjustable via -XX:NonNMethodCodeHeapSize)
The profiled-code segment - Code compiled by C1, which may have a short lifespan (default ~122MB, adjustable via -XX:ProfiledCodeHeapSize)
The non-profiled segment - Code compiled by C2, which may have a longer lifespan (default ~122MB, adjustable via -XX:NonProfiledCodeHeapSize)

Deoptimization

There's a possibility that code compiled by C2 might not be optimized. In such cases, the JVM temporarily rolls back to interpretation mode. For example, when profile information doesn't match the actual method execution.

Compilation Levels

The interpreter and JIT compiler have five levels:

Level 0 - Interpreted Code

At this stage, the JVM interprets all Java code. It reads and executes bytecode line by line, resulting in lower performance compared to compiled languages at this stage.

Level 1 - Simple C1 Compiled Code

The JVM compiles methods deemed non-critical using C1 without collecting profiling information. This typically applies to very simple or low-complexity methods. These methods aren't expected to show significant performance improvements even with further optimization by C2. The main purpose is to speed up execution, allowing code to run with minimal overhead. Since profiling information isn't collected, the JVM doesn't decide on additional optimization for code running at this level. This reduces system resource usage and ensures fast execution for simple methods.

Level 2 - Limited C1 Compiled Code

C1 analyzes code through lightweight profiling. The JVM uses this stage when the C2 Queue is full. Since C2 performs extensive optimizations requiring significant time and resources, it temporarily uses C1 with lightweight profiling to improve performance without waiting.

Level 3 - Full C1 Compiled Code

After running code compiled at level 2 for some time, the JVM collects more runtime data and compiles it with full profiling through C1 at this stage. This includes more comprehensive data collection than lightweight profiling, allowing identification of complex patterns and optimization opportunities. It collects detailed execution metrics for more complex optimizations that C2 will perform.

Level 4 - C2 Compiled Code

When the C2 Queue is available and important hotspots are identified based on full profiling from level 3, this stage proceeds. C2 applies optimization techniques to generate native code. This is the final stage and aims to maximize execution efficiency based on insights gained from extensive profiling data.

The JVM continues with interpretation until reaching the Tier3CompileThreshold. After that, C1 compiles the method and continues profiling. Finally, C2 compiles when reaching the Tier4CompileThreshold. The JVM may decide to deoptimize C2-compiled code, in which case the process starts again from the beginning.

JVM Warming Up

After class loading completes, important classes used during process startup enter the JVM cache for faster operation during runtime. Other classes go into the JVM cache on a per-request basis when requested.

Due to lazy class loading and Just In Time compilation, the first request in a Java web application has a slower average response time.

To improve the slow response in the first request, all classes need to be pre-loaded into the JVM cache. This process is called JVM warming up.

Manual Implementation

This involves writing a custom class loader that directly uses classes used at application startup. For web applications, you can make the application send API requests to itself. In Spring Boot applications, you can use CommandLineRunner or ApplicationRunner to make internal calls during the Spring lifecycle process.

ApplicationStartEvent
ApplicationEnvironmentPreparedEvent
ApplicationContextInitializedEvent
ApplicationPreparedEvent
ApplicationStartEvent
AvailabilityChangeEvent(LivenessState.CORRECT)
ApplicationRunner, CommandLineRunner execution
- You can preload classes used in the application through internal calls to load them into the native cache.
ApplicationReadyEvent(ReadinessState.ACCEPTING_TRAFFIC)

About JVM Warm-up