A new way to test multi-threaded and concurrent Java
Testing multi-threaded, concurrent Java is hard since we can only test one thread interleaving at a time. An obvious solution is to repeat the test multiple times, hoping that we execute all possible interleavings.
The problem with this solution is that every field access leads to a potential new interleaving. So the solution only works for small unit tests. This led me to develop the open-source tool VMLens.
Separate the Problem into two parts
VMLens breaks the problem into two separate parts:
- Execute all interleavings based on the synchronization actions.
- Use data race detection to see if one of the interleavings containes a data race.
A data race happens when two threads access the same field simultaneously without proper synchronization. Synchronization actions include operations such as accessing a volatile field or using a synchronized block. Synchronization actions and data races are formally defined in the Java Memory Model.
When a data race happens there is no guarantee that a reading thread will see the most recently written value. This is because the compiler may reorder instructions, and CPU cores may cache field values. Only with a synchronization action in place, we can ensure that thread reads the most recent value written by other threads. The reasoning behind this behavior, as well as the underlying memory model mechanisms, is explained in more detail in the JSR 133 (Java Memory Model) FAQ.
The idea behind VMLens is to focus on synchronization actions to define possible thread interleavings. And then to check if any of those interleavings contain a data race. This approach is based on one of the key insights from the paper Memory models: a case for rethinking parallel languages and hardware:
After much prior confusion, major programming languages are converging on a model that guarantees simple interleaving-based semantics for “data-race-free” programs and most hardware vendors have committed to support this model.
Implemenation
VMLens operates as a Java agent, transforming bytecode at runtime to trace all field accesses and synchronization actions. The detection of data races is based on happens-before relations similar to the algorithm used in Efficient on-the-fly data race detection in multithreaded C++ programs. However, unlike the on-the-fly approach in that paper, VMLens does not detect data races during execution. Instead, it logs all relevant access and synchronization events and analyzes them asynchronously to identify data races.
Executing all interleavings based on the synchronization actions works as follows: For each non-commutative synchronization action, we have two potential interleavings to execute. For example, a volatile read from thread A and a volatile write from thread B leads to the following two potential thread interleavings:
- A read from Thread A followed by a write from Thread B
- A write from Thread B followed by a read from Thread A
The thread interleavings are then the combination of all those potential interleavings.
The following shows a test consisting of a class with a volatile field.
You surround the multi-threaded part of your test with a while loop
iterating over all thread interleavings using:
while (allInterleavings.hasNext())
.
For each interleaving we check if our code behaves correctly using: assertThat(j,is(2))
import com.vmlens.api.AllInterleavings;
public class TestVolatileField {
private volatile int j = 0;
@Test
public void testIncrement() throws InterruptedException {
try(AllInterleavings allInterleavings = new AllInterleavings("testVolatileField")) {
while (allInterleavings.hasNext()) {
j = 0;
Thread first = new Thread() {
@Override
public void run() {
j++;
}
};
first.start();
j++;
first.join();
assertThat(j,is(2));
}
}
}
}
As expected the test failes. VMlens shows us the thread interleaving which led to the failing assertion:

The trace shows that both threads first read and then both write to the volatile field. This leads to 1 instead of the expected 2 for the variable j.
Choosing a high level of abstraction
Using VMLens you do not test the behaviour of the JVM directly. Rather it uses an abstraction. It treats volatile and synchronized blocks as correctly implemented. There already exists an OpenJDK tool, jcstress, which tests if those abstractions are implemented correctly. Using VMLens you test if your code is using those abstractions correctly.
We do not stop here. Instead of testing the classes of java.util.concurrent
we
treat them also as correctly implemented. And test if our code uses them correctly.
This further reduces the amount of necessary interleavings. For example instead of testing the mechanics
of java.util.concurrent.locks.ReentrantLock
we treat the methods as atomic:
public class TestWithReentrantLock {
private int j = 0;
private final Lock lock = new ReentrantLock();
@Test
public void testIncrement() throws InterruptedException {
try(AllInterleavings allInterleavings = new AllInterleavings("testWithReentrantLock")) {
while (allInterleavings.hasNext()) {
j = 0;
Thread first = new Thread() {
@Override
public void run() {
increment();
}
};
first.start();
increment();
first.join();
assertThat(j,is(2));
}
}
}
private void increment() {
lock.lock();
try{
j++;
}
finally {
lock.unlock();
}
}
}
Instead of tracing the internal of ReentrantLock
we
only trace the method calls of lock
and unlock
:

You can download all examples from this git repository.
Unit tests are a missing piece to use the cores efficiently
The number of cores of the CPU is continuously increasing. In 2020 the processor with the highest core count was the AMD EPYC 7H12 with 64 cores and 128 hardware threads. Today, June 2025, the processor with the highest core count has 288 efficiency cores, the Intel Xeon 6 6900E. AMD increased the core count to 128 and 256 hardware threads, with the AMD EPYC 9754.
Java with its clearly defined semantics through the Java Memory Model
and the powerful concurrency utilities in java.util.concurrent
allows us to use all those cores efficiently.
Project Loom with its virtual threads and structured concurrency
will further improve this.
What is still missing is a way to test that we are using all those techniques correctly.
I hope VMLens will fill this gap.