A new way to unit test multi-threaded Java

August 17, 2020

Category: The package java.util.concurrent.atomic

Unit-testing multi-threaded Java seams impossible. Bugs depend on the specific timing of the threads and sometimes even on the specific processor type or JVM. But by using byte-code transformations it is possible to test all thread interleavings for a given unit test.

I have implemented those transformations in the open-source tool vmlens.

Why multi-threaded?

The number of current CPU cores is growing exponentially. While it took four years to go from two cores to eight in the year 2009, it took only one year to go from 32 to 64 in the year 2018. The following figure shows the growth of cores for server CPUs. To utilize all those cores we need scalable, multi-threaded software.

Why Java?

The JVM offers numerous techniques, frameworks, and open source libraries for multi-threaded programming. You can use threads, Executors, the ForkJoin framework, parallel streams, actors, to name a few of them. But until now, it was not possible to write a unit test for multi-threaded software. So to test early, test often, test automatically was not possible. And also techniques that require unit tests like re-factoring or test-driven design.

An example of a unit test

But by using byte-code transformations, it is now possible to test all thread interleavings. The idea is to use an automatic test and re-run the test for each possible thread interleaving using vmlens. The following example shows this for a unit test using java.util.concurrent.ConcurrentHashMap to collect statistics. You can download the example from GitHub here.

import com.vmlens.api.AllInterleavings;
public class TestUpdateWrong {
	public void update(ConcurrentHashMap<Integer, Integer> map) {
		Integer result = map.get(1);
		if (result == null) {
			map.put(1, 1);
		} else {
			map.put(1, result + 1);
	public void testUpdate() throws InterruptedException {
		try (AllInterleavings allInterleavings = 
				new AllInterleavings("TestUpdateWrong");) {
			while (allInterleavings.hasNext()) {
				final ConcurrentHashMap<Integer, Integer> map = 
						new ConcurrentHashMap<Integer, Integer>();
				Thread first = new Thread(() -> {
				Thread second = new Thread(() -> {

The test uses two threads to update the ConcurrentHashMap. Each test run uses a new map and new threads. After a test run, we check if both threads incremented the counter inside the map. To test all thread interleavings we put the complete test in a while loop iterating over all thread interleavings using the class AllInterleavings.

The test fails with the error message:

  TestUpdateWrong.testUpdate:45 expected:<2> but was:<1>

The test fails because for one interleaving the two threads first both get null out of the map. And then both threads insert one into the map.

When running the test, you need to make sure that vmlens is added as Java-agent to the JVM. You can either do this by using maven, as described here, or by using eclipse, as described here.

How does it work?

How can we calculate all thread interleavings? The idea is to identify all atomic and instantly visible operations and method calls. And to execute all combinations of those operations and method calls. Well not all, only those which can lead to a different outcome. In our example test, the methods get and set from ConcurrentHashMap are atomic and instantly visible. To calculate all possible thread interleavings is possible as long as our test is data race free. Therefore we check for each interleaving if the test run is data race free.

Data races are incorrect synchronized reads and writes to the same memory location from different threads. When a data race happens and when an application is correctly synchronized is defined in the Java Memory Model.

That we can use such a two-step approach for testing is no surprise but rather a consequence of the Java Memory Model.

Other tools

Testing multi-threaded software by executing all thread interleavings is not new. The tool Concuerror implements this approach for the language Erlang, a programming language without shared memory. Finding data races at runtime is also not new. A prominent example is ThreadSanitizer which detects data races in C++ programs and golang and Java.

New is the combination of those two techniques to test Java, a language with shared memory.

Debugging is free

O.k. so we calculated all thread interleavings and know for which interleaving a test failed. We can show this interleaving in a report. This makes debugging of the failed test possible, almost for free. Here is the interleaving which led to the failure of the example test:

In case of the failure, both threads first call get. And then both threads call put. So the second thread overwrites the value of the first thread.


By using byte-code transformations, vmlens makes it possible to test all thread interleavings for a given unit test. To calculate all potential thread interleavings is possible since we check that the given unit test is data race free.

We can take an automatic test and surround it by a while loop iterating over all thread interleavings. Unit testing makes it possible to test early, test often, test automatically even for multi-threaded code. And to apply techniques like refactoring and test-driven design to multi-threaded code.

testing multi-threaded applications on the JVM made easy


© 2020 vmlens Legal Notice Privacy Policy