vmlens Blog

How to write JUnit tests for multi-threaded java code

Mon, 07 Sep 2020 22:00:00 GMT

Currently, when we test multi-thread Java we call the class under test by as many threads possible. And since the test is not deterministic, we repeat this test as often as possible.

This approach has the disadvantage that most of the time our faulty test succeeds which makes debugging multi-threaded bugs a nightmare. I, therefore, developed an open-source tool, vmlens, to make JUnit test of multi-threaded Java deterministic. And to make debugging easier.

The idea is to execute all possible thread interleavings for a given test. And to report the failed thread interleaving which makes debugging possible.

A test for a concurrent counter

The following example shows how to use vmlens to write a test for a concurrent counter. All tests are in the GitHub project vmlens-examples in the package com.vmlens.examples.tutorial.counter.

import com.vmlens.api.AllInterleavings;
public class TestCounterNonVolatile {
 int i = 0;
 @Test
 public void test() throws InterruptedException {
   try (AllInterleavings allInterleavings = 
     new AllInterleavings
        ("tutorial.counter.TestCounterNonVolatile");) {
	 while (allInterleavings.hasNext()) {
	  i = 0;
	  Thread first = new Thread(() -> {
		i++;
	  });
	  Thread second = new Thread(() -> {
		i++;
	  });
	  first.start();
	  second.start();

	  first.join();
	  second.join();
				
	  assertEquals(2,i);
     }
   }
 }
}

We increment the field i from two threads. And after both threads are finished we check that the count is 2. The trick is to surround the complete test by a while loop iterating over all thread interleavings using the class AllInterleavings.

vmlens uses byte code transformation to calculate all thread interleavings. Therefore you need to configure vmlens in the maven pom as described here. After running the test, we can see the result of all test runs in the interleave report in the file target/interleave/elements.html.

Our test, test number 5 with the name tutorial.counter.TestCounterVolatile, failed with a data race. A data race means that the reads and writes to a shared field are not correctly synchronized. Incorrectly synchronized reads and writes can be reordered by the JIT compiler or the CPU. Here can is important. Typically incorrectly synchronized reads and writes return the correct result. Only under very specific circumstances, often a combination of a specific CPU architecture, a specific JVM, and a specific thread interleaving, lead to incorrect values.

vmlens checks for every field access if it is correctly synchronized to detect data races.

A test for a concurrent volatile counter

To fix the data race we declare the field as volatile:

public class TestCounterVolatile {
 volatile int i = 0;
 @Test
 public void test() throws InterruptedException {
   try (AllInterleavings allInterleavings = 
	  new AllInterleavings
	    ("tutorial.counter.TestCounterVolatile");) {
	  while (allInterleavings.hasNext()) {
	   i = 0;
	   Thread first = new Thread(() -> {
		 i++;
	   });
	   Thread second = new Thread(() -> {
		 i++;
	   });
	   first.start();
	   second.start();

	   first.join();
	   second.join();
				
	   assertEquals(2,i);
	  }
   }
 }
}

This fixes the data race but now the assertion fails:

TestCounterVolatile.test:30 expected:<2> but was:<1>

To see what went wrong we click on the test tutorial.counter.TestCounterVolatile in the interleave report. This shows us the interleaving which went wrong:

The bug is that both threads first read the variable i and after that, both update the variable. So the second thread overrides the value of the first one.

A test witch an atomic counter

To write a correct concurrent counter we use the class AtomicInteger:

public class TestCounterAtomic {
 AtomicInteger i = new AtomicInteger();
 @Test
 public void test() throws InterruptedException {
   try (AllInterleavings allInterleavings = 
	  new AllInterleavings
         ("tutorial.counter.TestCounterAtomic");) {
	  while (allInterleavings.hasNext()) {
	   i.set(0);
	   Thread first = new Thread(() -> {
		i.incrementAndGet();
	   });
	   Thread second = new Thread(() -> {
	    i.incrementAndGet();
	   });
	   first.start();
	   second.start();

	   first.join();
	   second.join();
	   assertEquals(2,i.get());
	  }
   }
  }
}

Now the increment of our counter is atomic and our test finally succeeds.

Conclusion

As we have seen executing all thread interleavings for multi-threaded tests make multi-threaded tests deterministic. And it makes debugging of failed tests possible. To test all thread interleavings we have surrounded our test by a while loop iterating over all thread interleavings using the class AllInterleavings. vmlens uses byte code transformation to calculate all thread interleavings. Therefore you also need to configure vmlens in the maven pom as described here. And if a test fails you can look at the failing thread interleaving to debug our test.

A new way to unit test multi-threaded Java

Sun, 16 Aug 2020 22:00:00 GMT

Unit-testing multi-threaded Java seams impossible. Bugs depend on the specific timing of the threads and sometimes even on the specific processor type or JVM. But by using byte-code transformations it is possible to test all thread interleavings for a given unit test.

I have implemented those transformations in the open-source tool vmlens.

Why multi-threaded?

The number of current CPU cores is growing exponentially. While it took four years to go from two cores to eight in the year 2009, it took only one year to go from 32 to 64 in the year 2018. The following figure shows the growth of cores for server CPUs. To utilize all those cores we need scalable, multi-threaded software.

Why Java?

The JVM offers numerous techniques, frameworks, and open source libraries for multi-threaded programming. You can use threads, Executors, the ForkJoin framework, parallel streams, actors, to name a few of them. But until now, it was not possible to write a unit test for multi-threaded software. So to test early, test often, test automatically was not possible. And also techniques that require unit tests like re-factoring or test-driven design.

An example of a unit test

But by using byte-code transformations, it is now possible to test all thread interleavings. The idea is to use an automatic test and re-run the test for each possible thread interleaving using vmlens. The following example shows this for a unit test using java.util.concurrent.ConcurrentHashMap to collect statistics. You can download the example from GitHub here.

import com.vmlens.api.AllInterleavings;
public class TestUpdateWrong {
	public void update(ConcurrentHashMap<Integer, Integer> map) {
		Integer result = map.get(1);
		if (result == null) {
			map.put(1, 1);
		} else {
			map.put(1, result + 1);
		}
	}
	@Test
	public void testUpdate() throws InterruptedException {
		try (AllInterleavings allInterleavings = 
				new AllInterleavings("TestUpdateWrong");) {
			while (allInterleavings.hasNext()) {
				final ConcurrentHashMap<Integer, Integer> map = 
						new ConcurrentHashMap<Integer, Integer>();
				Thread first = new Thread(() -> {
					update(map);
				});
				Thread second = new Thread(() -> {
					update(map);
				});
				first.start();
				second.start();
				first.join();
				second.join();
				assertEquals(2, 
				  map.get(1).intValue());
			}
		}
	}
}

The test uses two threads to update the ConcurrentHashMap. Each test run uses a new map and new threads. After a test run, we check if both threads incremented the counter inside the map. To test all thread interleavings we put the complete test in a while loop iterating over all thread interleavings using the class AllInterleavings.

The test fails with the error message:

Failures: 
  TestUpdateWrong.testUpdate:45 expected:<2> but was:<1>

The test fails because for one interleaving the two threads first both get null out of the map. And then both threads insert one into the map.

When running the test, you need to make sure that vmlens is added as Java-agent to the JVM. You can either do this by using maven, as described here, or by using eclipse, as described here.

How does it work?

How can we calculate all thread interleavings? The idea is to identify all atomic and instantly visible operations and method calls. And to execute all combinations of those operations and method calls. Well not all, only those which can lead to a different outcome. In our example test, the methods get and set from ConcurrentHashMap are atomic and instantly visible. To calculate all possible thread interleavings is possible as long as our test is data race free. Therefore we check for each interleaving if the test run is data race free.

Data races are incorrect synchronized reads and writes to the same memory location from different threads. When a data race happens and when an application is correctly synchronized is defined in the Java Memory Model.

That we can use such a two-step approach for testing is no surprise but rather a consequence of the Java Memory Model.

Other tools

Testing multi-threaded software by executing all thread interleavings is not new. The tool Concuerror implements this approach for the language Erlang, a programming language without shared memory. Finding data races at runtime is also not new. A prominent example is ThreadSanitizer which detects data races in C++ programs and golang and Java.

New is the combination of those two techniques to test Java, a language with shared memory.

Debugging is free

O.k. so we calculated all thread interleavings and know for which interleaving a test failed. We can show this interleaving in a report. This makes debugging of the failed test possible, almost for free. Here is the interleaving which led to the failure of the example test:

In case of the failure, both threads first call get. And then both threads call put. So the second thread overwrites the value of the first thread.

Conclusion

By using byte-code transformations, vmlens makes it possible to test all thread interleavings for a given unit test. To calculate all potential thread interleavings is possible since we check that the given unit test is data race free.

We can take an automatic test and surround it by a while loop iterating over all thread interleavings. Unit testing makes it possible to test early, test often, test automatically even for multi-threaded code. And to apply techniques like refactoring and test-driven design to multi-threaded code.

The difference between ARM and x86 for Java

Tue, 04 Aug 2020 22:00:00 GMT

ARM CPUs are coming to Java. Amazon offers cloud instances based on ARM-compatible processors. And there is now a new JEP to create an OpenJDK port for Windows on ARM. And Apple plans to use ARM Processors for its Macs and Macbooks.

What is the difference between ARM and x86 when we program in Java? As long as you follow the rules: None. So let us break the rules

Reordering on ARM vs. x86

The following test leads to different results on the two processor types. We use a dedicated tool for those types of tests, the OpenJDK tool jcstress. The test consists of two methods which are annotated with the annotation @Actor. The annotated methods get called from different threads:

public class TestReorderWriteWrite {
 SingeltonWithDataRace singelton = new SingeltonWithDataRace();
 @Actor
 public void actor1(II_Result r) {
  if (singelton.instance().initialized) {
   r.r1 = 1;
  } else {
   r.r1 = 0;
  }
 }
 @Actor
 public void actor2(II_Result r) {
  if (singelton.instance().initialized) {
   r.r2 = 1;
  } else {
   r.r2 = 0;
  }
 }
}

The class Singelton checks if the variable instance is null and if yes, creates a new SingeltonValue:

public class SingeltonWithDataRace {
 SingeltonValue instance;
 public SingeltonValue instance() {
  if (instance == null) {
   instance = new SingeltonValue();
  }
  return instance;
 }
}

SingeltonValue sets the variable initialized to true in the constructor:

public class SingeltonValue {
 boolean initialized;
 public SingeltonValue() {
  initialized = true;
 }
}

Since we always first write to the variable initialized and then to instance, the variable initialized should always be true. If I run this test on my development machine, an Intel i5 4 core CPU, I see the following results.

  Observed state   Occurrences              Expectation
            1, 1      52368551               ACCEPTABLE

So as expected the variable initialized is always true. Running the same test on an ARM AWS Graviton Processor with 2 vpus gives the following results:

  Observed state   Occurrences              Expectation
            0, 0             0                FORBIDDEN
            0, 1             7   ACCEPTABLE_INTERESTING
            1, 0            14   ACCEPTABLE_INTERESTING
            1, 1    57,117,820               ACCEPTABLE

On ARM the variable initialized is sometimes false, the state 0 1 and 1 0. So on ARM the write to instance and to initialized can be reordered, when we read the variables from a different thread. Why?

The processor memory model

CPU Cores cache the values from the main memory in caches. This bridges the gap between the fast core and the slower memory system. A read from the level 1 cache is about 200 times faster than a read from the main memory.

L1 cache reference      0.5 ns
Branch mispredict         5 ns
L2 cache reference        7 ns    14x L1 cache
Main memory reference 	100 ns    20x L2 cache, 200x L1 cache

From jboner/latency.txt

The result of the test is the effect of this cache system. The behavior of the cache system is specified in a memory model. A memory model answers the question: What happens when multiple threads access the same memory location?

The two processor types have different memory models. The ARM memory model allows the reordering of two writes to different memory locations. And the x86 memory model forbids this. This is the reason why the test leads to different results on the different processor architectures.

Other reorderings like read and write to different memory locations are allowed by both memory models.

Reordering on ARM and x86

The following test shows this. Again we use two methods annotated with @Actor. The two methods run in different threads during the test. The first method writes to the field first and reads from the field second. The second method writes to the field second and reads from the field first:

public class TestReorderReadWrite {
 private int first;
 private int second;
 @Actor
 public void actor1(II_Result r) {
  first = 1;
  r.r1 = second;
 }
 @Actor
 public void actor2(II_Result r) {
  second = 1;
  r.r2 = first;
 }
}

And here are the results from a test run on my development machine, an Intel i5 4 core CPU:

  Observed state   Occurrences              Expectation
            0, 0     5,688,756   ACCEPTABLE_INTERESTING
            0, 1    46,185,263               ACCEPTABLE
            1, 0    26,244,626               ACCEPTABLE
            1, 1            86               ACCEPTABLE

Here is the result from the ARM AWS Graviton Processor with 2 vpus:

  Observed state   Occurrences              Expectation                                  
            0, 0     5,361,697   ACCEPTABLE_INTERESTING                      
            0, 1    55,586,568               ACCEPTABLE           
            1, 0    25,740,292               ACCEPTABLE          
            1, 1             4               ACCEPTABLE

Sometimes both method reads the default value zero. This means that the read and the write to the variables were reordered.

Memory barriers stop reordering

If we want to write meaningful multi-threaded programs we need a way to tell the processor that he should stop reordering. At least at specific points. The processor provides memory barriers for that. If we annotate the field with a volatile variable the JVM generates memory barriers. Here is the assembler code from my development machine, an Intel i5 4 core CPU:

movl    $0x1,0xc(%r10)    ;*putfield first
lock addl $0x0,(%rsp)     : Memory Barrier
mov     0x10(%r10),%edx   ;*getfield second

The JVM inserts the statement lock addl. This makes sure that read and writes do not get reordered.

The Java Memory Model

When we write Java we do not write for a specific processor architecture. So Java also needs a memory model. A memory model which answers the question, what happens when multiple threads access the same memory location, for Java.

The answer is the following:

If a program has no data races, then all executions of the program will appear to be sequentially consistent.

Sequential consistency means that reads and writes are not reordered. A run of a multi-threaded program is simply one specific interleaving of the source code statements of the different threads.

A Java program is only sequential consistent when it does not contains data races. A data race is a read and a write or two writes to the same memory location which is not ordered by synchronization actions. Synchronization actions like the read and write from a volatile field create an order between multiple threads, the happens-before order. For example the write to a volatile variable happens-before all subsequent volatile reads from this variable. And if all memory accesses can be ordered through this happens-before relation our program is data race free.

Who has reordered my program?

The processor core is not the only system that reorders statements. The compiler also reorders statements to improve the performance. Therefore we need a memory model at the language level. Only we, the programmers, can tell how our program needs to be ordered.
But we do not control at which hardware or JVM our program runs on, so we need a way to specify this order in the program code. We do this by the synchronization actions of the Java Memory Model.

Typical suspects for reordering our program are the processor and the JVM compiler, which compiles the bytecode to machine code.

Conclusion

As we have two writes to different variables can only be reordered on ARM CPUs, while it is forbidden on x86 CPUs. The reordering of reading and writing from different variables can be reordered on both CPU types.

When I first read about the Java Memory Model in the book Java Concurrency in Practice, from Brian Goetz et al. I did not understand it. It took long for me to accept that we need to tell the JVM how memory reads and writes should be ordered. So I am happy that we have now another system, ARM, which reorders read and writes. And that we can write small Java programs that demonstrate that we need the Java Memory Model.

Scalability of SynchronizedMap vs. ConcurrentHashMap vs. NonBlockingHashMap

Thu, 09 Jul 2020 22:00:00 GMT

The number of current CPUs cores is growing exponentially, see the following figure for the development of current server CPUs. Therefore I am looking at the scalability of different data structures. The last time we looked at concurrent queues this time we look at concurrent hash maps.

The Concurrent hash maps

We look at three different hash maps, two from the JDK and one from the open-source library JCTools.

SynchronizedMap A thread-safe hash map from the JDK. It can be created by calling Collections.synchronizedMap on java.util.HashMap. It is simply the not thread-safe HashMap surrounded by a single lock.
java.util.concurrent.ConcurrentHashMap is a dedicated implementation of a thread-safe hash map from the JDK. It uses one lock per array element when updating the map. For read-only operations, this map does not need any locks but only volatile reads.
org.jctools.maps.NonBlockingHashMap from the open-source library JCTools does not use any locks. For updates, it instead uses the atomic compare and set operation. And for read-only operations, it uses similar to ConcurrentHashMap only volatile reads. This map only supports the java.util.concurrent.ConcurrentMap API till JDK 1.6.

The read-only benchmark

The hash maps have to basic operations, one to put a key-value pair into the map and one to get a value for a key back. Therefore I use two benchmarks. Read-only using get and write only, using put. Let us first look at the benchmark for get. This benchmark uses a prefilled map and always gets the same element out of the map. I use the OpenJDK JMH project to implement the benchmark:

@State(Scope.Benchmark)
public abstract class AbstractBenchmark {
 private final Map forGet;
 Integer element = 1;
 public AbstractBenchmark() {
	forGet = create();
	Random random = new Random();
	final int maxKey = 10000;
	for (int i = 0; i < 1000; i++) {
	  forGet.put(random.nextInt(maxKey), element);
	}
	forGet.put(100, element);
 }
 @Benchmark
 public Object get() {
	return forGet.get(100);
 }
}

Here are the results for an Intel Xeon Platinum 8124M CPU @ 3.00GHz with two sockets with eighteen cores per socket and two hardware threads per core. I used Amazon Corretto 11 as JVM.

The read performance from ConcurrentHashMap and NonBlockingHashMap is impressive. To see how good the performance really is we compare it with a baseline benchmark, e.g. a benchmark which only returns a constant:

@State(Scope.Benchmark)
public class BaselineBenchmark {
 int x = 923;
 @Benchmark
 public int get() {
   return x;
 }
}

If we run it one the same Intel Xeon machine again, we see that the read performance is almost as fast as the baseline:

The write-only benchmark

Now let us look at the write-only benchmark. This time we put random keys with a constant value into the map:

@State(Scope.Benchmark)
public abstract class AbstractBenchmark {
 private Map forPut;
 public static final int MAX_KEY = 10000000;
 Integer element = 1;
 protected abstract Map create();
 @Setup(Level.Iteration)
 public void setup() {
	forPut = create();
 }
 @Benchmark
 public Object put() {
	int key = 
	ThreadLocalRandom.current().
	  nextInt(MAX_KEY);
	return forPut.put(key, element);
 }
}

If we run this benchmark on the Intel Xeon we see the advantage of lock free algorithms. While ConcurrentHashMap only scales up to 16 threads NonBlockingHashMap scales till the maximum thread count of 72 threads.

You can download the source code for all the benchmarks from GitHub here.

concurrencyLevel of ConcurrentHashMap

Disqus and Reddit commentators suggested testing with a higher concurrencyLevel. So here is the same benchmark with a concurrencyLevel of 72 compared to the default concurrencyLevel of 12:

The concurrencyLevel does not affect the scalability of the ConcurrentHashMap. Why?

The meaning of the concurrencyLevel changed between JDK 7 and JDK 8. In JDK 7 the hash map consisted of an array of segments. Each segment consisted of an array of lists of key-value pairs and extended the class java.util.concurrent.locks.ReentrantLock. The size of this array of segments was fixed and was specified by the concurrencyLevel. The idea was that if many threads are accessing the hash map it is better to use multiple locks and smaller segments. And if only a small number of threads are accessing the hash map a lower number of locks fits better.

But with JDK 8 the ConcurrentHashMap uses only one array of lists of key-value pairs. And each array element is used as a lock. The concurrencyLevel is now just another way to specify the capacity of the map, as we can see in the constructor of ConcurrentHashMap:

public ConcurrentHashMap(int initialCapacity, 
	float loadFactor, int concurrencyLevel) {
 if (!(loadFactor > 0.0f) || initialCapacity < 0 
 	|| concurrencyLevel <= 0)
 throw new IllegalArgumentException();
 if (initialCapacity < 
 		concurrencyLevel) // Use at least as many bins
     initialCapacity = 
     	concurrencyLevel; // as estimated threads 
 long size = (long)(1.0 + 
 	(long)initialCapacity / loadFactor);
 int cap = (size >= (long)MAXIMUM_CAPACITY) ?
 MAXIMUM_CAPACITY : tableSizeFor((int)size);
 this.sizeCtl = cap;
}

Conclusion

The benchmark for the concurrent hash map shows what is possible in Java. Read-only performs is almost as good as the baseline since it consists only of volatile reads. And by using a non-blocking algorithm using the atomic compare and set operation it is possible to implement write operations which scale.

Scalability of concurrent queues from java.util.concurrent and org.jctools

Thu, 02 Jul 2020 22:00:00 GMT

The core count of current CPUs is growing exponentially. See the following figure for the development of the core count of current server CPUs. So the scalability of concurrent data structures becomes more and more important. Today we look at the scalability of one such data structure: Concurrent Queues.

Every time two threads communicate asynchronously, chances are high that a queue is involved. Web servers like tomcat use queues to dispatch incoming requests to worker threads. Frameworks like Akka use queues for the communication of their actor classes.

The Concurrent Queues

We will look at the scalability of the following four queues, three bounded queues, and one unbounded queue:

java.util.concurrent.ArrayBlockingQueue: A bounded queue from the JDK. The queue uses an array to store its elements and a single lock for both reading and writing.
org.jctools.queues.MpmcArrayQueue: A bounded queue from the open-source library JCTools. It also uses an array to store its elements. But instead of a lock, it uses the atomic compare and swap operation for the concurrent update of the array.
java.util.concurrent.LinkedBlockingQueue: A bounded queue from the JDK. The queue uses a linked list to store its elements and two locks. One lock for reading and one lock for writing
java.util.concurrent.ConcurrentLinkedQueue: An unbounded queue from the JDK. It also uses a linked list to store its elements. But instead of a lock, it uses the atomic compare and swap operation for the concurrent update of the list.
org.jctools.queues.MpmcUnboundedXaddArrayQueue: An unbounded queue from the open-source library JCTools. The queue uses multiple arrays or chunks to store its elements. It uses atomic operations to calculate the index in those chunks for reading and writing.

The Benchmark

When testing we need to decide in which state we want to test the queues. I opted for a test in a steady state. And to test the queues when they are only slightly filled. This avoids that the garbage collection falsifies the measured times. I use the following JMH benchmark:

@State(Scope.Benchmark)
public abstract class AbstractBenchmark {
 public static final int QUEUE_SIZE = 1024;
 private static long DELAY_PRODUCER = 1024;
 public AbstractBenchmark(Queue queue) {
   this.queue = queue;
 }
 private final Queue queue;
 Integer element = 1;
 @Setup(Level.Iteration)
 public void setup() {
   synchronized (queue) {
		queue.clear();
	}
 }
 @Benchmark
 @Group("g")
 @GroupThreads(1)
 public void put(Blackhole bh, Control control) {
   bh.consumeCPU(DELAY_PRODUCER);
   while (!queue.offer(element) && 
             !control.stopMeasurement) {
   }
 }
 @Benchmark
 @Group("g")
 @GroupThreads(1)
 public Object take(Blackhole bh, Control control) {
   Object result = queue.poll();
   while (result == null && 
             !control.stopMeasurement) {
	  result = queue.poll();
   }
   return result;
 }
}

Here are the results for Intel Xeon Platinum 8124M CPU @ 3.00GHz with two sockets with eighteen cores per socket and two hardware threads per core. I used Amazon Corretto 11 as JVM. I use a high delay for the consumer threads, 1024.

Now we use a much slower delay, 32, and repeat the benchmark.

The first benchmark is more realistic. The producer threads normally do not only publish to the queue but also do some other work. In this benchmark, the lock-free queues outperform the lock-based queues.

You can download the source code from GitHub here and test the scalability in your production environment.

Conclusion

To scale queues is hard. Since they are first in first out all writing threads compete for the next position to write to. And all reading threads compete for the next element to read. But as long as the queue is not overloaded the lock-free queues using compare and swap outperform the lock-based queues. And the unbounded queue org.jctools.queues.MpmcUnboundedXaddArrayQueue outperforms all other queues, independent of the workload.

How I solved an OutOfMemoryError using a concurrent state machine

Sun, 28 Jun 2020 22:00:00 GMT

Many concurrent problems can be solved through a concurrent state machine. This leads to better scalability and in my case better memory utilization.

I learned this through an OutOfMemoryError exception. And this video from Dr. Cliff Click.

The problem: An OutOfMemoryError exception

Running my tool with the neo4j-analytics benchmark from the renaissance benchmark suite let to an OutOfMemoryError. By using the JVM arguments -XX:+HeapDumpOnOutOfMemoryError I generated a heap dump. The Eclipse memory analyzer showed that I was creating too many instances of the class com.vmlens.trace.agent.bootstrap.callback.state.ObjectState: More than forty-five thousand or two gigabytes RAM.

I added a field to each class to log the threads accessing the fields of this class. I created a new instance of com.vmlens.trace.agent.bootstrap.callback.state.ObjectState in the constructor of each class. And I updated the values from com.vmlens.trace.agent.bootstrap.callback.state.ObjectState using a synchronized block.

The solution: A concurrent state machine

A friend of mine told me about a video from Dr. Cliff Click. In it, Dr. Cliff Click shows how we can use state machines to implement a concurrent hash map. This was the solution: I can also use a concurrent state machine. The state machine starts with null, goes to SingleThreaded, and from there to MultiThreaded.

To implement the state machine I use the method compareAndSet from the class java.util.concurrent.atomic.AtomicReferenceFieldUpdater. This method lets you execute the two operations compare and set atomically. The method typically directly maps to a machine code instruction. Starting with JDK 9 you also can use java.lang.invoke.VarHandle instead of java.util.concurrent.atomic.AtomicReferenceFieldUpdater.

We need to retry the method compareAndSet until it succeeds. This retry loop is already implemented in the class java.util.concurrent.atomic.AtomicReferenceFieldUpdater in the method updateAndGet:

public final V updateAndGet(T obj, UnaryOperator<V> updateFunction) {
  V prev, next;
  do {
     prev = get(obj);
     next = updateFunction.apply(prev);
  } while (!compareAndSet(obj, prev, next));
  return next;
}

Using this method the source code of our state machine looks like this:

volatile State state = null;
private static final AtomicReferenceFieldUpdater STATE = 
	AtomicReferenceFieldUpdater.newUpdater(
	ObjectWithState.class, State.class, "state");
public void update(ObjectWithState object) {
  STATE.updateAndGet(object, (current) -> {
  if (current == null) {
   return new SingleThreaded(); 
  }
  return new MultiThreaded(current);
 });
}

Using the new way to store the data I now need three hundred megabytes instead of two gigabytes.

Updating multiple variables

There is currently no method to update multiple variables atomically. But often we can still use compareAndSet for multiple variables. When our state machine does not contain cycles and we access the next variable only when we reached a specific state, we still can use compareAndSet, even for multiple variables.

An example of this technique is the class java.util.concurrent.FutureTask. This class uses a state machine stored in the variable state. The dependent variable outcome is only read or written when we reached a specific state.

The states are documented in the JavaDoc of the class:

/*
* Possible state transitions:
* NEW -> COMPLETING -> NORMAL
* NEW -> COMPLETING -> EXCEPTIONAL
* NEW -> CANCELLED
* NEW -> INTERRUPTING -> INTERRUPTED
*/

The following shows the source code to update the variable outcome. We only update the variable outcome when we managed to set the state COMPLETING:

protected void set(V v) {
   if (STATE.compareAndSet(this, 
         NEW, COMPLETING)) {
     outcome = v;
     STATE.setRelease(this, 
        NORMAL); // final state
     finishCompletion();
   }
}

The static field STATE is a java.lang.invoke.VarHandle to call compareAndSet for the variable state.

Conclusion

After realizing that I was implementing a concurrent state machine, I was able to reduce the size from two gigabytes to three hundred megabytes. An algorithm using compareAndSet, which is used to implement the concurrent state machines, typically scale better than a lock-based algorithm. For example, for write operations, the concurrent hash map by Dr. Cliff Click scales better than the locked based java.util.concurrent.ConcurrentHashMap.

A new concurrent hash map

Mon, 15 Jun 2020 22:00:00 GMT

The following shows an algorithm for a simple concurrent hash map which for writes scales better than java.util.concurrent.ConcurrentHashMap. java.util.concurrent.ConcurrentHashMap is fast, sophisticated, and rather complicated. And I needed a similar data structure for vmlens, a tool to test concurrent Java. I can not use java.util.concurrent.ConcurrentHashMap, since I also want to trace the internals of java.util.concurrent.ConcurrentHashMap with this tool.

So I decided to implement only the bare minimum, the method computeIfAbsent. This led to a simple yet scalable concurrent hash map.

You can download the source code from GitHub here.

The Algorithm

The algorithm uses open addressing with linear probing. It is based on work from Jeff Preshing, which again is based on work from Dr. Cliff Click. This GitHub repository contains the source code of the hash map from Dr. Cliff Click.

Open addressing means that the key-value pairs are stored in a single array. The index to store a key-value pair is given by the hash code modulo the array size.

Linear probing means that if this array element is already occupied, we use the next array element. And so on, until we have found an empty slot. Here is an example for an insert after two filled slots:

To make the algorithm concurrent, we need to fill the empty array element using compareAndSet. By using compareAndSet we make the checking for null and the setting of the new element atomic. If compareAndSet succeeds we have successfully inserted a new element. If it fails we need to check if the key inserted by the other threads equals our key. If yes we are done and can return the value of this element. If not, we need to find a new empty slot and try again.

Here is the source code for the algorithm:

private static final VarHandle ARRAY = 
	MethodHandles.arrayElementVarHandle(KeyValue[].class);
volatile KeyValue[] currentArray;
public V computeIfAbsent(K key, Function<? super K, ? extends V> compute) {
	KeyValue[] local = currentArray;
	int hashCode = hashAndEquals.hashForKey(key);
	// the array position is given by hashCode modulo array size. Since
	// the array  size is a power of two, we can use & instead of %.
	int index = (local.length - 1) & hashCode;
	int iterations = 0;
	KeyValue created = null;
	KeyValue current = tabAt(local, index);
	// fast path for reading
	if (current != null) {
		if (hashAndEquals.keyEquals(current.key, key)) {
			return (V) current.value;
		} else if (current.key == MOVED_NULL_KEY) {
			return (V) insertDuringResize(key, compute);
		}
	}
	while (true) {
		if (current == null) {
			if (created == null) {
				created = new KeyValue(key, compute.apply(key));
			}
			// use compareAndSet to set the array element if it is null
			if (casTabAt(local, index, created)) {
				if (((iterations) << resizeAtPowerOfTwo) > local.length) {
					resize(local.length, iterations);
				}
				// if successful we have inserted a new value
				return (V) created.value;
			}
			// if not we need to check if the other key is the same
			// as our key
			current = tabAt(local, index);
			if (hashAndEquals.keyEquals(current.key, key)) {
				return (V) current.value;
			} else if (current.key == MOVED_NULL_KEY) {
				return (V) insertDuringResize(key, compute);
			}
		}
		index++;
		iterations++;
		if (index == local.length) {
			index = 0;
		}
		if ((iterations << resizeAtPowerOfTwo) > local.length) {
			resize(local.length, iterations);
			return computeIfAbsent(key, compute);
		}
		current = tabAt(local, index);
		if (current != null) {
			if (hashAndEquals.keyEquals(current.key, key)) {
				return (V) current.value;
			} else if (current.key == MOVED_NULL_KEY) {
				return (V) insertDuringResize(key, compute);
			}
		}
	}
}
private static final KeyValue tabAt(KeyValue[] tab, int i) {
	return (KeyValue) ARRAY.getVolatile(tab, i);
}
private static final boolean casTabAt(KeyValue[] tab, int i, 
		KeyValue newValue) {
	return ARRAY.compareAndSet(tab, i, null, newValue); 
}

Why does it work?

The trick is that if a thread has read a filled array element, the array element will stay filled and the key will stay the same. So the only time we need to check if another thread has modified an already read value is when we read null. We do this by using compareAndSet as outlined above.

Therefore our algorithm is possible because we do not allow to remove keys and the keys are immutable.

Resizing

During resizing, we need to make sure that newly added elements do not get lost. So we need to block the current array for updates during resizing. We do this by setting each empty array element to a special value MOVED_NULL_KEY. If a thread sees such a value, it knows that a resize is running. And will wait with an insert till the resize is done.

The resize of the hash map consists of the following steps:

Set all null values in the current array to the special value MOVED_NULL_KEY.
Create a new array.
Copy all values from the current array to the new array.
Set the current array to the new array.

Here is the source code:

private static final Object MOVED_NULL_KEY = new Object();	
private final Object resizeLock = new Object();	
private void resize(int checkedLength, int intervall) {
	synchronized (resizeLock) {
		if (currentArray.length > checkedLength) {	
			return;
		}
		resizeRunning = true;
		// Set all null values in the current array to the special value
		for (int i = 0; i < currentArray.length; i++) {
			if (tabAt(currentArray, i) == null) {
				casTabAt(currentArray, i, MOVED_KEY_VALUE);
			}
		}
		int arrayLength = Math.max(currentArray.length * 2,
			tableSizeFor(intervall * newMinLength + 2));
		// Create a new array
		KeyValue[] newArray = new KeyValue[arrayLength];
		// Copy all values from the current array to the new array
		for (int i = 0; i < currentArray.length; i++) {
			KeyValue current = tabAt(currentArray, i);
			if (current != MOVED_KEY_VALUE) {
				int hashCode = hashAndEquals.hashForKey(current.key);
				int index = (newArray.length - 1) & hashCode;
				while (newArray[index] != null) {
					index++;
					if (index == newArray.length) {
						index = 0;
					}
				}
				newArray[index] = current;
			}
		}
		// Set the current array to the new array
		currentArray = newArray;
		resizeRunning = false;
		resizeLock.notifyAll();
	}
}

Benchmark results

The performance of the different hash maps depends on the specific workload and the quality of the hash function. So take the following results with a grain of salt. To measure the performance I call the method computeIfAbsent with a random key using JMH. Here is the source code for the benchmark method:

public static final int MAX_KEY  = 10000000;
public static final Function COMPUTE = new Function() {
	public Object apply(Object t) {
		return new Object();
	}
};
@Benchmark
public Object computeIfAbsentHashMap() {
	int key = ThreadLocalRandom.current().nextInt(MAX_KEY);
	return computeIfAbsentHashMap.computeIfAbsent(key, COMPUTE);
}

You can download the source for the benchmark code from GitHub here. Here are the results for Intel Xeon Platinum 8124M CPU @ 3.00GHz with two sockets with eighteen cores per socket each and two hardware threads per core. I used Amazon Corretto 11 as JVM.

The size of the map is similar to the size of ConcurrentHashMap. For five million random keys java.util.concurrent.ConcurrentHashMap needs 285 megabytes and the new map 253 megabytes.

Differences in behavior

The new map retries to insert an element if another thread has updated an empty slot. java.util.concurrent.ConcurrentHashMap uses the array bins as monitors for a synchronized block. So only one thread at a time can modify the content of those bins.

This leads to the following difference in behavior: The new map might call the callback method compute multiple times. Every failed update leads to a method call. java.util.concurrent.ConcurrentHashMap on the other side only calls compute at maximum once.

Conclusion

By implementing only the computeIfAbsent method, we can write a concurrent hash map which for writes scales better than java.util.concurrent.ConcurrentHashMap. The algorithm is so easy because we avoided the deletion and the change of keys. I use this map to store classes together with related metadata.

How to define thread-safety in Java?

Tue, 19 May 2020 22:00:00 GMT

Thread-safety is typically defined as works correctly even when used by multiple threads.

In Java Concurrency in Practice, for example, thread-safety is defined as:

A class is thread-safe if it behaves correctly when accessed from multiple threads, regardless of the scheduling or interleaving of the execution of those threads by the runtime environment, and with no additional synchronization or other coordination on the part of the calling code.

and in Effective Java:

Unconditionally thread-safe: Instances of this class are mutable, but the class has sufficient internal synchronization that its instances can be used concurrently without the need for any external synchronization. Examples include Random and ConcurrentHashMap.

But all those definitions have the disadvantage that they do not tell us how. They do not tell us how the methods behave when called from multiple threads and they do not tell us how to use the methods in a thread-safe way.

The latter point is especially problematic because of classes like java.util.concurrent.atomic.LongAccumulator.

So I rather follow the Art of Multiprocessor Programming. by using a correctness property to define thread-safety. And since there are two correctness properties used in Java, e.g. linearizable respectively atomic methods and quiescent methods, we can define thread-safety as:

A class is thread-safe if all its public methods are either atomic or quiescent.

When is a method atomic?

A method is atomic when the method call appears to take effect instantaneously, at some time between the start and the end of the method call. So other threads either see the state before or after the method call but no intermediate state. Threads reading the state see the most recently written state. Or more formally, reading method calls create a happens-before relation to previous writing method calls.

Atomic methods lead to a correctness property called linearizability. Maurice Herlihy and Jeannette Wing introduced the notion of linearizability in 1990 in the paper Linearizability: a correctness condition for concurrent objects

Types of atomic classes

Stateless

Classes without state are automatically atomic. Since they have no state, different threads can never see any intermediate state. Examples for stateless classes include java.lang.Math or java.util.Arrays. A more complicated example is the class com.google.gson.Gson, see this blog post.

Immutable

Immutable classes are similar to classes without state automatically atomic. Since their state does not change, different threads can never see an inconsistent intermediate state. To make sure that the threads always see a completely initialized immutable object you should use the final modifier for all fields. From the Java memory model:

final fields also allow programmers to implement thread-safe immutable objects without synchronization. A thread-safe immutable object is seen as immutable by all threads, even if a data race is used to pass references to the immutable object between threads.

Examples of immutable classes are java.lang.String or java.lang.Integer.

Using locks

The next technique to make methods is to use locks. You can either use the Java intrinsic locks, e.g. synchronized blocks or the locks from the package java.util.concurrent.locks.

The easiest way is to use a single monitor and surround each public method of the class with a synchronized block using this monitor. The methods synchronizedSet, synchronizedList and so on from the class java.util.Collections, for example, use this technique to make the methods of an underlying collection atomic.

The following shows part of the source code of the class SynchronizedList. This class is used by the method synchronizedList to create a class with atomic methods:

static class SynchronizedList<E>
        extends SynchronizedCollection<E>
        implements List<E> {
  final List<E> list;
  public E get(int index) {
     synchronized (mutex) {return list.get(index);}
  }
  public E set(int index, E element) {
     synchronized (mutex) {return list.set(index, element);}
  }
  // other methods omitted
}

Besides those intrinsic locks, Java provides reentrant exclusive, read-write, and stamped locks in the package java.util.concurrent.locks.

Lock-free

Current processors provide machine code instructions to atomically compare and swap or fetch and add a variable. Those instructions are used by the JVM to implement locks. And you can use those instructions directly through the classes in the package java.util.concurrent.atomic and starting with JDK 9 through the class java.lang.invoke.VarHandle .

The class java.util.concurrent.ConcurrentSkipListMap is an example for a class using those operations to implement atomic methods. And the class java.util.concurrent.ConcurrentHashMap is an example of the combination of those machine code instructions with locks.

Other meanings of atomic

Atomic is an overloaded term. It is used to describe database transactions through the ACID property, e.g. Atomicity Consistency Isolation Durability. And it is used to describe the effect of machine code instructions. The meanings are similar but with subtle differences. In our context an atomic method means the following:

all or nothing: other threads either see the state before or after the method call but no intermediate state
visibility: Threads reading the state see the most recently written state. Or more formally, reading method calls create a happens-before relation to previous writing method calls.

When is a method quiescent?

The second type of method, quiescent methods, is typically used to collect statistics. Suppose you want to track how often a specific method was called. You have multiple threads incrementing a counter every time they execute a specific method. And when all threads are stopped you collect the result.

This is how a quiescent method works.

When a quiescent method gets called at a time of quiescent, e.g. no other method calls are pending, it sees the result of all previous method calls.

The classes java.util.concurrent.atomic.DoubleAccumulator, java.util.concurrent.atomic.DoubleAdder, java.util.concurrent.atomic.LongAccumulator and java.util.concurrent.atomic.LongAdder are examples of classes with quiescent methods.

Quiescent consistency was first introduced implicitly in 1994 by James Aspnes, Maurice Herlihy, and Nir Shavit in the paper Counting networks.

How do those methods behave and how to use them correctly?

In the beginning, I complained that the typical definition of thread-safety does not tell us how thread-safe classes behave and how we can use them. So what does our new definition tells us about how our class works when called from multiple threads?

Both types, atomic and quiescent allows us to map the concurrent method calls to an equivalent sequential one. And instead of reasoning about the concurrent method calls, we can reason about the equivalent sequential method flow.

Atomic methods always map to a sequential method flow. But the usage of atomic methods still can lead to a race condition. Typically errors happen when we call multiple non-commutative atomic methods from the same thread. The following method leads to a race condition since the method get and put of java.util.concurrent.ConcurrentHashMap are not commutative:

public void update(ConcurrentHashMap<Integer, Integer> map) {
  Integer result = map.get(1);
    if(result == null) {
      map.put(1, 1);
    } else {
      map.put(1, result + 1);
  }
}

To implement an update without race condition you should use the method compute:

public void update(ConcurrentHashMap<Integer,Integer>  map) {
  map.compute(1, (key, value) -> {
    if (value == null) {
       return 1;
     } 
      return value + 1;
  });
}

Quiescent methods on the other side only map to a sequential flow when they are called at a time of quiescent, e.g. no other method calls are pending. So a typical scenario for quiescent methods looks like this:

LongAdder longAdder = new LongAdder();
ExecutorService service = Executors.newCachedThreadPool();
service.submit( () -> { longAdder.increment(); } );
service.submit( () -> { longAdder.increment(); } ); 
service.shutdown();
service.awaitTermination(10, TimeUnit.SECONDS);
longAdder.longValue();

Summary

A class is thread-safe if all its public methods are either atomic or quiescent.

A method is quiescent when it sees the result of all previous method calls at a time of quiescent, e.g. no other method calls are pending.

The Java Memory Model enables testing of multithreaded Java

Tue, 05 May 2020 22:00:00 GMT

Testing multithreading Java seams impossible. Bugs depend on the specific timing and sometimes even on a specific processor type or JVM. But Java has a specification that enables us to test multithreaded software: The Java memory model.

The Java memory model enables us to execute all thread interleavings of a multithreaded program as long as the program is data race free. And it defines rules to automatically check if a program contains a data race. So we can use a two-step approach for testing: In the first step, we execute all thread interleavings. In the second step, we check if the test contained data races. I have implemented those two steps in a tool called vmlens.

What is the Java memory model?

The Java memory model is a specification to define what happens when multiple threads read and write the same memory location As Hans Boehm, one of the authors of the Java memory model writes:

We have been repeatedly surprised at how difficult it is to formalize the seemingly simple and fundamental property of “what value a read should return in a multithreaded program.”

The Java memory model answers this question in the following way:

If a program has no data races, then all executions of the program will appear to be sequentially consistent.

Sequential consistency means that a run of a multi-threaded program is one specific interleaving of the source code statements of the different threads. So to execute a sequential consistent program we can use the following algorithm: Select one thread and execute the current statement of this thread. Repeat this till all threads are terminated.

But Java programs are only sequential consistent when they are data race free. A program contains a data race when it contains a read and a write or two writes to the same memory location which are not ordered by the so-called happens-before order. Synchronization actions like the read and write from a volatile field generate an order between multiple threads, the happens-before order. For example the write to a volatile variable happens-before all subsequent volatile reads from this variable. And if all memory accesses can be ordered through this happens-before relation our program is data race free.

In the following example, the read and write to the field i are ordered through a happens-before relation. In this interleaving the write to the volatile variable v in thread A comes before the read from this variable in thread B. So the write to the normal field i in the thread A can be ordered with the reading from this field in thread B. And so the following interleaving is data race free:

Operation	Thread
write normal field i	A
write volatile field v	A
read volatile field v	B
read normal field i	B

Compare this to the following interleaving of the same program. Here the read from the field i in thread B can not be ordered with the write to this field in thread A. And so the following interleaving contains a data race.

Operation	Thread
read volatile field v	B
read normal field i	B
write normal field i	A
write volatile field v	A

The weaker form of sequential consistency of the Java memory model has the advantage that it allows the JVM to apply optimizations of the generated machine code. And it makes caching in the processor possible.

This blog post from Martin Thompson shows how the caching in the CPU works and how caching is related to the Java memory model. And this post from Aleksey Shipilev gives a comprehensive overview of the Java memory model.

And this weaker form of sequential consistency means that we only need to consider synchronization actions when we want to test all thread interleavings.

How does the Java memory model enable the testing of multithreaded Java?

When testing we only need to consider thread interleavings which lead to a different happen-before order. So we calculate all orders the synchronization actions of the test run can create. And for each order, we execute one thread interleaving which leads to this order.

After we executed all those thread interleavings we need to check that the test is data race free. Therefore we check for each multi-threaded memory access if this access can be ordered according to the happens-before order.

New is the combination of those two techniques in one tool to systematically test multithreaded Java programs.

A multithreaded unit test

The following JUnit test implements the examples from above in a real test. One thread reads the variable i and the volatile variable v. The other thread writes to those two variables. To test all thread interleavings we put the complete test in a while loop iterating over all thread interleavings using the class com.vmlens.api.AllInterleavings.

You can download the source code of the example from Github here.

import org.junit.Test;
import com.vmlens.api.AllInterleavings;
public class TestReadWrite {
 volatile int v = 0;
 int i = 0;
 @Test
 public void testUpdate() throws InterruptedException {
   try (AllInterleavings allInterleavings =
     new AllInterleavings(TestReadWrite.class.getName());) {
	 while (allInterleavings.hasNext()) {
	   Thread first = new Thread(() -> {
		 i = 5;
		 v = 7;
	   });
	   Thread second = new Thread(() -> {
		 int x = v;
		 int y = i;
	   });
	   first.start();
	   second.start();

	   first.join();
	   second.join();
	   }
    }
 }
}

vmlens traces all multi-threaded memory accesses and synchronization actions as java agent using byte code transformation. Using this trace vmlens calculates all potential thread interleavings and checks for data races. Running the above test, vmlens reports the following data race.

Performance of the multithreaded unit tests

Using this two-step approach it is possible to test even complicated data structures like java.util.concurrent.ConcurrentHashMap. For example, testing put using two threads takes 353 iterations and less than 3 seconds on my Intel i5 3,40 GHz 4 core CPU, see the test com.vmlens.examples.javaMemoryModel.TestConcurrentHashMapWithoutAtomicTwoPuts. Since we always run one thread at a time the performance depends on the CPU clock speed and not on the available cores.

But typically we do not write new concurrent data structures. We use existing data structures. Here we can use the fact that most of the methods of concurrent data structures are atomic. And for two atomic methods a and b we only need to test two combinations. The combination a before b and the combination b before a. So we can drastically reduce the iteration count when testing the usage of atomic methods.

Summary

The Java memory model guarantees that all data race free programs are sequentially consistent. This allows us to test multithreaded programs in a two-step approach. In the first step, we execute all thread interleavings and in the second step, we check for data races. This allows us to test multithreaded Java in a systematic, reproducible way.

Lambdas for concurrent maps

Wed, 15 Apr 2020 22:00:00 GMT

The package java.util.concurrent contains two concurrent maps, the class ConcurrentHashMap, and the class ConcurrentSkipListMap. Both classes are thread-safe and high performant. But using them is error-prone because of read modify write race conditions. Here come lambda expressions into the play. Lambda expressions help us to avoid these race conditions elegantly.

Let us see how.

Read modify write race condition

The race condition happens when we read an element from the map, modify this element and write the element back into the map. Like in the following example:

import com.vmlens.api.AllInterleavings;
public class TestUpdateWrong {
	public void update(ConcurrentHashMap<Integer, Integer> map) {
		Integer result = map.get(1);
		if (result == null) {
			map.put(1, 1);
		} else {
			map.put(1, result + 1);
		}
	}
	@Test
	public void testUpdate() throws InterruptedException {
		try (AllInterleavings allInterleavings = 
			new AllInterleavings("TestUpdateWrong");) {
			while (allInterleavings.hasNext()) {
				final ConcurrentHashMap<Integer, Integer> map = 
						new ConcurrentHashMap<Integer, Integer>();
				Thread first = new Thread(() -> {
					update(map);
				});
				Thread second = new Thread(() -> {
					update(map);
				});
				first.start();
				second.start();
				first.join();
				second.join();
				assertEquals(2, map.get(1).intValue());
			}
		}
	}

}

You can download the source code of the example from Github here.

Here we implement a per-key counter. In the update method, we initialize the count to 1 if no mapping exists otherwise we increment the count by one. To reproduce the race condition we update the map from two different threads. After both threads are stopped we check if the value is indeed two.

To test all thread interleavings we put the complete test in a while loop iterating over all thread interleavings using the class AllInterleavings from vmlens, line 15. Running the test we see the following error:

java.lang.AssertionError: expected:<2> but was:<1>

To see why the result is one and not two as expected we can look at the report vmlens generated:

The problem is that the update method is not atomic. Both threads can read the same. This lets the last thread override the result of the first thread.

Avoiding read modify write race condition with lambda expressions

To avoid this race condition we need a way to execute all three operations, the read, the modification and the write in one atomic method call. The method compute does exactly this using a lambda expression:

public void update(
	ConcurrentHashMap<Integer,Integer>  map ) {
	map.compute(1, (key, value) -> {
		if (value == null) {
			return 1;
		 } 
			return value + 1;
	});
}

Now the read modify write operations happen in one atomic method and the race disappears.

Lambdas should be pure

What properties should a lambda function have?

The lambda expressions in the ConcurrentHashMap get executed under a synchronized block on a bin entry of the hash map. Therefore you must not call another writing operation of this ConcurrentHashMap instance. Otherwise, this can lead to deadlocks, as shown in this blog post.

When we use a ConcurrentSkipListMap on the other side our lambda expressions might be called multiple times, since this map uses an optimistic concurrency scheme. To not depend on implementation details the general advice is to make lambda expression pure. This means they should have no side effects and simply calculate a new immutable value from a given immutable value.

Conclusion

Lambda expressions help to avoid read modify write race conditions elegantly. When using lambda expressions the elements in the collection should be immutable. And the lambda function itself should be pure.

Gson an example for a stateless thread-safe utility class

Fri, 10 Apr 2020 22:00:00 GMT

Stateless classes are perfect for multi-threaded programming. They can be called from multiple threads without ever worrying about synchronization of their state since there is none. But how can I write a performant, thread-safe stateless class?

Gson, an open-source library to convert Java Objects into JSON and back from google, is a good example of such a class. Gson uses the following three techniques to implement a performant, thread-safe stateless class:

Stack confinement or create a new instance for every method call
Thread-safe caching to reduce the creation time
Configuration of the instance at construction time

Let us look at those three techniques in more detail.

Stack confinement or create a new instance for every method call

Stack confinement is described in the book Java Concurrency in Practice, by Brian Goetz et al.:

If data is only accessed from a single thread, no synchronization is needed. This technique, thread confinement, is one of the simplest ways to achieve thread safety. [...] Stack confinement is a special case of thread confinement in which an object can only be reached through local variables.

So the idea is to only use local variables to achieve thread-safety. To do this we create a new instance at the beginning of our method and store this instance in a local variable.

The following shows how this is done in Gson:

public void toJson(Object src, Type typeOfSrc, 
		Appendable writer) throws JsonIOException {
    try {
      JsonWriter jsonWriter = newJsonWriter(
      	Streams.writerForAppendable(writer));
      toJson(src, typeOfSrc, jsonWriter);
    } catch (IOException e) {
      throw new JsonIOException(e);
    }
  }

We create a new instance of JsonWriter in the method newJsonWriter for every method call. This instance is only reachable through local variables. So our data is stack confined and therefore thread-safe.

Thread-safe caching using ConcurrentHashMap

Now since we create a new instance for each method call, the instance creation time becomes performance-critical. To improve this time we use thread-safe caching.

We need to things for this: First a thread-safe hash map and second thread-safe cached elements. For the hash map, we use the class java.util.concurrent.ConcurrentHashMap. This class provides a high performant thread-safe hash map. For the elements, we either use immutable classes or we need to synchronize their methods.

In Gson the type adapters are cached in a ConcurrentHashMap:

private final Map<TypeToken<?>, TypeAdapter<?>> typeTokenCache 
	= new ConcurrentHashMap<TypeToken<?>, TypeAdapter<?>>();

Gson uses both immutable elements like for example com.google.gson.internal.bind.ArrayTypeAdapter and synchronized elements like in com.google.gson.internal.bind.DateTypeAdapter.

Configure the instance at creation time

Gson has multiple configuration options. To make the configuration thread-safe it can only be configured at construction time. And to make it easy to use only the default configuration can be constructed by the constructor. For all other configurations, Gson uses the builder pattern. The builder pattern encapsulates the creation of a complex object in a separate Builder object.

The following example shows how the GsonBuilder can be used to create a Gson instance with a specific configuration:

 Gson gson = new GsonBuilder()
    .registerTypeAdapter(Id.class, 
    	new IdTypeAdapter())
    .enableComplexMapKeySerialization()
    .serializeNulls()
    .create();

Conclusion

For me, Gson is a good example of how to implement a stateless thread-safe utility class. Gson shows how to implement a thread-safe class by creating a new instance at each method call. Thereby using stack confinement to achieve thread-safety. It shows how to use caching to reduce the object creation time. And it shows how to configure the instance only at construction time using the builder pattern.

ConcurrentHashMap: Call only one method per key

Thu, 16 Jan 2020 23:00:00 GMT

Each method of ConcurrentHashMap is thread-safe. But calling multiple methods from ConcurrentHashMap for the same key leads to race conditions. And calling the same method from ConcurrentHashMap recursively for different keys leads to deadlocks.

Let us look at an example to see why this happens:

Calling multiple methods

In the following test, I use two methods from ConcurrentHashMap for the same key 1. The method update, line 3 till 10, first gets the value from the ConcurrentHashMap using the method get. Than update increments the value and put it back using the method put, line 6 and 8:

import com.vmlens.api.AllInterleavings;
public class TestUpdateWrong {
	public void update(ConcurrentHashMap<Integer, Integer> map) {
		Integer result = map.get(1);
		if (result == null) {
			map.put(1, 1);
		} else {
			map.put(1, result + 1);
		}
	}
	@Test
	public void testUpdate() throws InterruptedException {
		try (AllInterleavings allInterleavings = 
			new AllInterleavings("TestUpdateWrong");) {
			while (allInterleavings.hasNext()) {
				final ConcurrentHashMap<Integer, Integer> map = 
						new ConcurrentHashMap<Integer, Integer>();
				Thread first = new Thread(() -> {
					update(map);
				});
				Thread second = new Thread(() -> {
					update(map);
				});
				first.start();
				second.start();
				first.join();
				second.join();
				assertEquals(2, map.get(1).intValue());
			}
		}
	}

}

You can download the source code of all examples from Github here.

To test what happens I use two threads, line 18 and 21. I start those two threads, line 25 and 25. And then wait till both are ended using thread join, line 26 and 27. After both threads are stopped I check if the value is indeed two, line 28.

java.lang.AssertionError: expected:<2> but was:<1>

To see why the result is one, not two as expected we can look at the report vmlens generated:

So the problem is that first both threads call get and after that both threads call put. So both threads see an empty value and update the value to one. Which leads to a result of one and not as expected two. The trick to solving this race condition is to use only one method instead of two methods to update the value. Using the method compute we can do this. So the correct version looks like this:

public void update(
	ConcurrentHashMap<Integer,Integer>  map ) {
	map.compute(1, (key, value) -> {
		if (value == null) {
			return 1;
		 } 
			return value + 1;
	});
}

Calling the same method recursively

Now let us look at an example for calling the same method from ConcurrentHashMap recursively:

public class TestUpdateRecursive {
	private final ConcurrentHashMap<Integer, Integer> map = 
			new ConcurrentHashMap<Integer, Integer>();
	public TestUpdateRecursive() {
		map.put(1, 1);
		map.put(2, 2);
	}
	public void update12() {
		map.compute(1,  (key,value) ->   {  	
			   map.compute(2, ( k , v ) ->  {  return 2; }  );   
		       return 2;	   
		});
	}	
	public void update21() {
		map.compute(2,  (key,value) ->   {  	
			   map.compute(1, ( k , v ) ->  {  return 2; }  );   
		       return 2;	   
		});
	}
	@Test
	public void testUpdate() throws InterruptedException	{
		Thread first = new Thread(() ->  { update12(); 	});
		Thread second = new Thread(() -> { update21();  	});
		first.start();
		second.start();
		first.join();
		second.join();
	}	
}

Here we call the compute method inside of the compute method for different keys. Once for the key one than two and once for the key two than one. If we run the test we see the following deadlock:

To understand why this deadlock happens, we have to look at the internals of the ConcurrentHashMap. ConcurrentHashMap uses an array to store the mapping between the keys and the values. Every time we update such a mapping of the ConcurrentHashMap locks the array element in which the mapping is stored. So in our test, the call to compute for the key one locked the array element for the key one. And then we try to lock the array element for the key two. But this key is already locked by the other thread who called compute for key two and tries to lock the array element for the key one. A deadlock.

Note that only updates need a lock to an array element. Methods which reads only, like for example get, do not use locks. So it is no problem to use a get method inside a compute call.

Conclusion

Using the ConcurrentHashMap in a thread-safe way is easy. Select the one method which fits your need. And use it exactly once per key

How to write thread-safe yet scalable classes?

Thu, 16 Jan 2020 23:00:00 GMT

When writing thread-safe classes the main issue is to separate the data into multiple independent parts. And to choose the right size for those parts. If the part is too small our class is not thread-safe. If the part is too large the class is not scalable.

Let us look at an example which illustrates that point:

An example

Suppose we want to track how many people live in a city. We want to support two methods, one to get the current count of people living in a city and one to move a person from one city to another. So we have the following interface:

public interface CityToCount {
	static final String[] ALL_CITIES = 
		new String[] {  "Springfield" , "South Park"  };
	static final int POPULATION_COUNT = 1000000;
	void move( String from, String to );
	int count(String name);
}

You can download the source code of all examples from GitHub here.

Since we want to use this interface from multiple threads in parallel we have to options to implement this interface. Either use the class java.util.concurrent.ConcurrentHashMap or uses the class java.util.HashMap and a single lock. Here is the implementation using the class java.util.concurrent.ConcurrentHashMap:

public class CityToCountUsingConcurrentHashMap 
	implements CityToCount {
	private ConcurrentHashMap<String, Integer> map = 
		new ConcurrentHashMap<String, Integer>();
	public CityToCountUsingConcurrentHashMap() {
		for (String city : ALL_CITIES) {
			map.put(city, POPULATION_COUNT);
		}
	}
	public void move(String from, String to) {
		map.compute(from, (key, value) -> {
			if (value == null) {
				return POPULATION_COUNT - 1;
			}
			return value - 1;
		});
		map.compute(to, (key, value) -> {
			if (value == null) {
				return POPULATION_COUNT + 1;
			}

			return value + 1;
		});
	}
	public int count(String name) {
		return map.get(name);
	}
}

The method move uses the thread-safe method compute to decrement the count in the source city. Than compute is used to increment the count in the target city. The count method uses the thread-safe method get.

And here is the implementation using the class java.util.HashMap:

public class CityToCountUsingSynchronizedHashMap 
	implements CityToCount {
	private HashMap<String, Integer> map = 
		new HashMap<String, Integer>();
	private Object lock = new Object();
	public CityToCountUsingSynchronizedHashMap() {
		for (String city : ALL_CITIES) {
			map.put(city, POPULATION_COUNT);
		}
	}
	public void move(String from, String to) {
		synchronized (lock) {
			map.compute(from, (key, value) -> {
				if (value == null) {
					return POPULATION_COUNT - 1;
				}
				return value - 1;
			});
			map.compute(to, (key, value) -> {
				if (value == null) {
					return POPULATION_COUNT + 1;
				}
				return value + 1;
			});
		}
	}
	public int count(String name) {
		synchronized (lock) {
			return map.get(name);
		}
	}
}

The method move also uses the method compute to increment and decrement the count in the source and target city. Only this time, since the compute method is not thread-safe, both methods are surrounded by a synchronized block. The count method uses the get method again surrounded by a synchronized block.

Both solutions are thread-safe.

But in the solution using ConcurrentHashMap multiple cities can be updated from different threads in parallel. And in the solution using a HashMap, since the lock is around the complete HashMap, only one thread can update the HashMap at a given time. So the solution using ConcurrentHashMap should be better scalable. Let us see.

Too large means not scalable

To compare the scalability of the two implementations I use the following benchmark:

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Scope;
@State(Scope.Benchmark)
public class CityToCountBenchmark {
	public CityToCount cityToCountUsingSynchronizedHashMap 
		= new CityToCountUsingSynchronizedHashMap();
	public CityToCount cityToCountUsingConcurrentHashMap 
		= new CityToCountUsingConcurrentHashMap();
	@Benchmark
	public void synchronizedHashMap() {
		String name = Thread.currentThread().getName();
		cityToCountUsingSynchronizedHashMap.move(name, name + "2");

	}
	@Benchmark
	public void concurrentHashMap() {
		String name = Thread.currentThread().getName();
		cityToCountUsingConcurrentHashMap.move(name, name + "2");

	}

}

The benchmark uses jmh, an OpenJDK framework for micro-benchmarks. In the benchmark, I move people from one city to another. Each worker thread updates different cities. The name of the source city is simply the thread id and the target city the thread id plus two. I ran the benchmark on an Intel i5 4 core CPU with these results:

As we see the solution using ConcurrentHashMap scales better. Starting with two thread it performs better than the solution using a single lock.

Too small means not thread-safe

Now I want an additional method to get the complete count overall cities. Here is this method for the implementation using the class ConcurrentHashMap:

public int completeCount() {
	int completeCount = 0;
	for (Integer value : map.values()) {
		completeCount += value;
	}
	return completeCount;
}

To see if this solution is thread-safe I use the following test:

import com.vmlens.api.AllInterleavings;
public class TestCompleteCountConcurrentHashMap {
	@Test
	public void test() throws InterruptedException {
		try (AllInterleavings allInterleavings = 
				new AllInterleavings("TestCompleteCountConcurrentHashMap");) {
			while (allInterleavings.hasNext()) {
				CityToCount cityToCount = 
					new CityToCountUsingConcurrentHashMap();
				Thread first = new Thread(() -> {
					cityToCount.move("Springfield", "South Park");
				});

				first.start();
				assertEquals(2 * CityToCount.POPULATION_COUNT, 
						cityToCount.completeCount());
				first.join();

			}
		}
	}

}

I need two threads to test if the method completeCount is thread-safe. In one thread, I move one person from Springfield to South Park. In the other thread I get the completeCount and check if the result equals the expected result.

To test all thread interleavings we put the complete test in a while loop iterating over all thread interleavings using the class AllInterleavings from vmlens, line 7. Running the test I see the following error:

expected:<2000000> but was:<1999999>

The vmlens report shows what went wrong:

As we see the problem is that the calculation of the complete count is done while the other thread still moves a person from Springfield to South Park. The decrement for Springfield was already executed but not the increment for South Park.

By allowing the parallel update of different cities the combination between completeCount and move leads to the wrong results. If we have methods which operate over all cities we need to lock all cities during this method. So to support such a method we need to use the second solution using a single lock. For this solution we can implement a thread-safe countComplete method as shown below:

public int completeCount() {
	synchronized (lock) {
		int completeCount = 0;
		for (Integer value : map.values()) {
			completeCount += value;
		}
	return completeCount;
	}
}

Conclusion

The example surely does not reflect the complexity of your data structure. But what is true in the example is also true in the real world. There is no way to update multiple dependent fields in a thread-safe way except to update them one thread after the other. So the only way to achieve scalability and thread safety is to find independent parts in your data. And then update them in parallel from multiple threads.

Why are there so many concurrent queues implementations in Java?

Thu, 09 Jan 2020 23:00:00 GMT

There exist multiple queue implementations in Java. Six alone implementations in the package java.util.concurrent. Why are there so many implementations? For a data structure that is implemented in the single-threaded case as a linked list?

Concurrent queues let two threads communicate asynchronously. And since this communication is often performance-critical we have multiple implementations optimized for a specific communication pattern. Let us look at those communication patterns in more detail and see which queue is optimized for which pattern.

Unbounded

The queue implementation for the first communication pattern, the unbounded multi -producer multi-consumer pattern, is similar to the single-threaded linked list implementation. It is implemented through the class java.util.concurrent.ConcurrentLinkedQueue The implementation is based on an algorithm from Michael and Scott from 1996 described in the paper Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms.

It uses a linked list data structure and compare and swap operations to update this data structure. I am not aware of any other implementation for an unbounded queue in Java.

Here is an example for its usage offering and polling a message from the queue:

Queue<String> queue = new ConcurrentLinkedQueue<String>();
queue.offer("hello world");
String message = queue.poll();
System.out.println(message);

The problem of the unbounded queue is that if the producer side is faster than the consumer the queue grows without limit. Or till the Java Heap is exhausted and we receive an OutOfMemory Exception. This leads us to the second type of concurrent queue, the bounded concurrent queue.

Bounded

The package java.util.concurrent contains two implementations for a bounded multi-consumer, multi-producer queue, the class java.util.concurrent.ArrayBlockingQueue and the class java.util.concurrent.LinkedBlockingQueue. The class LinkedBlockingQueue uses a linked list data-structure similar to the class ConcurrentLinkedQueue. But instead of using compare and swap operations it uses locks. The class ArrayBlockingQueue on the other side uses an array to store the messages. And locks to update the array

Here is an example for its usage offering and polling a message from the queue:

Queue<String> queue = new ArrayBlockingQueue<String>(10);
queue.offer("hello world");
String message = queue.poll();
System.out.println(message);

Both queues implement the same interface so you can easily switch between the implementation to see which of the queue performs better in your application.

Extra bit of performance

The open-source library JCTools also contains a bounded multi-consumer, multi-producer queue, the class org.jctools.queues.MpmcArrayQueue. This queue is based on an array similar to the class java.util.concurrent.ArrayBlockingQueue. But instead of using locks it uses compare and swap operations. The class MpmcArrayQueue implements the Queue interface so you can switch implementations to see if this implementation improves the performance in your application.

Single consumer and single producer queues.

When only one thread writes to the queue we can avoid the expensive lock or compare and swap operations for writing. The same is true for reading. This is used by JCTools. JCTools provides an implementation for each combination.

Here is an example for creating a multi-producer, single-consumer queue using the JCTools factory:

Queue<String> queue = QueueFactory.newQueue(
	 ConcurrentQueueSpec.createBoundedMpsc(10));

Again all those queue implementations implement the Queue interface. So you can switch implementations to see if those queues improve the performance in your application.

Avoiding memory creation

If we use a bounded queue backed by an array we can reuse the events after they were consumed. This idea is implemented by the LMAX Disruptor. Here you use an array pre-initialized with events. Instead of creating a new event you use one of the already created events. The LMAX Disruptor does not implement the Queue interface so switching implementations is harder. But if you need to avoid garbage collections it is worth a look.

Still more queues

So far we have seen three queue implementations from the package java.util.concurrent. Here are the three still missing implementations:

java.util.concurrent.LinkedTransferQueue: LinkedTransferQueue lets the producer optionally wait till a consumer has consumed its element. A TransferQueue may be useful for example in message passing applications in which producers sometimes (using method transfer(E)) await receipt of elements by consumers invoking take or poll, while at other times enqueue elements (via method put) without waiting for receipt.
java.util.concurrent.SynchronousQueue: A queue where every producer waits until its element was consumed by a corresponding consumer.
java.util.concurrent.DelayQueue: A queue whose element can only be consumed when the elements are expired.

Conclusion

So why are there so many concurrent queues implementations in Java? First Java is a pretty mature programming language. And second, there are many thread to thread communication patterns. And to achieve the maximum performance you need a data structure optimized for this special communication pattern.

Java Concurrency: AtomicReference

Mon, 06 Jan 2020 23:00:00 GMT

java.util.concurrent.atomic.AtomicReference is a class designed to update variables in a thread-safe way. Why do we need the class AtomicReference? Why can we not simply use a volatile variable? And how to use it correctly?

Why AtomicReference

For the tool I am writing I need to detect if an object was called from multiple threads. I use the following immutable class for this:

public class State {
	private final Thread thread;
	private final boolean accessedByMultipleThreads;
	public State(Thread thread, boolean accessedByMultipleThreads) {
		super();
		this.thread = thread;
		this.accessedByMultipleThreads = accessedByMultipleThreads;
	}
	public State() {
		super();
		this.thread = null;
		this.accessedByMultipleThreads = false;
	}
	public State update() {
		if(accessedByMultipleThreads) 	{
			return this;
		}
		if( thread == null  ) {
			return new  State(Thread.currentThread()
			, accessedByMultipleThreads);
		} 
		if(thread != Thread.currentThread()) {
			return new  State(null,true);
		}	
		return this;
	}
	public boolean isAccessedByMultipleThreads() {
		return accessedByMultipleThreads;
	}
}

You can download the source code of all examples from GitHub here.

I store the first thread accessing an object in the variable thread, line 2. When another thread accesses the object I set the variable accessedByMultipleThreads to true and the variable thread to null, line 23. When the variable accessedByMultipleThreads is true I do not change the state, line 15 till 17.

I use this class in every object to detect if it was accessed by multiple threads. The following example uses the state in the class UpdateStateNotThreadSafe:

public class UpdateStateNotThreadSafe {
	private volatile  State state = new State();
	public void update() {
		state = state.update();
	}
	public State getState() {
		return state;
	}	
}

I store the state in the volatile variable state, line 2. I need the volatile keyword to make sure that the threads always see the current values, as explained in greater detail here.

To check if using a volatile variable is thread-safe I use the following test:

import com.vmlens.api.AllInterleavings;
public class TestNotThreadSafe {
	@Test
	public void test() throws InterruptedException {
		try (AllInterleavings allInterleavings = 
			new AllInterleavings("TestNotThreadSafe");) {
			while (allInterleavings.hasNext()) {	
		final UpdateStateNotThreadSafe object = new UpdateStateNotThreadSafe();		
		Thread first = new Thread( () ->    {  object.update();  } ) ;
		Thread second = new Thread( () ->   {  object.update(); } ) ;
		first.start();
		second.start();
		first.join();
		second.join();	
		assertTrue(  object.getState().isAccessedByMultipleThreads() );
			}
		}
	}
}

I need two threads to test if using a volatile variable is thread-safe, created in line 9 and 10. I start those two threads, line 11 and 12. And then wait till both are ended using thread join, line 13 and 14. After both threads are stopped I check if the flag accessedByMultipleThreads is true, line 15.

java.lang.AssertionError: 
	at org.junit.Assert.fail(Assert.java:91)
	at org.junit.Assert.assertTrue(Assert.java:43)
	at org.junit.Assert.assertTrue(Assert.java:54)

The vmlens report shows what went wrong:

The problem is that for a specific thread interleaving both threads first read the state. So one thread overwrites the result of the other thread.

How to use AtomicReference?

To solve this race condition I use the compareAndSet method from AtomicReference.

The compareAndSet method takes two parameters, the expected current value, and the new value. The method atomically checks if the current value equals the expected value. If yes the method updates the value to the new value and return true. If not the method leaves the current value unchanged and returns false.

The idea to use this method is to let compareAndSet check if the current value was changed by another thread while we calculated the new value. If not we can safely update the current value. Otherwise, we need to recalculate the new value with the changed current value.

The following shows how to use the compareAndSet method to atomically update the state:

public class UpdateStateWithCompareAndSet {
	private final AtomicReference<State> state = 
			new AtomicReference<State>(new State());
	public  void update() {
		State current = state.get();
		State newValue = current.update();
		while( ! state.compareAndSet( current , newValue ) ) {
			current = state.get();
			newValue = current.update();
		}
	}
	public State getState() {
		return state.get();
	}	
}

I now use an AtomicReference for the state, line 2. To update the state I first need to get the current value, line 5. Then I calculate the new value, line 6 and try to update the AtomicReference using compareAndSet, line 7. If the update succeeds I am done. If not I need to get the current value again, line 8 and recalculate the new value, line 9. Then I can try again to update the AtomicReference using compareAndSet. I need a while loop since the compareAndSet might fail multiple times.

As Grzegorz Borczuch pointed out in a comment to this article there is since JDK 1.8 an easier to use method in AtomicReference which achieves the same result: updateAndGet. This method internally uses compareAndSet using a while loop to update the AtomicReference.

Conclusion

Using volatile variables lead to race conditions since for specific thread interleavings a thread overwrites the computation of the other threads. By using the compareAndSet method from the class AtomicReference we can circumvent this race condition. We atomically check if the current value is still the same as when we started the computation. If yes we can safely update the current value. Otherwise, we need to recalculate the new value with the changed current value.

How to test if a class is thread-safe in Java?

Thu, 12 Dec 2019 23:00:00 GMT

Tests for thread safety differ from typical single-threaded tests. To test if a method is thread-safe we need to call the method in parallel from multiple threads. We need to do this for all potential thread interleavings. And afterward, we need to check if the result is correct.

Those three requirements for our test lead to a special type of tests for thread safety which differ from typical single-threaded tests. Since we want to test all thread interleavings our test must be repeatable and run automatically. And since the methods run in parallel the potential result is a combination of different outcomes.

Let us look at an example to see how this looks in practice.

A test for thread safety

Suppose we want to test if the following class representing an Address is thread-safe. It offers one method to update the street and city, the method update and one method to read the complete Address, the method toString:

public class MutableAddress {
	private volatile String street;
	private volatile String city;
	private volatile String phoneNumber;
	public MutableAddress(String street, String city, 
		String phoneNumber) {
		this.street = street;
		this.city = city;
		this.phoneNumber = phoneNumber;
	}
	public void update(String street ,String city ) {
		this.street = street;
		this.city = city;
	}
	public String toString() {
		return "street=" + street + ",city=" + city + ",
		phoneNumber=" + phoneNumber;
	}
}

I use volatile fields, line 2 till 4, to make sure that the threads always see the current values, as explained in greater detail here. You can download the source code of all examples from GitHub here.

Now let us first see if the combination of toString and update is thread-safe. Here is the test:

import com.vmlens.api.AllInterleavings;
public class TestToStringAndUpdate {
	@Test
	public void testMutableAddress() throws InterruptedException {
		try (AllInterleavings allInterleavings = 
			new AllInterleavings("TestToStringAndUpdate_Not_Thread_Safe");) {
			while (allInterleavings.hasNext()) {
				MutableAddress address = new MutableAddress("E. Bonanza St.",
					 "South Park", "456 77 99");
				String readAddress = null;
				Thread first = new Thread(() -> {
					address.update("Evergreen Terrace", "Springfield");
				});
				first.start();
				readAddress = address.toString();
				first.join();
				assertTrue("readAddress:" + readAddress,readAddress.equals(
		"street=E. Bonanza St.,city=South Park,phoneNumber=456 77 99") 
					|| readAddress.equals(
		"street=Evergreen Terrace,city=Springfield,phoneNumber=456 77 99"));
			}
		}
	}
}

The test executes the two methods in parallel from two threads. To test all thread interleavings we put the complete test in a while loop iterating over all thread interleavings using the class AllInterleavings from vmlens, line 7. To see if the class is thread-safe we compare the result against the to potential outcomes, the value before the update and after the update, lines 17 till 20.

Running the test leads to the following error:

java.lang.AssertionError: readAddress:street=Evergreen Terrace
		,city=South Park,phoneNumber=456 77 99
	at com.vmlens.tutorialCopyOnWrite.TestToStringAndUpdate.
		testMutableAddress(TestToStringAndUpdate.java:22)

To see what went wrong we look at the report vmlens generated.

The problem is that for one thread interleaving the thread with Thread id 30 first updates the street name and then the main thread, thread id 1, reads the street and city name. So the main thread reads a partial updated address which leads to the error.

To make the address class thread-safe we copy the address value every time we update the address. Here is a thread-safe implementation using this technique. It consists of two classes, an immutable value, and a mutable container.

First the immutable value class:

public class AddressValue {
	private final String street;
	private final String city;
	private final String phoneNumber;
	public AddressValue(String street, String city, 
				String phoneNumber) {
		super();
		this.street = street;
		this.city = city;
		this.phoneNumber = phoneNumber;
	}
	public String getStreet() {
		return street;
	}
	public String getCity() {
		return city;
	}
	public String getPhoneNumber() {
		return phoneNumber;
	}
}

Second the mutable container class:

public class AddressUsingCopyOnWrite {
	private volatile AddressValue addressValue;
	private final Object LOCK = new Object();
	public AddressUsingCopyOnWrite(String street, 
			String city, String phone) {
		this.addressValue = new AddressValue( street, 
				city,  phone);
	}
	public void update(String street ,String city ) {
		synchronized(LOCK){
			addressValue = new AddressValue(  street,  city,  
					addressValue.getPhoneNumber() );
		}
	}
	public String toString() {
		AddressValue local = addressValue;
		return "street=" + local.getStreet()
		+ ",city=" + 	local.getCity() + 
		",phoneNumber=" + local.getPhoneNumber();
	}
}

The class AddressUsingCopyOnWrite creates a new address value every time it updates the variable addressValue. This makes sure that we always read a consistent address, either the value before or after the update.

If we run the test with those two classes, the test succeeds.

What do we need to test?

So far we tested the combination of toString and update for thread safety. To test if a class is thread-safe we need to test all combinations of modifying methods and all combinations of read-only methods together with modifying methods. So for our example class, we need to test the following two combinations:

update and update
toString and update

Since the combinations of read-only methods are automatically thread-safe we do not need to test the combination of the method toString with itself.

Data Races

So far we used volatile fields to avoid data races. Let us see what happens when we use normal fields instead. So in our thread-safe class AddressUsingCopyOnWrite we remove the volatile modifier and re-run our test. Now, vmlens reports a data race in the file target/interleave/issues.html

A data race is an access to a field where a thread might read a stale value. If the thread indeed reads a stale value depends on external factors like which optimizations the compiler is using or on which hardware architecture the JVM is running and on which cores the threads are running. To make it possible to always detect such a data race independent of those external factors, vmlens searches for data races in the execution trace of the test run. And if vmlens have found one as in the example it reports them in the issue report.

Summary

Tests for thread safety differ from typical single-threaded tests. To test if the combination of two methods, a and b, is thread-safe call them from two different threads. Put the complete test in a while loop iterating over all thread interleavings with the help from the class AllInterleavings from vmlens. Test if the result is either a after b or b after a. And to test if a class is a thread-safe test all combinations of modifying methods and all combinations of read-only methods together with modifying methods.

What does thread safety mean in Java?

Tue, 26 Nov 2019 23:00:00 GMT

Thread safety in java means that the methods of a class are either atomic or quiescent. So what does atomic and what does quiescent mean? And why are there no other types of thread-safe methods in java?

Meaning of Atomic

A method is atomic when the method call appears to take effect instantaneously. So other threads either see the state before or after the method call but no intermediate state. Let us look at a non-atomic method to see how an atomic method makes a class thread-safe. You can download the source code of all examples from GitHub here.

public class UniqueIdNotAtomic {
	private volatile long counter = 0;
	public  long nextId() {	
		return counter++;	
	}	
}

The class UniqueIdNotAtomic creates unique ids by using the volatile variable counter. I use a volatile field, line 2, to make sure that the threads always see the current values, as explained in greater detail here. To see if this class is thread-safe we use the following test:

public class TestUniqueIdNotAtomic {
	private final UniqueIdNotAtomic uniqueId = new UniqueIdNotAtomic();
	private long firstId;
	private long secondId;
	private void updateFirstId() {
		firstId  = uniqueId.nextId();
	}
	private void updateSecondId() {
		secondId = uniqueId.nextId();
	}
	@Test
	public void testUniqueId() throws InterruptedException {	
		try (AllInterleavings allInterleavings = 
			    new AllInterleavings("TestUniqueIdNotAtomic");) {
		while(allInterleavings.hasNext()) {	
		Thread first = new Thread( () ->   { updateFirstId();  } ) ;
		Thread second = new Thread( () ->  { updateSecondId();  } ) ;
		first.start();
		second.start();
		first.join();
		second.join();	
		assertTrue(  firstId != secondId );
		}
		}
	}

}

To test if the counter is thread-safe we need two threads, created in lines 16 and 17. We start those two threads, lines 18 and 19. And then wait till both are ended using thread join, lines 20 and 21. After both threads are stopped we check if the two ids are unique line 22. To test all thread interleavings we put the complete test in a while loop iterating over all thread interleavings using the class AllInterleavings from vmlens, line 15.

Running the test we see the following error:

java.lang.AssertionError: 
	at org.junit.Assert.fail(Assert.java:91)
	at org.junit.Assert.assertTrue(Assert.java:43)

The reason for the error is that since the operation ++ is not atomic the two threads can override the result of the other thread. We can see this in the report from vmlens:

In the case of the error, both threads first read the variable counter in parallel. And then both create the same id. To solve this problem we make the method atomic by using a synchronized block:

private final Object LOCK = new Object();
public  long nextId() {
  synchronized(LOCK) {
    return counter++;	
  }	
}

Now the method is atomic. The synchronized block makes sure that other threads can not see the intermediate state of the method.

Methods which do not access shared state are automatically atomic. The same is true for classes with read-only state. Therefore stateless and immutable classes are an easy way to implement thread-safe classes. All their methods are automatically atomic.

Not all usages of atomic methods are automatically thread safe. Combining multiple atomic methods for the same values typical leads to race conditions. Let us look at the atomic method get and put from ConcurrentHashMap to see why. Let us use those methods to insert a value in the map when no previous mapping exists:

public class TestUpdateTwoAtomicMethods {
	public void update(ConcurrentHashMap<Integer,Integer>  map)  {
			Integer result = map.get(1);		
			if( result == null )  {
				map.put(1, 1);
			}
			else	{
				map.put(1, result + 1 );
			}	
	}
	@Test
	public void testUpdate() throws InterruptedException	{
		try (AllInterleavings allInterleavings = 
		   new AllInterleavings("TestUpdateTwoAtomicMethods");) {
		while(allInterleavings.hasNext()) {	
		final ConcurrentHashMap<Integer,Integer>  map = 
		   new  ConcurrentHashMap<Integer,Integer>();	
		Thread first = new Thread( () ->   { update(map);  } ) ;
		Thread second = new Thread( () ->  { update(map);  } ) ;
		first.start();
		second.start();
		first.join();
		second.join();	
		assertEquals( 2 , map.get(1).intValue() );
		}
		}
	}	
}

The test is similar to the previous test. Again we use two threads, to test if our method is thread-safe, lines 18 and 19. An again we test after both threads finished if the result is correct, line 24. Running the test we see the following error:

java.lang.AssertionError: expected:<2> but was:<1>
	at org.junit.Assert.fail(Assert.java:91)
	at org.junit.Assert.failNotEquals(Assert.java:645)

The reason for the error is that the combination of the two atomic methods, get and put is not atomic. So the two threads can override the result of the other thread. We can see this in the report from vmlens:

In the case of the error, both threads first get the value in parallel. And then both create the same value and put it into the map. To solve this race condition we need to use one method instead of two. In our case we can use the single method compute instead of the two methods get and put:

public void update() {
  map.compute(1, (key, value) -> {
	if (value == null) {
		return 1;
	} 
	return value + 1;
  });
}

This solves the race condition since the method compute is atomic. While all operations which operate on the same element of ConcurrentHashMap are atomic, operations that operate on the complete map like size are quiescent. So let us see what quiescent means.

Meaning of Quiescent

Quiescent means that we need to make sure that no other methods are currently running when we call the quiescent method. The following example shows how to use the quiescent method size of the ConcurrentHashMap:

ConcurrentHashMap<Integer,Integer>  map = 
	new  ConcurrentHashMap<Integer,Integer>();
Thread first  = new Thread(() -> { map.put(1,1);});
Thread second = new Thread(() -> { map.put(2,2);});
first.start();
second.start();
first.join();
second.join();	
assertEquals( 2 ,  map.size());

By waiting till all threads are finished using thread join, we make sure that no other threads are accessing the ConcurrentHashMap when we call the method size.

The method size uses a mechanism also used in the class java.util.concurrent.atomic.LongAdder, LongAccumulator, DoubleAdder, and DoubleAccumulator to avoid contention. Instead of using a single variable for storing the current size it uses an array. Different threads update different parts of the array thereby avoiding contention. The algorithm is explained in more detail in the java doc of Striped64

The quiescent classes and methods are useful for collecting statistics under high contention. After you collected the data you can use a single thread to evaluate the collected statistics.

Why no other types of thread safe methods in java?

In theoretical computer science, thread safety means that a data structure full fills a correctness criterion. The most common used correctness criterion is linearizable, which means that the methods of the data structure are atomic. For common data structures exists a provable linearizable concurrent data structures, see the book The Art of multiprocessor programming by Maurice Herlihy and Nir Shavit. But to make a data structure linearizable an expensive synchronization mechanism like compare and swap is needed, see the paper Laws of Order: Expensive Synchronization in Concurrent Algorithms Cannot be Eliminated.

Therefore other correctness criteria like quiescent are investigated. So I think the question is not, why are there no other types of thread-safe methods in java but rather when will there be other types of thread safety available in java.

Conclusion

Thread safety in java means that the methods of a class are either atomic or quiescent. A method is atomic when the method call appears to take effect instantaneously. Quiescent means that we need to make sure that no other methods are currently running when we call the quiescent method.

Currently, quiescent methods are only used to collect statistics like the size of the concurrentHashMap. For all other use cases atomic methods are used. Let us see if the future brings other types of thread-safe methods.

Java Concurrency: Copy On Write

Sat, 31 Aug 2019 22:00:00 GMT

Copy on write is a technique which allows you to update a data structure in a thread-safe way. The main advantage of copy on write is that reading threads get never blocked.

Why do we need this technique? And how to use this technique correctly?

Why copy on write?

In the following, I want to implement a thread-safe class representing an address. To make the example short the address consists only of the street, the city, and the phone number:

You can download the source code of all examples from GitHub here.

public class MutableAddress {
	private volatile String street;
	private volatile String city;
	private volatile String phoneNumber;
	public MutableAddress(String street, String city, String phoneNumber) {
		this.street = street;
		this.city = city;
		this.phoneNumber = phoneNumber;
	}
	public String getStreet() {
		return street;
	}
	public String getCity() {
		return city;
	}
	public void updatePostalAddress(String street ,String city ) {
		this.street = street;
		this.city = city;
	}
	@Override
	public String toString() {
		return "street=" + street + 
		",city=" + city + 
		",phoneNumber=" + phoneNumber;
	}
}

I use volatile fields, line 2 till 4, to make sure that the threads always see the current values, as explained in greater detail here.

To check if this class is thread-safe I use the following test:

public class ConcurrencyTestReadWrite {
  private final MutableAddress address = new MutableAddress("E. Bonanza St." 
	, "South Park" , "456 77 99");
  private String readAddress;
  @Interleave(ConcurrencyTestReadWrite.class)
  private void updatePostalAddress() {
  	address.updatePostalAddress("Evergreen Terrace" , "Springfield");
   }
  @Interleave(ConcurrencyTestReadWrite.class)
  private void read() {
	readAddress = address.toString();
  }	
  @Test
  public void test() throws InterruptedException {
   Thread first  = new Thread( () ->    {  updatePostalAddress();  } ) ;
   Thread second = new Thread( () ->   {  read();  } ) ;
   first.start();
   second.start();
   first.join();
   second.join();	
   assertTrue(  "readAddress:" + readAddress  ,  
	readAddress.equals(
	"street=E. Bonanza St.,city=South Park,phoneNumber=456 77 99")  || 
	readAddress.equals(
	"street=Evergreen Terrace,city=Springfield,phoneNumber=456 77 99") );	
  }
}

I need two threads to test if the class is thread-safe, created in line 15 and 16. I start those two threads, line 17 and 18. And then wait till both are ended using thread join, line 19 and 20. After both threads are stopped I check if the read address equals either the value before or after the update, line 21 till 25.

To test all thread interleavings I use the annotation Interleave, line 5 and 9, from vmlens. The Interleave annotation tells vmlens to test all thread interleavings for the annotated method. Running the test we see the following error:

java.lang.AssertionError: readAddress:
	street=Evergreen Terrace,city=South Park,phoneNumber=456 77 99

We read a mixture between the initial address, e.g. the city South Park and the updated address e.g. the street Evergreen Terrace. To see what went wrong let us look at the vmlens report:

So first the writing thread, thread id 13, updates the street. Then the reading thread, thread id 14, reads the street, city and phone number. Thereby reading the already updated street but the initial city.

Copy on write

To solve this bug I use the copy on write technique. The idea is to create a new copy of the object when writing. Then change the values in the newly created object and publish the copied object. Since I need to copy the object I can make it immutable. The address using the copy on write technique then consists of the following two classes:

First, the immutable class to represent the current address:

public class AddressValue {
	private final String street;
	private final String city;
	private final String phoneNumber;
	public AddressValue(String street, String city, 
				String phoneNumber) {
		super();
		this.street = street;
		this.city = city;
		this.phoneNumber = phoneNumber;
	}
	public String getStreet() {
		return street;
	}
	public String getCity() {
		return city;
	}
	public String getPhoneNumber() {
		return phoneNumber;
	}
}

Second, the mutable class to implement the copy on write technique:

public class AddressUsingCopyOnWrite {
	private volatile AddressValue addressValue;
	private final Object LOCK = new Object();
	@Override
	public String toString() {
		AddressValue local = addressValue;
		return "street=" + local.getStreet() +
		",city=" + local.getCity() + 
		",phoneNumber=" + local.getPhoneNumber();
	}
	public AddressUsingCopyOnWrite(String street, String city, String phone) {
		this.addressValue = new AddressValue( street,  city,  phone);
	}
	public void updatePostalAddress(String street ,String city ) {
		synchronized(LOCK){
			addressValue = new AddressValue(  
			street,  city,  addressValue.getPhoneNumber() );
		}
	}
	public void updatePhoneNumber( String phoneNumber) {
		synchronized(LOCK){
			addressValue = new AddressValue(  
			addressValue.getStreet(), addressValue.getCity(),  phoneNumber );
		}	
	}
}

An update now consists of creating a new copy of AddressValue, line 16 and 17 for updating the postal address and line 22 and 23 to update the phone number.

Using those two classes the tests succeeds, making the address thread-safe.

Why using a local variable when reading

As you see in the toString method I store the addressValue variable in the local variable local, line 6. Why?

Let us see what happens when we directly access the variable addressValue instead of using a local variable:

public String toStringNotThreadSafe() {
	return "street=" + addressValue.getStreet() + 
	",city=" + addressValue.getCity() + 
	",phoneNumber=" + addressValue.getPhoneNumber();
}

Running the test we see the following error:

java.lang.AssertionError: readAddress:
	street=E. Bonanza St.,city=Springfield,phoneNumber=456 77 99

So we again read an inconsistent address. We can again see in the vmlens report what went wrong:

The reading thread, thread id 14, first reads the variable addressValue to get the street. Then the writing thread, thread id 14, update the variable addressValue. Now the reading threads reads the variable addressValue to get the city and phone number. So the reading thread reads partially the initial and partially the updated address.

Why synchronized block when writing

The second part to make the copy on write technique thread-safe is a synchronized block when we write to the variable addressValue. Why?

Let us see what happens when we remove the synchronized block

public void updatePostalAddress(String street ,String city ) {
			addressValue = new AddressValue(  street,  city,  
				addressValue.getPhoneNumber() );
}
public void updatePhoneNumber( String phone) {
			addressValue = new AddressValue(  addressValue.getStreet(),  
				addressValue.getCity(),  phone );
}

Running the test we see the following:

[INFO] BUILD SUCCESS

No error. The test still succeeds.

To see why we need the synchronized block we need a different test We need to test what happens when we update different parts of our address from different threads. So we use the following test:

public class ConcurrencyTestTwoWrites {
   private final AddressUsingCopyOnWriteWithoutSynchronized address = 
    new AddressUsingCopyOnWriteWithoutSynchronized("E. Bonanza St." 
    , "South Park" , "456 77 99"); 
  @Interleave(ConcurrencyTestTwoWrites.class)
  private void updatePostalAddress() {
   address.updatePostalAddress("Evergreen Terrace" , "Springfield");
  }
  @Interleave(ConcurrencyTestTwoWrites.class)
  private void updatePhoneNumber() {
   address.updatePhoneNumber("99 55 2222");
  } 
  @Test
  public void test() throws InterruptedException {
   Thread first  = new Thread( () -> {  updatePostalAddress();} ) ;
   Thread second = new Thread( () -> {  updatePhoneNumber();  } ) ;
   first.start();
   second.start();
   first.join();
   second.join(); 
   assertEquals(  "street=Evergreen Terrace,
   city=Springfield,phoneNumber=99 55 2222" , 
   address.toString() );
  }
}

In this test, the first thread updates the postal address, line 15 and the second thread updates the phone number, line 16. After both threads are stopped I check if the read address contains the new phone number and postal address, line 21 till 23.

If we run this test we see the following error:

org.junit.ComparisonFailure: 
	expected:<...ngfield,phoneNumber=[99 55 2222]> 
	but was:<...ngfield,phoneNumber=[456 77 99]>

The problem is that without synchronization a thread overrides the update from another thread leading to a race condition. By surrounding every write to the variable addressValue we avoid this race and this test also succeeds.

Comparison to read-write locks

Using copy on write, only writing threads get blocked by other writing threads. All other combinations are non-blocking. So reading threads get never blocked and writing threads are not blocked by a reading thread.

Compare this to read-write locks where reading threads get blocked by writing threads. And where writing threads not only get blocked by other writing threads but also by reading threads.

Conclusion

Copy on write let us update a class in a thread-safe way. The main advantage of this technique is that reading threads never block and that writing threads only get blocked by other writing threads. When you use this technique make sure that you always use a local variable when reading and a synchronized block when writing.

Java Concurrency: AtomicInteger

Wed, 24 Jul 2019 22:00:00 GMT

AtomicInteger is a class specially designed to update integers in a thread-safe way. Why do we need this a class? Why can we not simply use a volatile int? And how to use AtomicInteger?

Why AtomicInteger?

The following shows an example of a not thread-safe counter using a volatile int:

public class CounterNotThreadSafe {
	private volatile int count = 0;
	public void increment() {
		count++;
	}
	public int getCount() {
		return count;
	}	
}

You can download the source code of all examples from GitHub here.

We store the count in the volatile int count, line 2. We need the volatile keyword to make sure that the threads always see the current values, as explained in greater detail here. We increment the counter by using the ++ operation, line 4. To check if the class is thread-safe we use the following test:

public class ConcurrencyTestCounter {
	private final CounterNotThreadSafe counter = new CounterNotThreadSafe();
	@Interleave
	private void increment() {
		counter.increment();
	}
	@Test
	public void testCounter() throws InterruptedException {
		Thread first = new Thread( () ->    {  increment();  } ) ;
		Thread second = new Thread( () ->   {  increment();  } ) ;
		first.start();
		second.start();
		first.join();
		second.join();	
		assertEquals( 2 , counter.getCount());
	}
}

To test if the counter is thread-safe we need two threads, created in line 9 and 10. We start those two threads, line 11 and 12. And then wait till both are ended using thread join, line 13 and 14. After both threads are stopped we check if the count is two, line 15.

To test all thread interleavings we use the annotation Interleave, line 3, from vmlens. The Interleave annotation tells vmlens to test all thread interleavings for the annotated method. Running the test we see the following error:

ConcurrencyTestCounter.testCounter:22 expected:<2> but was:<1>

The reason for the error is that since the operation ++ is not atomic the two threads can override the result of the other thread. We can see this in the report from vmlens:

In the case of the error, both threads first read the variable count in parallel. And then both write to the variable. This leads to the wrong value 1.

To fix this bug we use the class AtomicInteger:

public class CounterUsingIncrement {
	private final AtomicInteger count = new AtomicInteger();
	public  void increment() {
		count.incrementAndGet();
	}
	public int getCount() {
		return count.get();
	}	
}

Instead of using an int we use AtomicInteger for the variable count, line 2. And instead of using the operation ++ we use the method incrementAndGet, line 4.

Now since the method incrementAndGet is atomic, e.g. the other thread always see the value either before or after the method call, the threads can not override the value of their calculation. So the count is now always 2, for all thread interleavings.

How to use AtomicInteger

The class AtomicInteger has multiple methods which allow us to update the AtomicInteger atomically. For example, the method incrementAndGet atomically increment the AtomicInteger and decrementAndGet decrement the AtomicInteger.

But the method compareAndSet is special. This method allows us to implement arbitrary calculations atomically. The compareAndSet method takes two parameters, the expected current value, and the new value. The method atomically checks if the current value equals the expected value. If yes the method updates the value to the new value and return true. If not the method leaves the current value unchanged and returns false.

The following example shows how to use the compareAndSet to implement our counter:

public  void increment() {
	int current = count.get();
	int newValue = current + 1;
	while( ! count.compareAndSet( current , newValue ) ) {
		current = count.get();
		newValue = current + 1;
	}
}

We first read the current value, line 2. Then we calculate the new value, line 3. And then we check using compareAndSet if another thread changed the current value, line 4. compareAndSet will update the current value if the current value is unchanged and return true. Otherwise, if the value was changed compareAndSet will return false. Since this test might fail multiple times, we need to use a while loop. If the value was changed by another thread, we need to get the current changed value, line 5. And then recalculate the new value, line 6 and try to update again.

Conclusion

AtomicInteger let us update integers in a thread-safe way. Use atomic methods like incrementAndGet or decrementAndGet for simple types of calculations. And use the methods get and compareAndSet for all other types of calculations.

Why is combining thread-safe methods an error?

Thu, 20 Jun 2019 22:00:00 GMT

Combining dependent thread-safe methods leads to race conditions. Only when the methods do not depend on each other, we can combine them in a thread-safe way.

Why is combining thread-safe methods an error? And what does this tell us about how to use thread-safe classes?

Thread safe methods must be atomic

A method is thread-safe if it can be called from multiple threads without external synchronization. To make this possible the thread-safe method must be atomic, e.g. other threads only see the state before or after the method call nothing in between. The following example shows why it is necessary that a thread-safe method is atomic:

public class TestCounter {
	private volatile int i = 0;
	@Interleave
	public void increment() {
	 i++;	
	}
	@Test
	public void testUpdate() throws InterruptedException	{
		Thread first = new Thread( () ->   {increment();} ) ;
		Thread second = new Thread( () ->   {increment();} ) ;
		first.start();
		second.start();
		first.join();
		second.join();
		
	}	
	@After
	public void checkResult() {
		assertEquals( 2 , i );
	}	
}

You can download the source code of all examples from Github here.

To test this I use a method which increments a counter, line 4. I use two threads which call the increment method, line 9 and 10. To test all thread interleavings I use the annotation Interleave, line 3, from vmlens. vmlens is a tool I have written to test multi-threaded Java software. The Interleave annotation tells vmlens to test all thread interleavings for the annotated method. Running the test we see the following error:

java.lang.AssertionError: expected:<2> but was:<1>

The reason for the error is that since the operation i++ is not atomic the two threads override the result of the other thread. We can see this in the report from vmlens:

So to make methods thread safe we must make them atomic.

Combining two dependent thread safe methods leads to race conditions

Now let us see what happens when we combine 2 atomic methods:

public class TestTwoAtomicMethods {
	private final ConcurrentHashMap<Integer,Integer> map =
	 	new ConcurrentHashMap<Integer,Integer>();
	@Interleave
	public void update()  {
			Integer result = map.get(1);		
			if( result == null )  {
				map.put(1, 1);
			}
			else	{
				map.put(1, result + 1 );
			}	
	}
	@Test
	public void testUpdate() throws InterruptedException	{
		Thread first  = new Thread( () -> { update();   }  );
		Thread second = new Thread( () -> { update();   }  );
		first.start();
		second.start();
		first.join();
		second.join();
		
	}	
	@After
	public void checkResult() {
		assertEquals( 2 , map.get(1).intValue() );
	}	
}

I use two methods from ConcurrentHashMap for the same key 1. The method update, line 6 till 12, first gets the value from the ConcurrentHashMap using the method get line 6. Than update increments the value and put it back using the method put, line 8 and 11.

Running the test we see the following error:

java.lang.AssertionError: expected:<2> but was:<1>

The reason for the error is that the combination of two atomic methods is not atomic. So for specific thread interleavings, one thread overrides the result of the other thread. We can see this in the report from vmlens:

Only use one atomic method

To fix this race condition we must replace the two methods with one atomic method. For our example, we can use the method compute witch executes the get and put in one atomic method:

public void update() {
	map.compute(1, (key, value) -> {
		if (value == null) {
			return 1;
		} 
		return value + 1;
	});
}

Conclusion

The thread-safe methods from the classes in the package java.util.concurrent only update a small amount of state atomically. This allows multiple threads to update different parts of the data structure simultaneously. They are carefully designed to allow simultaneous reads and writes to the same element without blocking. To use them correctly we must find the one atomic method which fits our need.

Concurrent programming: Two techniques to avoid shared state

Thu, 30 May 2019 22:00:00 GMT

The more I learn about multi-threaded programming, the harder it gets. Coordinating multiple threads modifying the same data is complicated. The java.util.concurrent.ConcurrentHashMap, for example, needs with 4264 lines of code two times more lines of code than its single threaded counterpart the java.uti.HashMap with 1617 lines of code.

So I think the best solution for concurrent programming is to avoid shared state.

The first technique to avoid shared state is to copy the data before each modification. This technique relies on the fact that updating a single variable is easy. Only when we need to update multiple variables things get complicated.

Copy before modification

The following shows this technique using a java.util.concurrent.locks.ReentrantLock for writing and a volatile field for reading:

final transient ReentrantLock lock = new ReentrantLock();
private transient volatile Object[] array;  
public E get(int index) {
     return (E) array[index];
}   
public E set(int index, E element) {
    final ReentrantLock lock = this.lock;
    lock.lock();
    try {
       Object[] elements = array;
       E oldValue = (E) elements[index];
       if (oldValue != element) {
          int len = elements.length;
          Object[] newElements = Arrays.copyOf(elements, len);
          newElements[index] = element;
          array = newElements;
      } else {
          // Not quite a no-op; ensures volatile write semantics
      array = elements;
       }
          return oldValue;
    } finally {
    lock.unlock();
    }
}

The example is based on the class java.util.concurrent.CopyOnWriteArrayList.

Reading consists of reading the volatile variable array and then the ith array element, line 4. Writing consists of Locking the ReentrantLock, line 7. Then if the current element and the new array element are no the same, creating a local copy of the array, line 14, modifying the local copy, line 15 and finally setting the array variable to the newly created array, line 16. Why we need a volatile variable instead of a normal variable is explained in this blog post.

The technique implies to copy the data structure each time we want to modify the data structure. Using so-called persistent data structures we can avoid copying the complete data structures. Persistent data structures keep the old state together with the modification. This post by the bifurcan project compares different libraries which implement persistent data structures.

Copying before modifying works best for workloads consisting of multiple reads and few writes. The second technique work best for the opposite type of workloads, multiple writes and only a few reads:

Asynchronous modification using a single thread

The idea of this technique is to use a single thread which owns the state and a messaging queue to let other threads modify the state through this thread. The following example shows how to implement this technique using the ExecutorService:

ExecutorService executor = Executors.newFixedThreadPool(1);
executor.execute( () -> modifyState() );
executor.execute( () -> modifyState() );
executor.shutdown();
executor.awaitTermination(10, TimeUnit.MINUTES);

To implement this technique with a single threaded ExecutorService we first need to create one, line 1. Then we can modify the state asynchronously using the execute method, line 3 and 4. When we are done we need to stop the ExecutorService, line 4 and wait till the service is terminated, line 5.

Martin Thompson recommends this technique in this blog post for highly contended data structures. The single event processing thread of UI libraries like SWING or SWT is another example usage of this technique. And the idea behind the Akka framework to use messaging between actors instead of shared state is similar to this technique.

What is next?

The two techniques presented here are described in further detail in the book Java Concurrency in Practice by Brian Goetz et al. This book is the best resource to learn concurrent programming.

To start with avoiding shared state we need to know which parts of our application uses shared state. I added a report in vmlens, a tool I developed to test multi-threaded Java, which shows which method uses shared state. This report is described here.

And if you are interested in writing concurrent collections like the mentioned ConcurrentHashMap read the book The Art of multiprocessor programming by Maurice Herlihy and Nir Shavit. This book explains how commonly used data structures can be implemented for multi-threaded access.

Why do we need the volatile keyword?

Wed, 15 May 2019 22:00:00 GMT

What fascinates me about the volatile keyword is that it is necessary because my software still runs on a silicon chip. Even if my application runs in the cloud on a virtual machine in the Java virtual machine. But despite all of those software layers abstracting away the underlying hardware, the volatile keyword is still needed because of the cache of the processor my software runs on.

The volatile keyword and the cache of modern processors

Processors cache the values from the main memory in per-core caches to improve the memory access performance. While a read from a CPU register takes approximately 300 picoseconds a read from the main memory take 50 - 100 nanosecond. By using a cache this time can be reduced to approximately one nanosecond. Numbers are taken from Computer Architecture, A Quantitative Approach, JL Hennessy, DA Patterson, 5th edition, page 72.

As pointed out by Reddit Commenters this Level 1 cache was already used in the i486 Procesor family.

Now the question is when should a core check if the cached value was modified in the cache of another core. This is done by the volatile field annotation. By declaring a field as volatile we tell the JVM that when a thread reads the volatile field we want to see the latest written value. The JVM than uses special instructions to tell the CPU that it should synchronize its caches. For the x86 processor family those instructions are called memory fences as described here.

The processor not only synchronizes the value of the volatile field but the complete cache. So if we read from a volatile field we see all writes on other cores to this variable and also the values which were written on those cores before the write to the volatile variable.

The volatile field in action

Now let us look at how this works in practice. Let us see if we read stale values when we use a field without volatile annotation:

public class Termination {
   private int v;
   public void runTest() throws InterruptedException   {
	   Thread workerThread = new Thread( () -> { 
		   while(v == 0) {
			   // spin
		   }
	   });
	   workerThread.start();
	   v = 1;
	   workerThread.join();  // test might hang up here 
   }
 public static void main(String[] args)  throws InterruptedException {
	   for(int i = 0 ; i < 1000 ; i++) {
		   new Termination().runTest();
	   }
   }	
}

When the writing thread updates the field v in one core and the reading thread reads the field v in another thread, the test should hang up and run forever. But at least when I run the test on my machine, the test never hangs up. The reason is that the test needs so few CPU cycles that both threads typically run on the same core. And when both threads run on the same core they read and write to the same cache.

Luckily the OpenJDK provides a tool, jcstress, which helps with this type of tests. jcstress uses multiple tricks that the threads of the tests run on different cores. Here the above example is rewritten as a jcstress test:

@JCStressTest(Mode.Termination)
@Outcome(id = "TERMINATED", expect = Expect.ACCEPTABLE, desc = "Gracefully finished.")
@Outcome(id = "STALE", expect = Expect.ACCEPTABLE_INTERESTING, desc = "Test hung up.")
@State
public class APISample_03_Termination {
    int v;
    @Actor
    public void actor1() {
        while (v == 0) {
            // spin
        }
    }
    @Signal
    public void signal() {
        v = 1;
    }
}

This test is from the jcstress examples. By annotating the class with the annotation @JCStressTest we tell jcstress that this class is a jcstress test. jcstress runs the methods annotated with @Actor and @Signal in a separate thread. jcstress first starts the actor thread and then runs the signal thread. If the test exits in a reasonable time, jcstress records the "TERMINATED" result, otherwise the result "STALE".

I have run this test on my development machine, once with a normal and once with a volatile field v. The test for the volatile field looked like this:

public class APISample_03_Termination {
   volatile int v;
   // methods omitted
}

jcstress runs the test case multiple times with different JVM parameters. Here are the results of this test on my development machine an Intel i5 4 core CPU using the test mode stress:

JVM options	Observed state	Occurrence non volatile	Occurrence volatile
-client	TERMINATED	10	8980294
-client	STALE	10	0
-server	TERMINATED	11	9040080
-server	STALE	10	0
-XX:TieredStopAtLevel=1	TERMINATED	8858074	9052777
-Xint	TERMINATED	8035685	8454639
-server, -XX:-TieredCompilation	TERMINATED	0	8563250
-server, -XX:-TieredCompilation	STALE	10	0
-client, -XX:-TieredCompilation	TERMINATED	3	8719757
-client, -XX:-TieredCompilation	STALE	10	0

As we see using fields without volatile annotation lead indeed to hung threads. The percentage of hung threads depends on the JVM flags and the environment, JDK version and so on. Please run this on your PC, you should see a different distribution between hung and completed runs.

When to use volatile fields

The volatile field is most often used as a flag to signal a specific condition like in the test above. Another usage of volatile fields is to use the volatile field for reading and locks for writing. Or you can use them with the JDK 9 VarHandle to achieve atomic operations. How to implement those techniques is described here.

The volatile field as an example of a happens-before relation

But typical I do not use volatile fields directly. I rather use data structures from the java.util.concurrent package for concurrent programming. Which internally use the volatile fields.

In the documentation of those classes we often read something about memory consistency effects and happens-before relation like in the following from the interface Future:

Memory consistency effects: Actions taken by the asynchronous computation happen-before actions following the corresponding Future.get() in another thread.

Now with our knowledge about the volatile field, we can decode this documentation. If we read from a volatile field we see all writes on other cores to this variable. In the words of the java.util.concurrent documentation we would say the read to a volatile variable creates a happen-before relation to the write to this variable. The term happen-before comes from the mathematical model which formalizes the effect of the volatile field. This model is described here.

So the above statement means that a Thread which calls Future.get() always sees the latest written values which were written by other Threads before calling another method of the interface Future.

Let us use the class FutureTask to transfer data between two threads as an example. FutureTask implements the interface Future so calling the method FutureTask.get() always sees the latest written value by another method, for example, FutureTask.set().

Here is a potential program flow to explain this: Thread A set variable x and y of object OA to one and calls FutureTask.set(OA). Now Thread B reads this object calling FutureTask.get() into the variable OB. To make the example more interesting Thread A now sets variable x to two. If Thread B reads variable y it surely sees value one, since the cache was synchronized between the call to FutureTask.set(OA) and FutureTask.get(). But for variable y Thread B reads one or two, depending on which cores the two Threads were running on.

In pseudo code this looks like this:

Thread A	Thread B
OA.x = 1
OA.y = 1
FutureTask.set(OA)
	OB = FutureTask.get()
OA.y = 2	OB.x == 1
	OB.y == 1 or OB.y == 2

Tools to detect missing volatile annotations

If you forget to declare a field as volatile a thread might read a stale value. But the chance to see this during tests is rather low. Since the read and the write must happen at almost the same time and on different cores to read a stale value, this happens only under heavy load and after a long run time, e.g. in production.

So it is no surprise that there exist tools to detect such a problem in test runs:

ThreadSanitizer: ThreadSanitizer can detect missing volatile annotations in C++ programs. There is a draft for a Java enhancement proposal, JEP draft: Java Thread Sanitizer to include ThreadSanitizerinto the OpenJDK JVM. This would allow us to find missing volatile annotations in the JVM and also in the by the JVM executed Java application.
vmlens: vmlens, a tool I have written to test concurrent java, can detect missing volatile annotations in Java test runs.

Conclusion

The volatile field is needed to make sure that multiple threads always see the newest value. Even when the cache system or compiler optimizations are at work. Reading from a volatile variable always returns the latest written value from this variable. The methods of most classes in the java.util.concurrent package also has this property. Often by using volatile fields internally.

Detecting visibility bugs in concurrent Java

Tue, 21 Aug 2018 22:00:00 GMT

Chances to detect visibility bugs vary. The following visibility bug can in the best case detected in 90 percent of all cases. In the worst case, the chance to detect the bug is lower than one in a million.
But first what are visibility bugs?

What are visibility bugs?

A visibility bug happens when a thread reads a stale value. In the following example a thread signals another thread to stop the processing of its while loop:

public class Termination {
   private int v;
   public void runTest() throws InterruptedException   {
	   Thread workerThread = new Thread( () -> { 
		   while(v == 0) {
			   // spin
		   }
	   });
	   workerThread.start();
	   v = 1;
	   workerThread.join();  // test might hang up here 
   }
 public static void main(String[] args)  throws InterruptedException {
	   for(int i = 0 ; i < 1000 ; i++) {
		   new Termination().runTest();
	   }
   }	
}

The bug is that the worker thread might never see the update of the variable v and therefore runs forever.

One reason, for reading stale values, is the cache of the CPU cores. Each core of modern CPUs has his own cache. So if the reading and writing thread runs on different cores the reading thread sees a cached value and not the value written by the writing thread. The following shows the cores and caches inside an Intel Pentium 4 CPU, from this superuser answer:

Each core of an Intel Pentium 4 CPU has its own level 1 and level 2 cache. All cores share a large level 3 cache. The reason for those caches is performance. The following numbers show the time needed to access the memory, from Computer Architecture, A Quantitative Approach, JL Hennessy, DA Patterson, 5th edition, page 72:

CPU Register ~ 300 picosecond
Level 1 Cache ~ 1 nanosecond
Main Memory ~ 50 - 100 nanosecond

Reading and writing to a normal field does not invalidate the cache. So if two threads on different cores read and write to the same variable they see stale values. Let us see if we can reproduce this bug.

How to reproduce a visibility bug

If you have run the above example chances are high that the test does not hang up. The test needs so few CPU cycles that both threads typically run on the same core. And when both threads run on the same core they read and write to the same cache. Luckily the OpenJDK provides a tool, jcstress, which helps with this type of tests. jcstress uses multiple tricks that the threads of the tests run on different cores. Here the above example is rewritten as jcstress test:

@JCStressTest(Mode.Termination)
@Outcome(id = "TERMINATED", expect = Expect.ACCEPTABLE, desc = "Gracefully finished.")
@Outcome(id = "STALE", expect = Expect.ACCEPTABLE_INTERESTING, desc = "Test hung up.")
@State
public class APISample_03_Termination {
    int v;
    @Actor
    public void actor1() {
        while (v == 0) {
            // spin
        }
    }
    @Signal
    public void signal() {
        v = 1;
    }
}

jcstress runs the test case multiple times with different JVM parameters. Here are the results of this test on my development machine an Intel i5 4 core CPU using the test mode stress.

JVM options	Observed state	Occurrence
None	TERMINATED	16
None	STALE	10
-XX:-TieredCompilation	TERMINATED	1
-XX:-TieredCompilation	STALE	10
-XX:TieredStopAtLevel=1	TERMINATED	8776026
Xint	TERMINATED	9058042

As we see for the JVM parameter-XX:-TieredCompilation the thread hangs up in 90 percent of all cases. But for the JVM flags -XX:TieredStopAtLevel=1 and -Xint the thread terminated in all runs.

After confirming that indeed our example contains a bug, how can we fix it?

How to avoid visibility bugs?

Java has specialized instructions which guarantee that a thread always sees the latest written value. One such instruction is the volatile field modifier When reading a volatile field a thread is guaranteed to see the last written value. The guarantee not only applies to the value of the field but to all values written by the writing thread before the write to the volatile variable. Adding the field modifier volatile to the field v from the above example makes sure that the while loop terminates always. Even if run it in a test with jcstress.

public class Termination {
   volatile int v;
   // methods omitted
}

The volatile field modifier is not the only instruction which gives such visibility guarantees. For example the synchronized statement and classes in the package java.util.concurrent give the same guarantees. A good read to learn about techniques to avoid visibility bugs is the book JavaConcurrency In Practice by Brian Goetz et al.

After seeing why visibility bugs happen and how to reproduce and avoid them let us look at how to find them.

How to find visibility bugs?

The Java Language Specification Chapter 17. Threads and Locks defines the visibility guarantees of the Java instructions formally. This specification defines a so-called happens before relation to define the visibility guarantees:

Two actions can be ordered by a happens-before relationship. If one action happens-before another, then the first is visible to and ordered before the second.

And the reading from and writing to a volatile field creates such a happens-before relation:

A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.

Using this specification we can check if a program contains visibility bugs, called data race in the specification.

When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race. Two accesses to (reads of or writes to) the same variable are said to be conflicting if at least one of the accesses is a write.

Looking at our example we see that there is no happens-before relation between the read and the write to the shared variable v. So this example contains a data race according to the specification.

Of course, this reasoning can be automated. The following two tools use this rules to automatically detect visibility bugs:

ThreadSanitizer uses the rules of the c++ memory model to find visibility bugs in c++ applications. The c++ memory model consists of formal rules to specify the visibility guarantees of c++ instructions similar to what the Java Language Specification does for the Java instructions. There is a draft for a Java enhancement proposal, JEP draft: Java Thread Sanitizer to include ThreadSanitizerinto the OpenJDK JVM. The use of ThreadSanitizer should be enabled through a command line flag.
vmlens, a tool I have written to test concurrent java, uses the Java Language Specification to automatically check if a Java test run contains visibility bugs.

AtomicReference, a sometimes easier alternative to synchronized blocks

Wed, 25 Apr 2018 22:00:00 GMT

Brian Goetz lists AtomicReference in his the book Java Concurrency in Practice in the in the section advanced topics. Yet we will see that AtomicReference are, for specific use cases, easier to use than synchronized blocks. And the new JDK 8 getAndUpdate and updateAndGet methods make AtomicReference, even more, easier to use.

Let us start with the first topic, a use case which can be easier implemented by an AtomicReference than by a synchronized block: A concurrent state machine.

How to use compareAndSet: A concurrent state machine

The class CommandReader from the maven surefire plugin uses the compareAndSet method to implement a concurrent state machine:

public final class CommandReader {
  private static final CommandReader READER = new CommandReader();
  private final Thread commandThread = 
    newDaemonThread( new CommandRunnable(), "surefire-forkedjvm-command-thread" );
  private final AtomicReference<Thread.State> state =
   new AtomicReference<Thread.State>( NEW );
  public static CommandReader getReader() {
     final CommandReader reader = READER;
     if ( reader.state.compareAndSet( NEW, RUNNABLE ) ) {
         reader.commandThread.start();
     }
   return reader;
  }
}

The method getReader must start the commandThread when the current state is NEW and update its value to RUNNABLE. Since the method can be called by multiple threads in parallel, setting and checking must be done atomically. This is done by the method compareAndSet, line 9. The compareAndSet method only updates its value to the new value when the current value is the same as the expected. In the example, it only updates the variable to RUNNING when the current value is NEW. If the update succeed the method returns true and the thread gets started, otherwise, it returns false and nothing happens. The check and update are done atomically.

Here is, as a comparison, the same functionality implemented with a synchronized block.

public final class CommandReader {
 private static final CommandReader READER = new CommandReader();
 private final Thread commandThread = 
   newDaemonThread( new CommandRunnable(), "surefire-forkedjvm-command-thread" );
 private final  Thread.State state =  NEW;
 private final Object LOCK = new Object();
 public static CommandReader getReader()  {
    final CommandReader reader = READER;
    synchronized(reader.LOCK) {
      if(reader.state == NEW) {
     	 reader.commandThread.start();
     	 reader.state = RUNNABLE;
      }
    }
    return reader;
 }
}

We use a synchronized block around the check and update of the variable state, line 10. This example shows that the check and update must be atomic. Without synchronization, two threads might read a state NEW calling the start method from the commandThread multiple times.

As we see we can replace the synchronized block, the if statement and the write to the state by one method call to compareAndSet. In the next example, we see how to use the compareAndSet method to update values.

Updating values: Retry till success

The idea behind using compareAndSet for updates is to retry till the update succeeds. The class AsyncProcessor from RXJava uses this technique to update an array of subscribers in the method add:

final AtomicReference<AsyncSubscription<T>[]> subscribers;
boolean add(AsyncSubscription<T> ps) {
 for (;;) {
   AsyncSubscription<T>[] a = subscribers.get();
   if (a == TERMINATED) {
     return false;
   }
   int n = a.length;
   @SuppressWarnings("unchecked")
   AsyncSubscription<T>[] b = new AsyncSubscription[n + 1];
   System.arraycopy(a, 0, b, 0, n);
   b[n] = ps;
   if (subscribers.compareAndSet(a, b)) {
     return true;
    }
  }
}

The update is retried using a for loop, line 3. The loop is only terminated if either the subscriber array is in the state terminated, line 6, or the compareAndSet operation succeeds, line 14. In all other cases, the update is repeated on a copy of the array.

Starting with JDK 8 the class AtomicReference provides this functionality in the two utility methods getAndUpdate and updateAndGet. The following shows the implementation of the getAndUpdate method in JDK 8:

public final V getAndUpdate(UnaryOperator<V> updateFunction) {
 V prev, next;
 do {
   prev = get();
   next = updateFunction.apply(prev);
 } while (!compareAndSet(prev, next));
 return prev;
}

The method uses the same technique as the add method from the class AsyncProcessor. It retries the compareAndSet method in a loop, line 6. The updateFunction will be called multiple times when the update fails. So this function must be either side effect free.

And here is the add method from above implemented with the new updateAndGet method:

boolean add(AsyncSubscription<T> ps) {
AsyncSubscription<T>[] result = subscribers.updateAndGet(  ( a ) ->  {  
  if (a != TERMINATED) {	   
    int n = a.length;
    @SuppressWarnings("unchecked")
    AsyncSubscription<T>[] b = new AsyncSubscription[n + 1];
    System.arraycopy(a, 0, b, 0, n);
    b[n] = ps;
    return b;
  }
  else {
    return a;
  }
});
return result != TERMINATED;	
}

As we see the while loop is hidden in the updateAndGet method. We only need to implement a function calculating a new value from an old one.

Conclusion and next steps

We have seen two examples of compareAndSet. If you are interested in more examples take a look at the book The Art of Multiprocessor Programming. It shows you how to implement typical concurrent data structures using compareAndSet. And this article shows how to test atomic updates.

I would be glad to hear from you about how you use AtomicReference in your application.

Why do we need 4 classes for atomic updates in Java?

Mon, 23 Apr 2018 22:00:00 GMT

Atomic updates are an advanced technique, typically used in high performant concurrent data structures. Atomic updates are for example heavily used in the package java.util.concurrent. So why do we need 4 classes for atomic updates in Java?

By looking at each class we will see which class should be used for what. And by looking at the reason for those 4 classes we see what to look for when implementing a high and low-level API. But let us start with the easiest of the four classes: AtomicReference

First Class: AtomicReference

The easiest way to call compare and set is to use the method compareAndSet from the class AtomicReference. The class CommandReader from the maven surefire plugin uses the compareAndSet method of AtomicReference to implement a concurrent state machine:

public final class CommandReader {
  private static final CommandReader READER = new CommandReader();
  private final Thread commandThread = 
    newDaemonThread( new CommandRunnable(), "surefire-forkedjvm-command-thread" );
  private final AtomicReference<Thread.State> state =
   new AtomicReference<Thread.State>( NEW );
  public static CommandReader getReader() {
     final CommandReader reader = READER;
     if ( reader.state.compareAndSet( NEW, RUNNABLE ) ) {
         reader.commandThread.start();
     }
   return reader;
  }
}

The class AtomicReference wraps another class to enrich a variable with atomic update functionality. In line 5 the AtomicReference represents an atomic variable of the Enum type Thread.State. The AtomicReference gets initialized in line 6 to the value NEW. The compareAndSet method only updates its value to the new value when the current value is the same as the expected. In the example, it only updates the variable to RUNNING when the current value is NEW. If the updated succeed the method returns true and the thread gets started, otherwise, it returns false and nothing happens.

The disadvantage of AtomicReference is that for each Object we want to update atomically we need a separate AtomicReference instance. With the OpenJDK tool jol, we see that for our example the AtomicReference costs two-thirds of the object Thread.State:

public static void main(String[] args) {
  out.println(VM.current().details());
  out.println(ClassLayout.parseClass(AtomicReference.class).toPrintable());
  out.println(ClassLayout.parseClass(Thread.State.class).toPrintable());
}

And here the output:

# Running 64-bit HotSpot VM.
java.util.concurrent.atomic.AtomicReference object internals:
Instance size: 16 bytes
java.lang.Thread$State object internals:
Instance size: 24 bytes

Therefore Java provides a second class, AtomicReferenceFieldUpdater, to call compareAndSet using reflection.

Second Class: AtomicReferenceFieldUpdater

The AtomicReferenceFieldUpdater uses reflection to access the field to update atomically:

private volatile Thread.State state = NEW;
private static final AtomicReferenceFieldUpdater<AtomicReferenceExample, Thread.State> 
 ATOMIC_STATE_UPDATER =  AtomicReferenceFieldUpdater.
    newUpdater(AtomicReferenceExample.class, Thread.State.class, "state");
public void update() {
	ATOMIC_STATE_UPDATER.compareAndSet(this, NEW, RUNNABLE);
}

The volatile field which should be updated atomically is declared in line 1. A new AtomicReferenceFieldUpdater is created with the name of the field the class of the field and the class containing the field, line 3. The field holding the AtomicReferenceFieldUpdater is static final since we need only one AtomicReferenceFieldUpdater for all objects. To update a field we call compareAndSet with the object we want to update, the expected and the new value, line 5.

Third Class: sun.misc.Unsafe

To see the third way let us look at the implementation of AtomicReferenceFieldUpdater.

 private static final class AtomicReferenceFieldUpdaterImpl<T,V>
      extends AtomicReferenceFieldUpdater<T,V> {
  private static final Unsafe unsafe = Unsafe.getUnsafe();
  private final long offset;
  AtomicReferenceFieldUpdaterImpl(final Class<T> tclass,
                                        final Class<V> vclass,
                                        final String fieldName,
                                        final Class<?> caller) {
    final Field field;
    field = AccessController.doPrivileged(
       new PrivilegedExceptionAction<Field>() {
          public Field run() throws NoSuchFieldException {
             return tclass.getDeclaredField(fieldName);
          }
    });
    // parameter checks and exception handling omitted 
    offset = unsafe.objectFieldOffset(field);
  }
  public boolean compareAndSet(T obj, V expect, V update) {
    if (obj == null || obj.getClass() != tclass || cclass != null ||
       (update != null && vclass != null &&
    vclass != update.getClass()))
    updateCheck(obj, update);
    return unsafe.compareAndSwapObject(obj, offset, expect, update);
  }
}

AtomicReferenceFieldUpdater is a wrapper around the class sun.misc.Unsafe, adding security checks and making the API easier. To use the method compareAndSwapObject from sun.misc.Unsafe we need the offset of the field we want to update. The offset is calculated in the constructor line 17. To update an object we call compareAndSwapObject with the object we want to update, the offset, the expected state and the new state, line 24.

Usage Count

Many people do what we did. They look at the implementation of AtomicReference and AtomicReferenceFieldUpdater. And sometimes they end up using sun.misc.Unsafe instead of AtomicReference or AtomicReferenceFieldUpdater. To see how often, let us look at the GitHub open source projects using google bigquery.
google bigquery allows you to query GitHub projects using SQL. We use the following query to get the usage count:

SELECT count(*) FROM (
      SELECT SPLIT(repo.content, '\n') line
      FROM [fh-bigquery:github_extracts.contents_java] as repo
      HAVING REGEXP_MATCH(line, [Search Key]) 
)

The used table fh-bigquery:github_extracts.contents_java was last updated Jan 19, 2017. Here are the results for the three search keys:

Search Key	Count
java.util.concurrent.atomic.AtomicReference;	20746
compareAndSwapObject	3454
java.util.concurrent.atomic.AtomicReferenceFieldUpdater;	896

AtomicReference is most often used, followed by the usage of Unsafe. AtomicReferenceFieldUpdater, on the other hand, is only seldom used.

I use compareAndSwapObject as the search key for sun.misc.Unsafe since The sun.misc.Unsafe provides methods for multiple use cases. And compareAndSwapObject uniquely identifies the usage sun.misc.Unsafe for an atomic update of a reference variable.

Fourth Class: VarHandle

But with the release of JDK 9, Oracle wanted to clean up. The first idea to simply forbid the usage of sun.misc.Unsafe using the new module system led to a public outcry in the community. This outcry led to an update to the JDK Enhancement Proposal 260: Encapsulate Most Internal APIs. It now states that all methods which have a replacement in JDK 9 will be deprecated and removed in a later release. The replacement in JDK 9 for objectFieldOffset of sun.misc.Unsafe is the class VarHandle.

The JDK Enhancement Proposal for the class VarHandle is JEP 193 : Variable Handles. I think the most important sentence for a higher usage of the class VarHandle instead of sun.misc.Unsafe from the community is the penultimate sentence of this JEP:

The classes in java.util.concurrent (and other areas identified in the JDK) will be migrated from sun.misc.Unsafe to VarHandle.

The following shows the usage of a VarHandle to update our state variable:

private volatile Thread.State state = NEW;
private static final VarHandle ATOMIC_STATE_UPDATER;
static {
    try {
    MethodHandles.Lookup l = MethodHandles.lookup();
    ATOMIC_STATE_UPDATER = 
        l.findVarHandle(VarHandleExample.class, "state", Thread.State.class);
    } 
    catch (ReflectiveOperationException e) {
         throw new Error(e);
    }
}
public void update() {
    ATOMIC_STATE_UPDATER.compareAndSet(this, NEW, RUNNABLE);
}

A VarHandle works almost the same as AtomicReferenceFieldUpdater. The volatile field which should be updated atomically is declared in line 1. We create a new VarHandle with reflection by calling findVarHandle of the class MethodHandles.Lookup, line 7. We need only one VarHandle for all Instances of VarHandleExample, so we can declare the variable ATOMIC_STATE_UPDATER as static final, line 2. To update a field we call compareAndSet with the object we want to update, the expected and the new value, line 14.

Summary and Next Steps

If you want to use compare and set in your application use AtomicReference. It is the easiest class of the four. This post shows that AtomicReference is sometimes even easier to use than a synchronized block. And if you want to learn how to implement hight performant concurrent data structures using AtomicReference read the book The Art of Multiprocessor Programming.

Our 4 classes show that if you want to provide low and high-level API, do not offer an API with medium complexity like AtomicReferenceFieldUpdater. And the low-level API should be so useful that it is used in your own classes, like VarHandle in the package java.util.concurrent. But I think the main lesson is, nobody gets all classes right all the time.

Waiting for tasks with Phaser

Tue, 03 Apr 2018 22:00:00 GMT

The class Phaser let you wait for a flexible amount of tasks executed in other threads. Use the method register to add a task you want to wait for. Call arrive to signal that a registered task is finished. And call awaitAdvance to wait till all registered tasks are finished. We will see how to use it by looking at a real-life example. But first, how does it work?

How does Phaser work?

The class Phaser works in phases. You register tasks for all phases by calling the method register. You signal that a task is finished for this phase by calling arrive. When all registered tasks have arrived for this phase the Phaser starts a new phase and you can start over again. The following shows this for the first phase, phase zero:

		Phaser phaser = new Phaser();
		assertEquals(  0 , phaser.register() );
		assertEquals(  0 , phaser.arrive()   );

Both methods return the current phase, phase 0 in the example. By calling arrive, line 3 in the example above, the Phaser starts a new phase 1:

		assertEquals(  1 , phaser.getPhase() );

And we can start over again calling arrive to start a new phase:

		assertEquals(  1 , phaser.arrive()   );		
		assertEquals(  2 , phaser.getPhase() );

The Phaser let us wait for other threads by the method awaitAdvance. awaitAdvance let a thread wait till the Phaser reaches a new phase. You call the awaitAdvance method with the current phase to wait for other threads:

phaser.awaitAdvance( phaser.getPhase() );  // waits for phaser.arrive() in other threads

awaitAdvance returns immediately if the current phase is not equal to the given phase value:

phaser.awaitAdvance( phaser.getPhase()  + 1); // returns immediately

Deregistering Tasks and the terminal state

The Phaser let you deregister tasks by calling arriveAndDeregister:

phaser.arriveAndDeregister();

When all registered tasks are deregistered the Phaser gets terminated:

assertEquals(  true , phaser.isTerminated() );

When the Phaser is terminated arrive and register have no effect and return a negative number. And the method await returns immediately. You can change this behavior by overriding the method onAdvance:

 Phaser phaser = new Phaser() {
   protected boolean onAdvance(int phase, int parties) { return false; }
 }

By always returning false as in the example above, the Phaser can only be terminated by calling the method forceTermination explicitly.

Example: Waiting for other threads

The class ChangedFilesCollector from the IntelliJ community edition uses the Phaser to wait till all threads have reached a specific state. The used Phaser overrides the onAdvance method, to allow to reuse the Phaser when all tasks are deregistered:

 private final Phaser myWorkersFinishedSync = new Phaser() {
      @Override
      protected boolean onAdvance(int phase, int registeredParties) {
        return false;
      }
    };

In the method processFilesInReadAction this Phaser is used to wait till all threads have finished their tasks:

   private void processFilesInReadAction() {
      assert ApplicationManager.getApplication().isReadAccessAllowed();
      myWorkersFinishedSync.register();
      int phase = myWorkersFinishedSync.getPhase();
      try {
        // other statements omitted
      }
      finally {
        myWorkersFinishedSync.arriveAndDeregister();
      }
      myWorkersFinishedSync.awaitAdvance(phase);
    }

In line 3 the method register registers a new task. The variable phase remembers the current phase in line 4. In line 9 the method arriveAndDeregister signals that the task is done and deregisters the task. And in line 11 we wait for the other threads using the method awaitAdvance.

Other classes to wait for threads

Java provides three classes to wait for other threads: Phaser, CountDownLatch, and CyclicBarrier. Use Phaser when you need to wait for a flexible amount of threads. When you need to wait for a fixed amount of tasks done in other threads use CountDownLatch instead. And use CyclicBarrier when you do the work and need to wait in the same threads for a fixed amount of threads.

Next Steps

This was the last java.util.concurrent class to wait for other threads. In the next blog post, we will look at the classes in the package java.util.concurrent.atomic.

I would be glad to hear from you about how you use Phaser in your application.

Waiting for another thread with CyclicBarrier: 2 Real-life examples

Wed, 28 Mar 2018 22:00:00 GMT

The CyclicBarrier let a set of threads wait till they have all reached a specific state. Initialize the CyclicBarrier with the number of threads which need to wait for each other. Call await to signal that you are ready to proceed and wait for the other threads. We will see how to use it by looking at two real-life examples.

How to use CyclicBarrier

The first use case of CyclicBarrier is to signal that we are ready to proceed and wait for the other threads. The class DistributedTxCommitTask from the Blazegraph open source graph database uses the CyclicBarrier to coordinate multiple threads. The following shows the creation of the CyclicBarrier:

preparedBarrier = new CyclicBarrier(nservices,
            new Runnable() {
                public void run() {
			// Statements omitted
                }
            });

We need to wait for nservices threads so we initialize the CyclicBarrier with this variable, line 1. To execute an action after all threads have called await, we initialize the CyclicBarrier with a Runnable, line 2. The run method of the Runnable class will be called by the last thread calling await. Only after the run method was executed the threads can continue.

When we are ready we call await, signaling that we are ready and waiting for the other threads:

try {
		// Statements omitted
		preparedBarrier.await();
		// Statements omitted
} finally {               
       if (preparedBarrier != null)
             preparedBarrier.reset();
}

As we see, we execute the task of the thread a try block, line 1 to 4. And we call the reset method of the CyclicBarrier in a finally block, line 7. This makes sure that in the case of an exception other threads are not waiting infinitely but receive a BrokenBarrierException exception.

Handling of exceptions

The BrokenBarrierException allows us to signal the other waiting threads that we can not finish our task and it does not make sense to wait any longer. The BrokenBarrierException gets thrown when

Another thread was interrupted by calling Thread.interrupt while waiting
CyclicBarrier reset was called while other threads were still waiting at the barrier.

In the case of Thread.interrupt, the thread who gets interrupted, will receive an InterruptedException while waiting and all other threads waiting will receive a BrokenBarrierException. So both exception signal that we should give up our task.

Waiting in cycles

The second use case of CyclicBarrier is to wait in cycles. The MoreExecutorsTest test from guava, the Google core libraries for Java, shows how to do this. The following shows the first two cycles of this test:

 public void testDirectExecutorServiceServiceTermination() throws Exception {
    final ExecutorService executor = newDirectExecutorService();
    final CyclicBarrier barrier = new CyclicBarrier(2);
    Thread otherThread =
        new Thread(
            new Runnable() {
              public void run() {
                try {
                  Future<?> future =
                      executor.submit(
                          new Callable<Void>() {
                            public Void call() throws Exception {
                              // WAIT #1
                              barrier.await(1, TimeUnit.SECONDS);
                              // WAIT #2
                              barrier.await(1, TimeUnit.SECONDS);
                              assertTrue(executor.isShutdown());
                              assertFalse(executor.isTerminated());
                              // Next cycle omitted
                            }
                          });
				  // checks omitted
                } catch (Throwable t) {
                  throwableFromOtherThread.set(t);
                }
              }
            });
    otherThread.start();
    // WAIT #1
    barrier.await(1, TimeUnit.SECONDS);
    assertFalse(executor.isShutdown());
    assertFalse(executor.isTerminated());
    executor.shutdown();
    assertTrue(executor.isShutdown());
    assertFalse(executor.isTerminated());
    // WAIT #2
    barrier.await(1, TimeUnit.SECONDS);
    // Next cycle omitted
  }

We initialize the CyclicBarrier for two threads, line 3. The first cycle ends when both threads call await, line 14 and 30. This also starts the second cycle. The second cycle ends when again both threads call await, line 16 and line 37. This allows you two test a process in multiple steps.

Other classes to wait for threads

Java provides three classes to wait for other threads: CyclicBarrier, CountDownLatch, and Phaser. Use CyclicBarrier when you do the work and need to wait in the same threads. When you need to wait for tasks done in other threads use CountDownLatch instead. To use CyclicBarrier or CountDownLatch you need to know the number of threads when you call the constructor. If you need to add threads after construction time, use the class Phaser.

Summary and next steps

Use the following steps to wait till a set of threads have reached a specific state with CyclicBarrier:

Initialize the CountDownLatch with the number of threads you are waiting for.
Call await to signal that you are ready and wait for the other threads.
Call reset in the finally block. This makes sure that other threads receive a BrokenBarrierException instead of waiting infinitely when this thread can not finish its task.
BrokenBarrierException and InterruptedException both signal that we should give up our task.

To use CyclicBarrier you need to know the number of threads working on the task in advance. We will look at the class Phaser next, which allows us to register threads after construction..

I would be glad to hear from you about how you use CyclicBarrier in your application.

Waiting for another thread with CountDownLatch: 2 Real-life examples

Tue, 20 Mar 2018 23:00:00 GMT

CountDownLatch is an easy way to wait till another thread has finished a task. The CountDownLatch is initialized by the number of threads we need to wait for. After a thread has finished its task it calls countDown, counting down the countdown. A thread can wait till the countdown reaches zero by calling await. We will see how to use it by looking at two real-life examples.

How to use CountDownLatch: Waiting for initialization

The apache ignite CacheDataStructuresManager needs to wait till it is initialized. One thread initializes the CacheDataStructuresManager so we need a CountDownLatch with count one:

private final CountDownLatch initLatch = new CountDownLatch(1);

The initialization is done in the onKernalStart0() method:

    @Override protected void onKernalStart0() 
                        throws IgniteCheckedException {
        try {
            queueHdrView = cctx.cache();
            initFlag = true;
        }
        finally {
            initLatch.countDown();
        }
    }

The countDown method is called in the finally block. So we guarantee that the countdown eventually reaches zero and we do not wait forever. To signal that the initialization was successful we use the boolean flag initFlag.

The method waitInitialization is used to check if the initialization was successful:

    private void waitInitialization() 
             throws IgniteCheckedException {
        if (initLatch.getCount() > 0)
            U.await(initLatch);
        if (!initFlag)
            throw new IgniteCheckedException(
               "DataStructures manager was not properly initialized.");
    }

We check the count of the CountDownLatch waiting if necessary in the await method from IgniteUtils. After that, we check the initFlag if the initialization was successful.

The method await from IgniteUtils handles the InterruptedException from the CountDownLatch await method:

    public static void await(CountDownLatch latch) 
                  throws IgniteInterruptedCheckedException {
        try {
            if (latch.getCount() > 0)
                latch.await();
        }
        catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw new IgniteInterruptedCheckedException(e);
        }
    }

In the example the interrupt flag is restored by calling Thread.currentThread().interrupt() and a new checked Exception is thrown.

Handling of InterruptedException

The await method of the CountDownLatch throws an InterruptedException when the method Thread.interrupt was called. the CountDownLatch clears the interrupt flag when it throws the InterruptedException. Therefore the book Java Concurrency in Practice - Brian Goetz, et al. recommends to either propagate the InterruptedException or to restore the interrupt flag.

The guava library provides the utility method awaitUninterruptibly in the class Uninterruptibles which implements the second recommendation. It restores the interrupt flag. Using this method you make sure that another blocking method will again throw an InterruptedException.

How to use await with a timeout: Testing multithreaded software.

The next example shows how to call the second method to wait for other threads, await with a timeout: This method is typically used in tests, like in this example in the PostServletTest from jetty:

    @Test
    public void testBadPost() throws Exception
    {
        StringBuilder req = new StringBuilder(16*1024);
        // creation of the request String omitted
        String resp = connector.getResponse(req.toString());
        assertThat(resp,startsWith("HTTP/1.1 200 OK")); 
        assertTrue(complete.await(5,TimeUnit.SECONDS));
        assertThat(ex0.get(),not(nullValue()));
        assertThat(ex1.get(),not(nullValue()));
    }

The await method gets called with the amount we want to wait together with the unit for the amount, line 8. In our example, we want to wait no more than 5 seconds. The test fails when the await method returns false because of the timeout.

Other classes to wait for threads

Java provides three classes to wait for other threads: CountDownLatch, CyclicBarrier, and Phaser. When you need to wait for tasks done in other threads use CountDownLatch. Use CyclicBarrier when you do the work and need to wait in the same threads. To use CyclicBarrier or CountDownLatch you need to know the number of threads when you call the constructor. If you need to add threads after construction time, use the class Phaser.

Summary and next steps

When you want to wait for another thread with CountDownLatch follow these steps:

Initialize the CountDownLatch with the number of threads you are waiting for.
Call the countDown method in the finally block when a thread has finished its task.
Use a flag or counter to signal that the tasks were successful.
Wait for the threads with the method await. Propagate the InterruptedException thrown by await or restore the interrupted flag when you catch the InterruptedException.

The CountDownLatch can only be used once. We will look at the CyclicBarrier next, which can be used multiple times. I would be glad to hear from you about how you use CountDownLatch in your application.

Concurrent Java: Low scalability, the risk of deadlocks or garbage creation, you can not avoid all

Mon, 12 Mar 2018 23:00:00 GMT

On the example of three concurrent map implementations, we will see that we either have low scalability, the risk of deadlocks or unnecessary garbage creation. Locks lead to low scalability when we use only one, or the risk to deadlock when using multiple locks. Compare and swap operations and immutable data structures on the other side lead to unnecessary object creation which causes more garbage collection cycles.

So what technique to choose? If we have a small data structure we can use an immutable class. And if our data structure is not performance critical, we can use a single lock. In all other cases, I would start with multiple locks since it is easier to implement than compare and swap operations. But let us first start with the easiest case, a hash map synchronized with a single lock.

A single lock: Low scalability

As the first example we look at a concurrent map created by Collections.synchronizedMap:

private Map<Integer,Integer> map = 
		Collections.synchronizedMap(new HashMap<Integer,Integer>());

In our examples we use the compute and put method to update our concurrent maps:

public void update12()
{
	map.compute( 1, ( key ,value ) ->  {  
		map.put( 2 , 1); 
		return value; 
	});
}

When we look at the implementation of the compute method we see it is a wrapper which puts a synchronization block around an underlying map:

public V compute(K key,
                BiFunction<? super K, ? super V, ? extends V> remappingFunction) {
      synchronized (mutex) {return m.compute(key, remappingFunction);}
}

The synchronization block acts as a lock. Since we are only using one monitor, the object in the mutex variable, we have only one lock. The lock makes sure that only one thread at a time accesses the underlying map. This makes sure that threads do not see an inconsistent state and that only one thread at a time modifies the data structure. But it also leads to low scalability, since the threads must wait for each other.

Multiple locks: Risk of deadlocks

As next example let us look at a map implementation using one lock per hash array element, the ConcurrentHashMap.

private Map<Integer,Integer> map = 
		new ConcurrentHashMap<Integer,Integer>();

The ConcurrentHashMap stores all its data in a hash array. The ConcurrentHashMap uses those array elements as locks when we update the map. Both put and compute methods use locks. Let us again look at the compute method to see how this is done:

    public V compute(K key,
                     BiFunction<? super K, ? super V, ? extends V> remappingFunction) {
        // Parameter checks omitted
        int h = spread(key.hashCode()); // Spreads higher bits of hash to lower
        V val = null;
        int delta = 0;
        int binCount = 0;
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
            else if ((f = tabAt(tab, i = (n - 1) & h)) == null) {
               // Handling of missing elements omitted
            }
            else if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            else {
                synchronized (f) {
				// Statements omitted
				// The remappingFunction method gets called in this block 
            }
        }
		// Incrementing of hash map size omitted
        return val;
    }

Here again a synchronized block, in line 18, acts as the lock. But this time multiple locks are used since the synchronization happens one multiple objects held by the variable f. The variable f contains the return value from the method tabAt, line 12. This method returns the hash array element at index (n-1) & h. The operation index (n-1) & gives you the hash code of the key modulo the size of the hash array.

So which lock is used depend on the key we use. If we use a new method which updates the values in reverse order like the following update21 method, we create a deadlock.

public void update21()
{
	map.compute( 2, ( key ,value ) ->  {  
		map.put( 1 , 1); 
		return value; 
	});
}

update21 locks array element 2, the key for the compute call and requests a lock on element 1, the key used for the put call. But update12 holds a lock on array element 1, by the compute call, and requests a lock on array element 2 while, for the put call. If we call both methods in parallel two threads are requesting a lock held by another thread: A deadlock.

How to avoid deadlocks

The example gives us a hint what rules to follow to avoid deadlock. The first rule:

try to minimize the number of potential locking interactions, and follow and document a lock ordering protocol for locks that may be acquired together. — Java Concurrency in Practice - Brian Goetz, et al.

We violated this rule in our example. Our update21 method even accessed the locks in reverse order on purpose. And the second:

Release all locks before invoking unknown code. — Is Parallel Programming Hard, And, If So, What Can You Do About It? - Paul E. McKenney

That the compute method of the ConcurrentHashMap does not release all its locks before calling the callback function is a clear disadvantage of the compute method. At least it is documented in the javadoc of the method.

Compare and swap operation: Garbage creation

As the third example we look at a map implementation using a compare and swap operation for updating, the ConcurrentSkipListMap:

private Map<Integer,Integer> map = 
		new ConcurrentSkipListMap<Integer,Integer>();

All modern CPU provide instructions to atomically compare and swap or increment and get values. Those operations are used internally by the JVM to implement synchronized locks.

The compare and swap method checks the value of a variable and only updates it when the value is the same as the expected value. Compare and swap operations can be called in Java by using either the classes in java .util.concurrent.atomic or sun.mis.Unsafe or in JDK 9 VarHandles. Let us look at the compute method to see how this operation is used.

 public V compute(K key,
                     BiFunction<? super K, ? super V, ? extends V> remappingFunction) {
        // Parameter checks omitted
        for (;;) {
            Node<K,V> n; Object v; V r;
            if ((n = findNode(key)) == null) {
                // Handling of missing elements omitted
            }
            else if ((v = n.value) != null) {
                @SuppressWarnings("unchecked") V vv = (V) v;
                if ((r = remappingFunction.apply(key, vv)) != null) {
                    if (n.casValue(vv, r))
                        return r;
                }
                else if (doRemove(key, vv) != null)
                    break;
            }
        }
        return null;
    }

In line 12 the compare and swap operation is called through the method casValue. This method only updates the value of the variable n to the new value r when its current value is still vv. The variable n holds the node for our key, set in line 6. The new value r is calculated using the remappingFunction in line 11. And the variable vv holds the old value of the node n. If the method casValue successfully updates the value of n it returns true otherwise false.

So we first remember the old value of the node we want to update. Then we create the new value we want to set. And then we update the node only when it is still containing the old value. So we make sure that no other thread has updated our node in between. We repeat this till we successful update our node through the for loop in line 4.

The casValue method uses the sun.misc.Unsafe method internally to call the compare and swap operation:

boolean casValue(Object cmp, Object val) {
      return UNSAFE.compareAndSwapObject(this, valueOffset, cmp, val);
}

The sun.misc.Unsafe compareAndSwapObject method is a very low-level method. It takes as parameters the object which contains the variable to update, a memory offset of the variable to update and the expected value cmp and the new value val.

This is a very performant way to make sure that we never overwrite a value written by another thread. But it leads to unnecessary object creation when the update fails. The next technique creates even more garbage, immutable classes.

Immutable Data: Even more garbage creation

Immutable classes do not change their value in the middle of an operation without using synchronized blocks. By avoiding synchronization blocks you avoid deadlocks. And since you are always working with an unchangeable consistent state you avoid race conditions. But modifying immutable state consists of copying the immutable object. So each modification leads to the creation of garbage. Read here more about how to use immutable classes for concurrent programming.

It is your choice

Immutable data is easy to implement and use. But each modification requires a copy of the data structure. So you only use it when your data structure is small enough or the reads vastly outnumber the writes. In all other cases, you need to choose between compare and swap operations and locks. I typically prefer locks since they are easier to use.

What is next?

All maps we looked at are synchronous. When we call a compute method the method is executed in the calling thread. Therefore we need to coordinate the different threads calling the method in parallel. Each of those mechanisms like locks or compare and swap operations have it specific tradeoffs.

In the next blog post, we will look at the tradeoffs for a map with can be updated asynchronously. I would be glad to hear from you about which synchronization mechanism you use in your application.

How to implement values with immutable classes

Sun, 04 Mar 2018 23:00:00 GMT

Classes representing values should have the same properties than primitive types. Primitive types represent basic values like int or char in Java. Primitive types have no identity and they are immutable.

In the following, we will see that those two properties are also useful for classes representing properties.

No identity

Two values are equal if they have the same external visible state: For example, if you have two variables a and b with the value 5, they are equal:

 int a = 5;
 int b = 5;
 a == b // true

Immutable

Values are immutable. If you a modify a value it becomes a new value. In the following example, we modify the variable a which leads to a new value b. A and b are nor equal.

 int a = 5
 int b  = a + 2;
 a == b // false

A value class

Now let us look at a class representing a value with those two properties, the java.time.Instant class. This class represents an instant in time. Let us first look at the field declaration of this class:

package java.time;

public final class Instant
{
    private final long seconds;
    private final int nanos;    
    // static fields and methods omitted 
}

Declaring the field final makes this class immutable. Declaring a field as final as the two fields in the example, line 5 and 6, lets the compiler check that the fields are not modified after the constructor of the class was called. Note that final is a field modifier. It makes the field itself immutable not the object the field references to. So the type of the final field must also be immutable or a primitive type like in the example.

Next, let us look at the equal method to see how equality is implemented for this class:

public boolean equals(Object otherInstant) {
        if (this == otherInstant) {
            return true;
        }
        if (otherInstant instanceof Instant) {
            Instant other = (Instant) otherInstant;
            return this.seconds == other.seconds &&
                   this.nanos == other.nanos;
        }
        return false;
    }

As we see two Instant objects are equal if their external visible state is equal.

Immutability but an identity

Now let us look at a class with is immutable but uses its identity for equality, the Object class. The Object class is useful when you only need an identity but no state. Like in the following example, from the JDK 9 java.util.concurrent.CopyOnWriteArrayList, where we need the identity as a monitor for synchronization:

public class CopyOnWriteArrayList<E>
{
  final transient Object lock = new Object()
   public boolean add(E e) {
        synchronized (lock) {
            // other statements omitted
        }
    }
}

To represent a value a class with identity is not useful.

No identity but mutable

The following shows a mutable value class, the java.util.Date class:

public class Date
{
     private transient BaseCalendar.Date cdate;
	 private transient long fastTime;
	 public void setTime(long time) {
        fastTime = time;
        cdate = null;
    }
    public boolean equals(Object obj) {
        return obj instanceof Date && getTime() == ((Date) obj).getTime();
    } 
	public long getTime() {
        return getTimeImpl();
    }
    private final long getTimeImpl() {
        if (cdate != null && !cdate.isNormalized()) {
            normalize();
        }
        return fastTime;
    } 
    // static fields and other methods omitted 
}

The two fields cdate and fastTime are not final and can be changed by the setTime method making the class mutable. The equals method checks the externally visible state for equality. While it is possible to implement values with mutable classes, immutable classes are easier to use.

Advantages of immutable value classes

Immutable classes cannot change their state. This property is useful in specific scenarios in single-threaded programs, for example when you use them as keys in hash maps. And it makes writing multi-threaded programs much easier.

Usage of identity

While it is still possible to access the identity of a value object it is probably an error. For example, the use of == instead of equals is probably incorrect:

Integer a = new Integer(5);
Integer b = new Integer(5);
a.equals(b) // true
a == b      // false

You probably expect true when you compare two Integer of value 5, so the use of == is incorrect. The following methods are using the identity of the object and should be avoided when using value classes:

synchronized statement
System.identityHashCode
Object.notify and wait

The future: JEP 169, Value Objects

Implementing values using classes requires more memory than their primitive counterparts. A solution to this problem is implemented by The Java enhancement proposal 169, Value Objects,. It will allow you to create value classes with similar memory characteristics as primitive types.
The idea is to implement a new operator lockPermanently which converts an object into a new state with memory consumption similar to a primitive type. Using an operation which requires the identity of the value object like == or synchronized on such a locked object will be forbidden.

Conclusion and what is next?

Primitive types represent basic values. Primitive types are immutable and have no identity. We have seen how to implement classes with the same properties. The usage of the identity of a value class, while still possible, is probably an error. Making value classes immutable makes them easier to use, especially for thread safe software.

In the next blog post, we will look at one type of value classes, messages, to write thread safe software. I would be glad to hear from you if you use value classes in your application.

How to use immutable classes for concurrent programming in Java

Mon, 12 Feb 2018 23:00:00 GMT

Immutable classes make concurrent programming easier. Immutable classes make sure that the values are not changed in the middle of an operation without using synchronized blocks. By avoiding synchronization blocks you avoid deadlocks. And since you are always working with an unchangeable consistent state you avoid race conditions. In the following, we will look at how to use immutable classes for concurrent programming in Java.

How to write an immutable class?

As an example for an immutable class we implement a class storing the login credentials of a user:

public class ImmutableLogin {
	private final String userName;
	private final String password;
	public ImmutableLogin(String userName, String password) {
		super();
		this.userName = userName;
		this.password = password;
	}
	public String getUserName() {
		return userName;
	}
	public String getPassword() {
		return password;
	}
	
}

When you implement an immutable class you declare its fields as final as the two fields, line 2 and 3, in the example. This lets the compiler check that the fields are not modified after the constructor of the class was called. Note that final is a field modifier. It makes the field itself immutable not the object the field references to. So the type of the final field must also be immutable like in the example the class String.

The following shows a mutable class storing the same information:

public class MutableLogin {
	private String userName;
	private String password;
	public String getUserName() {
		return userName;
	}
	public void setUserName(String userName) {
		this.userName = userName;
	}
	public String getPassword() {
		return password;
	}
	public void setPassword(String password) {
		this.password = password;
	}
	  
}

How to use immutable classes?

First, let us see how we can share the immutable login data between multiple threads using a java.util.concurrent.ConcurrentHashMap. To change the login data we use the method compute:

private final ConcurrentHashMap<String,ImmutableLogin> mapImmutableLogin = new ConcurrentHashMap<String,ImmutableLogin>();	
public  void changeImmutableLogin()
{
	mapImmutableLogin.compute("loginA", (String key , ImmutableLogin login ) -> {
		return new ImmutableLogin(login.getUserName() , "newPassword");
	});
}

As you probably expected, we need to copy the ImmutableLogin to change it. The compute method uses a synchronized block internally to make sure that a value for a given key is not changed in parallel by multiple threads.

The following shows an example for reading the changed login data from the ConcurrentHashMap using get:

	public void readImmutableLogin()
	{
		ImmutableLogin immutableLogin = mapImmutableLogin.get("loginA");
		// read from the object immutableLogin
	}

Reading the data can directly operate on the ImmutableLogin class without synchronization block.

Now we look how we can achieve the same using the mutable login class.Again changing the password in the ConcurrentHashMap:

private final ConcurrentHashMap<String,MutableLogin> mapMutableLogin = new ConcurrentHashMap<String,MutableLogin>();
public void changeMutableLogin()
{
	MutableLogin mutableLogin = mapMutableLogin.get("loginA");
	synchronized(mutableLogin)
	{
		mutableLogin.setPassword("newPassword");
	}
}

and reading the data:

	public void readMutableLogin()
	{
		MutableLogin mutableLogin = mapMutableLogin.get("loginA");
		synchronized(mutableLogin)
		{
		   // read from the object mutableLogin 
		}
	}

To make sure that the MutableLogin object does not get changed while reading we need to synchronize the reading and writing using the same monitor. In the examples, we use the MutableLogin object as the monitor. To avoid a nested synchronized block we use the get method for modifying the MutableLogin instead of the compute method.

Separating identity and state

In the above examples, the keys of the ConcurrentHashMap defined the identity of the different logins and the values the current state of the login. In the case of the MutableLogin class, each key has exactly one MutableLogin object. In the case of the ImmutableLogin, each key has different ImmutableLogin objects at different points in time. Mutable classes represent both identity and state while immutable classes represent only the state and we need a separate class to represent the identity. So using Immutable classes leads to a separation of identity and state.

The following shows how to encode the identity of the login in the class Login and the state in the class ImmutableLogin:

public class Login {
	private volatile ImmutableLogin state;
	public Login(ImmutableLogin state) {
		super();
		this.state = state;
	}
	public synchronized void change(Function<ImmutableLogin,ImmutableLogin> update )
	{
		state = update.apply(state);
	}
	public ImmutableLogin getState() {
		return state;
	}
}

The change function uses a synchronized block to make sure that only one thread is updating the Login object at a given time. The field state must be declared as volatile to make sure that you read always the latest written value.

When to use immutable classes?

Modifying immutable state consists of a copying the immutable object. This allows us to read the state without the need of synchronization blocks. So you should use immutable classes when it is feasible to copy the object for every modification and you need read-only access to the data.

What is next?

So far all examples we have looked at used synchronized blocks for changing the state of the class holding the immutable object. In the next blog post, we will see how to implement a concurrent hash map with immutable classes for the hash array elements using compare-and-swap operations instead.

In the meantime, if you want to test whether your application is thread-safe, try vmlens for free. I would be glad to hear from you about how you use immutable classes.

A specialized high performant concurrent queue

Tue, 30 Jan 2018 23:00:00 GMT

Using a specialized algorithm it is possible to achieve up to four times better performance than java.util.concurrent.BlockingQueue for multiple writing and a single reading thread. Such a blocking queue supporting multiple writers but only one reader is useful if you have to access a single resource from multiple threads. Instead of writing directly to the resource you to put the data into a queue and let a single thread write the data asynchronously to the resource.

In vmlens, this queue is used to write trace events to the file system.

The Queue

The main idea is to use not one single queue but many. We use one queue per writing thread stored in a thread local field. A thread local queue is then a simple linked list using a volatile field for the next element and final for the value. Reading iterates over all writing threads to collect the written data. The algorithm is explained in more detail here.

You can download the source code for the queue from GitHub here.

The Benchmark

Here is the source code of the benchmark:

@State(Scope.Group)
public class BlockingQueueBenchmark {
	private static final int WRITING_THREAD_COUNT = 5;
	private static final int VMLENS_QUEUE_LENGTH = 1000;
	private static final int JDK_QUEUE_LENGTH    = 4000;
	
	EventBusImpl eventBus;
	Consumer consumer;
	ProzessAllListsRunnable prozess;
	TLongObjectHashMap<ProzessOneList> threadId2ProzessOneRing;
	LinkedBlockingQueue jdkQueue;
	private long jdkCount = 0;
	private long vmlensCount = 0;
	
	@Setup()
	public void setup() {
		eventBus = new EventBusImpl(VMLENS_QUEUE_LENGTH);
		consumer = eventBus.newConsumer();
		prozess = new ProzessAllListsRunnable(new EventSink() {
			public void execute(Object event) {
				vmlensCount++;
			}
			public void close() {
			}
			public void onWait() {
			}
		}, eventBus);
		threadId2ProzessOneRing = new TLongObjectHashMap<ProzessOneList>();
		jdkQueue = new LinkedBlockingQueue(JDK_QUEUE_LENGTH);
	}
	@Benchmark
	@Group("vmlens")
	@GroupThreads(WRITING_THREAD_COUNT)
	public void offerVMLens() {
		consumer.accept("event");
	}
	@Benchmark
	@Group("vmlens")
	@GroupThreads(1)
	public void takeVMLens() {
		prozess.oneIteration(threadId2ProzessOneRing);
	}
	@Benchmark
	@Group("jdk")
	@GroupThreads(WRITING_THREAD_COUNT)
	public void offerJDK() {
		try {
			jdkQueue.put("event");
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
	@Benchmark
	@Group("jdk")
	@GroupThreads(1)
	public void takeJDK() {
		try {
			jdkQueue.poll(100, TimeUnit.SECONDS);
			jdkCount++;
		} catch (Exception e) {
			e.printStackTrace();
		}

	}
	 @TearDown(Level.Trial)
	 public void printCounts() {
        System.out.println("jdkCount " + jdkCount);
        System.out.println("vmlensCount " + vmlensCount);
	  }
}

The benchmark uses jmh, an OpenJDK framework for micro-benchmarks. The benchmark consists of publishing events to the queues, line 35 for the vmlens queue and line 48 for the JDK queue using WRITING_THREAD_COUNT threads. The events are read in line 41 for vmlens and line 58 for JDK using one thread. The vmlens queue reads all currently available events in one call and calls the callback function execute, line 20, for each event.

You can download the source code of the benchmark from GitHub here.

Results

The benchmark was run on an Intel i5 4 core CPU. All tests were run with the following jmh parameters: -wi 10 -i 50 -f 5 -tu ms. The following graph shows the throughput in operation per milliseconds for one to 8 writing threads:

Conclusion and next steps

As we could see it is possible to achieve better throughput using a specialized queue than the generic java.util.concurrent.BlockingQueue for a blocking multiple writer single reader queue. When we use this queue for writing to the file system the limiting factor is the reading thread. This thread not only needs to collect all the data but also write it to the file system. So to improve the performance further I plan to collect the data and probably zip it in the blocked writing threads.

I would be glad to hear from you about the techniques you use to write to the file system.

5 Tips for Performant, Thread-Safe Java From ConcurrentHashMap

Thu, 18 Jan 2018 23:00:00 GMT

java.util.concurrent.ConcurrentHashMap is a highly optimized concurrent hash map implementation. Here are 5 tips we can learn from its implementation:

Disclaimer: The techniques described here increase the complexity of the code, making it harder to reason about it and test. So please only apply have them when you have seen through profiling that your code is on the hot path.

Use bit operations instead of expensive mathematical operations

An example of this technique is the use of the and operation instead modulo in ConcurrentHashMap. ConcurrentHashMap stores all values in an array. To calculate the position where an entry should be stored we need to calculate the hash code modulo the array size. When the array size is a power of two as in the case of the ConcurrentHashmap we can write hash code and ( array size - 1 ) instead of hash code modulo size. The following shows this for a size of 8 and a hash code of 23:

		int sizeMinusOne               = 0b00000111;        //  7
		int hashCode                   = 0b00010111;        // 23
		int positionInArray  = hashCode & sizeMinusOne;     //  7

This is for example done in the get method:

    public V get(Object key) {
        Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
        int h = spread(key.hashCode());
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (e = tabAt(tab, (n - 1) & h)) != null) {
           // other code omitted
        }
        return null;
    }

In line 5 the array position is calculated using (n - 1) & h. The and operation is about 20 percent faster than the modulo operation. So if you have an expensive mathematical operation inside your critical path it is a good idea to see if you can replace it with a bit operation.

Use fine-grained locks

If multiple threads are accessing the same lock it becomes a bottleneck. A solution to this problem is to use fine-grained locking. As can be seen in the putVal method ConcurrentHashMap uses the array elements as monitors for synchronized locks:

    final V putVal(K key, V value, boolean onlyIfAbsent) {
        if (key == null || value == null) throw new NullPointerException();
        int hash = spread(key.hashCode());
        int binCount = 0;
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
                if (casTabAt(tab, i, null,
                             new Node<K,V>(hash, key, value, null)))
                    break;                   // no lock when adding to empty bin
            }
            else if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            else {
                V oldVal = null;
                synchronized (f) {
                // other code omitted
            }
        }
        addCount(1L, binCount);
        return null;
    }

If we need to change a non-empty array element it is locked using a synchronized block with the array element as the monitor. This is done with the synchronized block in line 18 on the array element received in line 9. If the hash function used disperses the elements properly among the array elements, threads access different array elements and therefore synchronize on different monitors.

So to improve the scalability use the smallest independent value as the lock. Using this technique leads to many lock objects. Since reentrant locks consume more memory than the usage of synchronized blocks, this technique should be used with synchronized blocks instead of reentrant locks.

Use volatile fields for reading and locks for writing

As we have seen writing in the putVal method uses a lock on the array element. As can be seen in the get method reading do not uses locks but only consists of reading from a volatile field:

    public V get(Object key) {
        Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
        int h = spread(key.hashCode());
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (e = tabAt(tab, (n - 1) & h)) != null) {
            if ((eh = e.hash) == h) {
                if ((ek = e.key) == key || (ek != null && key.equals(ek)))
                    return e.val;
            }
            else if (eh < 0)
                return (p = e.find(h, key)) != null ? p.val : null;
            while ((e = e.next) != null) {
                if (e.hash == h &&
                    ((ek = e.key) == key || (ek != null && key.equals(ek))))
                    return e.val;
            }
        }
        return null;
    }

The array element is received in line 5 using the method tabAt. The volatile read is done in the method tabAt as can be seen here, using the method getObjectVolatile from sun.misc.Unsafe :

   static final <K,V> Node<K,V> tabAt(Node<K,V>[] tab, int i) {
        return (Node<K,V>)U.getObjectVolatile(tab, ((long)i << ASHIFT) + ABASE);
    }

When you have a class with many reads and some writes use this technique. Reading simply consists reading from a volatile field while writing uses a lock. In ConcurrentHashMap the values of the Node are directly modified. This makes it necessary to declare the fields of this class also as volatile:

  static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        volatile V val;
        volatile Node<K,V> next;
 // methods omitted
    }

This has the disadvantage that the values can change while you read them. An easier variation of this technique is used in java.util.concurrent.CopyOnWriteArrayList which include a copy of the data during the write.

Lazily create immutable data

ConcurrentHashMap has multiple functions to create a view on this collection, like the following

private transient KeySetView<K,V> keySet;

public KeySetView<K,V> keySet() {
     KeySetView<K,V> ks;
     return (ks = keySet) != null ? ks : (keySet = new KeySetView<K,V>(this, null));
}

public static class KeySetView<K,V> extends CollectionView<K,V,K>
    implements Set<K>, java.io.Serializable {
     private static final long serialVersionUID = 7249069246763182397L;
     private final V value;
     KeySetView(ConcurrentHashMap<K,V> map, V value) {  // non-public
          super(map);
          this.value = value;
     }
   // methods omitted
}

As we can see in line 1 the field keySet is not volatile and the method keySet(), line 3 till 6 is not synchronized. This leads to two problems, first the KeySetView object is not correctly published and second it might lead to the creation of multiple KeySetView objects. Through the usage of the final field in line 11, we make sure that the object gets correctly initialized even when they are incorrectly published. And the creation of multiple objects does not lead to inconsistent state since they are immutable.

Use java.util.concurrent.atomic.LongAdder for counting

Every writing and deleting threads needs to update the size counter of the collection. So the modification of the size becomes a bottleneck even if we use the atomic methods from java.util.concurrent.atomic.AtomicLong. To solve this problem ConcurrentHashMap uses the class CounterCell:

    /**
     * A padded cell for distributing counts.  Adapted from LongAdder
     * and Striped64.  See their internal docs for explanation.
     */
    @sun.misc.Contended static final class CounterCell {
        volatile long value;
        CounterCell(long x) { value = x; }
    }

To implement this mechanism yourself is probably not a good idea since it is rather complicated. Luckily the JDK provides an implementation of the technique used, the class java.util.concurrent.atomic.LongAdder. The idea behind the counter cells is described in the java doc of java.util.concurrent.atomic.LongAdder:

One or more variables that together maintain an initially zero long sum. When updates (method add(long)) are contended across threads, the set of variables may grow dynamically to reduce contention.

What is next?

As we could see ConcurrentHashMap is full of ticks to write high-performance yet still thread-safe java. The next time we will look at java.util.concurrent.ConcurrentSkipListMap. I would be glad to hear from you about the techniques you use to achieve high-performance thread-safe classes.

7 Techniques for thread-safe classes

Wed, 10 Jan 2018 23:00:00 GMT

7 Techniques for thread-safe classes

Almost every Java application uses threads. A web server like Tomcat process each request in a separate worker thread, fat clients process long-running requests in dedicated worker threads, and even batch processes use the java.util.concurrent.ForkJoinPool to improve performance.

It is, therefore, necessary to write classes in a thread-safe way, which can be achieved by one of the following techniques:

No state

When multiple threads access the same instance or static variable you must somehow coordinate the access to this variable. The easiest way to do this is simply by avoiding instance or static variables. Methods in classes without instance variables do only use local variables and method arguments. The following example shows such a method which is part of the class java.lang.Math:

public static int subtractExact(int x, int y) {
  int r = x - y;
  if (((x ^ y) & (x ^ r)) < 0) {
      throw new ArithmeticException("integer overflow");
  }
  return r;
}

No shared state

If you can not avoid state do not share the state. The state should only be owned by a single thread. An example of this technique is the event processing thread of the SWT or Swing graphical user interface frameworks.

You can achieve thread-local instance variables by extending the thread class and adding an instance variable. In the following example, the field pool and workQueue are local to a single worker thread.

package java.util.concurrent;
public class ForkJoinWorkerThread extends Thread {
    final ForkJoinPool pool;                
    final ForkJoinPool.WorkQueue workQueue; 
}

The other way to achieve thread-local variables is to use the class java.lang.ThreadLocal for the fields you want to make thread-local. Here is an example of an instance variable using java.lang.ThreadLocal:

public class CallbackState {
public static final ThreadLocal<CallbackStatePerThread> callbackStatePerThread = 
    new ThreadLocal<CallbackStatePerThread>()
   {
      @Override
      	protected CallbackStatePerThread  initialValue()
      { 
   	   return getOrCreateCallbackStatePerThread();
      }
   };
}

You wrap the type of your instance variable inside the java.lang.ThreadLocal. You can provide an initial value for your java.lang.ThreadLocal through the method initialValue().

The following shows how to use the instance variable:

CallbackStatePerThread callbackStatePerThread = CallbackState.callbackStatePerThread.get();

Through calling the method get() you receive the object associated with the current thread.

Since in application servers a pool of many threads is used to process requests, java.lang.ThreadLocal leads to a high memory consumption in this environment. java.lang.ThreadLocal is therefore not recommended for classes executed by the request processing threads of an application server.

Message passing

If you do not share state using the above techniques you need a way for the threads to communicate. A technique to do this is by passing messages between threads. You can implement message passing using a concurrent queue from the package java.util.concurrent. Or, better yet, use a framework like Akka, a framework for actor style concurrency. The following example shows how to send a message with Akka:

target.tell(message, getSelf());

and receive a message:

@Override
public Receive createReceive() {
   return receiveBuilder()
      .match(String.class, s -> System.out.println(s.toLowerCase()))
      .build();
}

Immutable state

To avoid the problem that the sending thread changes the message during the message is read by another thread, messages should be immutable. The Akka framework, therefore, has the convention that all messages have to be immutable

When you implement an immutable class you should declare its fields as final. This not only makes sure that the compiler can check that the fields are in fact immutable but also makes them correctly initialized even when they are incorrect published. Here is an example of a final instance variable:

public class ExampleFinalField
{
  private final int finalField;
  public ExampleFinalField(int value)
  {
   this.finalField = value;
  }
}

final is a field modifier. It makes the field immutable not the object the field references to. So the type of the final field should be a primitive type like in the example or also an immutable class.

Use the data structures from java.util.concurrent

Message passing uses concurrent queues for the communication between threads. Concurrent Queues are one of the data structures provided in the package java.util.concurrent. This package provides classes for concurrent maps, queues, dequeues, sets and lists. Those data structures are highly optimized and tested for thread safety.

Synchronized blocks

If you can not use one of the above techniques use synchronized locks. By putting a block inside a synchronized block you make sure that only one thread at a time can execute this section.

synchronized(lock)
{
 i++;
}

Beware that when you use multiple nested synchronize blocks you risk deadlocks. A deadlock happens when two threads are trying to acquire a lock held by the other thread.

Volatile fields

Normal, nonvolatile fields, can be cached in registers or caches. Through the declaration of a variable as volatile, you tell the JVM and the compiler to always return the latest written value. This not only applies to the variable itself but to all values written by the thread which has written to the volatile field. The following shows an example of a volatile instance variable:

public class ExampleVolatileField
{
  private volatile int  volatileField;
}

You can use volatile fields if the writes do not depend on the current value. Or if you can make sure that only one thread at a time can update the field.

volatile is a field modifier. It makes the field itself volatile not the object it references. In case of an array you need to use java.util.concurrent.atomic.AtomicReferenceArray to access the array elements in a volatile way. See the race condition in org.springframework.util.ConcurrentReferenceHashMap as an example of this error.

Even more techniques

I excluded the following more advanced techniques from this list: Atomic updates, a technique in which you call atomic instructions like compare and set provided by the CPU, java.util.concurrent.locks.ReentrantLock, a lock implementation which provides more flexibility than synchronized blocks, java.util.concurrent.locks.ReentrantReadWriteLock, a lock implementation in which reads do not block reads and java.util.concurrent.locks.StampedLock a nonreeantrant Read-Write lock with the possibility to optimistically read values.

Conclusion

The best way to achieve thread safety is to avoid shared state. For the state, you need to share you can either use message parsing together with immutable classes or the concurrent data structures together with synchronized blocks and volatile fields.

I would be glad to hear from you about the techniques you use to achieve thread-safe classes.

3 Tips for volatile fields in java

Mon, 11 Dec 2017 23:00:00 GMT

3 Tips for volatile fields in java

Volatile fields are one of built-in mechanism to write multi-threaded java.

Volatile variables are not cached in registers or in caches where they are hidden from other processors, so a read of a volatile variable always returns the most recent write by any thread. ... The visibility effects of volatile variables extend beyond the value of the volatile variable itself. When thread A writes to a volatile variable and subsequently thread B reads that same variable, the values of all variables that were visible to A prior to writing to the variable become visible to B after reading the volatile variable.
— Java Concurrency in Practice - Brian Goetz, et al.

In the following I collected three tips on when and how to use volatile fields in practice:

1) Use volatile fields when writes do not depend on its current value.

An example is a flag to stop a worker thread from another thread:

public class WorkerThread extends Thread {
	private volatile boolean isRunning = true;
	@Override
	public void run() {
		while(isRunning)
		{
			// execute a task
		}
	}
    public void stopWorker()
    {
    	isRunning = false;
    }
}

The WorkerThread executes his tasks in a while loop, line 5. It checks the volatile field isRunning in each iteration and stops processing if the field is false. This allows other threads to stop the WorkerThread by calling the method stopWorker which sets the value of the field to false. Since a thread can call the method stopWorker even if the WorkerThread is already stopped, the write to the field can be executed independently of its current value.

By declaring the field volatile we make sure that the WorkerThread sees the update done in another Thread and does not run forever.

2) Use volatile fields for reading and locks for writing

The java.util.concurrent.CopyOnWriteArrayList get and set methods are an example of this tip:

public class CopyOnWriteArrayList<E>
    implements List<E>, RandomAccess, Cloneable, java.io.Serializable {
  private transient volatile Object[] array;  
  final Object[] getArray() {
        return array;
   }
   final void setArray(Object[] a) {
        array = a;
   }
   private E get(Object[] a, int index) {
        return (E) a[index];
   }
   public E get(int index) {
        return get(getArray(), index);
   }   
   public E set(int index, E element) {
        final ReentrantLock lock = this.lock;
        lock.lock();
        try {
            Object[] elements = getArray();
            E oldValue = get(elements, index);
            if (oldValue != element) {
                int len = elements.length;
                Object[] newElements = Arrays.copyOf(elements, len);
                newElements[index] = element;
                setArray(newElements);
            } else {
                // Not quite a no-op; ensures volatile write semantics
                setArray(elements);
            }
            return oldValue;
        } finally {
            lock.unlock();
        }
    }
    // Other fields and methods omitted
}

The get method, line 13, simply reads the volatile field array and returns the value at the position index. Writing uses a lock to ensure that only one thread can modify the array at a given time. Line 18 acquires the lock and line 33 releases the lock. Writing requires copying the array when an element is changed, line 24 so that the reading threads do not see an inconsistent state. The writing thread then updates the array, line 25 and set the new array to the volatile field array, line 26.

Using this tip only writes block writes. Compare this to using synchronized set and get methods where each operation block all other operations. Or java.util.concurrent.locks.ReentrantReadWriteLock where too many readers can lead to starvation of writers.

This is especially a problem for older JDKs. Here are the number from Akhil Mittal in a DZone comment to this article:

Java 6
RO= 4, RW= 4, fair=false 4,699,560   584,991
Java 9
RO= 4, RW= 4, fair=false 2,133,904   3,289,220

3) Use with JDK 9 VarHandle for atomic operations.

All modern CPU provide instructions to atomically compare and set or increment and get values. Those operations are used internally by the JVM to implement synchronized locks. Prior to JDK 1.9, they were available for Java applications only through classes in the java.util.concurrent.atomic package or by using the private java API sun.misc.Unsafe. With the new JDK 9 VarHandle, it is now possible to execute such operations directly on volatile fields. The following shows the AtomicBoolean compareAndSet method implemented using VarHandles:

public class AtomicBoolean implements java.io.Serializable {
  private static final VarHandle VALUE;
  static {
        try {
            MethodHandles.Lookup l = MethodHandles.lookup();
            VALUE = l.findVarHandle(AtomicBoolean.class, "value", int.class);
        } catch (ReflectiveOperationException e) {
            throw new Error(e);
        }
    }
    private volatile int value;
    public final boolean compareAndSet(boolean expectedValue, boolean newValue) {
        return VALUE.compareAndSet(this,
                                   (expectedValue ? 1 : 0),
                                   (newValue ? 1 : 0));
    }
    // Other fields and methods omitted
}

The VarHandle works similar to the class java.lang.reflect.Field. You need to lookup a VarHandle from the class which contains the field using the name of the field, line 6. To execute a compareAndSet operation on the field we need to call the VarHandle with the object of the field, the expected and the new value, line 13.

Conclusion

You can use volatile fields if the writes do not depend on the current value as in tip one. Or if you can make sure that only one thread at a time can update the field as in tip two and three. I think that those three tips cover the more common ways to use volatile fields. Read here about a more exotic way to use volatile fields to implement a concurrent queue.

I would be glad to hear from you about other ways to use volatile fields to achieve thread-safety.

org.springframework.util.ConcurrentReferenceHashMap is not thread-safe

Mon, 27 Nov 2017 23:00:00 GMT

The class org.springframework.util.ConcurrentReferenceHashMap which is part of the spring framework is not thread-safe. We see this in the following junit test:

@RunWith(ConcurrentTestRunner.class)
public class TestSpringConcurrentReferenceHashMap {
	private ConcurrentReferenceHashMap map = new ConcurrentReferenceHashMap();
	private KeyAndValue[] keyAndValue;
	private static final int SIZE = 100;
	public TestSpringConcurrentReferenceHashMap()
	{
		keyAndValue = new KeyAndValue[SIZE];	
		for(int i = 0 ; i< SIZE ; i++)
		{
			keyAndValue[i] = new KeyAndValue(i);
		}
	}
	@Test
	public void test()
	{
		for(int i = 0 ; i < SIZE ; i++)
		{
			AtomicInteger result =  (AtomicInteger)map.get(i);	
			if(  result == null )
			{
				 result = (AtomicInteger) map.putIfAbsent(  keyAndValue[i].key  , keyAndValue[i].value  );
			}	
			if( result != null )
			{
				result.addAndGet(1);
			}
		}
	}
	class KeyAndValue
	{
		KeyAndValue(int key)
		{
			this.key = key;
			value= new AtomicInteger();
		}
		Integer key;
		AtomicInteger value;
	}
}

The ConcurrentTestRunner runs the test by 4 threads in parallel. vmlens, a tool to detect deadlocks and race conditions during the test run, shows the following race conditions:

For example, the first race is the access to a value of an array without correct synchronization. In the method getReference the value of the array is read in line 12

@Nullable
		public Reference<K, V> getReference(@Nullable Object key, int hash, Restructure restructure) {
			if (restructure == Restructure.WHEN_NECESSARY) {
				restructureIfNecessary(false);
			}
			if (this.count == 0) {
				return null;
			}
			// Use a local copy to protect against other threads writing
			Reference<K, V>[] references = this.references;
			int index = getIndex(hash, references);
			Reference<K, V> head = references[index];
			return findInChain(head, key, hash);
		}

And in the method add the value of the array is written in line 22.

		@Nullable
		public <T> T doTask(final int hash, final Object key, final Task<T> task) {
			boolean resize = task.hasOption(TaskOption.RESIZE);
			if (task.hasOption(TaskOption.RESTRUCTURE_BEFORE)) {
				restructureIfNecessary(resize);
			}
			if (task.hasOption(TaskOption.SKIP_IF_EMPTY) && this.count == 0) {
				return task.execute(null, null, null);
			}
			lock();
			try {
				final int index = getIndex(hash, this.references);
				final Reference<K, V> head = this.references[index];
				Reference<K, V> reference = findInChain(head, key, hash);
				Entry<K, V> entry = (reference != null ? reference.get() : null);
				Entries entries = new Entries() {
					@Override
					public void add(V value) {
						@SuppressWarnings("unchecked")
						Entry<K, V> newEntry = new Entry<>((K) key, value);
						Reference<K, V> newReference = Segment.this.referenceManager.createReference(newEntry, hash, head);
						Segment.this.references[index] = newReference;
						Segment.this.count++;
					}
				};
				return task.execute(reference, entry, entries);
			}
			finally {
				unlock();
				if (task.hasOption(TaskOption.RESTRUCTURE_AFTER)) {
					restructureIfNecessary(resize);
				}
			}
		}

The field references is declared as volatile. But the access to the value of the array references[index] is not volatile. To make the access to an array volatile we need to use java.util.concurrent.atomic.AtomicReferenceArray.

The consequence of the race conditions

To see the consequence of the race conditions we use jcstress, an open JDK code tool: The Java Concurrency Stress tests (jcstress) is an experimental harness and a suite of tests to aid the research in the correctness of concurrency support in the JVM, class libraries, and hardware.

I use the following test class:

@JCStressTest
@Outcome(id = "10", expect = Expect.ACCEPTABLE, desc = "Default outcome.")
@State
public class SpringConcurrentHashMap {
	private final ConcurrentReferenceHashMap map = new ConcurrentReferenceHashMap();
	private final KeyAndValue[] keyAndValue;
	private final int SIZE = 10;
	class KeyAndValue {
		KeyAndValue(int key) {
			this.key = key;
			value = new AtomicInteger();
		}
		final Integer key;
		final AtomicInteger value;
	}
	public SpringConcurrentHashMap() {
		keyAndValue = new KeyAndValue[SIZE];
		for (int i = 0; i < SIZE; i++) {
			keyAndValue[i] = new KeyAndValue(i);
		}
	}
	public void test() {
		for (int i = 0; i < SIZE; i++) {
			AtomicInteger result = (AtomicInteger) map.get(i);
			if (result == null) {
				result = (AtomicInteger) map.putIfAbsent(keyAndValue[i].key, keyAndValue[i].value);
			}
			if (result != null) {
				result.addAndGet(1);
			}
		}
	}
	@Actor
	public void actor1() {
		test();
	}
	@Actor
	public void actor2() {
		test();
	}
	@Arbiter
	public void arbiter(IntResult1 r) {
		int sum = 0;
		for (int i = 0; i < SIZE; i++) {
			sum += ((AtomicInteger) map.get(i)).get();
		}
		r.r1 = sum;
	}
}

The jcstress tool runs each actor in a separate thread, repeating the test multiple times. We have two actor method, each creating a new AtomicInteger in the hash map if no value is there or incrementing the AtomicInteger if a value existed. Therefore we expect the sum for all values in the hash map after the thread was run 10. If I run this test using the stress mode on my development machine, an intel i5 4 core CPU with oracle JDK 8, I see the following result:

   2 matching test results. 
  [FAILED] com.vmlens.stressTest.tests.SpringConcurrentHashMap
    (JVM args: [-client])
  Observed state   Occurrences   Expectation  Interpretation                                              
              10    92,597,143    ACCEPTABLE  Default outcome.                                            
               9             7     FORBIDDEN  No default case provided, assume FORBIDDEN                  
  [FAILED] com.vmlens.stressTest.tests.SpringConcurrentHashMap
    (JVM args: [-server])
  Observed state   Occurrences   Expectation  Interpretation                                              
              10    98,517,611    ACCEPTABLE  Default outcome.                                            
               9             9     FORBIDDEN  No default case provided, assume FORBIDDEN

In a small percentage of all cases, 7 cases for -client and 9 cases for -server the count is less than 10. So in some cases, a value in the map is overridden by a new value.

How to test if your spring application is thread-safe

Wed, 22 Nov 2017 23:00:00 GMT

In the following, I want to show you how to test if your spring application is thread-safe. As an example application, I use the spring petclinic project.

To detect concurrency bugs during our tests we use vmlens. vmlens traces the test execution and analyzes the trace afterward. It detects deadlocks and race conditions during the test run.

Testing

To test the spring project we parallelize the existing unit tests. The following shows test method runs the existing test shouldFindAllPetTypes of the ClinicServiceTests class in parallel:

public class ClinicServiceTests {
    @Test
    public void testMultithreaded() throws InterruptedException 
    {
    	TestUtil.runMultithreaded( new Runnable() {
			public void run() {
				try{
					shouldFindAllPetTypes();
				}
				catch(Exception e)
				{
					e.printStackTrace();
				}
			}
    	}
    	, 5);
    }

The TestUtil.runMultithreaded method runs the runnable with n threads in parallel:

	public static void runMultithreaded(Runnable  runnable, int threadCount) throws InterruptedException
	{
		List<Thread>  threadList = new LinkedList<Thread>();	
		for(int i = 0 ; i < threadCount; i++)
		{
			threadList.add(new Thread(runnable));
		}
		for( Thread t :  threadList)
		{
			t.start();
		}
		for( Thread t :  threadList)
		{
			t.join();
		}
	}

You can find the source of the class TestUtil at GitHub here. After running the junit test we see the following report in vmlens:

Analyzing

Let us look at one of the races found, the race at accessing the field org.hsqldb.HsqlNameManager.sysNumber.

The access to the field is locked in the methods org.hsqldb.StatementManager.compile, org.hsqldb.Session.execute and org.hsqldb.jdbc.JDBCConnection.prepareStatement but each thread uses a different monitor. Here is as an example the JDBCConnection prepareStatement method:

  public synchronized PreparedStatement prepareStatement(
            String sql) throws SQLException {
        checkClosed();
        try {
            return new JDBCPreparedStatement(this, sql,
                    JDBCResultSet.TYPE_FORWARD_ONLY,
                    JDBCResultSet.CONCUR_READ_ONLY, rsHoldability,
                    ResultConstants.RETURN_NO_GENERATED_KEYS, null, null);
        } catch (HsqlException e) {
            throw JDBCUtil.sqlException(e);
        }
    }

The problem is that the synchronization happens on the PreparedStatement and Session which is created for each thread, while HsqlNameManager is shared between all threads. That HsqlNameManager is shared between all threads can be seen in the method org.hsqldb.Table.createPrimaryKey:

 public void createPrimaryKey(HsqlName indexName, int[] columns,
                                 boolean columnsNotNull) {
        if (primaryKeyCols != null) {
            throw Error.runtimeError(ErrorCode.U_S0500, "Table");
        }
        if (columns == null) {
            columns = ValuePool.emptyIntArray;
        }
        for (int i = 0; i < columns.length; i++) {
            getColumn(columns[i]).setPrimaryKey(true);
        }
        primaryKeyCols = columns;
        setColumnStructures();
        primaryKeyTypes = new Type[primaryKeyCols.length];
        ArrayUtil.projectRow(colTypes, primaryKeyCols, primaryKeyTypes);
        primaryKeyColsSequence = new int[primaryKeyCols.length];
        ArrayUtil.fillSequence(primaryKeyColsSequence);
        HsqlName name = indexName;
        if (name == null) {
            name = database.nameManager.newAutoName("IDX", getSchemaName(),
                    getName(), SchemaObject.INDEX);
        }
        createPrimaryIndex(primaryKeyCols, primaryKeyTypes, name);
        setBestRowIdentifiers();
    }

Line 20 shows that the nameManager is part of the database object which is shared between all threads. The race happens in the method org.hsqldb.HsqlNameManager.newAutoName where the field sysNumber is read and written by the two threads:

    public HsqlName newAutoName(String prefix, String namepart,
                                HsqlName schema, HsqlName parent, int type) {
        StringBuffer sb = new StringBuffer();
        if (prefix != null) {
            if (prefix.length() != 0) {
                sb.append("SYS_");
                sb.append(prefix);
                sb.append('_');
                if (namepart != null) {
                    sb.append(namepart);
                    sb.append('_');
                }
                sb.append(++sysNumber);
            }
        } else {
            sb.append(namepart);
        }
        HsqlName name = new HsqlName(this, sb.toString(), type, false);
        name.schema = schema;
        name.parent = parent;
        return name;
    }

In line 13 the field sysNumber is incremented by ++. The operation ++ is not one atomic operation but 6 byte code operations:

ALOAD 0: this
DUP
GETFIELD Counter.count : int
ICONST_1
IADD
PUTFIELD Counter.count : int

If two threads execute this in parallel, this might lead to a scenario where both threads read the same value and then both increment the value, leading to duplicate values. And since the field sysNumber is used to generate the primary key the race condition leads to duplicated primary keys.

An algorithm for a concurrent queue using only thread local and volatile fields

Wed, 01 Nov 2017 23:00:00 GMT

In the following, I want to show you an algorithm for a concurrent queue which supports one reading and multiple writing threads. The writing threads need only a read from a thread local field and a write to a volatile field to publish an event to the queue. Writing does not need "compare and swap" operations like the standard JDK concurrent queues, leading to an easier and potentially faster algorithm. A usage example is a background thread writing log events asynchronously to a file.

Writing

The main idea is to use not one single queue but many. We use one queue per writing thread stored in a thread local field. Then the queue is a simple linked list using a volatile field for the next element and final for the value:

public class LinkedList<T> implements Consumer<T>  {
	volatile ListElementPointer<T> lastRead;
	private LinkedListElement<T> lastWritten;
	// Constructor omitted 
	@Override
	public void accept(T event) {	
	// Queue stopped logic omitted
		LinkedListElement<T> linkedListElement= new LinkedListElement<T>(event);
		if( lastWritten == null )
		{
			lastWritten = linkedListElement;
			lastRead= new ListElementPointer<T>(lastWritten);
		}
		else
		{
			lastWritten.next = linkedListElement;
			lastWritten = lastWritten.next;
		}
	}
}
class LinkedListElement<T> {
    volatile LinkedListElement<T> next;
    final T event;	
    // Constructor omitted 
}

Writing an element to the queue is implemented in the accept method, line 6. When it is the first element written, lastWritten and lastRead will be set to the new LinkedListElement line 10 till 13. Otherwise, the list is extended by the new LinkedListElement and lastWritten is moved to the end of the list, line 16 and 17.

And here is the class storing each queue, called Consumer, in a thread local field:

public class ThreadLocalConsumer<T> implements Consumer<T> {
	private final EventBus<T> theBus;
    private ThreadLocal<Consumer<T>> threadLocal = new 	ThreadLocal<Consumer<T>>();
	// Constructor omitted 
	public void accept(T event) {	
		Consumer<T>  consumer = threadLocal.get();
		if( consumer == null )
		{
			consumer = theBus.newConsumerForThreadLocalStorage(Thread.currentThread());
			threadLocal.set( consumer );
		}
		consumer.accept(event);
	}
}

Reading

When reading the elements we must remember the last elements we read. We do this in the field lastRead in the LinkedList. The following shows the method used for reading elements from one queue:

public void prozessWithoutReadCount(EventSink<T> eventSink) {
		if(list.lastRead  != null  )
		{
			// First element read			
			if(  ! list.lastRead.isRead )
			{
				eventSink.execute( list.lastRead.element.event  );
				list.lastRead.isRead = true;
			}
			LinkedListElement<T> current = list.lastRead.element;
			while(  current.next != null )
			{
				eventSink.execute( current.event  );
				current = current.next;
			}
			list.lastRead.element = current;	
		}	
	}

And here is the class ListElementPointer used to store the last read element:

public class ListElementPointer<T> {
	LinkedListElement<T> element;
	boolean isRead;
    // Constructor omitted 
}

The field lastRead is initialized by the writing thread and afterward only modified by the reading thread.

I skip the logic for creating new LinkedLists and reading multiple LinkedLists in the reading thread. You can see the complete source code here.

Usage

The queue is open source and the source code is available on GitHub here. We use this queue in to write events asynchronously to a file for later analysis.

How to test if your tomcat web application is thread safe

Tue, 24 Oct 2017 22:00:00 GMT

In the following, I want to show you how to test if your tomcat web application is thread-safe. As an example application, I use Jenkins deployed on an apache tomcat 9.0.

To detect concurrency bugs during our tests we use vmlens. vmlens traces the test execution and analyzes the trace afterward. It detects deadlocks and race conditions during the test run.

Testing

To enable vmlens we add it as java agent to the CATALINA_OPTS in catalina.sh on Linux or catalina.bat on windows:

CATALINA_OPTS="-javaagent:<Path to agent> -Xmx8g"

We also set a high enough heap size. After running Jenkins and executing some build jobs we see the following report in vmlens:

Analyzing

Let us look at one of the races found, the race at accessing the field hudson.UDPBroadcastThread.shutdown.

The thread "Jenkins UDP 33848 monitoring thread" reads the field in the race and the thread "localhost-startStop-2" writes it. Let us look at the class and the reading method run() and writing method shutdown().

public class UDPBroadcastThread extends Thread {
    private boolean shutdown;
public void run() {
        try {
            mcs.joinGroup(MULTICAST);
            ready.signal();
            while(true) {
                byte[] buf = new byte[2048];
                DatagramPacket p = new DatagramPacket(buf,buf.length);
                mcs.receive(p);
                SocketAddress sender = p.getSocketAddress();
                // prepare a response
                TcpSlaveAgentListener tal = jenkins.getTcpSlaveAgentListener();
                StringBuilder rsp = new StringBuilder("");
                tag(rsp,"version", Jenkins.VERSION);
                tag(rsp,"url", jenkins.getRootUrl());
                tag(rsp,"server-id", jenkins.getLegacyInstanceId());
                tag(rsp,"slave-port",tal==null?null:tal.getPort());
                for (UDPBroadcastFragment f : UDPBroadcastFragment.all())
                    f.buildFragment(rsp,sender);
                rsp.append("");
                byte[] response = rsp.toString().getBytes("UTF-8");
                mcs.send(new DatagramPacket(response,response.length,sender));
            }
        } catch (ClosedByInterruptException e) {
            // shut down
        } catch (SocketException e) {
            if (shutdown) { // forcibly closed
                return;
            }            // if we failed to listen to UDP, just silently abandon it, as a stack trace
            // makes people unnecessarily concerned, for a feature that currently does no good.
            LOGGER.log(Level.INFO, "Cannot listen to UDP port {0}, skipping: {1}", new Object[] {PORT, e});
            LOGGER.log(Level.FINE, null, e);
        } catch (IOException e) {
            if (shutdown)   return; // forcibly closed
            LOGGER.log(Level.WARNING, "UDP handling problem",e);
            udpHandlingProblem = true;
        }
    }
      public void shutdown() {
        shutdown = true;
        mcs.close();
        interrupt();
    }
}

The field shutdown is a nonvolatile field. It is read in line 28 and 35 in the method run and written in line 41 in the method shutdown

Since the field hudson.UDPBroadcastThread.shutdown is not volatile, it is not guaranteed that the "Jenkins UDP 33848 monitoring thread" sees the values set by the "localhost-startStop-2" thread.

The "Jenkins UDP 33848 monitoring thread" might for example run on the first core while "localhost-startStop-2" on the second core of a multi-core CPU. The write to a normal field does not invalidate the cache of the cores. Therefore the "Jenkins UDP 33848 monitoring thread" still sees the cached old value.

A new way to detect deadlocks during tests

Wed, 18 Oct 2017 22:00:00 GMT

In the following, I want to show you a new way to detect deadlocks during tests. A deadlock happens when two threads are trying to acquire a lock held by the other thread. To detect a deadlock you need to reach the exact time point when both threads are waiting for the other lock. Let us look at an example:

import java.util.concurrent.ConcurrentHashMap;
import java.util.function.BiFunction;
import org.junit.Test;
import org.junit.runner.RunWith;
import com.anarsoft.vmlens.concurrent.junit.ConcurrentTestRunner;
@RunWith(ConcurrentTestRunner.class)
public class TestConcurrentHashMapCompute {
	private final ConcurrentHashMap<Integer,Integer> map = 
	    new ConcurrentHashMap<Integer,Integer>();
	public TestConcurrentHashMapCompute()
	{
		map.put(1, 1);
		map.put(2, 2);	
	}
	@Test
	public void update12()
	{
		map.compute( 1 ,   			
				new BiFunction<Integer,Integer,Integer>()
				{
					public Integer apply(Integer k, Integer v) {		
						map.put( 2 , 1);
						return v;
					}
				}
				);
	}
	@Test
	public void update21()
	{
              map.compute( 2 ,   			
				new BiFunction<Integer,Integer,Integer>()
				{
					public Integer apply(Integer k, Integer v) {		
						map.put( 1 , 1);
						return v;
					}
				}
				);
	}
}

The ConcurrentTestRunner runs the JUnit test methods by 4 threads in parallel. The test succeeds at least almost all the time.

One way to see the deadlock hidden in this test is to execute the test by more threads multiple times by a machine with many cores. The other way is to trace the test execution and to analyze the lock order afterward.

We can do this by adding the vmlens agent to the VM arguments of our test. After analyzing the test run vmlens reports the following deadlock:

- deadlock: Monitor@java.util.concurrent.ConcurrentHashMap.putVal()<->Monitor@java.util.concurrent.ConcurrentHashMap.putVal()
  parent2Child:
    thread: Thread-4
    stack:
      - java.util.concurrent.ConcurrentHashMap.putVal <<Monitor@java.util.concurrent.ConcurrentHashMap.putVal()>>
      - java.util.concurrent.ConcurrentHashMap.put
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute$2.apply
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute$2.apply
      - java.util.concurrent.ConcurrentHashMap.compute <<Monitor@java.util.concurrent.ConcurrentHashMap.putVal()>>
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute.update21
  child2Parent:
    thread: Thread-1
    stack:
      - java.util.concurrent.ConcurrentHashMap.putVal <<Monitor@java.util.concurrent.ConcurrentHashMap.putVal()>>
      - java.util.concurrent.ConcurrentHashMap.put
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute$1.apply
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute$1.apply
      - java.util.concurrent.ConcurrentHashMap.compute <<Monitor@java.util.concurrent.ConcurrentHashMap.putVal()>>
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute.update12

Here is the putVal method. The synchronized statement leading to the deadlock is in line 18:

  final V putVal(K key, V value, boolean onlyIfAbsent) {
        if (key == null || value == null) throw new NullPointerException();
        int hash = spread(key.hashCode());
        int binCount = 0;
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
                if (casTabAt(tab, i, null,
                             new Node<K,V>(hash, key, value, null)))
                    break;                   // no lock when adding to empty bin
            }
            else if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            else {
                V oldVal = null;
                synchronized (f) {
                    // ... omitted    
                }
                if (binCount != 0) {
                    if (binCount >= TREEIFY_THRESHOLD)
                        treeifyBin(tab, i);
                    if (oldVal != null)
                        return oldVal;
                    break;
                }
            }
        }
        addCount(1L, binCount);
        return null;
    }

and the compute method. The synchronized statement leading to the deadlock is in line 19:

    public V compute(K key,
                     BiFunction<? super K, ? super V, ? extends V> remappingFunction) {
        if (key == null || remappingFunction == null)
            throw new NullPointerException();
        int h = spread(key.hashCode());
        V val = null;
        int delta = 0;
        int binCount = 0;
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
            else if ((f = tabAt(tab, i = (n - 1) & h)) == null) {
                 // ... omitted contains call to remappingFunction function   
            }
            else if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            else {
                synchronized (f) {
                    // ... omitted contains call to remappingFunction function   
                }
                if (binCount != 0) {
                    if (binCount >= TREEIFY_THRESHOLD)
                        treeifyBin(tab, i);
                    break;
                }
            }
        }
        if (delta != 0)
            addCount((long)delta, binCount);
        return val;
    }

How to use atomic methods to write thread-safe classes

Wed, 20 Sep 2017 22:00:00 GMT

The classes in the package java.util.concurrent use atomic methods to update their internal state in a thread-safe way. In the following, you will see how to write and use such atomic methods to create thread-safe classes.

Atomic methods

A method is atomic if it is "all or nothing". If a thread reads the data the thread can only see the state before or after the execution of the atomic method, no intermediate state. After the atomic method was executed successfully the changes are visible to all threads. The atomic method only modifies data of its own object without side effects.
Here are some examples of atomic methods:

java.util.concurrent.atomic.AtomicInteger

int get()

Gets the current value.

void set(int newValue)

Sets to the given value.

int addAndGet(int delta)

Atomically adds the given value to the current value.

java.util.concurrent.ConcurrentHashMap

V compute(K key, BiFunction<? super K,? super V,? extends V> remappingFunction)

Attempts to compute a mapping for the specified key and its current mapped value (or null if there is no current mapping). The entire method invocation is performed atomically. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this Map.

Writing atomic methods

The easiest way to implement atomic methods is to use synchronized blocks. For example, the backport for AtomicInteger for JDKs prior to version 1.5 uses synchronized blocks and volatile fields:

package edu.emory.mathcs.backport.java.util.concurrent.atomic;
public class AtomicInteger extends Number implements java.io.Serializable {
  private volatile int value;
  public final int get() {
        return value;
    }
    public final synchronized void set(int newValue) {
        value = newValue;
    }
    public final synchronized int addAndGet(int delta) {
        return value += delta;
    }
}

To write lock-free atomic methods you can use atomic operations of the CPU. In JAVA atomic operations can be accessed using sun.misc.Unsafe like in the implementation of AtomicInteger below. Or you can use the classes of the java.util.concurrent.atomic package, which provides atomic operations for each field type.

package java.util.concurrent.atomic;
import sun.misc.Unsafe;
public class AtomicInteger extends Number implements java.io.Serializable {
 private static final Unsafe unsafe = Unsafe.getUnsafe();
 private static final long valueOffset;
    static {
        try {
            valueOffset = unsafe.objectFieldOffset
                (AtomicInteger.class.getDeclaredField("value"));
        } catch (Exception ex) { throw new Error(ex); }
    }
    private volatile int value;
    public final int get() {
        return value;
    }
    public final void set(int newValue) {
        value = newValue;
    }
   public final int addAndGet(int delta) {
        return unsafe.getAndAddInt(this, valueOffset, delta) + delta;
    }
}

Using atomic methods

To see how to use atomic methods let us update our AtomicInteger from two threads. We can simulate this using the following two methods. Calling AtomicInteger sequential:

public void testSetAndGetSequential() throws Exception
	{
		AtomicInteger atomicInteger= new AtomicInteger(0);
		int threadA = atomicInteger.get();
		atomicInteger.set(threadA + 5);
		int threadB = atomicInteger.get();
		atomicInteger.set(threadB + 5);
		assertEquals(   atomicInteger.get() , 10  );
	}

and calling AtomicInteger in parallel:

	public void testSetAndGetParallel() throws Exception
	{
		AtomicInteger atomicInteger= new AtomicInteger(0);
		int threadA = atomicInteger.get();
		int threadB = atomicInteger.get();
		atomicInteger.set(threadA + 5);
		atomicInteger.set(threadB + 5);
		assertEquals(   atomicInteger.get() , 5  );
	}

Each method simulates a different thread interleaving. As we can see the use of get and set leads for a specific thread interleaving to an incorrect result. To update the AtomicInteger correctly we need to use a method which combines the set and get operation in an atomic method, the addAndGet method. Now the result is independent of the thread interleaving:

	public void testUpdate() throws Exception
	{
		AtomicInteger atomicInteger= new AtomicInteger(0);
		atomicInteger.addAndGet(5); // Thread A
		atomicInteger.addAndGet(5); // Thread B
		assertEquals(   atomicInteger.get() , 10  );
	}

Care must be taken if we want to use callback functions inside atomic methods like in the compute method of ConcurrentHashMap. Those callback methods should not modify the same object, as in the test below. The following usage leads to a deadlock. To test this we need a multi-threaded unit test. This junit test uses concurrent-junit to run the test in parallel. The ConcurrentTestRunner runs each test method in parallel by 4 threads.

import java.util.concurrent.ConcurrentHashMap;
import java.util.function.BiFunction;
import org.junit.Test;
import org.junit.runner.RunWith;
import com.anarsoft.vmlens.concurrent.junit.ConcurrentTestRunner;
@RunWith(ConcurrentTestRunner.class)
public class TestConcurrentHashMapCompute {
	private final ConcurrentHashMap<Integer,Integer> map = new ConcurrentHashMap<Integer,Integer>();
	public TestConcurrentHashMapCompute()
	{
		map.put(1, 1);
		map.put(2, 2);	
	}
	@Test
	public void update12()
	{
		map.compute( 1 ,   			
				new BiFunction<Integer,Integer,Integer>()
				{
					public Integer apply(Integer k, Integer v) {		
						map.put( 2 , 1);
						return v;
					}
				}
				);
	}
	@Test
	public void update21()
	{
              map.compute( 2 ,   			
				new BiFunction<Integer,Integer,Integer>()
				{
					public Integer apply(Integer k, Integer v) {		
						map.put( 1 , 1);
						return v;
					}
				}
				);
	}
}

If we run the test we see the following deadlocks. The trace was generated by vmlens.com:

- deadlock: Monitor@java.util.concurrent.ConcurrentHashMap.putVal()<->Monitor@java.util.concurrent.ConcurrentHashMap.putVal()
  parent2Child:
    thread: Thread-4
    stack:
      - java.util.concurrent.ConcurrentHashMap.putVal <<Monitor@java.util.concurrent.ConcurrentHashMap.putVal()>>
      - java.util.concurrent.ConcurrentHashMap.put
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute$2.apply
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute$2.apply
      - java.util.concurrent.ConcurrentHashMap.compute <<Monitor@java.util.concurrent.ConcurrentHashMap.putVal()>>
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute.update21
  child2Parent:
    thread: Thread-1
    stack:
      - java.util.concurrent.ConcurrentHashMap.putVal <<Monitor@java.util.concurrent.ConcurrentHashMap.putVal()>>
      - java.util.concurrent.ConcurrentHashMap.put
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute$1.apply
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute$1.apply
      - java.util.concurrent.ConcurrentHashMap.compute <<Monitor@java.util.concurrent.ConcurrentHashMap.putVal()>>
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute.update12

Conclusion

Atomic methods let you use classes in a thread-safe way without knowing the implementation details. To test a specific thread interleaving we can simply order the calls accordingly. Callback functions inside atomic methods should not modify the same object since this might lead to deadlocks.

Opinion: Use atomic methods for thread safety

Tue, 12 Sep 2017 22:00:00 GMT

When we update rows in our database we use transactions. By using transactions we can safely modify our data even when multiple clients are accessing the database concurrently. In java, we have many techniques for modifying data concurrently, volatile fields, synchronized blocks, classes from java.util.concurrent.

But we are missing a high-level abstraction like transactions in a database to concurrently modify data in a safe way. I think atomic methods are a good fit for such a high-level abstraction.

Atomic methods

A method is atomic if it is "all or nothing". If another thread reads the data the other thread can only see the state before or after the execution of the atomic block, no intermediate state. After the atomic method was successful the changes are visible to all other threads. The atomic method only modifies data of its own object without side effects.
Here is an example of a class with atomic methods implemented by the synchronized statement:

public class AtomicPositiveValue {
   private int value;
   public AtomicPositiveValue(int newValue) throws Exception 
   {
	   if( newValue < 0 ) 
	   {
		   throw new Exception("value is negative");
	   }
	this.value = newValue;
   }
   public synchronized int get()
   {
	return value;
   }
   public synchronized void set(int newValue) throws Exception
   {
	   if( newValue < 0 ) 
	   {
		   throw new Exception("value is negative");
	   }
	   value = newValue;
   }
}

Using atomic methods

Let us modify an instance of our class from multiple threads. The main advantage of atomic methods is that we can simulate this by a single threaded method as the following.

	public void testSetAndGetParallel() throws Exception
	{
		AtomicPositiveValue atomicPositiveValue= new AtomicPositiveValue(0);
		int threadA = atomicPositiveValue.get();
		int threadB = atomicPositiveValue.get();
		atomicPositiveValue.set(threadA + 5);
		atomicPositiveValue.set(threadB + 5);
		assertEquals(   atomicPositiveValue.get() , 5  );
	}
	public void testSetAndGetSequentiell() throws Exception
	{
		AtomicPositiveValue atomicPositiveValue= new AtomicPositiveValue(0);
		int threadA = atomicPositiveValue.get();
		atomicPositiveValue.set(threadA + 5);
		int threadB = atomicPositiveValue.get();
		atomicPositiveValue.set(threadB + 5);
		assertEquals(   atomicPositiveValue.get() , 10  );
	}

This is not the result we wanted. The chaining of the get and set method leads to a non atomic update and the result depends on the order of the set and get calls from the different threads. We need an atomic update method:

 public synchronized int update(int delta) throws Exception
   {
	   int temp = value + delta;
	   if( temp < 0 ) 
	   {
		   throw new Exception("value is negative");
	   }
	   value = temp;
	   return value;   
   }

Now we always achieve the same result:

	public void testUpdate() throws Exception
	{
		AtomicPositiveValue atomicPositiveValue= new AtomicPositiveValue(0);
		atomicPositiveValue.update(5); // Thread A
		atomicPositiveValue.update(5); // Thread B
		assertEquals(   atomicPositiveValue.get() , 10  );
	}

As we have seen chaining atomic methods from the same object typically leads to nonatomic methods. Therefore we need to provide atomic methods for all use cases.

Composing atomic methods

Now let us see what happens when we compose atomic methods. For example let us create a method which transfers an amount from one instance to another.

public synchronized void transfer(AtomicPositiveValue other, int amount) throws Exception
{
	other.update( -1 * amount );
    update(amount); 
}

To test this method we need a multi-threaded unit test. The ConcurrentTestRunner runs each test method parallel by 4 threads.

@RunWith(ConcurrentTestRunner.class)
public class TestDeadlockAtomicValue {
	private final AtomicPositiveValue first;
	private final AtomicPositiveValue second;
	public TestDeadlockAtomicValue() throws Exception
	{
		first  = new AtomicPositiveValue(1000);
		second = new AtomicPositiveValue(1000);
	}
	@Test
	public void testTransferFirstToSecond() throws Exception
	{
		second.transfer( first , 1);
	}
	@Test
	public void testTransferSecondToFirst() throws Exception
	{
		first.transfer( second , 1);
	}
}

If we run the test we see deadlocks:

- deadlock: Monitor@com.anarsoft.vmlens.concurrent.example.AtomicPositiveValue.update()<->Monitor@com.anarsoft.vmlens.concurrent.example.AtomicPositiveValue.update()
  parent2Child:
    thread: Thread-4
    stack:
      - com.anarsoft.vmlens.concurrent.example.AtomicPositiveValue.update <<Monitor@com.anarsoft.vmlens.concurrent.example.AtomicPositiveValue.update()>>
      - com.anarsoft.vmlens.concurrent.example.AtomicPositiveValue.transfer <<Monitor@com.anarsoft.vmlens.concurrent.example.AtomicPositiveValue.update()>>
      - com.anarsoft.vmlens.concurrent.example.TestDeadlockAtomicValue.testTransferSecondToFirst
      - sun.reflect.NativeMethodAccessorImpl.invoke
      - sun.reflect.DelegatingMethodAccessorImpl.invoke
      - java.lang.reflect.Method.invoke
      - org.junit.runners.model.FrameworkMethod$1.runReflectiveCall
      - org.junit.internal.runners.model.ReflectiveCallable.run
      - org.junit.runners.model.FrameworkMethod.invokeExplosively
      - org.junit.internal.runners.statements.InvokeMethod.evaluate
      - com.anarsoft.vmlens.concurrent.junit.internal.ConcurrentStatement.evaluateStatement
      - com.anarsoft.vmlens.concurrent.junit.internal.ConcurrentStatement.evaluate
      - com.anarsoft.vmlens.concurrent.junit.internal.ParallelExecutorThread.run
  child2Parent:
    thread: Thread-1
    stack:
      - com.anarsoft.vmlens.concurrent.example.AtomicPositiveValue.update <<Monitor@com.anarsoft.vmlens.concurrent.example.AtomicPositiveValue.update()>>
      - com.anarsoft.vmlens.concurrent.example.AtomicPositiveValue.transfer <<Monitor@com.anarsoft.vmlens.concurrent.example.AtomicPositiveValue.update()>>
      - com.anarsoft.vmlens.concurrent.example.TestDeadlockAtomicValue.testTransferFirstToSecond
      - sun.reflect.NativeMethodAccessorImpl.invoke
      - sun.reflect.DelegatingMethodAccessorImpl.invoke
      - java.lang.reflect.Method.invoke
      - org.junit.runners.model.FrameworkMethod$1.runReflectiveCall
      - org.junit.internal.runners.model.ReflectiveCallable.run
      - org.junit.runners.model.FrameworkMethod.invokeExplosively
      - org.junit.internal.runners.statements.InvokeMethod.evaluate
      - com.anarsoft.vmlens.concurrent.junit.internal.ConcurrentStatement.evaluateStatement
      - com.anarsoft.vmlens.concurrent.junit.internal.ConcurrentStatement.evaluate
      - com.anarsoft.vmlens.concurrent.junit.internal.ParallelExecutorThread.run

The trace was generated by vmlens.

Chaining of atomic methods leads to deadlock. At least when they are implemented by synchronized statements. And since how the atomic method is implemented should by hidden, we need to avoid chaining of atomic methods.

Implementing with compareAndSet

To see that we can easily change the implementation of our AtomicPositiveValue, let us see how it can be implemented with compareAndSet. Suppose in our performance test we see a bottleneck at the get method. And we decide to use the following faster implementation using AtomicInteger with compareAndSet:

public class AtomicPositiveValueUsingAtomicInteger {
	private final AtomicInteger value;
	public AtomicPositiveValueUsingAtomicInteger(int newValue) throws Exception {
		if (newValue < 0) {
			throw new Exception("value is negative");
		}
		value = new AtomicInteger(newValue);
	}
	public int get() {
		return value.get();
	}
	public int update(int delta) throws Exception {
		int current = value.get();
		int update = current + delta;
		if (update < 0) {
			throw new Exception("value is negative");
		}
		while (!value.compareAndSet(current, update)) {
			update = current + delta;
			if (update < 0) {
				throw new Exception("value negative");
			}
		}
		return update;
	}
}

The behaviour of our class is the same as the synchronized implementation. It is only faster.

Conclusion

Atomic methods let us use classes in a thread safe way without knowing the implementation details, similar to database transactions. If we want to test if our usage is correct we can simply chain the method calls. To test a specific thread interleaving we can simply order the calls accordingly. In contrast to database transactions, which have automatic deadlock detection, we can not chain atomic methods.

java.math.BigDecimal toString is not thread safe

Mon, 28 Aug 2017 22:00:00 GMT

BigDecimal is an immutable data type. So every method of this class should be thread safe. But this not the case for the method toString. Calling it from multiple threads leads to strange results.

To see this, let us look at the source code:

@Override
    public String toString() {
        String sc = stringCache;
        if (sc == null)
            stringCache = sc = layoutChars(true);
        return sc;
    }
    /**
     * Used to store the canonical string representation, if computed.
     */
    private transient String stringCache;

As we see a non-volatile field stringCache is used to cache the String computed in the method layoutChars. The method layoutChars uses a thread local StringBuffer to compute a String representation of this BigDecimal. In line 3 the instance variable stringCache is read and line 5 written This makes the class BigDecimal mutable and the method toString not thread safe.

The race condition

If the code is executed in the given order everything is o.k. But if some component reorders the statements, the cached String is not completely initialized. In pseudo code the method toString together with layoutChars looks like this:

store stringCache in local Variable sc
if sc is null
{
	call layoutChars
	{
		compute String with thread local StringBuilder
		call StringBuilder toString
		{
			create String
			initialize String in Constructor of class String
	    }
	}
	store result in instance Variable stringCache
}

If the statements get reordered a thread sees an uninitialized String:

Thread A	store stringCache in local Variable sc
Thread A	if sc is null
Thread A	create String
Thread A	store result in instance Variable stringCache
Thread B	store stringCache in local Variable sc
Thread B	Thread B sees an uninitialized String
Thread A	compute String with thread local StringBuilder
Thread A	initialize String in Constructor of class String

One component which reorders statements is the cache system of the CPU. ARM compatible processors like in smartphones or the Raspberry Pi reorder reads and writes to improve performance, leading to a scenario as described above.

Reproducing the error

To reproduce the error I use jcstress, an open JDK code tool: The Java Concurrency Stress tests (jcstress) is an experimental harness and a suite of tests to aid the research in the correctness of concurrency support in the JVM, class libraries, and hardware.

I use the following test class:

package com.vmlens.stressTest.tests;
import java.math.BigDecimal;
import org.openjdk.jcstress.annotations.*;
import org.openjdk.jcstress.infra.results.IntResult1;
@JCStressTest
@Outcome(id = "0", expect = Expect.ACCEPTABLE, desc = "Default outcome.")
@State
public class BigDecimalToString {
	private final  BigDecimal testBigDecimal = new BigDecimal("0.56");
	@Actor
	public void actor1(IntResult1 r) {
        testBigDecimal.toString().length();	
	}
	@Actor
	public void actor2(IntResult1 r) {
		testBigDecimal.toString().length();
	}	
}

Jcstress runs this test multiple times always calling the method actor1 and actor2 from separate threads. When I call this test on a raspberry pi, I see the following null pointer exception:

        java.lang.NullPointerException
        at java.lang.String.length(String.java:623)
        at com.vmlens.stressTest.tests.BigDecimalToString.actor1(BigDecimalToString.java:12)
        at com.vmlens.stressTest.tests.BigDecimalToString_jcstress.actor1(BigDecimalToString_jcstress.java:145)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

If we look at line 623 of class String we see that the String was indeed not completely initialized. The instance variable value storing the character array is null:

return value.length;

Where did I find the race condition?

I found the race condition while testing geronimo web service with vmlens. vmlens detects race conditions during test runs. vmlens created the following trace during my tests:

- variable: java.math.BigDecimal.stringCache
  reading:
    thread: DefaultThreadPool 197
    stack:
      - java.math.BigDecimal.toString
      - org.apache.axis.encoding.ser.SimpleSerializer.getValueAsString
      - org.apache.axis.encoding.ser.SimpleSerializer.serialize
      - org.apache.axis.encoding.SerializationContext.serializeActual
      - org.apache.axis.encoding.SerializationContext.serialize
      - org.apache.axis.encoding.SerializationContext.serialize
      - org.apache.axis.encoding.ser.BeanSerializer.serialize
      - org.apache.axis.encoding.SerializationContext.serializeActual
      - org.apache.axis.encoding.SerializationContext.serialize
      - org.apache.axis.encoding.SerializationContext.serialize
      - org.apache.axis.message.RPCParam.serialize
      - org.apache.axis.message.RPCElement.outputImpl
      - org.apache.axis.message.MessageElement.output
      - org.apache.axis.message.SOAPBody.outputImpl
      - org.apache.axis.message.SOAPEnvelope.outputImpl
      - org.apache.axis.message.MessageElement.output
      - org.apache.axis.SOAPPart.writeTo
      ---- Stack Trace shortened ---- 
  writing:
    thread: DefaultThreadPool 196
    stack:
      - java.math.BigDecimal.toString
      - org.apache.axis.encoding.ser.SimpleSerializer.getValueAsString
      - org.apache.axis.encoding.ser.SimpleSerializer.serialize      
      ---- Stack Trace shortened ----

This shows that toString is used for serializing BigDecimals to SOAP messages.

Conclusion

The caching of the computed String in an instance variable in the method toString makes the class BigDecimal mutable and the method toString not thread safe.This leads to NullPointer exceptions on ARM compatible processors. One usage of the toString method is the serialization of BigDecimals to SOAP messages.

java.lang.reflect.TypeVariable getBounds is not thread safe

Sun, 19 Mar 2017 23:00:00 GMT

java.lang.reflect.TypeVariable getBounds is not thread safe. Calling it from multiple threads might even crash your JVM

The following method shows you the use of getBounds(). The method getBounds is used to get the upper bound(s) of a generic type:

public void testGetBounds() {
		Class cl = GenericInterface.class;
		TypeVariable typeVariable  = cl.getTypeParameters()[0];
		typeVariable.getBounds()[0].getTypeName();
	}

To make the examples and tests shorter, I do not iterate over the returned array but simply use the first element. Here is the generic interface used in the example:

package com.vmlens.stressTest.util;
public interface GenericInterface> {
}

If you call testGetBounds from multiple threads, calling getBounds leads to a race condition.

The race condition

Since the array of TypeVariables returned by getTypeParameters is cached in the volatile field genericInfo, each thread works on the same TypeVariable instance. And the class TypeVariableImpl implementing the TypeVariable interface modifies the not volatile field bounds without synchronization:

package sun.reflect.generics.reflectiveObjects;
// import statements omitted
public class TypeVariableImpl
    extends LazyReflectiveObjectGenerator implements TypeVariable {
    // upper bounds - evaluated lazily
    private Type[] bounds;
    public Type[] getBounds() {
        // lazily initialize bounds if necessary
        if (bounds == null) {
            FieldTypeSignature[] fts = getBoundASTs(); // get AST
            // allocate result array; note that
            // keeping ts and bounds separate helps with threads
            Type[] ts = new Type[fts.length];
            // iterate over bound trees, reifying each in turn
            for ( int j = 0; j  < fts.length; j++) {
                Reifier r = getReifier();
                fts[j].accept(r);
                ts[j] = r.getResult();
            }
            // cache result
            bounds = ts;
            // could throw away bound ASTs here; thread safety?
        }
        return bounds.clone(); // return cached bounds
    }
    // other fields and methods omitted
}

In line 15 and 30 the field bounds is read and in line 27 it is written. If the code is executed in the given order everything is o.k. But if some component reorders the statements, the array is not completely initialized. In pseudo code the method getBounds looks like this:

if instance variable bounds is  null
{
    set local variable ts to new Array
    initialize the array
    set instance variable bounds to the local variable ts 
}
return instance variable bounds.clone

If the statements get reordered another thread sees an uninitialised array:

Thread A	set local variable ts to new Array
Thread A	set instance variable bounds to the local variable ts
Thread B	if instance variable bounds is null
Thread B	Thread B return instance variable bounds.clone // the array is not yet completely initialized

One such component is the cache system of the CPU. ARM compatible processors like in smartphones or the Raspberry Pi reorder reads and writes to improve performance, leading to a scenario as described above.

Reproducing the error

I use the following test class:

@JCStressTest
@Outcome(id = "0, 0", expect = Expect.ACCEPTABLE, desc = "Default outcome.")
@State
public class TypeVariableGetBounds {
	private final Class cl;
	public TypeVariableGetBounds() {
		try {
			cl = (new StressTestClassLoader(TypeVariableGetBounds.class.getClassLoader()))
					.loadClass("com.vmlens.stressTest.util.GenericInterface");
		} catch (Exception e) {
			throw new RuntimeException("Test setup incorrect", e);
		}
	}
	public void callContainsDataRace() {
		TypeVariable typeVariable = cl.getTypeParameters()[0];
		typeVariable.getBounds()[0].getTypeName();
	}
	@Actor
	public void actor1(IntResult2 r) {
		callContainsDataRace();
	}
	@Actor
	public void actor2(IntResult2 r) {
		callContainsDataRace();
	}
}

Jcstress runs this test multiple times always calling the method actor1 and actor2 from separate threads. To separate the tests, I use a special classloader, which always reloads a class.

When I call this test on a raspberry pi using the test mode tough or stress, I see a crash of the JVM:

     
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7665b4c0, pid=16112, tid=1680598112
#
# JRE version: Java(TM) SE Runtime Environment (8.0_65-b17) (build 1.8.0_65-b17)
# Java VM: Java HotSpot(TM) Client VM (25.65-b01 mixed mode linux-arm )
# Problematic frame:
# V  [libjvm.so+0x27a4c0]

The JVM error log shows the following:

Stack: [0x6426f000,0x642bf000],  sp=0x642bd8b8,  free space=314k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x27a4c0]
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 1303  java.lang.Object.clone()Ljava/lang/Object; (0 bytes) @ 0x742ba71c [0x742ba6e0+0x3c]
J 1313 C1 sun.reflect.generics.repository.GenericDeclRepository.getTypeParameters()[Ljava/lang/reflect/TypeVariable; (80 bytes) @ 0x742b8210 [0x742b7f40+0x2d0]
J 1229 C1 com.vmlens.stressTest.tests.TypeVariableGetBounds_jcstress.actor1()Ljava/lang/Void; (109 bytes) @ 0x742a6cb8 [0x742a6bf0+0xc8]
j  com.vmlens.stressTest.tests.TypeVariableGetBounds_jcstress$$Lambda$6.call()Ljava/lang/Object;+4
j  java.util.concurrent.FutureTask.run()V+42
j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub

It seems that the native array clone method can not cope with an uninitialized array.

So far I could not reproduce this error on my intel i5 workstation.

Conclusion

java.lang.reflect.TypeVariable getBounds is not thread safe. Calling it from multiple threads might lead, depending on the java platform you are using, to strange errors.

How to crash the java virtual machine with a race condition

Thu, 02 Mar 2017 23:00:00 GMT

This is a how-to guide for crashing the java virtual machine. It gives you an introduction to race conditions and shows you what errors can happen if your code contains such bugs.

Create a race condition

Let us start with the following method:

public class DataRaceTest implements Runnable {
   private Type[] instance;
   @Override
   public void run() {
     if (instance == null) {
	   Type[] ts = new Type[1];
	   ts[0] = Object.class;
	   instance = ts;
	  }
	  instance[0].getTypeName();
   }
}

If this method is executed by multiple threads it leads to a race condition. More specifically it leads to a data race. Data races are defined in the java memory model as an access to a shared field without correct synchronization. According to the java memory model data races lead to platform dependent undefined behavior:

Without correct synchronization, very strange, confusing and counterintuitive behaviors are possible.

If the code is executed in the given order everything is o.k. But if some component reorders the statements, the array might not be completely initialized. One such component is the cache system of the CPU. Intel Processors, for example, have a cache system with a strong memory guarantees. The cache system makes sure that the values written by the cores are almost always seen in the same order as they have been written by the other cores. But ARM compatible processors like in smartphones or the Raspberry Pi do not give this guarantee.

Create a Nullpointer Exception

To see what is happening, I execute the method multiple times by multiple threads To do this I use a tool to reproduce race conditions, called stress test. I run it with the following command line options:

java -cp stress-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar com.vmlens.StressTest 
   -i 5 -w 16 com.vmlens.stressTest.examples.dataRace.DataRaceTestSetup

using the following test setup class:

public class DataRaceTestSetup implements TestSetup  {
   @Override
   public Runnable createTest() {
		return  new DataRaceTest();
   }
}

This runs the method run of DataRaceTest for 5 iterations. Each iteration consists of 16.000 tests run by 16 threads.

If I run this on my intel i5 workstation, I could not create an exception. If I run this on my Raspberry Pi, I see the following Nullpointer Exception for every 2000 tests:

java.lang.NullPointerException
        at com.vmlens.stressTest.examples.dataRace.DataRaceTest.run(DataRaceTest.java:78)
        at com.vmlens.stressTest.internal.TestCall.call(TestCall.java:36)
        at com.vmlens.stressTest.internal.TestCall.call(TestCall.java:7)
        at com.vmlens.stressTest.internal.WorkerThread.run(WorkerThread.java:22)

Crash the java virtual machine

Now let us add some native call. Let us clone the array:

@Override
	public void run() {
			if (instance == null) {
				Type[] ts = new Type[1];
				ts[0] = Object.class;
				instance = ts;
			}
		Type[] clonedInstance = instance.clone();
		clonedInstance[0].getTypeName();
	}

This time I run the test till I see at least one error. This is done by using the -e option:

java -cp stress-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar com.vmlens.StressTest
   -e 1 -w 16 com.vmlens.stressTest.examples.dataRace.DataRaceTestSetup

If I run this code on a Raspberry Pi, I see a java virtual machine crash after some time. It takes between half an hour and a day:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x741ab340, pid=26364, tid=1682633824
#
# JRE version: Java(TM) SE Runtime Environment (8.0_65-b17) (build 1.8.0_65-b17)
# Java VM: Java HotSpot(TM) Client VM (25.65-b01 mixed mode linux-arm )
# Problematic frame:
# J 38 C1 com.vmlens.stressTest.internal.TestSetupCall.call()Lcom/vmlens/stressTest/internal/Result; (45 bytes) @ 0x741ab340 [0x741ab250+0xf0]

Conclusion

Data races are hard to reproduce. You need different types of hardware. And even then it takes time till an error occurs.

That is the reason I developed vmlens, a tool to detect data races in the execution trace of an application.

java.util.concurrent.locks.ReentrantLock Cheat Sheet

Mon, 26 Sep 2016 22:00:00 GMT

The reentrant lock is a replacement for the easier to use synchronized statement when you need one of the following advanced techniques: lockInterrupibly, tryLock, lock coupling, multiple conditions, or fair locks In the cheat sheet, I have summarized each technique. And in the blog post, I give a detailed description of those techniques.

Similar as synchronized this lock is reentrant, which means the same thread can acquire the lock multiple times.

Lock Interruptible

private final ReentrantLock lock = new ReentrantLock();
   public void m() throws InterruptedException {
     lock.lockInterruptibly();  
     try {
       // ... method body
     } finally {
       lock.unlock()
     }
   }

Throws an InterruptedException, when the thread.interrupt method was called by another thread. This allows you to interrupt the processing even while trying to acquire a lock. Something which is not possible with the synchronized statement.

Try Lock

private final ReentrantLock lock = new ReentrantLock();  
   public boolean m() throws InterruptedException {
     if(! lock.tryLock(2 , TimeUnit.MILLISECONDS ) )
     {
       return false; 
     }
     try {
       // ... method body
     } finally {
       lock.unlock()
     }
     return true;
   }

Similar to aqcuireInterrubtable throws a thread interrupted exception when the thread was interrupted by another thread. Additionally, it allows you to give a time span how long to try to acquire the lock. Useful when you have a task which is only valid for a specific amount of time. You can also use it to avoid deadlocks. When you need to acquire two locks simultaneously. Acquire the locks with tryLock. If you can not acquire one of the locks at the given interval release both locks, wait a little bit and try again.

Lock Coupling

The reentrant lock allows you to unlock a lock immediately after you have successfully acquired another lock leading to a technique called lock coupling.This can, for example, be used to lock a linkedlist. You can see an implementation here.

Multiple Conditions

Create:

Condition notFull = lock.newCondition();

Wait for Condition to become true

lock.lock()
{
   while( condition not fullfilled)
   {    
        notFull.await();
   }
}
 finally {
       lock.unlock()
}

Signal that condition has become true

lock.lock() { notFull.signal(); } finally { lock.unlock() }

Be cautious that you do not mix the methods from Condition, 'await', 'signal', with the methods from java.lang.Object wait, notify. Useful when you need to wait for different conditions. ArrayBlockingQueue, for example, uses two conditions one to wait that the queue becomes not full the other to wait till the queue becomes not empty.

Fair locks

private final ReentrantLock lock = new ReentrantLock(true);

Threads get the lock in the order they requested it. This has a high-performance penalty, see the following benchmark for details

Benchmark

The figure shows the throughput of synchronized vs reentrant lock in a fair and unfair mode for different thread counts . The benchmark was run on JDK 8 on an intel i5 4 core CPU using jmh. The source of the benchmark can be downloaded here.

A new high throughput java executor service

Sun, 11 Sep 2016 22:00:00 GMT

The vmlens executor service is a high throughput executor service. It achieves three times higher throughput than the standard JDK executor service. The tradeoff is that the latency is much higher than that of the standard JDK executor service. Here are my benchmark results:

The figure shows the throughput of the vmlens executor Service compared to the standard JDK executor service for different threads on JDK 8.

The figure shows the latency of the vmlens executor Service compared to the standard JDK executor service for different threads on JDK 8. Both benchmarks were run with jmh on an intel i5 4 core CPU. The source of the benchmark can be downloaded here

A wait-free algorithm for writing

The basic Idea is that writing should be as fast as possible. The vmlens executor Service uses a thread local field to store its last written queue node. So writing consists of creating a new QueueNode, writing to a thread local field and writing to the volatile QueueNode next field.

public class QueueManyWriters
private final ThreadLocal lastWrittenQueueNode = new ThreadLocal();
public void accept(E element)
{
if( dispatcherThread.stop )
{
throw new RejectedExecutionException();
}
QueueNode current = new QueueNode(element);
if( lastWrittenQueueNode.get() == null )
{
writingThreads.append(current,Thread.currentThread().getId());
lastWrittenQueueNode.set(new LastWrittenQueueNode(current));
}
else
{
lastWrittenQueueNode.get().last.next = current;
lastWrittenQueueNode.get().last = current;
}
}
...
}
public class QueueNode {
final E element;
volatile QueueNode next;
public QueueNode(E element) {
super();
this.element = element;
}
}

Reading is done by a single dispatcher Thread. The dispatcher collects the tasks and pushes them to one of the worker thread. For collecting the tasks the dispatcher thread uses a local list. The list contains the last read element for each writing thread.

Conclusion

The wait-free algorithm for writing leads to a three time higher throughput than the standard JDK executor service. The vmlens executor service is used in vmlens, a tool to test multithreaded application, for asynchronously writing events to disk. Whatever type of execution service you use you should test the multithreaded part of your application. Read more about testing multithreaded java code here.

Performance Improvements of Contended Java Monitor from JDK 6 to 9

Sun, 14 Aug 2016 22:00:00 GMT

The new JDK™ 9 early access release contains a JDK enhancement proposal JEP 143, Improve Contended Locking, to improve the performance of contended monitors. Monitors are used by the java synchronized statement to lock the access to a code block. If the synchronized block is called by many threads, the monitor becomes contended. This can degrade the performance dramatically. So let us look at the performance improvements of contended monitors.

The graphic shows the time of one method call. Lower means better performance. The test consists of 8 threads accessing a synchronized block. All threads are accessing the same monitor. You can download the test here. The test was run on an intel i5 4 core CPU As we see JDK 9 improves the performance of contended monitors. Let us now look at a direct comparison between JDK 8 and 9.

Comparison between JDK 8 and JDK 9

The following shows how much the switch from JDK 8 to JDK 9 will bring.

The graphic shows the time of one method call for a contended monitor at different threads. As we see the performance of JDK 9 degrades mutch slower than the performance from JDK 8. For 16 Threads JDK 8 needs 2580 ns while JDK 9 only needs 1655 ns. This is an improvement by more than 60 percent. One advice to improve the performance is to use reentrant locks instead of synchronized blocks. So let us see if this advice is still true for JDK 9.

Reentrant Locks vs synchronized

Let us look at the performance of a contended reentrant lock vs a contended monitor of a synchronized block.

The graphics show the time of one method call at different threads. As we see in JDK 9 the performance of the synchronized statements gets almost as fast as reentrant locks.

Conclusion

As we have seen JDK 9 improves the performance of contended monitors. In JDK 9 contended monitors are almost as fast as contended reentrant locks. But JDK 9 also has a JDK enhancement proposal JEP 285, Spin-Wait Hints, to improve the performance of locks. I will look at this in the next blog article at this blog. If you have a question or remark please add a comment below.

Detecting Java Race Conditions With Tests, Part 2

Tue, 26 Jul 2016 22:00:00 GMT

If you update a field from different threads, you must make sure that between the read and the write from one thread the field is not updated by another thread. You can achieve this by using a synchronized block or the atomic compareAndSet method. But how can you test it?

Example of a Test

Let us look at an example of an incorrect synchronized class:

class Account {
	private int amount = 0;
	public synchronized int getAmount() {
		return amount;
	}
	public synchronized void setAmount(int amount) {
		this.amount = amount;
	}
}

To test this class we use the following junit test case:

@RunWith(ConcurrentTestRunner.class)
public class TestAccount {
	private final Account account = new Account();
    private final static int THREAD_COUNT = 2;
	@Test
	@ThreadCount(THREAD_COUNT)
	public void testAdd() {
		account.setAmount(  account.getAmount() + 20  );
	}
	@After
	public void chechBalance()
	{
		assertEquals( "" , THREAD_COUNT * 20 , account.getAmount()  ); 
	}
}

This junit test uses concurrent-junit to run the test in parallel. The test sometimes succeed, sometimes fails. Every time the get method is called immediately one after the other the sum is incorrect. In the vmlens Method Explorer you can see the order of the two methods. In case of an error you will see the following:

The Solution: Waitpoints

For a test this is rather inconvenient. If a test fails, which is bad enough, it should at least always fail. To achieve this, a thread needs to wait for the other threads before calling the set method. In vmlens this can be done by using a “waitpoint”. You can set waitepoints at the beginning of a monitor block or a volatile field access: If we set the waitepoint at the synchronized block of the set method the test always fails:

Conclusion

In the first part you have seen how to detect lost updates with tests. This second part shows how to detect non atomic updates. In the next part we will see, how to use this technique to test for thread safety. If you have any questions or remarks please comment below.

5 Tips to make your classes thread safe

Tue, 26 Jul 2016 22:00:00 GMT

While testing vmlens, a tool to find data races in java applications, on open source projects, I found the following 5 tricks how to make classes thread safe.

1) Declare immutable member variables as final

Always declare immutable member variables as final. This makes sure that your class behaves correctly independent on how it is used. Take for example the field fieldAccessor in the class java.lang.reflect.Field.

 
   private FieldAccessor fieldAccessor; 
   private FieldAccessor getFieldAccessor(Object obj)
        throws IllegalAccessException
    {
        boolean ov = override;
        FieldAccessor a = (ov) ? overrideFieldAccessor : fieldAccessor;
        return (a != null) ? a : acquireFieldAccessor(ov);
    }

Since it is not synchronized and not declared volatile, a thread reading this field might not see a completely initialized object as described in DoubleCheckedLocking But since the created object type sun.reflect.UnsafeQualifiedIntegerFieldAccessorImpl only uses final problem, there is no problem. Threads reading this field will always see a fully initialized object or null.

2) Create objects eagerly

Using final fields forces you to initialize your objects in the constructor. Lazy initialization of your objects on the other side is almost never a good idea in concurrent programs.

Take for example the old version from org.apache.commons.lang.StringEscapeUtils. It uses the lazy initialized class org.apache.commons.lang.Entities$LookupEntityMap:

       private String[] lookupTable() {
            if (lookupTable == null) {
                createLookupTable();
            }
            return lookupTable;
        }

This only works with locks or synchronization. Much better is the new version org.apache.commons.lang3.StringEscapeUtils witch eagerly creates the lookup tables and also uses a final field.

 public static final CharSequenceTranslator ESCAPE_XML10 =
        new AggregateTranslator(
            ...

3) Use volatile for mutable boolean variables

Mutable boolean fields are often used for controlling the flow of your application. For example to control the life cycle of a thread the following pattern can be used:

private volatile boolean isWorking;
while(isWorking)
{
  // do something
}

Use a volatile field to make the changes done in one thread visible in other threads.

4) Check 3rd party classes

A typical example for not doing so, is the use of the non thread safe java.util.date as member variable without synchronization. Therefore always check if the class is documented as thread safe. If not chances are high that it is not.

5) Test

Like all other features of your application, concurrency must be tested. In my next Blog post I will write how to test concurrency. In the meanwhile you can give vmlens a trial, which helps you to detect data races during testing.

3 Synchronization idioms

Tue, 26 Jul 2016 22:00:00 GMT

Signal

Intent

Change the behavior of a thread based on an event in another thread.

Motivation

In GUI Frameworks like swing the event thread handles all user events. Some events do not depend on the current state of other threads. An example is the cancel event for a long running operation. To notify worker threads an event handler uses the signal idiom.

Structure

The Signal idiom consists of a volatile field and at least on thread reading and one thread writing this field. The field must never be read and written in the same thread.

Sample Code

public class WorkerThread extends Thread {
public volatile boolean  canceled;
public void run()
{
while( ! canceled )
{
// do some work
}
}
}

Snapshot

Intent

See a consistent state of an object.

Motivation

A thread needs to make a long running computation with data which is potentially changed by other threads in the middle of the computation. An example is the deployment descriptor of a web application in a web server. Through automatic reloads the deployment descriptor might change in the middle of the processing of a web request. To see a consistent state the web request processing thread uses a snapshot of the deployment descriptor.

Structure

The Snapshot idiom consists of a class holding the current snapshot. A thread reading values gets the current snapshot and reads values from this snapshot. A thread writing values clones the current snapshot, changes this copy and set the clone as new current snapshot.

Sample Code

public class CopyOnWriteArrayList {
	  final transient ReentrantLock lock = new ReentrantLock();
	  // usa a volatile field for the snapshot reference
	  private transient volatile Object[] array;
	  final Object[] getArray() {
	        return array;
	   }
	 final void setArray(Object[] a) {
	        array = a;
	   }
	    public boolean add(E e) {
	        final ReentrantLock lock = this.lock;
	        lock.lock();
	        try {
	            Object[] elements = getArray();
	            int len = elements.length;
	            // clone the object
	            Object[] newElements = Arrays.copyOf(elements, len + 1);
	            // change the local copy
	            newElements[len] = e;
	            // set the copy as new snapshot
	            setArray(newElements);
	            return true;
	        } finally {
	            lock.unlock();
	        }
	    }
	    public void forEach(Consumer action) {
	        if (action == null) throw new NullPointerException();
	        // work with the current snapshot
	        Object[] elements = getArray();
	        int len = elements.length;
	        for (int i = 0; i < len; ++i) {
	            @SuppressWarnings("unchecked")
	            // all read operations are done on this snapshot 
	            E e = (E) elements[i];
	            action.accept(e);
	        }
	    }
}

Taken from java.util.concurrent.CopyOnWriteArrayList. Comments are mine.

Put if absent

Intent

Get an object out of a map in a multithreaded environment. Create it if it does not exist.

Motivation

You have a map of objects which is accessed from many threads. Each thread behaves the same, checking if a value exists for a key, if not creating the value. You have many concurrent reads. A read should not be blocked by a write to a different key. For example in a web application the language specific formats are stored in a map, using the language as key. Each worker thread checks if the language specific formats are available in the map, if not the thread creates a new one.

Structure

A thread tries to get a value for a key calling get on a concurrent map. If the “get” method returns null it creates the missing value and calls putIfAbsent.

Sample Code

public Set  getLanguageTagSet(String category) {
    // get the value
    Set tagset = langtagSets.get(category);
    // if value is null create one
       if (tagset == null) {
            tagset = createLanguageTagSet(category);
            // call putIfAbsent
            Set ts = langtagSets.putIfAbsent(category, tagset);
            // if putIfAbsent returns a value a other thread has created a new value in between 
            if (ts != null) {
                tagset = ts;
            }
     }
     return tagset;
    }

Taken from sun.util.locale.provider.JRELocaleProviderAdapter. Comments are mine.

Conclusion

Each synchronization idiom can only be used for a specific access pattern. Using it outside this access pattern will lead to race conditions in your application. Therefore always use a tool like vmlens to detect race conditions during development and testing.

How to test if your multi threaded java rest service is thread safe?

Sun, 17 Jul 2016 22:00:00 GMT

A not thread safe counter

As an example we use the following jersey rest service. It consists of a resource which increments a counter for each post call and returns the new value:

@Path("counter")
public class Counter {
     private static int i = 0;
    @POST
    @Produces(MediaType.TEXT_PLAIN)
    public String addOne() {
    	return new Integer(i++).toString();
    }
}

This is clearly not thread safe. The access to the variable counter is not synchronized, which will lead to a race condition, if the method "addOne" is called from too many threads in parallel. Let us see, if we can detect this bug with a test.

How to test

To test if this service is thread safe, we need a multi threaded test, like the following:

@RunWith(ConcurrentTestRunner.class)
public class CounterTest {
    private HttpServer server;
    private WebTarget target;
    @Before
    public void setUp() throws Exception {
        server = Main.startServer();
        Client c = ClientBuilder.newClient();
        target = c.target(Main.BASE_URI);
    }
    @After
    public void tearDown() throws Exception {
        server.shutdown();
    }
    @Test
    public void testAddOne() {	
    		String responseMsg = target.path("counter").request().post(Entity.json(null) , String.class);
    		/*
    		 * 
    		 * Checking the responseMsg left out for brevity...
    		 * 
    		 */  
    }
}

The concurrent test runner runs the test method in parallel with THREAD_COUNT threads. To detect race conditions, we need a tool, which can detect race conditions during tests. One such tool is vmlens. We can enable it, by adding the vmlens agent path to the vm arguments. After running we will see the race condition in vmlens:

After we have found the race condition, we want to fix the race.

Making the rest service thread safe

The easiest way to do this, is to use a java.util.concurrent.atomic.AtomicInteger. AtomicInteger uses a volatile Field internally, making updates visible to all threads. And the used method "addAndGet" is made atomic, by using compareAndSet.

@Path("counter")
public class Counter {
     private static AtomicInteger i = new AtomicInteger();
    @POST
    @Produces(MediaType.TEXT_PLAIN)
    public String addOne() {
    	return new Integer(i.incrementAndGet()).toString();
    }
}

Conclusion

To test a multi threaded java rest service we need two things, a multi threaded test and a tool which can detect java race conditions. For the multi threaded test I used concurrent-junit, for the race condition detection I used vmlens. If you have a question or remark please add a comment below.

5 Ways to thread safe update a field in java

Fri, 24 Jun 2016 22:00:00 GMT

Here are the 5 possibilities to update a field in java in a thread safe way. But before we start, what do you have to make look at? If you access a field from many threads, you must make sure that:

1. changes are made visible to all threads, and
2. the value is not changed during the update by the other thread, and
3. reading threads do not see the inconsistent intermediate state.

You can achieve this by one of the following 5 ways:

1) volatile Field

When to use?

You can use a volatile field when you have only one thread updating and many threads reading a single-valued field. Or use it, when the writing threads do not read the field. You can use it only for single valued fields like boolean or int. If you want to update object graphs or collections, use copy on write, as described below, instead.

Example

The following example code shows a worker thread, which stops processing based on a volatile field. This makes it possible, that other threads, like an event dispatching thread, stop the worker thread.

public class WorkerThread extends Thread {
	private volatile boolean canceled = false;
	public void cancelThread() {
		this.canceled = true;
	}
	@Override
	public void run() {
		while( ! canceled )
		{
			// Do Some Work		
		}		
	}
}

How does it work?

Declaring the field volatile makes changes made by one thread visible to all other threads. As a writing thread do not read the value, point b “the value is not changed during the update by the other thread” is fulfilled. Since the field is a single value point c “reading threads do not see inconsistent intermediate state” is also fulfilled.

How to test?

By using vmlens, an eclipse plugin to test multi-threaded software and to detect java race conditions, we can find fields, which should be declared volatile. After declaring the field volatile, we can check in the “order of event” view of vmlens, that the field is correctly read and written:

2) copy on write

When to use?

Use copy on write, if you want to update a graph of objects or a collection and the threads mostly read and only rarely update.

Example

The following shows the add and get Method from java.util.concurrent.CopyOnWriteArrayList

private transient volatile Object[] array;
final Object[] getArray() {
  return array;
}
public boolean add(E e) {
  final ReentrantLock lock = this.lock;
  lock.lock();
  try {
  Object[] elements = getArray();
  int len = elements.length;
  Object[] newElements = Arrays.copyOf(elements, len + 1);
  newElements[len] = e;
  setArray(newElements);
  return true;
  } finally {
  lock.unlock();
  }
}
public E get(int index) {
  return get(getArray(), index);
}

How does it work?

Again declaring the field volatile makes changes made by one thread visible to the other threads. By using a lock around the updating method, we make sure that the value is not changed during the updating process. Since we copy the data before changing it, reading threads do not see the inconsistent intermediate state.

How to test?

We can test this by using a multithreaded test and adding a wait point inside vmlens at the read of the field.

3) lock based atomic update

When to use?

Use locks, when updates and reads happen equally often. Use it until the lock becomes a bottleneck and you need the more performant solution compareAndSet as described below.

Example

The following example shows a lock based counter:

public class LockBasedCounter {
	private int i = 0;
	public synchronized void addOne()
	{
		i++;
	}
	public synchronized int get()
	{
		return i;
	}
}

How does it work?

The synchronize statements make sure that the changes made by one thread are seen by the other threads. Since only one thread can execute the methods protected by the lock at a given time, the value can not be changed during the update by another thread and the other threads can not see an intermediate inconsistent state.

How to test?

We can test this by using a multi-threaded test and adding a wait point at the updating method.

4) compare And Set based atomic update

When to use?

Use compareAndSet, when the lock in the solution described above becomes a bottleneck. Or use it, if there exists a ready-made solution, as for example the AtomicInteger as shown below.

Example

The following implements a counter based on Atomic Integer.

public class AtomicIntegerCounter {
	private final AtomicInteger i = new AtomicInteger();
	public void addOne()
	{
		i.incrementAndGet();
	}	
	public int get()
	{
		return i.get();
	}
}

The Atomic Integer uses compareAndSet Internally in the incrementAndGet method:

public final int incrementAndGet() {
  for (;;) {
  int current = get();
  int next = current + 1;
  if (compareAndSet(current, next))
  return next;
  }
  }

How does it work?

Again declaring the field volatile makes changes made by one thread visible to the other threads. You are optimistically calculating the new value and only set the calculated value when the value of the field is still the same as at the beginning of the calculation. Thereby you make sure that the value is only written if it was not changed by another thread. If your field points to a collection or graph of objects, you must create a copy before your update, similar as copy on write.

How to test?

We can test this by using a multi-threaded test and adding a wait point in vmlens at the compareAndSet method.

5) Benign Data Race

When to use?

Only use this when you can sacrifice correctness for performance.

Example

The following example shows a counter used in to switch between different implementations in the class sun.reflect.NativeMethodAccessorImpl

class NativeMethodAccessorImpl extends MethodAccessorImpl {
  private Method method;
  private DelegatingMethodAccessorImpl parent;
  private int numInvocations;
  NativeMethodAccessorImpl(Method method) {
  this.method = method;
  }
  public Object invoke(Object obj, Object[] args)
  throws IllegalArgumentException, InvocationTargetException
  {
  if (++numInvocations > ReflectionFactory.inflationThreshold()) {
  MethodAccessorImpl acc = (MethodAccessorImpl)
  new MethodAccessorGenerator().
  generateMethod(method.getDeclaringClass(),
  method.getName(),
  method.getParameterTypes(),
  method.getReturnType(),
  method.getExceptionTypes(),
  method.getModifiers());
  parent.setDelegate(acc);
  }
  return invoke0(method, obj, args);
  }
  ...
}

How does it work?

This way of updating the field does neither guarantee that changes are visible in other threads, nor that other threads are not changing the field between an update. But sometimes, as in the above example, you can live with incorrect results for higher performance.

How to test?

You can test this only with a single threaded test since multiple threads lead to non-deterministic behavior.

Conclusion

Which of the 5 ways to update a field in a thread safe way you use, depends on your performance and safety requirements. Independent of which way you use, you should test it. Read more about unit testing multi-threaded software with vmlens and concurrent-junit in a new way to junit test your multithreaded java code . If you have a question or remark please add a comment below.

Detecting Java Race Conditions With Tests Part 1

Thu, 10 Mar 2016 23:00:00 GMT

Detecting java race conditions is hard, detecting them during tests is impossible. Really?

In the following I want to show you, that it is actually rather easy. In this first part you will see how to detect lost updates, the first type of java race conditions. In the second part we will look at the second type of java race conditions: Non atomic access.

So what are visibility problems anyway?

If you read and write to the same field from different threads without synchronization, you will lose updates.

Why? In multi core computers every core has a cache. If you write to a field, another thread sees the old value if the thread runs on another core. Or if the thread runs on the same core, you will see the new value. Only if the cache was invalidated, the thread will always see the new value. Without invalidating the cache you lose updates. And invalidating the cache is what a synchronization statement or writing to a volatile field does.

O.K. so what can I do?

Simply run a multithreaded test. Afterwards check if each concurrent field access is correctly synchronized. The java memory model formally specifies which statements correctly synchronize a field access. For example the following is correctly synchronized since between the write and read is a thread start:

public class ThreadStart extends Thread {	
	private int i = 0;
	public static void main(String[] args    )
	{
		ThreadStart threadStart = new ThreadStart();
		threadStart.i = 8;	
		/*
		 * 
		 * see https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.4
		 * An action that starts a thread synchronizes-with the first action in the thread 
		 * it starts. 
		 */
		threadStart.start();
	}
	@Override
	public void run() {
	   int j =  i;
	}
}

So let’s get started with a simple example, a counter:

public class Counter {
	private int i = 0;
	public void addOne()
	{
		i++;
	}	
}

To test the counter, we use the following junit test. The concurrent test runner runs the test method from four different threads.

import org.junit.Test;
import org.junit.runner.RunWith;
import com.anarsoft.vmlens.concurrent.junit.ConcurrentTestRunner;
@RunWith(ConcurrentTestRunner.class)
public class TestCounter {
	private final Counter counter = new Counter();
	@Test
	public void testAdd()
	{
		counter.addOne();
	}
}

To trace the fields and synchronization actions, add the vmlens agent path to the virtual machine arguments. After the run, vmlens checks all field accesses. If vmlens finds a field access which is not correctly synchronized, we have detected a java race condition.

It is not a bug, it is a feature

Probably you have noticed the detected java race condition accessing the field numInvocations in the class sun.reflect.NativeMethodAccessorImpl. If we look at the source code, we see that it is used to switch to a faster implementation, if a threshold is reached.

public Object invoke(Object obj, Object[] args)
         throws IllegalArgumentException, InvocationTargetException
{
      if (++numInvocations > ReflectionFactory.inflationThreshold()) {
            MethodAccessorImpl acc = (MethodAccessorImpl)
                new MethodAccessorGenerator().
                    generateMethod(method.getDeclaringClass(),
                                   method.getName(),
                                   method.getParameterTypes(),
                                   method.getReturnType(),
                                   method.getExceptionTypes(),
                                   method.getModifiers());
            parent.setDelegate(acc);
        }
        return invoke0(method, obj, args);
 }

This is an example for a java race condition, which is not a bug but a feature. The performance loss to make the numInvocations count thread safe is worse than to lose some values.

If it is not tested it is probably broken

Since detecting java race conditions with testing is rather new, you can expect many undetected race conditions. Here is an example of race conditions during the start and stop of Jenkins, an open source continuous integration server:

So detecting lost updates with tests is possible. As test runner I used concurrent-junit, as race condition catcher I used vmlens. In the next part we will look at testing non atomic updates.

Is your java eclipse plugin thread safe?

Fri, 11 Dec 2015 23:00:00 GMT

Or does it contain race conditions?

According to wikipedia:

A race condition happens when the outcome of the program depends on the sequence or timing of other uncontrollable events. It becomes a bug when events do not happen in the order the programmer intended

We searched inside eclipse for race conditions to see what are the most common types of race conditions inside eclipse and its plugins.

The following types of race conditions were found by vmlens inside eclipse luna during startup and debugging of a java project:

No synchronization at all

The most common cause for race conditions was accessing the same field from different threads without any synchronization at all.

Object	Count
Concurrently Accessed Fields	2065
Fields Traced	27114
Monitors	7162
Locks	427
Threads	52
Volatile Fields	2836

During this run 2065 different fields were accessed by more than one thread, 4 of them without synchronization.

2836 volatile fields were used. For 3 more fields it would have been necessary to declare them as volatile. This leads to the second type of race conditions found, visibility problems.

Visibility

A field is accessed by many threads, but not declared as volatile.

The jvm does not directly write field updates into the main memory but first in registers or a cache of your cpu. As long as your program is running on one core of your pc, this is not a problem. But if the threads runs on different cores, they probably will not see the updates to the fields.

This problem appears most often with boolean flags. Like the terminated field of org.eclipse.equinox.internal.util.impl.tpt.timer.TimerImpl in the run method:

public void run() { TimerQueueNode n = null; while (!terminated) { synchronized (sync) { if (n == null && queue.isEmpty()) { try { sync.wait(); } catch (Exception e) { } // todo check if isEmpty is necessary if (queue.isEmpty() || terminated) { continue; } } }

Conclusion

For 2065 concurrently accessed fields, vmlens found 7 race conditions. All other were correctly synchronized by 7162 monitors or declared as volatile.

Synchronized java.util.HashMap vs. java.util.concurrent.ConcurrentHashMap

Sat, 14 Nov 2015 23:00:00 GMT

Using java.util.HashMap from many threads without any synchronization leads to race conditions. So we need some kind of synchronization. The easiest way is to synchronize the complete class. Another way is to use a java.util.concurrent.ConcurrentHashMap. But what does this mean to the performance of the application? As an example we use a HashMap to count the occurrence of a String:

import java.math.BigInteger; import java.security.SecureRandom; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.annotations.Scope; import org.openjdk.jmh.annotations.State; @State(Scope.Benchmark) public class MyBenchmark { private HashMapBasedCollectionmap = new HashMapBasedCollection(); private String nextSessionId(SecureRandom random) { return new BigInteger(130, random).toString(32); } public String[] buildNames(int size) { SecureRandom random = new SecureRandom(); String[] result = new String[size]; for (int i = 0; i < size; i++) { result[i] = nextSessionId(random); } return result; } @Benchmark public void testMethod() { String[] array = buildNames(40); for (int j = 0; j < 200; j++) { for (int i = 0; i < 40; i++) { map.addOne(array[i]); } } } }

import java.util.HashMap;
import java.util.function.BiFunction;
public class HashMapBasedCollection {
private HashMap<string,integer> map = new HashMap<string,integer>();
public static final BiFunction<string, integer,="" integer=""> fun
= new BiFunction<string, integer,="" integer="">()
{
@Override
public Integer apply(String t, Integer u) {
if( u == null )
{
return new Integer(0);
}
return u + 1;
}
};
/**
*
* Warning: not thread safe
*
*/
public void addOne(String key)
{
map.compute(key, fun);
}
}

As expected if we run the method “testMethod()” with two threads we see a race condition. Output from vmlens, a tool to detect race conditions: So we need to use java.util.HashMap with synchronization:

public class SynchronizedHashMapBasedCollection {
private HashMap<string,integer> map = new HashMap<string,integer>();
public synchronized void addOne(String key)
{
map.compute(key, HashMapBasedCollection.fun);
}
}

Or a java.util.concurrent.ConcurrentHashMap:

public class ConcurrentHashMapBasedCollection {
private final ConcurrentHashMap<string,integer> map = new ConcurrentHashMap<string,integer>();
public void addOne(String key)
{
map.compute(key, HashMapBasedCollection.fun );
}
}

Below you see the throughput for the two implementations for different counts of thread. That the performance of the ConcurrentHashMap for many threads is better is probably no surprise. What surprised me is that the performance for one thread is the same. This means ConcurrentHashMap is a good alternative to synchronized HashMap even for few threads. The benchmark was created using jmh. The race condition was detected by vmlens.

vmlens Blog

How to write JUnit tests for multi-threaded java code

A test for a concurrent counter

A test for a concurrent volatile counter

A test witch an atomic counter

Conclusion

A new way to unit test multi-threaded Java

Why multi-threaded?

Why Java?

An example of a unit test

How does it work?

Other tools

Debugging is free

Conclusion

The difference between ARM and x86 for Java

Reordering on ARM vs. x86

The processor memory model

Reordering on ARM and x86

Memory barriers stop reordering

The Java Memory Model

Who has reordered my program?

Further readings

Conclusion

Scalability of SynchronizedMap vs. ConcurrentHashMap vs. NonBlockingHashMap

The Concurrent hash maps

The read-only benchmark

The write-only benchmark

concurrencyLevel of ConcurrentHashMap

Conclusion

Scalability of concurrent queues from java.util.concurrent and org.jctools

The Concurrent Queues

The Benchmark

Conclusion

How I solved an OutOfMemoryError using a concurrent state machine

The problem: An OutOfMemoryError exception

The solution: A concurrent state machine

Updating multiple variables

Conclusion

A new concurrent hash map

The Algorithm

Why does it work?

Resizing

Benchmark results

Differences in behavior

Conclusion

How to define thread-safety in Java?

When is a method atomic?

Types of atomic classes

Stateless

Immutable

Using locks

Lock-free

Other meanings of atomic

When is a method quiescent?

How do those methods behave and how to use them correctly?

Summary

The Java Memory Model enables testing of multithreaded Java

What is the Java memory model?

How does the Java memory model enable the testing of multithreaded Java?

A multithreaded unit test

Performance of the multithreaded unit tests

Summary

Lambdas for concurrent maps

Read modify write race condition

Avoiding read modify write race condition with lambda expressions

Lambdas should be pure

Conclusion

Gson an example for a stateless thread-safe utility class

Stack confinement or create a new instance for every method call

Thread-safe caching using ConcurrentHashMap

Configure the instance at creation time

Conclusion

ConcurrentHashMap: Call only one method per key

Calling multiple methods

Calling the same method recursively

Conclusion

How to write thread-safe yet scalable classes?

An example

Too large means not scalable

Too small means not thread-safe