A new way to detect deadlocks during tests

October 19, 2017

In the following, I want to show you a new way to detect deadlocks during tests. A deadlock happens when two threads are trying to acquire a lock held by the other thread. To detect a deadlock you need to reach the exact time point when both threads are waiting for the other lock. Let us look at an example:

import java.util.concurrent.ConcurrentHashMap;
import java.util.function.BiFunction;
import org.junit.Test;
import org.junit.runner.RunWith;
import com.anarsoft.vmlens.concurrent.junit.ConcurrentTestRunner;
@RunWith(ConcurrentTestRunner.class)
public class TestConcurrentHashMapCompute {
	private final ConcurrentHashMap<Integer,Integer> map = 
	    new ConcurrentHashMap<Integer,Integer>();
	public TestConcurrentHashMapCompute()
	{
		map.put(1, 1);
		map.put(2, 2);	
	}
	@Test
	public void update12()
	{
		map.compute( 1 ,   			
				new BiFunction<Integer,Integer,Integer>()
				{
					public Integer apply(Integer k, Integer v) {		
						map.put( 2 , 1);
						return v;
					}
				}
				);
	}
	@Test
	public void update21()
	{
              map.compute( 2 ,   			
				new BiFunction<Integer,Integer,Integer>()
				{
					public Integer apply(Integer k, Integer v) {		
						map.put( 1 , 1);
						return v;
					}
				}
				);
	}
}

The ConcurrentTestRunner runs the JUnit test methods by 4 threads in parallel. The test succeeds at least almost all the time.

One way to see the deadlock hidden in this test is to execute the test by more threads multiple times by a machine with many cores. The other way is to trace the test execution and to analyze the lock order afterward.

We can do this by adding the vmlens agent to the VM arguments of our test. After analyzing the test run vmlens reports the following deadlock:

- deadlock: Monitor@java.util.concurrent.ConcurrentHashMap.putVal()<->Monitor@java.util.concurrent.ConcurrentHashMap.putVal()
  parent2Child:
    thread: Thread-4
    stack:
      - java.util.concurrent.ConcurrentHashMap.putVal <<Monitor@java.util.concurrent.ConcurrentHashMap.putVal()>>
      - java.util.concurrent.ConcurrentHashMap.put
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute$2.apply
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute$2.apply
      - java.util.concurrent.ConcurrentHashMap.compute <<Monitor@java.util.concurrent.ConcurrentHashMap.putVal()>>
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute.update21
  child2Parent:
    thread: Thread-1
    stack:
      - java.util.concurrent.ConcurrentHashMap.putVal <<Monitor@java.util.concurrent.ConcurrentHashMap.putVal()>>
      - java.util.concurrent.ConcurrentHashMap.put
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute$1.apply
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute$1.apply
      - java.util.concurrent.ConcurrentHashMap.compute <<Monitor@java.util.concurrent.ConcurrentHashMap.putVal()>>
      - com.anarsoft.vmlens.concurrent.example.TestConcurrentHashMapCompute.update12

Here is the putVal method. The synchronized statement leading to the deadlock is in line 18:

  final V putVal(K key, V value, boolean onlyIfAbsent) {
        if (key == null || value == null) throw new NullPointerException();
        int hash = spread(key.hashCode());
        int binCount = 0;
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
                if (casTabAt(tab, i, null,
                             new Node<K,V>(hash, key, value, null)))
                    break;                   // no lock when adding to empty bin
            }
            else if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            else {
                V oldVal = null;
                synchronized (f) {
                    // ... omitted    
                }
                if (binCount != 0) {
                    if (binCount >= TREEIFY_THRESHOLD)
                        treeifyBin(tab, i);
                    if (oldVal != null)
                        return oldVal;
                    break;
                }
            }
        }
        addCount(1L, binCount);
        return null;
    }

and the compute method. The synchronized statement leading to the deadlock is in line 19:

    public V compute(K key,
                     BiFunction<? super K, ? super V, ? extends V> remappingFunction) {
        if (key == null || remappingFunction == null)
            throw new NullPointerException();
        int h = spread(key.hashCode());
        V val = null;
        int delta = 0;
        int binCount = 0;
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
            else if ((f = tabAt(tab, i = (n - 1) & h)) == null) {
                 // ... omitted contains call to remappingFunction function   
            }
            else if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            else {
                synchronized (f) {
                    // ... omitted contains call to remappingFunction function   
                }
                if (binCount != 0) {
                    if (binCount >= TREEIFY_THRESHOLD)
                        treeifyBin(tab, i);
                    break;
                }
            }
        }
        if (delta != 0)
            addCount((long)delta, binCount);
        return val;
    }

Make your application thread safe

LEARN MORE