What fascinates me about the volatile keyword is that it is necessary because my software still runs on a silicon chip. Even if my application runs in the cloud on a virtual machine in the Java virtual machine. But despite all of those software layers abstracting away the underlying hardware, the volatile keyword is still needed because of the cache of the processor my software runs on.

The volatile keyword and the cache of modern processors

Processors cache the values from the main memory in per-core caches to improve the memory access performance. While a read from a CPU register takes approximately 300 picoseconds a read from the main memory take 50 - 100 nanosecond. By using a cache this time can be reduced to approximately one nanosecond. Numbers are taken from Computer Architecture, A Quantitative Approach, JL Hennessy, DA Patterson, 5th edition, page 72.

As pointed out by Reddit Commenters this Level 1 cache was already used in the i486 Procesor family.

Now the question is when should a core check if the cached value was modified in the cache of another core. This is done by the volatile field annotation. By declaring a field as volatile we tell the JVM that when a thread reads the volatile field we want to see the latest written value. The JVM than uses special instructions to tell the CPU that it should synchronize its caches. For the x86 processor family those instructions are called memory fences as described here.

The processor not only synchronizes the value of the volatile field but the complete cache. So if we read from a volatile field we see all writes on other cores to this variable and also the values which were written on those cores before the write to the volatile variable.

The volatile field in action

Now let us look at how this works in practice. Let us see if we read stale values when we use a field without volatile annotation:

public class Termination {
   private int v;
   public void runTest() throws InterruptedException   {
	   Thread workerThread = new Thread( () -> { 
		   while(v == 0) {
			   // spin
		   }
	   });
	   workerThread.start();
	   v = 1;
	   workerThread.join();  // test might hang up here 
   }
 public static void main(String[] args)  throws InterruptedException {
	   for(int i = 0 ; i < 1000 ; i++) {
		   new Termination().runTest();
	   }
   }	
}

When the writing thread updates the field v in one core and the reading thread reads the field v in another thread, the test should hang up and run forever. But at least when I run the test on my machine, the test never hangs up. The reason is that the test needs so few CPU cycles that both threads typically run on the same core. And when both threads run on the same core they read and write to the same cache.

Luckily the OpenJDK provides a tool, jcstress, which helps with this type of tests. jcstress uses multiple tricks that the threads of the tests run on different cores. Here the above example is rewritten as a jcstress test:

@JCStressTest(Mode.Termination)
@Outcome(id = "TERMINATED", expect = Expect.ACCEPTABLE, desc = "Gracefully finished.")
@Outcome(id = "STALE", expect = Expect.ACCEPTABLE_INTERESTING, desc = "Test hung up.")
@State
public class APISample_03_Termination {
    int v;
    @Actor
    public void actor1() {
        while (v == 0) {
            // spin
        }
    }
    @Signal
    public void signal() {
        v = 1;
    }
}

This test is from the jcstress examples. By annotating the class with the annotation @JCStressTest we tell jcstress that this class is a jcstress test. jcstress runs the methods annotated with @Actor and @Signal in a separate thread. jcstress first starts the actor thread and then runs the signal thread. If the test exits in a reasonable time, jcstress records the "TERMINATED" result, otherwise the result "STALE".

I have run this test on my development machine, once with a normal and once with a volatile field v. The test for the volatile field looked like this:

public class APISample_03_Termination {
   volatile int v;
   // methods omitted
}

jcstress runs the test case multiple times with different JVM parameters. Here are the results of this test on my development machine an Intel i5 4 core CPU using the test mode stress:

JVM options	Observed state	Occurrence non volatile	Occurrence volatile
-client	TERMINATED	10	8980294
-client	STALE	10	0
-server	TERMINATED	11	9040080
-server	STALE	10	0
-XX:TieredStopAtLevel=1	TERMINATED	8858074	9052777
-Xint	TERMINATED	8035685	8454639
-server, -XX:-TieredCompilation	TERMINATED	0	8563250
-server, -XX:-TieredCompilation	STALE	10	0
-client, -XX:-TieredCompilation	TERMINATED	3	8719757
-client, -XX:-TieredCompilation	STALE	10	0

As we see using fields without volatile annotation lead indeed to hung threads. The percentage of hung threads depends on the JVM flags and the environment, JDK version and so on. Please run this on your PC, you should see a different distribution between hung and completed runs.

When to use volatile fields

The volatile field is most often used as a flag to signal a specific condition like in the test above. Another usage of volatile fields is to use the volatile field for reading and locks for writing. Or you can use them with the JDK 9 VarHandle to achieve atomic operations. How to implement those techniques is described here.

The volatile field as an example of a happens-before relation

But typical I do not use volatile fields directly. I rather use data structures from the java.util.concurrent package for concurrent programming. Which internally use the volatile fields.

In the documentation of those classes we often read something about memory consistency effects and happens-before relation like in the following from the interface Future:

Memory consistency effects: Actions taken by the asynchronous computation happen-before actions following the corresponding Future.get() in another thread.

Now with our knowledge about the volatile field, we can decode this documentation. If we read from a volatile field we see all writes on other cores to this variable. In the words of the java.util.concurrent documentation we would say the read to a volatile variable creates a happen-before relation to the write to this variable. The term happen-before comes from the mathematical model which formalizes the effect of the volatile field. This model is described here.

So the above statement means that a Thread which calls Future.get() always sees the latest written values which were written by other Threads before calling another method of the interface Future.

Let us use the class FutureTask to transfer data between two threads as an example. FutureTask implements the interface Future so calling the method FutureTask.get() always sees the latest written value by another method, for example, FutureTask.set().

Here is a potential program flow to explain this: Thread A set variable x and y of object OA to one and calls FutureTask.set(OA). Now Thread B reads this object calling FutureTask.get() into the variable OB. To make the example more interesting Thread A now sets variable x to two. If Thread B reads variable y it surely sees value one, since the cache was synchronized between the call to FutureTask.set(OA) and FutureTask.get(). But for variable y Thread B reads one or two, depending on which cores the two Threads were running on.

In pseudo code this looks like this:

Thread A	Thread B
OA.x = 1
OA.y = 1
FutureTask.set(OA)
	OB = FutureTask.get()
OA.y = 2	OB.x == 1
	OB.y == 1 or OB.y == 2

Tools to detect missing volatile annotations

If you forget to declare a field as volatile a thread might read a stale value. But the chance to see this during tests is rather low. Since the read and the write must happen at almost the same time and on different cores to read a stale value, this happens only under heavy load and after a long run time, e.g. in production.

So it is no surprise that there exist tools to detect such a problem in test runs:

ThreadSanitizer: ThreadSanitizer can detect missing volatile annotations in C++ programs. There is a draft for a Java enhancement proposal, JEP draft: Java Thread Sanitizer to include ThreadSanitizerinto the OpenJDK JVM. This would allow us to find missing volatile annotations in the JVM and also in the by the JVM executed Java application.
vmlens: vmlens, a tool I have written to test concurrent java, can detect missing volatile annotations in Java test runs.

Conclusion

The volatile field is needed to make sure that multiple threads always see the newest value. Even when the cache system or compiler optimizations are at work. Reading from a volatile variable always returns the latest written value from this variable. The methods of most classes in the java.util.concurrent package also has this property. Often by using volatile fields internally.

testing multi-threaded applications on the JVM made easy

LEARN MORE

Why do we need the volatile keyword?