Analysing Concurrent Queues Using CSP: Examining Java’s ConcurrentLinkedQueue

Chalmers, Kevin; Pedersen, Jan Bækgaard

doi:10.3390/software4030015

Open AccessArticle

Analysing Concurrent Queues Using CSP: Examining Java’s ConcurrentLinkedQueue

by

Kevin Chalmers

^1,*,†

and

Jan Bækgaard Pedersen

^2,†

¹

School of Computing, Architecture, and Emerging Technologies, Ravensbourne University, 6 Penrose Way, London SE10 0EW, UK

²

Department of Computer Science, University of Nevada Las Vegas, 4505 S. Maryland Parkway, Las Vegas, NV 89154, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Software 2025, 4(3), 15; https://doi.org/10.3390/software4030015

Submission received: 1 May 2025 / Revised: 23 June 2025 / Accepted: 28 June 2025 / Published: 7 July 2025

Download

Browse Figures

Versions Notes

Abstract

In this paper we examine the OpenJDK library implementation of the ConcurrentLinkedQueue. We use model checking to verify that it behaves according to the algorithm it is based on: Michael and Scott’s fast and practical non-blocking concurrent queue algorithm. In addition, we develop a simple concurrent queue specification in CSP and verify that Michael and Scott’s algorithm satisfies it. We conclude that both the algorithm and the implementation are correct and both conform to our simpler concurrent queue specification, which we can use in place of either implementation in future verification tasks. The complete code is available on GitHub.

Keywords:

CSP; formal verification; wait-free data structures; concurrent queue

1. Introduction and Motivation

Library use in modern programming is ubiquitous. Java, for example, has an extensible SDK,—Java 21 includes 4387 classes. The source code underpinning these libraries is rarely scrutinised for formal correctness. We take for granted that a library shipped with an SDK will work as defined in all circumstances.

In this paper we examine a specific class in the OpenJDK: the ConcurrentLinkedQueue (We have taken the implementation from the OpenJDK source available at https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/concurrent/ConcurrentLinkedQueue.java accessed on 1 May 2025) based on work by Michael and Scott [1]. Our aim is to have a specification of a concurrent queue that can be used system modelling. The specification will reduce the state space explored in system models, thus speeding up overall model checking time.

This work can be viewed as an extension of Lowe’s [2]. In their work they also consider the Michael and Scott algorithm, although they only consider a garbage-collecter-friendly version. We want to explore whether the Java implementation operates as defined by Michael and Scott, and also develop a simplified reusable specification of such behaviour.

Our work utilises Hoare’s [3,4,5,6] Communicating Sequential Processes (CSP) to verify specification equivalence. We start by translating both the Java implementation and the algorithm from Michael and Scott [1] to CSP_M (a machine-readable form of CSP). For the Java code, we perform the following steps:

We extract the Java implementation of ConcurrentLinkedQueue from the OpenJDK.
We change the object-oriented code to structured code to simplify translation to CSP.
We decompose complex operations to further simplify translation to CSP.
Finally, we translate the code to CSP.

We perform similar steps for Michael and Scott’s algorithm. Once we have the CSP versions of both the original algorithm and the Java implementation, we proceed to show—using the FDR [7] refinement checking tool—that the Java implementation matches the behaviour of the algorithm described in [1], and vice versa.

Finally, we develop a more general specification of a concurrent queue in CSP and then show that Michael and Scott’s and OpenJDK’s ConcurrentLinkedQueue Java implementation behave according to our new specification.

Our goal is to establish behavioural equivalence between implementations and specifications under CSP semantics. This verification is bounded to a fixed number of threads, enabling tractable model checking. This aligns with other bounded verification approaches in CSP, including Lowe’s work on verification of linearizability [2] with garbage collection effects, and distinguishes our contribution from approaches taken with PAT [8].

Layout of the Paper

In Section 2 we provide a brief introduction to CSP. We also discuss concurrent and lock-free data structures, along with the Michael and Scott algorithm. Finally, we consider related work and discuss serializability and linearizability.

Section 3 describes our method for converting both the algorithm and the OpenJDK implementation into CSP. We describe how we model the memory system in CSP to support the internal queue data structure. Section 3 also presents a simplified specification of a concurrent queue. In Section 4 we examine OpenJDK’s implementation of ConcurrentLinkedQueue and consider the CSP translation of it using the method from Section 3. In Section 5 we investigate why a simple sequential queue cannot replace a more complex specification for a concurrent queue. We also prove linearizability of our concurrent queue specification. Finally, Section 6 contains a summary of the results and Section 7 concludes with a summary and directions for future work.

2. Background and Related Work

In this section we present the background and related work for this article. We begin with a brief introduction to Communicating Sequential Processes (CSP) [3,4,5,6]. We then discuss concurrent and non-blocking data structures, including Michael and Scott’s algorithm for a wait-free concurrent queue [1], as well as OpenJDK’s implementation in ConcurrentLinkedQueue. Finally, we review serializability and linearizability [9], and present related work.

2.1. Communicating Sequential Processes

Communicating Sequential Processes (CSP) [3,4,6,10] is a process algebra used for specifying concurrent systems through the use of processes and events. Processes are abstract components characterised by the events they perform.

Events in CSP are atomic, synchronous, and instantaneous. They are indivisible and require all participating processes to wait until the event occurs, and when an event occurs, it does so immediately for all involved processes. A CSP model specifies the events with which processes are willing to synchronise at various stages of their execution.

The simplest CSP processes are STOP—which performs no events and does not terminate—and SKIP—which likewise performs no events but will eventually terminate.

We introduce events into a process using the prefix operator (→). For example, the process

P = x \to

STOP engages in event x, then halts. The general form of a process definition is: Process = event → Process′. Processes can be recursive (e.g.,

P = x \to P

).

2.1.1. Choice

CSP provides several choice operators to model branching behaviour. The three most commonly used types of choice are external (or deterministic) choice, internal (or non-deterministic) choice, and prefix choice.

Given two processes P and Q, the definition

P □ Q

(external choice) represents a process that will behave as P or Q depending on the first event offered by the environment. For example, the process:

P = (a \to Q) □ (b \to R)

can accept event a and then behave as Q, or accept event b and then behave as R. The system may choose nondeterministically between the two options when both a and b are available.

An internal choice, represented by ⊓, allows a process to behave as either P or Q without considering the external environment. That is, a process

P ⊓ Q

can behave as either P or Q without external input or interaction.

Both external and internal choices can be applied to a set of events. If

E = e_{1}, \dots, e_{n}

is a set of events, these expand to:

\begin{matrix} \underset{a \in E}{□} a \to P & \equiv e_{1} \to P □ e_{2} \to P □ \dots □ e_{n} \to P \\ \underset{a \in E}{⊓} a \to P & \equiv e_{1} \to P ⊓ e_{2} \to P ⊓ \dots ⊓ e_{n} \to P \end{matrix}

With prefix choice, we can define an event and a parameter. If we define a set of events as

c . v | v \in V a l u e s

, we can interpret c as a channel willing to communicate a value v. In this case, input and output operations (? and !, respectively) represent message passing and variable assignment. This shorthand can be represented by the following identities:

c! v \to P \equiv c . v \to P

c ? x \to P \equiv \underset{x \in V a l u e s}{□} c . x \to P

where Values is a finite set.

CSP includes a functional conditional construct, where each branch must conclude with a process definition. For example,

c ? v \to (if (v = = x) then P else Q)

is semantically equivalent to

(c . x \to P) □ (\underset{v \in X - {x}}{□} c . v \to Q)

, assuming that v does not appear in P.

2.1.2. Pre-Guards

The availability of choice branches can be controlled using a boolean expression as a guard before each branch, followed by &. For example:

e_{1} & c ? x \to \dots □ e_{2} & d ? x \to \dots

Only branches with true guards are available to the process. If the boolean expression

e_{1}

is true and

e_{2}

is false, only the

c ? x

guard will be considered. Similarly, if only

e_{2}

is true, only the

d ? x

guard will be considered.

2.1.3. Process Composition

Processes can be combined through parallel and sequential composition. We denote parallel composition as

P | | Q

, where P and Q must synchronise on a shared set of events. There are two forms of parallel composition:

The generalised parallel $P \underset{A}{| |} Q$ defines two processes that synchronise on a specific set of events. In CSP_M (a machine-readable version of CSP) this is written as $P | [A] | Q$ .
The alphabetised parallel P _A ${| |}_{B} Q$ defines two processes, each restricted to their respective alphabets: P with alphabet A and Q with alphabet B. In CSP_M this is written as $P | [A | B] | Q$ .

For generalised parallel, both P and Q must offer an event in A simultaneously for it to occur. Events outside A do not synchronise P and Q. For example:

(a \to b \to P) \underset{{b}}{| |} (b \to c \to Q)

Here, a must occur first, then both processes syncrhonise on b and then c is performed. If the synchronisation set contained c, the right-hand side process would block since the left-hand side process does not offer c.

For alphabetised parallel, P and Q synchronise on their alphabets’ intersection. For example:

(a \to b \to P) {}_{{a, b}}{| |}_{{b, c}} (b \to c \to Q)

Both processes synchronise on

{b}

. a occurs first, then both processes synchronise on b, followed by c. If the left-hand side alphabet did not contain a, the system would deadlock as a could not be performed.

Processes can also execute concurrently without synchronisation, using interleaving:

P | | | Q

, where P and Q execute in parallel but do not synchronise on any shared events. We denote sequential composition as

P; Q

, meaning P must terminate before Q begins.

2.1.4. Traces and Hiding

The trace set (or traces) of a process is all observable sequences of events that a process may perform. For example,

P = a \to P

has the empty trace

〈 〉

as the shortest observable trace. If P performs a, the trace is

〈 a 〉

. Each subsequent a extends the trace. The traces of P are traces(P) =

{〈 〉, 〈 a 〉, 〈 a, a 〉, \dots}

. Similarly, for a process

Q = a \to b \to Q

, the traces are

{〈 〉, 〈 a 〉, 〈 a, b 〉, 〈 a, b, a 〉, \dots}

.

We use the hiding operator ∖ to conceal events, replacing events with the (ignored)

τ

event. For example,

(a \to b \to a \to SKIP) ∖ {a}

has traces

{〈 〉, 〈 b 〉}

. Similarly, traces

(P ∖ {a}) = {〈 〉}

, and traces

(Q ∖ {a}) = {〈 〉, 〈 b 〉, 〈 b, b 〉, \dots}

.

2.1.5. Models

There are three semantic models of behaviour to analyse CSP models: the traces model, the failures model, and the failures/divergence model.

The Traces Model

The traces model describes the externally visible behaviour of a system. If the traces of an implementation Q are a subset of those in the specification P, then Q trace refines P (written

P ⊑_{T} Q

). In a refinement test, Specification

⊑_{T}

Implementation, the Specification represents allowable behaviour.

Hiding events allows us to focus on the external behaviour of a process. An implementation may include events not present in the specification to model the system’s implementation. We hide internal events of the implementation when comparing to the specification.

The Stable Failures Model

The stable failures model [6,10] captures which events a process may refuse after executing a given trace. A stable state is one where a process cannot make internal progress (i.e., via hidden events) and must engage externally. A refusal is an event that a process cannot participate in when in a stable state.

The stable failures model overcomes limitations of trace comparison. For example,

(P = a \to P) ⊑_{T} ((P = a \to P) ⊓ STOP)

, although the right-hand side may non-deterministically refuse to accept any event. A failure is a pair

(s, X)

: s is a trace, and X is the set of refused events after the trace s.

If

P ⊑_{F} Q

then whenever Q refuses to an event, P does likewise. More formally,

P ⊑_{F} Q \Leftrightarrow f a i l u r e s (Q) \subseteq f a i l u r e s (P)

.

The Failures-Divergences Model

Divergences are potential livelock scenarios, where a process continuously performs internal events without making any externally observable progress. For example

P = a \to STOP ⊑_{F} Q = (a \to STOP) ⊓ DIV

, where

DIV

is an immediately diverging process. However, Q can refuse a and repeatedly perform

τ

, resulting in livelock. The refinement

⊑_{F D}

allows comparisons between such processes.

Formally, process P has a pair:

({failures}_{⊥} (P), divergences (P))

.

P ⊑_{F D} Q \Leftrightarrow failures (Q) \subseteq failures (P) \land divergences (Q) \subseteq divergences (P)

.

{failures}_{⊥} (P)

is defined as

failures (P) \cup (s, X) | s \in divergences (P)

; the set of traces leading to divergence are added to the set of stable failures to form the extended failures set.

Importantly, if both the specification and implementation are divergence free, we only need to establish equivalence in the stable failures model. For our verification purposes, both specification and implementation are divergence-free, so we only verify

S P E C ⊑_{F} I M P L E M E N T A T I O N

.

2.1.6. FDR

The FDR tool [11] is used to verify CSP_M models against specifications. FDR supports the three models discussed in the previous section. FDR also supports verification of deadlock-freedom, divergence-freedom, and determinism.

In the stable failures model, FDR considers a process deterministic if there is no evidence of non-determinism. A “witness” to non-determinism is a trace

t r

and an event such that the process may both accept and refuse the event after that trace [7]. In the failures-divergences model, the process must be divergence-free.

2.1.7. Modules

CSP_M supports encapsulation of definitions using a module system. A typical module has the following structure:

module ModuleName

exports

endmodule

Modules can be nested and parametrised (and instantiated). Public declarations can be accessed using the notation ModuleName::VariableName.

2.2. Concurrent and Non-Blocking Data Structures

Concurrent access to shared data structures typically requires mutual exclusion mechanisms. Locks are commonly used to ensure only one thread accesses a structure at a time. Some concurrency may be possible, but modification typically requires sequential access control.

Sequential access control creates bottlenecks as it restricts concurrency. Data structures that do not require locks are advantageous. Non-blocking data structures can be classified into three levels, from weakest to strongest:

Obstruction-Free: A thread will progress if all other threads are suspended. Unlike locks, obstruction-freedom does not guarantee progress when threads contend for access.
Lock-Free: A data structure is lock-free if at least one thread can continue execution at any time. This differs from obstruction-freedom in that it ensures that system-wide progress continues, even if individual threads stall.
Wait-Free: A data structure is wait-free if it is lock-free and ensures that every thread will make progress after a finite number of steps. Wait-freedom is the strongest guarantee.

The concurrent queue that we are considering in this paper is lock-free. Lock-freedom is often supported via the implementation of compare-and-swap operations. Compare-and-swap (CAS(

x, e x p e c t e d, n e w

)) will compare x to the

e x p e c t e d

value, and if x equals the

e x p e c t e d

value, updates x to

n e w

value. CAS returns a boolean indicating operation success.

The algorithm underpinning OpenJDK’s implementation of ConcurrentLinkedQueue is the non-blocking (lock-free) concurrent queue algorithm by Michael and Scott [1]. The algorithm uses a linked list of nodes to store values in the queue, and CAS operations to support link changes.

2.3. Serializable vs. Linearizable

Serializability, is a well-known concept from database systems. Serializability ensures that a set of parallel events result in the same outcome if some serial order had been imposed on the events. In terms of CSP, such an ordering is a trace. Consider the following where P performs a then b then c before terminating. Q performs d then b then e before terminating.

P Q

is the parallel composition of P and Q synchronising on {b}:

P = a \to b \to c \to SKIP

Q = d \to b \to e \to SKIP

P Q = P | [b] | Q

The traces of

P Q

, where we have omitted the ✓ representing termination, are

{〈 a, d, b, c, e 〉

,

〈 d, a, b, c, e 〉, 〈 a, d, b, e, c 〉, 〈 d, a, b, e, c 〉}

. a and d can happen in any order before b. Similarly, c and e can happen in any order after b.

Herlihy and Wing [9] define serializability as: “A history is serializable if it is equivalent to one in which transactions appear to execute sequentially, i.e., without interleaving.” In other words, we are guaranteed transactional isolation.

Linearizability originates from concurrent object systems. It is a stronger consistency condition than serializability, requiring that each individual operation on a shared object appears to occur atomically at some point between its invocation and response—known as the linearization point. Linearizability preserves the real-time ordering of non-overlapping operations—if one operation completes before another begins, this order must be reflected in the system’s behaviour.

Assume two processes are interacting with a shared register with an initial value of 0. We model their actions using

c a l l

and

r e t

events:

P = c a l l . w r i t e! 1 \to r e t . w r i t e \to SKIP

Q = c a l l . r e a d \to r e t . r e a d ? x \to SKIP

P Q = P | | | Q

P and Q are invoking

w r i t e

and

r e a d

concurrently. P and Q do not synchronise—each process independently calls and returns from the shared object.

Let P perform an invocation

c a l l . w r i t e . 1 \to r e t . w r i t e

. If P completes its interaction before Q begins (i.e.,

c a l l . r e a d

follows

r e t . w r i t e

in the trace), then a linearizable execution must ensure the result reflects this real-time order, and thus x would equal 1.

Let us consider a few trace examples:

Linearizable traces:
-
$〈 c a l l . w r i t e . 1, r e t . w r i t e, c a l l . r e a d, r e t . r e a d . 1 〉$ —P completes before Q begins, and Q sees the value of the register as 1.
-
$〈 c a l l . r e a d, r e t . r e a d . 0, c a l l . w r i t e . 1, r e t . w r i t e 〉$ —if Q reads before P, Q sees the value of the register as 0.
-
$〈 c a l l . w r i t e . 1, c a l l . r e a d, r e t . r e a d . 1, r e t . w r i t e 〉$ —although P has not completed, it has begun and the linearization point occurs before completion, thus Q sees the value of the register as 1.
Non-linearizable traces:
-
$〈 c a l l . w r i t e . 1, r e t . w r i t e, c a l l . r e a d, r e t . r e a d . 0 〉$ —P completes before Q begins, but Q sees the value of the register as 0. This violates linearizability, as it contradicts real-time ordering and results in an inconsistent view of memory.

Herlihy and Wing [9] define linearizability as: “A history is linearizable if it can be extended (by appending zero or more response events) to a history that is equivalent to some legal sequential history, and that preserves the real-time order of non-overlapping operations.” In other words, linearizability ensures both consistency and temporal ordering of operations on shared state.

Linearization points are important as we can consider these as events in our models. We will return to use these points later in the paper.

2.4. Related Work

Lowe [2] has explored lock-free data structures using CSP using a practical approach. Lowe explored a version safe for garbage collection [12] of Michael and Scott’s [1] original queue, which forms the basis of the Java SDK’s implementation. Lowe’s work is similar to ours in that it uses CSP to model a lock-free algorithm. However, Lowe’s work does not consider Michael and Scott’s algorithm, nor whether the garbage collected version behaved as the original Michael and Scott algorithm. We explore both of these implementations and define how to specify a general concurrent data structure in CSP. Lowe’s goal was to explore linearizability in general, whereas we are interested in understanding and specifying the behaviour of concurrent data structures for reuse. Lowe is not explicit in the usage of linearization points, whereas we present them as fundamental for specifying concurrent data structures in CSP. Lowe’s models are task focused rather than designed for general specification.

Liu et al. [8] also explore linearizability using CSP, but with the PAT tool (another CSP refinement checker). Liu et al. specifically check for linearizability. Unlike our approach, they do not model linearization points explicitly. Their work is more general than Lowe’s, exploring linearizability outside a specific implementation case.

Verification of linearizability is an ongoing area of research. O’Hearn et al. [13] use a Hindsight Lemma to verify linearizability by inferring global state. They do not require linearization points. In contrast, our work models linearization points explicitly within CSP to enable automated verification.

Derrick et al. [14] also explore linearizability verification using Input/Output Automata (IOA) to ensure correctness under crash-recovery scenarios. Their work addresses correctness under failure, which could extend our model to durable systems. In contrast, our work focuses on modelling linearizability using CSP, facilitating automated verification through model checking.

Dongol et al. [15] provide a more thorough overview of linearizability verification reviewing existing work and categorising the approaches taken.

Our work defines a Java-to-CSP translation pipeline, building on our previous work [16]. Mahmoud et al. [17] have surveyed code-to-code translations, some of which parallel our approach.

3. Method

In this section we present our approach to constructing CSP models of the implementations of OpenJDK’s ConcurrentLinkedQueue and Michael and Scott algorithm. We start by considering the transformation of algorithms into a form more amenable to CSP translation. We then describe the memory model, including support for atomic operation. Finally, we define both sequential and concurrent queue specifications for use with FDR.

3.1. OO to CSP Pipeline

Figure 1 illustrates the two pipelines used in translation of the original algorithm (the top one) as well as the OpenJDK implementation (the bottom one). Boxes in Figure 1 marked with a ★ indicates tested Java code.

The algorithm from [1] is written in pseudo-C. This can be translated relatively easily non-object-oriented Java. During the second step we focus on transforming Java code into a CSP-friendly form by replacing loops with recursion and ensuring that all if-statements include explicit else branches. Finally, we translate the resulting Java code into CSP.

For the OpenJDK implementation we begin with object-oriented Java code that must be converted to non-object-oriented Java, then to CSP-friendly Java and finally to CSP.

3.1.1. Common Code

We define two data types—Node and TestQueue. Node is defined in Michael and Scott’s work as:

value: An AtomicInteger for the value stored on the node. The value can be null.
next: An AtomicReference<Node> to the next node in the queue. The node can be null.

TestQueue also has two values:

head: An AtomicReference<Node> pointed to the head node of the queue.
tail: An AtomicReference<Node> pointed to the tail node of the queue.

Both head and tail will always reference a node, even in an empty queue.

3.1.2. Converting Michael and Scott to Structured Java

We begin by examining Michael and Scott’s [1] algorithm for enqueuing a new value onto a non-blocking queue:

node = new_node()

node→value = value

node→next.ptr = null

loop

tail = Q→Tail

next = tail.ptr→next

if tail == Q→Tail

if next.ptr == null

if CAS(&tail.ptr→next, next, node)

break

endif

else

CAS(&Q→Tail, tail, next.ptr)

endif

endloop

CAS(&Q→Tail, tail, node)

Michael and Scott [1] explain their algorithm in more detail in their paper. Here, we focus on converting this code into a structured (minimal-OO) Java implementation. Let us break the code down into separate parts.

Firstly, the algorithm initialises a new node. Converting this code into Java is straightforward. The following table shows how the original algorithm maps to structured Java.

Michael & Scott	Structured Java

node = new_node() node→value = value node→next.ptr = null	Node node = new Node(); node.value.set(value); node.next.set(null);

In Java, value and next are represented as AtomicInteger and AtomicReference, respectively. We use the set() and get methods to assign and retrieve values, respectively.

The loop in Michael and Scott’s algorithm is a while-loop in Java and if-statements are directly translated. Therefore, we only need concern ourselves with the CAS calls. In Java, these are replaced with compareAndSet() method calls on the atomic types.

With these steps in place, we can convert Michael and Scott’s algorithm to structured Java. We present the enqueue method in Table 1. dequeue is likewise straightforward.

3.1.3. Converting OpenJDK Code to Structured Java

The OpenJDK implementation of ConcurrentLinkedQueue (https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/concurrent/ConcurrentLinkedQueue.java accessed on 1 May 2025) is written in an object-oriented style, which is incompatible with CSP’s process-based abstraction. Therefore, we convert it to a structured code format.

Specifically, we refactor instance methods into static ones that operate on a TestQueue object passes as an argument rather than using member variables. We reimplement offer() as enqueue() and poll as dequeue(). There are two challenges in the OpenJDK source code:

Ternary Operators: In the offer() method, OpenJDK uses ternary operators to update values. For example, p = (t != (t = tail)) ? t : head;. CSP does not support such operations, so we rewrite them as explicit if-else structures to simplify CSP translation.
Use of VarHandle (used for performance and low-level memory operations): Rather than use Java’s atomic types directly, OpenJDK uses VarHandles. A VarHandle has atomic operators (e.g., compareAndSet). We replace VarHandles with standard AtomicReference and AtomicInteger types for simpler conversion.

With these, we can produce equivalent structured version of the OpenJDK ConcurrentLinkedQueue implementation.

3.1.4. Converting Structured Java to CSP Friendly Java

Our next step is converting structured Java code into a form suitable for CSP_M translation. CSP_M has a functional syntax, lacks global variables, and cannot chain calls together. To convert the structured Java to a more CSP-friendly version we:

Remove method chaining.
Store return values before use.
Expand conditionals fully.
Move final operations outside loops.
Identify and flag loop continuation points.

Method chaining is a standard method in object-orientation. For example, our structured Java code will contain lines such as:

node.next.set(null);

In CSP_M, we have no method chaining, and therefore we must explicitly assign intermediate results to temporary variables. For example, we expand the code above into:

tmp_node = node.next;

tmp_node.set(null);

Similarly, we cannot use return values from method calls directly and must store them. compareAndSet() returns a success flag. In structured Java, we assign this result to a variable.

if (tail.next.compareAndSet(next, node))

which we convert to:

tmp_node = tail.next;

boolean succ = tmp_node.compareAndSet(next, node);

if (succ)

For conditionals, CSP_M requires both branches to be defined. We therefore add all else branches. Often, when the conditional branch does nothing, the behaviour is to start the next iteration of the operation loop. We flag this using continue which makes it easier to see when converting loops to recursion for CSP_M.

Finally, we take any final operation after the main loop and move it into the necessary branches of the conditionals within the loop. Again, this is to make it easier to undertake the final conversion.

As an example of these final steps, let us examine the loop for enqueueing a value provided by Michael and Scott. In our structured Java, this is defined as:

while (true) {

tail = Q.tail.get();

next = tail.next.get();

if (tail == Q.tail.get()) {

if (next == null) {

if (tail.next.compareAndSet(next, node))

break;

}

else {

Q.tail.compareAndSet(tail, next);

}

Q.tail.compareAndSet(tail, node);

The CSP-friendly version of the loop expands intermediate steps and ensures fully structured conditionals:

while (true){

tail = Q.tail.get();

tmp_node = tail.next;

next = tmp_node.get();

Node tmp = Q.tail.get();

if (tail == tmp) {

if (next == null) {

tmp_node = tail.next;

boolean succ = tmp_node.compareAndSet(next, node);

if (succ) {

Q.tail.compareAndSet(tail, node);

break;

}

else {

continue;

}

else {

Q.tail.compareAndSet(tail, next);

continue;

}

else {

continue;

}

3.2. Implementing Shared Memory Programs in CSP_M

With CSP-friendly Java code, we can complete the transformation to CSP_M. There are three areas of consideration for our work: how to model global state in CSP, how to model atomic operations in CSP, and how to convert Java code structures to CSP_M.

3.2.1. Modelling Global State in CSP

CSP provides a global event space but does not maintain a global data space. Therefore, we must model global variables via global events, using processes to manage state values. A state variable is a process that maintains the current variable value with events to load the current value or store a new value. We define a process VARIABLE as follows:

VARIABLE(myLoad, myStore, val) =

(myLoad!val → VARIABLE(myLoad, myStore, val))

□

(myStore?newVal → VARIABLE(myLoad, myStore, newVal))

myLoad communicates the value to the environment, myStore accepts new values, and val is the current value.

3.2.2. Modelling Atomic Variables in CSP

The load and store operations are atomic by design. However, the queue implementations also use CAS (compare-and-swap). As CSP channels can perform input and output operations in a single step, we can naturally model CAS semantics. We extend VARIABLE to include CAS:

ATOMIC_VARIABLE(get, set, cas, val) =

get!val → ATOMIC_VARIABLE(get, set, cas, val)

□

set?val → ATOMIC_VARIABLE(get, set, cas, val)

□

cas?expected?newVal!(expected == val) →

if (expected == val) then

ATOMIC_VARIABLE(get, set, cas, newVal)

else

ATOMIC_VARIABLE(get, set, cas, val)

get communicates the value to the environment, set accepts new values, and cas accepts a value to compare with, a new value to set if the comparison is successful, and the success. val is the current value of the variable.

3.2.3. Converting CSP Friendly Java to CSP_M

We model a CSP system using a set of processes. We define each process by a set of events that can be performed. Events can carry types, and can thereby allow input and output parameters. The CSP_M language allows us to define processes that can communicate with each other via these events.

We define object-oriented systems in terms of classes and methods. In CSP_M, we can define the operation of a method as a process, and the events of the process as the methods invocations. Method parameters are represented as the input and output values of the corresponding invocation events.

Objects (and data structures) are passive, whereas processes are active. In CSP_M, we can model passive objects as processes that communicate values with the environment. Threads that use these objects are modelled as separate processes that invoke events on them. This allows us to model the behaviour of the system in a more structured way.

For example, we can define a queue as a pointer to its head and tail. In Michael and Scott’s algorithm, the queue structure is specified as:

structure pointer_t {ptr: pointer to node_t}

structure node_t {value: datatype, next: pointer_t}

structure queue_t {Head: pointer_t, Tail: pointer_t}

Although we could define types in CSP_M to represent these structures, we would still require processes to manage the values. As such, we define a process QUEUE that represents the head and tail as two separate ATOMIC_VARIABLEs.

HEAD_PTR =

ATOMIC_VARIABLE(queue.HEAD.load, queue.HEAD.store, queue_cas.HEAD, NODE.1)

TAIL_PTR =

ATOMIC_VARIABLE(queue.TAIL.load, queue.TAIL.store, queue_cas.TAIL, NODE.1)

QUEUE_OBJ = HEAD_PTR

| | |

TAIL_PTR

We define the queue and queue_cas events as follows:

channel queue : QUEUE.AccessOperations.Nodes_Not_Null

channel queue_cas : QUEUE.Nodes_Not_Null.Nodes_Not_Null.Bool

where we define QUEUE as an enumerated datatype with values HEAD and TAIL. AccessOperations is the enumerated datatype load and store, and the value types are defined accordingly.

Nodes_Not_Null represents all the node IDs in the system not including null (0). Because CSP_M does not support passive data creation, we must define a separate process for each node used in the system. Each node is assigned a unique identifier (e.g., NODE.1), similar to a memory address.

All such data values require such definition and will execute during system operation. For example, the full set of NODEs we define as:

NODE_OBJS =

| | |

id : NODES_NOT_NULL • (

VARIABLE(next.load.NODE.id, next.store.NODE.id, ANODE.NULL)

| | |

VARIABLE(value.load.NODE.id, value.store.NODE.id, AINT.NULL))

Each property of the node is defined as a separate process. ANODE and AINT represent the atomic variables for the node references and integer, respectively.

To make the algorithm readable, we introduce aliases for accessing the queue.

getHead = queue.HEAD.load

setHead = queue.HEAD.store

casHead = queue_cas.HEAD

getTail = queue.TAIL.load

setTail = queue.TAIL.store

casTail = queue_cas.TAIL

We want to observe method call events to test correct operation. We define the enqueue and dequeue events as follows:

channel dequeue, end_enqueue : Processes

channel enqueue : Processes.Integers_Not_Null

channel return : Processes.Integers

This creates begin and end operation pairs (i.e.,

e n q u e u e \to e n d_e n q u e u e

and

d e q u e u e \to r e t u r n

) that we can compare to our specification.

Our final change is to replace loops with recursion. This is a fairly straightforward process, using continue to indicate when to recurse. For example, we replace the enqueue loop in Michael and Scott’s algorithm with the following recursive CSP_M definition:

ENQUEUE’(node, tl, nxt) =	while true
getTail?tl→	tail = Q→tail
getNext.tl?tmp_node→	next = tail.ptr→next
getNode.tmp_node?nxt→
getTail?tmp→
`if` (tl == tmp) `then` (	if tail == Q→tail {
`if` (nxt == nullNode) `then` (	if next.ptr == null {
getNext.tl?tmp_node→
casNode.tmp_node!nxt!node?succ→
`if` (succ) `then` (	if CAS(&tail.ptr→next, next, node) {
casTail!tl!node?succ→	CAS(&Q→tail, tail, node)
`SKIP`	break
)	}
`else`	else
ENQUEUE’(node, tl, nxt)	continue
)	}
`else` (	else {
casTail!tl!nxt?succ→	CAS(Q→tail, tail, next.ptr)
ENQUEUE’(node, tl, nxt)	continue
)	}
)	}
`else`	else
ENQUEUE’(node, tl, nxt)	continue

SKIP denotes the successful termination of the recursive call at the return point of the algorithm. We replace continue points with recursion calls.

3.3. Queue Specifications

We require specifications to determine whether both queue implementations behave as expected. We begin with a simple specification for a sequential queue, then introduce a concurrent queue specification based on linearization points.

3.3.1. A Sequential Queue Specification

We model a sequential queue in CSP via a choice between accepting enqueue events (if the queue is not full we set an upper limit queue length as CSP cannot handle unlimited data structures) and dequeue. If the queue is empty, a dequeue operation returns null.

QUEUE_SPEC(q) =

length(q) ≤ MAX_QUEUE_LENGTH &

enqueue?proc?v→end_enqueue.proc→QUEUE_SPEC(q

^< v >

)

□

dequeue?proc→

if (length(q) == 0) then

return.proc!mem::INT.mem::NULL→QUEUE_SPEC(q)

else

return.proc!(head(q)) →QUEUE_SPEC(tail(q))

Each user process simulates a thread that interacts with the queue. There may be multiple threads, increasing the number of users to simulate additional threads interacting with the queue. We define the thread process as USER:

USER(id, count) =

count

> 0

& enqueue.id?value → end_enqueue.id → USER(id, count − 1)

□

dequeue.id → return.id?value → USER(id, count)

Each USER will enqueue a set number of values (the initial value of count) or choose to dequeue a value. We wrap the sequential queue definition in a module SeqQueue, providing a SPEC process to act as the specification based on the number of user threads:

module SeqQueue

USER(id, count) = …–from above

QUEUE_SPEC(q) = …–from above

exports

SPEC(users) =

| | |

id : users • USER(id, MAX_QUEUE_LENGTH / card(users))

| [α

INTERACTION

] |

QUEUE_SPEC(

< >

)

endmodule

where

α

INTERACTION is the synchronisation set used between the processes.

3.3.2. A Concurrent Queue Specification

A queue can enqueue elements, or dequeue elements. However, to properly capture concurrent behaviour, we must allow the following scenario. In a system with two processes trying to enqueue a value where the enqueue event takes a process identifier and a value (e.g., enqueue.0.1—process 0 wants to enqueue 1), we must consider happens with two (concurrent) events:

enqueue.0.1

| | |

enqueue.1.2

Process 0 wants to enqueue 1 and process 1 wants to enqueue 2. If the enqueue event is the point of commitment, the trace enqueue.0.1, enqueue.1.2 must result in [1, 2] and the trace enqueue.1.2, enqueue.0.1 must result in [2, 1]. However, this outcome is not guaranteed. It should be possible for the trace enqueue.0.1, enqueue.1.2 to result in the queue [2, 1]. In order to allow such behaviour, we expand enqueue to consist of three events:

enqueue.proc.value—the start of an enqueue operation.
lin_enqueue.proc.value—the linearization point at which the value is logically committed to the queue.
end_enqueue.proc—the end of an enqueuing operation.

For the dequeue operations, we define the events dequeue.proc, lin_dequeue.proc.value, and return.proc.value. Note, there is always a sequential ordering within the three events of an enqueue and a dequeue. We can now observe the following trace:

〈 enqueue .

0.1

, e n q u e u e .

1.2

, l i n_e n q u e u e .

1.2

, e n d_e n q u e u e .

1

, l i n_e n q u e u e .

0.1

, e n d_e n q u e u e .

0〉

This results in a queue state such as [

2, 1

].

Figure 2 illustrates this point. In this figure, S denotes the start, C the commit (linearization point), and E the end of the enqueue operation. We have four different scenarios that a second process (

P_{2}

) could be in relative to the first (

P_{1}

). Similar issues arise with dequeue as well.

In the first and third example of

P_{2}

in Figure 2,

P_{2}

starts after

P_{1}

but commits its value before

P_{1}

. These commit points are linearization points.

A generic queue can now be specified as an external choice between these six events. The only two issues we have to handle are:

What happens if the queue is full? While a linked list implementation would rarely reach capacity in practice, for verification purposes we impose an upper limit (MAX_QUEUE_LENGTH) to manage state space size.
What happens if the queue is empty on dequeue? In this case, the operation returns null.

We define the CSP specification of a concurrent queue as:

QUEUE_SPEC(q) =

enqueue?_?_ → QUEUE_SPEC(q)

□

length(q) ≤ MAX_QUEUE_LENGTH & lin_enqueue?_?v → QUEUE_SPEC(

q^< v >

)

□

end_enqueue?_ → QUEUE_SPEC(q)

□

dequeue?_ → QUEUE_SPEC(q)

□

(if (length(q) == 0) then

lin_dequeue?_!mem::INT.mem::NULL → QUEUE_SPEC(q)

else

lin_dequeue?_!head(q) →QUEUE_SPEC(tail(q)))

□

return?_?_ → QUEUE_SPEC(q)

Note the

?_?_

in enqueue and other events. The first _ represents the process identifier and the second the value. _ means “accept any value and discard it”. The queue does not depend on the process identifier; only the values at the linearization points affect its state. We cannot remove the process identifier and value as they are required for specification checking.

The USER process is updated to include the linearization events between enqueue/end_enqueue and dequeue/return.

USER(id, count) =

count > 0 & enqueue.id?val → lin_enqueue.id.val → end_enqueue.id →

USER(id, count − 1)

□

dequeue.id → lin_dequeue.id?value → return.id.value → USER(id, count)

We also create a ConcQueue module, analogous to SeqQueue, to simplify usage. The code is available online. Note, the two channels for linearization exist in the module as they are not required for communication with the external environment.

4. Examining OpenJDK’s Implementation of a Concurrent Queue

ConcurrentLinkedQueue [18] is a parametrised class (in the java.util.concurrent package) extending the parametrised class AbstractQueue, which extends the parametrised class AbstractCollection. It implements a thread-safe, lock-free, and non-blocking linked-list queue. The documentation cites [1] as the basis for its implementation. We are primarily interested in inserting and removing elements from the queue, thus we only consider the following methods:

add/offer—Inserts an element into queue.
poll—Retrieves and removes an element from the queue, or returns null if the queue is empty.

Given access to both the implementation and a specification derived from [1], we follow this approach:

Translate the OpenJDK Java implementation to CSP (this is the implementation).
Use the CSP model of Michael and Scott’s algorithm (this is the specification).
Verify that the OpenJDK implementation behaves in accordance with the specification.

4.1. The OpenJDK/Java Version in CSP

As we detailed the process and resulting CSP code for the Michael and Scott algorithm in Section 3.1, we omit the corresponding CSP for the translated OpenJDK version for brevity.

4.2. Results

We have translated both the OpenJDK implementation and the Michael and Scott algorithm into CSP. We can now check that the two implementations behave like each other.

Firstly, we ensure MSYSTEM and JSYSTEM are deadlock- and divergence-free. Deadlock freedom is essential, and divergence freedom means we only need to perform specification checking in the failures model. Both systems are indeed deadlock- and divergence-free up to three users (we limit to three users as the state space is too large for specification checking with four users).

We can now perform the check that the OpenJDK version behaves as the version proposed in [1] (and vice versa). The checks are:

MSYSTEM(X)

⊑_{F}

JSYSTEM(X)

⊑_{F}

MSYSTEM(X)

where

X = {P 1}, {P 1, P 2}, {P 1, P 2, P 3}

. The refinement checks pass in both directions for both the traces and the stable failures models.

We have now shown that the OpenJDK implementation indeed behaves as Michael and Scott’s algorithm. Although Michael and Scott argue for the correctness of their algorithm, no formal proof exists that algorithm—and, by extension, the OpenJDK implementation—behaves as a concurrent queue.

We know that MSYSTEM and JSYSTEM behave like each other, but to ensure correctness we must compare them to ConcQueue. ConcQueue is both deadlock- and livelock-free, allowing us to verify that the Michael and Scott algorithm refines ConcQueue in the failures model.

We begin by checking whether MSYSTEM and JSYSTEM behave like a sequential queue. We can check if MSYSTEM behaves as a sequential queue with just a single process:

SeqQueue::SPEC({

P 1

})

⊑_{F}

MSYSTEM({

P 1

})

MSYSTEM({

P 1

})

⊑_{F}

SeqQueue::SPEC({

P 1

})

However, for more than one user process this check fails in the stable failures model. Instead, we get a single trace-refinement, namely:

MSYSTEM({

P 1, P 2

})

⊑_{T}

SeqQueue::SPEC({

P 1, P 2

})

This check is treating MSYSTEM as a specification of SeqQueue, and SeqQueue (as an implementation) only has traces that MSYSTEM also provides. In other words, the SeqQueue behaviour is still possible within MQUEUE (all traces are present). However, MSYSTEM can now perform multiple (two) enqueue or dequeue operations at a time. Consequently, MSYSTEM has traces—and corresponding acceptances—that SeqQueue does not support.

When using ConcQueue as the specification, the refinement holds in both directions under the stable failures model:

ConcQueue::SPEC(X)

⊑_{F}

MSYSTEM(X)

⊑_{F}

ConcQueue::SPEC(X)

where

X = {P 1, P 2}, {P 1, P 2, P 3},

and

{P 1, P 2, P 3, P 4}

. Thus, not only does MSYSTEM only behave as defined by ConcQueue, but ConcQueue can only behave as MSYSTEM.

Since MSYSTEM and JSYSTEM are failure-divergence refinements of each other, naturally, JSYSTEM and ConcQueue also failure-divergence refines each other.

4.3. Code Refactoring

We have proved that ConcQueue has the same behaviour as both JSYSTEM and MSYSTEM and vice verse. Since ConcQueue is a smaller and simpler implementation of a concurrent queue, any future model checking or formal verification involving either the Michael/Scott version (MSYSTEM) or the OpenJDK version (JSYSTEM) can now simply swap either with the much simpler and smaller ConcQueue, thereby reducing the state space required for verification.

Because our ConcQueue is modular and parameterised, it can be imported into new CSP models involving concurrent components without re-verifying underlying queue behaviours—a common bottleneck in formal modelling.

5. Not Sequential but Linearizable

We have shown that both the OpenJDK implementation and Michael and Scott’s algorithm conform to the concurrent queue specification. However, several interesting questions remain. The first question is whether the OpenJDK and the paper versions also behave like a regular sequential queue?

We presented a sequential queue in Section 3.3.1, and evaluated whether MSYSTEM and JSYSTEM behave like a sequential queue. Unsurprisingly, with a single user process, both MSYSTEM and JSYSTEM do, but when more than one user process is present, additional behaviours become possible. This is because a single process cannot overlap operations, whereas concurrency allows interleaving among multiple processes.

The second question we can answer is the concurrent queue is linearizable. The definition of linearizable is that the trace behaviour of the concurrent queue can be transformed into that of a sequential queue. In other words, the begin and end events of each concurrent operation can be repositioned to align with their respective linearization points. This converts the trace into a linearization.

Linearizability requires that each operation appears to commit instantaneously at some point during its execution. We can describe a linearizable queue for which any history of concurrent operations has a corresponding linearization that is consistent with some correct sequential execution.

Linearization points are when operations commit. By focusing solely on the linearization points and ignoring the begin and end events, we isolate the moments when the queue state is modified. With this knowledge, we can create a new specification SimpleQueue. We introduce the simple_enqueue and simple_dequeue events to represent enqueue and dequeue operations, respectively.

module SimpleQueue

SIMPLE_QUEUE(q) =

length(q)

< =

MAX_QUEUE_LENGTH & simple_enqueue?proc?v→

SIMPLE_QUEUE(q

^< v >

)

□

length(q) == 0 & simple_dequeue?proc!mem::INT.mem::NULL →

SIMPLE_QUEUE(q)

□

length(q)

> 0

& simple_dequeue?proc!head(q) → SIMPLE_QUEUE(tail(q))

USER(id, count) =

count

> 0

& simple_enqueue.id?val→USER(id, count − 1)

□

simple_dequeue.id?value→USER(id, count)

exports

SPEC(users) =

| | |

id : users • USER(id, MAX_QUEUE_LENGTH / card(users))

|[ {| simple_enqueue, simple_dequeue |} ]|

SIMPLE_QUEUE(

< >

)

endmodule

Thus, SimpleQueue models changes to the queue as atomic operations without begin and end events. We consider a trace from SimpleQueue to be a linearization that we can verify against. We do this by assuming that the begin and end events occur as part of the simple events. That is, we have reordered our begin and end events into a sequential trace by only observing the commit or linearization points. For example, an event

s i m p l e_e n q u e u e

can be considered the trace

〈 b e g i n_e n q u e u e, s i m p l e_e n q u e u, e n d_e n q u e u e 〉

.

We can now use SimpleQueue to verify whether ConcQueue exhibits only linearizable behaviours. As we have demonstrated that ConcQueue specifies both queue behaviours, showing that ConcQueue produces only traces allowed by SimpleQueue confirms that both queue implementations are linearizable. We must modify ConcQueue so that linearization events are visible rather than hidden, and begin and end events are hidden rather than visible.

To support this, we define a modified process in ConcQueue where the enqueue, end_enqueue, dequeue, and return events have been hidden and the lin_enqueue event renamed to simple_enqueue and lin_dequeue renamed to simple_dequeue. This process is defined as:

SPEC’(users) =

(

| | |

id : users • USER(id, MAX_QUEUE_LENGTH / card(users))

|[ {| enqueue, lin_enqueue, end_enqueue, dequeue, lin_dequeue, return |} ]|

QUEUE_SPEC(

< >

)

)∖ {| enqueue, end_enqueue, dequeue, return |}

[[ lin_enqueue ← simple_enqueue, lin_dequeue ← simple_dequeue]]

We can now perform the following two successful checks:

SimpleQueue::SPEC(X)

⊑_{T}

ConcQueue::SPEC’(X)

⊑_{F}

SimpleQueue::SPEC(X)

where

X = {P 1}, {P 1, P 2}, {P 1, P 2, P 3}

.

Note, the second assertion passes in the stable failures model, whereas the first only passes in the traces model. This is expected. The stable failures model considers stable states—ones where there are no internal operations possible. We have hidden the begin events in ConcQueue, and a user committing to start an operation is a steady state. SimpleQueue has no such internal commitment to a stable state. However, for the purpose of demonstrating linearizability, trace refinement is sufficient and failures need not be considered, which our verification confirms.

6. Overall Results

This section summarises the results presented in Section 4.2 and Section 5. For a number of processes, N, we have shown the following results:

(1)	SeqQueue	$⊑_{F D}$	MSYSTEM			(for N = 1), and vice versa
(2)	SeqQueue	$⊑_{F D}$	JSYSTEM			(for N = 1), and vice versa
(3)	MSYSTEM	$⊑_{T}$	SeqQueue			(for 2 ≤ N ≤ 4 )
(4)	JSYSTEM	$⊑_{T}$	SeqQueue			(for 2 ≤ N ≤ 4 )
(5)	MSYSTEM	$⊑_{F D}$	JSYSTEM			(for 1 ≤ N ≤ 3), and vice versa
(6)	ConcQueue	$⊑_{F D}$	MSYSTEM			(for 1 ≤ N ≤ 4), and vice versa
(7)	ConcQueue	$⊑_{F D}$	MSYSTEM	$⊑_{F D}$	JSYSTEM	(for 1 ≤ N ≤ 3), and vice versa
(8)	SimpleQueue	$⊑_{T}$	ConcQueue			(for 1 ≤ N ≤ 4), linearizable

Both the sequential (SeqQueue) and the concurrent (ConcQueue) queue specifications are deadlock- and divergence-free in the failures-divergence model with one, two, and three users. This means that checking assertions in the stable failures model is sufficient to conclude that the assertion will hold in the failures/divergence model (See Section 2.1.5). Note also that a single user process can both enqueue and dequeue using external choice, thus operating as if the system had a dedicated enqueue and a dedicated dequeue process.

Furthermore, the queue implementations of [1] (MSYSTEM) and the OpenJDK implementation (JSYSTEM) are also divergence- and deadlock-free for one, two, and three users.

The core objectives of this paper were two-fold:

Demonstrate that the algorithm of Michael and Scott [1] behaves as a concurrent queue.
Demonstrate that the OpenJDK implementation of ConcurrentLinkedQueue behaves in accordance with Michael and Scott’s algorithm, and thereby conforms to the concurrent queue specification.

FDR proved that MSYSTEM failure refines JSYSTEM and vice-versa for one, two, and three users (See Section 4.2). Also in that same section, we establish a relationship between MSYSTEM/JSYSTEM and the ConcQueue specification. We achieved this by checking refinement in the stable failures model between MSYSTEM and ConcQueue. Again, FDR proved that refinement holds in both directions.

Because MSYSTEM and JSYSTEM failure refines each other in the stable-failures model, we can conclude that JSYSTEM also failure-refines SeqQueue and vice-versa.

The penultimate family of checks we performed were comparisons to the generic sequential and concurrent queue specifications. We posed several refinement-related questions concerning SeqQueue and ConcQueue.

Does the concurrent queue specification refine the sequential queue specification? For a single user, the answer is “yes, in the stable-failures model.” For more users, the answer is “no, not even in the traces model.”
Does the sequential queue specification refine the concurrent queue specification? For a single user, the answer is again “yes, in the stable-failures model.” For more users, the answer is “yes, but only in the traces model.” This is because the sequential queue has a different set of failures compared to the concurrent queue; This is because a sequential queue allows at most one operation at a time, whereas the concurrent queue may support multiple operations simultaneously.

And finally, the last family of check we performed were comparisons of the ConcQueue (with enqueue, end_enqueue, dequeue, and return events hidden, and the linearization events visible and renamed) to a simple queue (SimpleQueue). See Section 5.

Our results provide three clear outcomes:

The first two results show that the concurrent queue implementations behave like a sequential queue when only one user thread is using them. The third and fourth result demonstrate that the concurrent queues are not sequential with more than one thread, but the concurrent queues still contain the complete behaviour of the sequential queue.
The fifth result demonstrate that the OpenJDK implementation of a concurrent queue behaves as Michael and Scott’s original algorithm (and vice-versa). The sixth result demonstrates that Michael and Scott’s algorithm also behaves as our ConcQueue specification, leading to the seventh result through the transitive relationship of CSP refinement. This meets the key aim of our work—development of a specification of a concurrent queue that can be used in other models.
The final result demonstrates that the concurrent queue specification—and therefore the implementations—are linearizable, insofar that they behave as a queue that atomically allows adding and removing items. We only require trace refinement in this regard.

7. Conclusions and Future Work

7.1. Conclusions

In this paper we investigated the Michael and Scott algorithm for a wait-free concurrent queue found in [1] as well as the OpenJDK’s implementation of the ConcurrentLinkedQueue. We have demonstrated that the OpenJDK implementation conforms to the Michael and Scott algorithm, and vice versa, under the stable failures model of CSP. Furthermore, we have shown that both implementations also behave as a generic concurrent queue, and that they are linearizable. Formally, we have established the following refinement relation:

ConcQueue

⊑_{F D}

MSYSTEM

⊑_{F D}

JSYSTEM

This shows that the OpenJDK implementation conforms to the behaviour specified by Michael and Scott’s algorithm, which in turn satisfies the concurrent queue specification.

With such a result, we can be confident in using our

C o n c Q u e u e

specification in other models where we require a concurrent queue as part of the implementation. This substitution significantly reduces the state space required for subsequent model checking.

Contributions

Our work advances existing CSP-based approaches for verifying linearizability by producing a reusable, formally verified concurrent queue model using CSP. We have produced a specification that is simple to plug into other models, enabling more efficient state space exploration during verification. Additionally, we have shown that our specification is linearizable by explicitly modelling linearization points that define concurrent behaviour.

Our approach is also generalisable. By modelling specifications with linearization points, and a number of users to simulate concurrent access, we can construct other concurrent data structure specifications with relative ease. For example, a stack will be similar to a queue but with pushing and popping to the head of the sequence:

STACK_SPEC(s) =

⋮

length(s) <= MAX & lin_push?proc?v → STACK_SPEC(

< v >^s

)

□

if length(s) == 0 then

lin_pop?proc!nullInt → STACK_SPEC(s)

else

lin_pop?proc!head(s) → STACK_SPEC(tail(s))

⋮

We have demonstrated both theoretical value (the formal specification model) and practical value (application to OpenJDK), including establishing the behavioural equivalence of OpenJDK to the original work of Michael and Scott [1] through full failures/divergences refinement checking.

Furthermore, we have provided an outline for converting existing object-oriented Java code into CSP_M. We hope that the outline and techniques will enable others to accurately model concurrent Java applications using FDR.

7.2. Limitations

While the results demonstrate the effectiveness of our approach, several limitations should be acknowledged. Due to exponential growth of the state space with the number of processes, our verification scope is limited:

We have only checked MSYSTEM $⊑_{F}$ JSYSTEM to three user processes. However, this is in line with other works using CSP for verification (e.g., Lowe [2]) of such complex models. Three user processes still provides a rich set of interactions with the queue.
We have only checked ConcQueue $⊑_{F}$ MSYSTEM for four user processes. Likewise, this provides a rich set of comparative behaviours to check against.
Our models do not include node recycling, and thus only support a fixed number of enqueue operations. Lowe [2] implemented a garbage collection system that we could have emulated (at the cost of increased state space complexity), but our work has focused on specification development. The queue allows multiple enqueues from different threads to occur, and unlimited dequeues. Again, this provides a rich set of behaviour transactions to explore for model checking.

Such limitations are standard in model checking of concurrent structures due to exponential state space growth as the number of threads increases.

7.3. Future Work

Our work has several directions of future research:

Model Scaling via Composition Techniques. While we have verified up to four concurrent users, the state space grows exponentially. Future work could explore compositional verification or data independence techniques to verify a greater number of users, following work by Roscoe and others.
Node Recycling and Garbage Collection Semantics. Our models assume a fixed number of enqueues and do not recycle nodes. Incorporating garbage collection mechanisms (e.g., as Lowe [2]) and node deletion schemes would enable a more accurate modelling of real-world behaviour. The limitation is in the size of models that would be created.
Application to Other Lock-Free Structures. The methodology and reusable specification style developed here could be applied to stacks, dequeues, priority queues, and concurrent maps. These structures present unique challenges in terms of linearization point placement and memory consistency.
Exploration of Other Memory Semantics. We have assumed that atomic operations occur in the order they invoked (they are sequentially consistent). In modern memory systems, caching allows alternative memory models (e.g., relaxed vs. sequentially consistent). Building a richer set of atomic memory components would allow exploration of these different semantics.
Tool Integration and Usability. While our CSP_M models are available online, integrating our workflow into tools could facilitate broader adoption of formal verification for Java (or other language) code. Semi-automated translation from structured Java to CSP_M could support both teaching or broader industry adoption.
Durable Linearizability. Future work could also investigate durable linearizability—i.e., ensuring consistency after a failure—building on the work of Derrick et al. [14], which is particularly relevant in systems with persistent memory.

Author Contributions

Conceptualization, K.C. and J.B.P.; Methodology, K.C. and J.B.P.; Software, K.C. and J.B.P.; Validation, K.C. and J.B.P.; Formal analysis, K.C. and J.B.P.; Investigation, K.C. and J.B.P.; Writing—original draft, K.C. and J.B.P.; Writing—review & editing, K.C. and J.B.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All the code referenced in this paper can be found on GitHub at https://github.com/mattunlv/Concurrent-Queue (accessed on 1 May 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Michael, M.M.; Scott, M.L. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing (PODC ’96), New York, NY, USA, 23–26 May 1996; pp. 267–275. [Google Scholar] [CrossRef]
Lowe, G. Analysing Lock-Free Linearizable Datatypes Using CSP. In Concurrency, Security and Puzzles: Essays Dedicated to Andrew William Roscoe on the Occasion of His 60th Birthday; LNCS; Gibson-Robinson, T., Hopcroft, P., Lazić, R., Eds.; Springer: Cham, Switzerland, 2017; Volume 10160. [Google Scholar]
Hoare, C.A.R. Communicating Sequential Processes. Commun. ACM 1978, 21, 666–677. [Google Scholar] [CrossRef]
Hoare, C.A.R. Communicating Sequential Processes; Prentice-Hall: Hoboken, NJ, USA, 1985. [Google Scholar]
Roscoe, A. CSP is expressive enough for π. In Reflections on the Work of CAR Hoare; Springer: London, UK, 2010; pp. 371–404. [Google Scholar]
Roscoe, A.W. The Theory and Practice of Concurrency; Prentice Hall: Hoboken, NJ, USA, 1998; Available online: http://www.comlab.ox.ac.uk/publications/books/concurrency/ (accessed on 1 May 2025).
FDR 4.2 Documentation. Available online: https://cocotec.io/fdr/manual/ (accessed on 29 April 2025).
Liu, Y.; Chen, W.; Liu, Y.A.; Sun, J. Model checking linearizability via refinement. In Proceedings of the FM 2009: Formal Methods: Second World Congress, Eindhoven, The Netherlands, 2–6 November 2009; Proceedings 2. Springer: Cham, Switzerland, 2009; pp. 321–337. [Google Scholar]
Herlihy, M.P.; Wing, J.M. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 1990, 12, 463–492. [Google Scholar] [CrossRef]
Roscoe, A.W. Understanding Concurrent Systems; Springer: Cham, Switzerland, 2010. [Google Scholar]
Gibson-Robinson, T.; Armstrong, P.; Boulgakov, A.; Roscoe, A.W. FDR3—A Modern Refinement Checker for CSP. In Tools and Algorithms for the Construction and Analysis of Systems; Ábrahám, E., Havelund, K., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2014; Volume 8413, pp. 187–201. [Google Scholar]
Herlihy, M.; Shavit, N.; Luchangco, V.; Spear, M. The Art of Multiprocessor Programming; Morgan Kaufmann: Burlington, MA, USA, 2020. [Google Scholar]
O’Hearn, P.W.; Rinetzky, N.; Vechev, M.T.; Yahav, E.; Yorsh, G. Verifying linearizability with hindsight. In Proceedings of the 29th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, Zurich, Switzerland, 25–28 July 2010; pp. 85–94. [Google Scholar]
Derrick, J.; Doherty, S.; Dongol, B.; Schellhorn, G.; Wehrheim, H. Verifying correctness of persistent concurrent data structures: A sound and complete method. Form. Asp. Comput. 2021, 33, 547–573. [Google Scholar] [CrossRef]
Dongol, B.; Derrick, J. Verifying linearisability: A comparative survey. ACM Comput. Surv. (CSUR) 2015, 48, 1–43. [Google Scholar] [CrossRef]
Pedersen, J.B.; Chalmers, K. Toward verifying cooperatively scheduled runtimes using CSP. Form. Asp. Comput. 2023, 35, 1–45. [Google Scholar] [CrossRef]
Mahmoud, A.T.; Mohammed, A.A.; Ayman, M.; Medhat, W.; Selim, S.; Zayed, H.; Yousef, A.H.; Elaraby, N. Formal Verification of Code Conversion: A Comprehensive Survey. Technologies 2024, 12, 244. [Google Scholar] [CrossRef]
OpenJDK. LinkedBlockingQueue.java. 2018. Available online: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/LinkedBlockingQueue.html (accessed on 29 April 2025).

Figure 1. Translation Pipeline.

Figure 2. Example of different commit/linearization times.

Table 1. Conversion of Michael and Scott’s algorithm to Java.

Michael & Scott	Structured Java
node = new_node()	Node node = new Node();
node→value = value	node.value.set(value);
node→next.ptr = null	node.next.set(null);
	Node tail;
	Node next;
loop	while (true)
tail = Q→Tail	tail = Q.tail.get();
next = tail.ptr→next	next = tail.next.get();
if tail == Q→Tail	if (tail == Q.tail.get()) {
if next.ptr == null	if (next == null) {
if CAS (&tail.ptr→next, next, node)	if (tail.next.compareAndSet(next, node)) {
break	break;
endif	}
	}
else	else {
CAS(&Q→Tail, tail, next.ptr)\|	Q.tail.compareAndSet(tail, next);
endif	}
endif	}
endloop	}
CAS(&Q→Tail, tail, node)\|	Q.tail.compareAndSet(tail, node);\|

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chalmers, K.; Pedersen, J.B. Analysing Concurrent Queues Using CSP: Examining Java’s ConcurrentLinkedQueue. Software 2025, 4, 15. https://doi.org/10.3390/software4030015

AMA Style

Chalmers K, Pedersen JB. Analysing Concurrent Queues Using CSP: Examining Java’s ConcurrentLinkedQueue. Software. 2025; 4(3):15. https://doi.org/10.3390/software4030015

Chicago/Turabian Style

Chalmers, Kevin, and Jan Bækgaard Pedersen. 2025. "Analysing Concurrent Queues Using CSP: Examining Java’s ConcurrentLinkedQueue" Software 4, no. 3: 15. https://doi.org/10.3390/software4030015

APA Style

Chalmers, K., & Pedersen, J. B. (2025). Analysing Concurrent Queues Using CSP: Examining Java’s ConcurrentLinkedQueue. Software, 4(3), 15. https://doi.org/10.3390/software4030015

Article Menu

Analysing Concurrent Queues Using CSP: Examining Java’s ConcurrentLinkedQueue

Abstract

1. Introduction and Motivation

Layout of the Paper

2. Background and Related Work

2.1. Communicating Sequential Processes

2.1.1. Choice

2.1.2. Pre-Guards

2.1.3. Process Composition

2.1.4. Traces and Hiding

2.1.5. Models

The Traces Model

The Stable Failures Model

The Failures-Divergences Model

2.1.6. FDR

2.1.7. Modules

2.2. Concurrent and Non-Blocking Data Structures

2.3. Serializable vs. Linearizable

2.4. Related Work

3. Method

3.1. OO to CSP Pipeline

3.1.1. Common Code

3.1.2. Converting Michael and Scott to Structured Java

3.1.3. Converting OpenJDK Code to Structured Java

3.1.4. Converting Structured Java to CSP Friendly Java

3.2. Implementing Shared Memory Programs in CSPM

3.2.1. Modelling Global State in CSP

3.2.2. Modelling Atomic Variables in CSP

3.2.3. Converting CSP Friendly Java to CSPM

3.3. Queue Specifications

3.3.1. A Sequential Queue Specification

3.3.2. A Concurrent Queue Specification

4. Examining OpenJDK’s Implementation of a Concurrent Queue

4.1. The OpenJDK/Java Version in CSP

4.2. Results

4.3. Code Refactoring

5. Not Sequential but Linearizable

6. Overall Results

7. Conclusions and Future Work

7.1. Conclusions

Contributions

7.2. Limitations

7.3. Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2. Implementing Shared Memory Programs in CSP_M

3.2.3. Converting CSP Friendly Java to CSP_M