Intelligent Transaction Scheduling to Enhance Concurrency in High-Contention Workloads

Shuhan Chen; Congqi Shen; Chunming Wu

doi:10.3390/app15116341

,

and

¹

College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China

²

Zhejiang Laboratory, Hangzhou 311121, China

^*

Author to whom correspondence should be addressed.

Appl. Sci.2025, 15(11), 6341;https://doi.org/10.3390/app15116341

This article belongs to the Special Issue AI-Based Data Science and Database Systems

Version Notes

Order Reprints

Abstract

Concurrency control (CC) scheme based on transaction decomposition has significantly enhanced the concurrency performance of multicore in-memory databases, surpassing traditional CC schemes such as two-phase locking (2PL) or optimistic concurrency control (OCC), particularly in high-contention scenarios. However, this performance improvement introduces new challenges, as balancing transaction dependency constraints with enhanced concurrency optimization remains a persistent issue, especially with the increased number of concurrent client requests, which can lead to complex transaction dependencies. To address these challenges, we propose Dynamic Contention Scheduling (DCoS), a novel method that enhances transaction concurrency via a dual-granularity architecture. DCoS integrates a deep reinforcement learning (DRL)-based executor to schedule high-contention transactions while preserving dependency correctness. DCoS employs a one-shot execution model that enables fine-grained scheduling in high-contention scenarios, while retaining lightweight in-partition execution under low-contention conditions. The experimental results on both micro- and macro-benchmarks demonstrate that DCoS achieves a throughput up to three times higher than state-of-the-art CC protocols under high-contention workloads.

Keywords:

in-memory database; transaction decomposition; transaction dependency; deep reinforcement learning; high-contention workloads

1. Introduction

With the exponential growth of multicore architectures in modern servers, optimizing concurrency control (CC) has become a critical challenge for online transaction processing (OLTP) systems. As the demand for higher throughput and lower latency intensifies, efficiently scaling transaction processing across cores is essential for latency-sensitive applications such as financial trading, e-commerce, and distributed databases. However, the performance of these protocols degrades rapidly on workloads with high contention [1], where hot data items are frequently accessed and modified by many concurrent transactions. Existing approaches to addressing high-contention workloads can be broadly classified into two main paradigms:

1. Partition-based execution. Partitioning techniques aim to colocate contention transactions within the same partition to minimize cross-partition communication and CC overhead. By assigning transactions to fixed threads or cores, partition-based strategies reduce inter-partition contention and facilitate execution with minimal coordination, making explicit CC mechanisms unnecessary in ideal deployment scenarios. However, they inherently limit intra-transaction concurrency and lack flexibility in the presence of dynamic or skewed workloads. Moreover, many real-world workloads are not naturally partitionable [2,3], which further constrains their applicability.

2. Fine-grained CC. Although hot data may constitute only a small portion of a transaction’s execution time, traditional CC schemes like two-phase locking (2PL) [4] and optimistic concurrency control (OCC) [5] operate at the transaction level, either holding locks or aborting entire transactions, leaving much of the potential concurrency unexploited. To overcome this problem, techniques such as transaction chopping [6,7], IC3 [8], and runtime pipelining [9] decompose transactions into smaller pieces or operations. These techniques enable the early exposure of intermediate results, but they require careful scheduling to prevent deadlocks and cyclic dependencies. For example, if one transaction accesses Table A before B and another accesses them in reverse order, IC3 merges these operations into a single execution unit to avoid cycles—sacrificing parallelism for safety. In real-world OLTP applications, transaction dependencies become increasingly complex due to high-frequency access to hot items and fluctuating client request patterns [10]. Existing methods face fundamental challenges in balancing transaction dependencies and concurrency, which urges the design of dynamic, dependency-aware scheduling frameworks.

Deep reinforcement learning (DRL) has shown great promise in solving complex combinatorial optimization problems such as graph routing and job scheduling [11,12,13]. However, its application to contention-aware scheduling in OLTP systems remains relatively unexplored.

In this paper, we propose Dynamic Contention Scheduling (DCoS), a DRL-based framework designed to enhance transaction concurrency by adaptively resolving both inter-transaction and intra-transaction contentions. Rather than developing a new high-performance concurrency control scheme, we improve the effectiveness of transaction processing through dependency-aware scheduling, with the capability to adapt to varying levels of contention. We implement a flexible contention identification mechanism based on hot data, enabling DCoS to dynamically switch between lightweight execution for low-contention workloads and DRL-based fine-grained scheduling for high-contention scenarios. Specifically, DCoS introduces a two-stage transaction processing architecture:

Transactions are first dynamically partitioned. Transactions without cross-partition contention are executed independently and in parallel across cores.
Transactions with potential cross-partition contentions are further decomposed into fine-grained operations. A DRL agent is then used to learn contention-aware execution orders, optimizing core utilization and minimizing aborts under dynamic workloads.

Our contributions are summarized as follows:

We formulate complex-dependency transaction scheduling as a Markov Decision Process (MDP), enabling adaptive execution through learned scheduling policies.
We propose an Adaptive Placement Graph structure to jointly encode intra-transaction and inter-transaction dependencies, as well as partition-to-core mappings.
We develop a scalable, architecture-agnostic DRL executor with efficient feature embeddings, capable of generalizing across diverse concurrency scenarios and processing environments.
We conduct an extensive experimental evaluation to demonstrate the effectiveness of the proposed approach, showing consistent improvements over both existing transaction-scheduling strategies and state-of-the-art CC methods without scheduling.

3. Preliminaries

In this section, we establish the formal foundations of transaction processing and then introduce key concepts for reinforcement learning.

3.1. Formalism of Transaction Processing

Definition 1

(Transaction). A transaction

T_{i}

constitutes a finite sequence of database operations

{O_{i 1}, \dots, O_{i n}}

executing as an atomic unit. Each operation

O_{i j}

represents either a

R E A D (d)

or

W R I T E (d)

action on data item

d \in D

.

Operations within the same transaction can exhibit two types of dependencies, which arise based on the transaction’s logical structure. We categorize these as intra-transaction dependency, which are further divided into (1) data dependency and (2) commit dependency.

Definition 2

(Intra-Transaction Dependency). For operations

O_{i j}

and

O_{i k}

within transaction

T_{i}

, the partial order

≺_{intra}

satisfies:

\forall O_{i j}, O_{i k} \in T_{i}, (j < k) \Rightarrow (O_{i j} ≺_{intra} O_{i k})

(1)

Intra-transaction data dependency exists when $O_{i k}$ relies on data computed by $O_{i j}$ . Formally, $O_{i k}$ is said to be data-dependent on $O_{i j}$ if it requires a value produced by $O_{i j}$ .
Intra-transaction commit dependency ensures transaction atomicity when certain operations may be subject to logic-induced aborts. Logic-induced aborts result from violating integrity constraints defined by applications, which are captured by the set of constraints $≺_{intra}$ . Intuitively, if an operation is associated with at least one constraint that may not be satisfied after the execution of the operation, then it is abortable.

Another type of dependency exists between operations of different transactions, which the execution model induces.

Definition 3

(Inter-Transaction Dependency). For distinct transactions

T_{i} \neq T_{k}

, a dependency

O_{i j} ≺_{dep} O_{k l}

exists iff:

\exists d \in D : (O_{i j} = WRITE (d)) \land (O_{k l} \in {READ (d), WRITE (d)})

(2)

This general dependency can be classified into two specific cases:

Write–read dependency: If $O_{k l} = READ (d)$ , then $O_{i j}$ ’s commit must precede $O_{k l}$ ’s execution to prevent cascading aborts.
Write–write dependency: If $O_{k l} = WRITE (d)$ , then $O_{i j}$ and $O_{k l}$ require mutual exclusion on d, meaning they must be serialized to avoid data inconsistency. Unlike write–read dependency, this type of contention does not enforce a strict execution order.

Transaction contention. Transaction contention primarily arises from inter-transaction dependencies, where multiple transactions contend for shared data. A transaction

T_{i}

is considered conflict-free with another transaction

T_{j}

if there are no contentions between them. The nature of these contentions is determined by the isolation level enforced by the CC protocol. For example:

Under serializability, contention occurs when two transactions access the same data item, and at least one modifies it.
Under snapshot isolation, contention occurs if two transactions attempt to write the same data item.

Effectively managing transaction contentions is essential for maintaining database integrity, ensuring correctness, and optimizing performance in a concurrent environment.

3.2. Reinforcement Learning

Reinforcement learning is a practical approach for adaptive decision making in dynamic environments. In this framework, an agent iteratively refines its actions based on feedback from the system state. This learning paradigm is typically modeled as a Markov Decision Process, represented by the tuple

(S, A, P, R, S^{'})

.

The agent interacts with the environment at discrete time steps. At each step t, the agent observes a state

s^{t}

from the state space

S

and selects an action

a^{t}

from the action space

A

. After executing the action, the environment provides a reward

r^{t}

, and the agent transitions to a new state

s^{t + 1}

according to the transition probability distribution

P

.

The process of selecting actions is guided by a policy

π (a | s)

, which specifies the probability of choosing an action a in a given state s. The objective of the learning-based optimization process is to derive an optimal policy

π

that maximizes the expected cumulative reward:

R (π) = E_{π} [\sum_{t} γ^{t} r_{t}]

(3)

where

γ \in [0, 1]

is the discount factor that controls the importance of future rewards.

4. Design of DCoS

DCoS adopts a one-shot transaction-processing model, in which a transaction begins execution only after all its inputs are ready. Many modern in-memory systems employ similar models, including H-Store [26], Calvin [17], and Silo [27], which leverage pre-declared access patterns to enable efficient scheduling and deterministic execution in some cases. DCoS adopts two coordinated phases to optimize transaction processing. Figure 1 shows the workflow of DCoS.

Figure 1. DCoS workflow in transaction processing for OLTP system.

(Section 4.1) Conflict-free partitioning phase: Transactions are partitioned based on high-contention data items.
(Section 4.2) Dynamic execution phase: The CC granularity of the transactions is determined dynamically according to data access patterns.

4.1. Conflict-Free Partitioning Phase

Given a transaction batch

B

, DCoS applies a partitioning function

Φ

to minimize concurrency control to divide it into:

p conflict-free clusters ${Q_{1}, Q_{2}, \dots, Q_{p}}$ ;
A residual set $T$ , where $| T | \leq α | B |$ , and $α \in [0, 1]$ serves as the residual threshold parameter.

The transaction-partitioning problem is generally NP-hard, and extensive research has been conducted in this domain [3,15,24]. We adopt the dynamic partitioning algorithm following [15], and DCoS is also compatible with other partitioning functions.

DCoS follows a proactive framework that leverages the knowledge of the complete database scheme and the complete set of stored procedures in OLTP applications. DCoS analyzes the contention among transactions based on their read–write sets by creating the data access graph from the read–write set of transactions. Specifically, the partitioning phase consists of three key processes:

1. Identifying hot data items. Hot data items

{d_{1}, \dots, d_{δ}}

are probabilistically detected by randomly sampling transactions for

η

trials. The data items accessed by these sampled transactions are merged into a designated hot cluster, ensuring that only

η

transactions are processed and

δ

hot clusters are formed in this step.

2. Expanding the hot cluster with cold items. For the remaining transactions, if at most one of their accessed data items belongs to an existing hot cluster, the data set of the entire transaction is merged into that cluster. However, if a transaction accesses data items spanning multiple hot clusters, it is skipped to prevent cross-cluster conflicts.

3. Optimizing the volume of the residual set. The system continuously monitors the residual growth ratio

\frac{| T |}{| B |}

. If the residual volume exceeds the threshold

α | B |

, a hot cluster merging process is triggered to balance the trade-off between reducing CC synchronization overhead and maintaining thread concurrency efficiency.

4.2. Dynamic Execution Phase

4.2.1. Execution Model

Existing solutions [15,28] employ a two-phase execution method, as shown in Figure 2b. In the first phase, conflict-free clusters are executed serially without requiring any CC protocol on m available cores

C = {C_{1}, \dots, C_{m}}

. We consider the situation in which each core is assigned to a single thread and use thread worker and core assignment interchangeably in the following paper. In the second phase, residual transactions involving cross-partition data items are executed using conventional CC protocols across all cores. However, this approach has inherent limitations: transactions in the residual set limit the entire transaction processing latency due to cross-partition dependencies (see Section 4.2.2 for details). The system can manage conflicts more effectively and improve concurrency control by decomposing transactions into smaller pieces based on their contention data set.

Figure 2. Transaction execution model.

DCoS solution. Given the varying workloads and processing times of conflict-free clusters, particularly for residual transactions, DCoS introduces a fine-grained scheduling approach for cross-partition transactions. Firstly, transactions accessing multiple high-contention data items (i.e., cross-partition transactions) are decomposed into transactions pieces leveraging runtime pipelining (RP) [9]. RP organizes these pieces based on data dependencies and assigns each group a rank. Depends on whether access hot data item in different partition, the transaction pieces can be further divided into contention-intensive transaction pieces and non-contentious transaction pieces. Among the same group, existing RP-based methods mainly merge these accesses into a single execution unit to avoid deadlocks, which limits the level of parallelism that can be potentially exploited. We employ fine-grained scheduling with a DRL-based executor to explore more concurrency among the same group (see Section 4.2.3 for details).

Then DCoS employs an adaptive two-level queue execution model as shown in Figure 2a.

Hot queue: Designed to handle contention-intensive transaction pieces requiring fine-grained scheduling at the operation level. A DRL-based executor is employed to maximize concurrency within these contention-intensive transaction pieces.
Cold queue: Designed to handle transactions (or transaction pieces) that do not involve cross-partition data. Utilize a coarse-grained scheduling approach (e.g., conventional methods such as round-robin allocation), where entire transactions (pieces) are assigned to threads.

The adaptive two-level queue execution model adheres to the following scheduling principles:

Principle 1: Concurrent execution and scheduling. DCoS executes cold-queue transactions with hot-queue transactions scheduling concurrently. The fine-grained scheduling process dynamically adapts to execution progress according to thread state.
Principle 2: Priority-based scheduling. Transactions in the hot queue are assigned higher execution priority, while cold-queue transactions are assigned lower priority. If no hot transactions can proceed due to dependency constraints, the thread first fetches from the cold queue. Instead of forcing hot-queue transactions to wait idly, the system dynamically schedules transactions in the cold queue to execute first.
Principle 3: Dependent enforcing. Hot-queue transactions are often dependent on cold-queue transactions. Since cold-queue transactions access low-contention data but may still hold locks on critical data items, hot-queue transactions that require those items must wait until the locks are released. In addition, execution dependencies between transaction pieces within the same transaction are enforced via the lock mechanism defined in the RP framework [9]. Specifically, a later transaction piece cannot acquire the locks until all earlier pieces have released theirs.

Compared with the two-phase execution model, the two-level queue execution model dynamically adjusts execution order based on dependencies and thread status with the following objectives:

Reduced lock contention: Hot transactions experience fewer conflicts through fine-grained scheduling to reduce the probability of aborts.
Higher resource utilization: Cores remain active by executing transactions in the cold queue when hot transactions wait for a lock.

4.2.2. Contention Transaction Patterns

Before introducing the fine-grained scheduling design, we first analyze the fundamental dependency patterns within the residual set

T

using two representative examples, as illustrated in Figure 3.

W (\cdot)

denotes a write operation, and

R (\cdot)

represents a read operation.

Figure 3. Dependency patterns. (a) Direct dependency. (b) Transitive dependency.

Example 1

(Direct dependency). Figure 3a illustrates a direct dependency scenario in which transactions

T_{1}

,

T_{2}

, and

T_{3}

access shared data items

G_{2}

,

G_{3}

, and

G_{4}

with overlapping write operations. To ensure serializability under such contention, transaction execution can be coordinated by assigning different start times based on dependency order. For instance, by applying proactive delayed data access [28],

T_{3}

’s write to

G_{2}

can be scheduled first, followed by

T_{2}

, and finally

T_{1}

, respecting a write sequence to avoid conflicts.

Example 2

(Transitive dependency). In real-world workloads, more complex dependency chains frequently arise, as shown in Figure 3b. Here, transitive dependencies exist among

T_{1}

,

T_{2}

, and

T_{3}

. Unlike direct dependencies, simply adjusting the execution order of transactions cannot resolve the scheduling problem. An inappropriate sequence may still result in transaction aborts, which imposes a more sophisticated scheduling solution.

4.2.3. Fine-Grained Scheduling

To effectively schedule contention-intensive transactions, we design a DRL-based executor to determine the access sequence on hot data items (i.e., operation sequence) and their corresponding thread assignment, as shown in Figure 4. The scheduling process satisfies the following constraints:

Figure 4. Fine-grained scheduling process with a DRL executor.

Minimizing cross-partition coordination: Scheduling operations while respecting inter-transaction and intra-transaction dependency constraints to reduce synchronization overhead.
Optimizing core utilization: Ensuring proper core assignment to prevent resource underutilization and avoid excessive load concentration on specific cores.

Problem formulation. Given a residual transaction set

T = {T_{1}, \dots, T_{n}}

consisting of variable-length operation sequences (

| T_{i} | \neq | T_{j} |

) over a data subset

\bar{D} \subseteq D

, the executor schedules these transactions across m available processing threads,

C = {C_{1}, \dots, C_{m}}

. We assume

n \leq m

, as DCoS processes residual transactions concurrently within a single period across available cores. If residual transactions exceed the available cores (

n > m

), transactions are grouped and scheduled across multiple periods.

Residual transaction processing follows the same mapping function as conflict-free transaction processing to ensure the serial execution of operations on the same data. Let

Φ : T \to Q = {Q_{1}, \dots, Q_{p}}

represent the partitioning function that assigns transactions to p disjoint subsets. Additionally, a one-to-many mapping

Γ : Q \to C

allocates these partitions to thread groups. Consequently, the mapping of operations to threads is formally defined as follows:

M = Φ (Γ) : O_{i j} \to \bar{C} \subseteq C

(4)

where

O_{i j}

represents an operation from transaction

T_{i}

.

The makespan refers to the maximum completion time across all threads, i.e., the time by which the last thread completes its assigned operations. The objective of the scheduler is to minimize the total makespan required to process

T

. Specifically, the overall makespan is given by:

M_{max} = max {M_{i} ∣ i \in {1, \dots, m}}

(5)

where

M_{i}

denotes the makespan of the i-th thread. By optimizing the operations sequence and operations to threads, the scheduler aims to achieve balanced workload distribution while reducing contention and synchronization overhead.

5. Drl-Based Executor

DCoS implements a DRL agent to act as the executor and solve the scheduling problem above. We propose the Adaptive Placement Graph (APG) as the input to the scheduling process, providing a structured way to represent both intra-transaction and inter-transaction dependencies.

Graph structure. The APG is defined as:

G = (T = T \cup {O_{s t a r t}, O_{e n d}}, ω, φ)

(6)

where

O_{s t a r t}

and

O_{e n d}

are the dummy starting and ending of scheduling sequence. The set

ω

consists of intra-transaction dependency edges, capturing the execution order of operations within a single transaction. The set

φ

represents resource edges, which link pairs of operations requiring the same thread worker for execution, as dictated by

M (O_{i j})

. Resource edges consist of a mix of directed and undirected arcs:

Undirected arcs: Represent intra-transaction dependency, indicating that two operations contend for the same resource without enforcing a strict execution order.
Directed arcs: Represent write–read dependency, ensuring that a write operation must be committed before a dependent read operation is executed.

Figure 5 illustrates how DCoS schedules contention-intensive transactions in Example 2. By updating the resource edges (from undirected to directed), DCoS can generate a scheduling solution that optimizes the processing of transactions across available threads, ensuring minimal makespan while respecting the dependencies between transactions.

Figure 5. DCoS executor solves the scheduling problem in Example 2.

Moreover, DCoS leverages historical workload traces containing transaction-processing metadata (e.g., procedure parameters, execution timestamps, query parameters). Using this data, DCoS estimates per-thread operation processing times set

ε = {ρ_{i j k}} (O_{i j} \in T, C_{k} \in C)

based on current thread-worker state

S {(C)}^{t} = {S {(C_{1})}^{t}, \dots, S {(C_{m})}^{t}}

.

5.1. MDP Formulation

The scheduling process comprises

| T |

consecutive decision steps. We formalize the MDP defined by a tuple

(S, A, P, R, S^{'})

as follows:

States $S$ . The global state

s^{t}

at timestep t consists of a APG state

s_{s c}^{t} = ⟨ S {(O)}^{t}, κ^{t}, ξ^{t} ⟩

and a thread-worker state

s_{c}^{t} = (S {(C)}^{t}, ε)

:

APG State: $κ^{t}$ and $ξ^{t}$ denote scheduled/unscheduled set of resource edges. For each $O_{i j}$ , $S {(O_{i j})}^{t} = {T {(O_{i j})}^{t}, X {(O_{i j})}^{t}}$ , representing operation state, where $X {(O_{i j})}^{t}$ is a binary indicator (1 for scheduled operation, 0 otherwise), $T {(O_{i j})}^{t}$ is scheduled time or the estimated minimum completion time defined as:

$T {(O_{i j})}^{t} = \{\begin{matrix} Completion time & if scheduled \\ T {(O_{i, j - 1})}^{t} + min_{C_{k} \in C} ρ_{i j k} & otherwise \end{matrix}$

(7)

Action $A$ . The hierarchical action space contains:

Operation selection ( $a_{1}^{t}$ ): Choose eligible operation $O_{i j}$ ;
Thread-worker assignment ( $a_{2}^{t}$ ): Choose one thread worker for $O_{i j}$ from candidate set $M (O_{i j})$ .

Reward $R$ . The incremental reward at step t is:

r (s^{t}, a_{1}^{t}, a_{2}^{t}) = - [M (s^{t + 1}) - M (s^{t})]

(8)

The objective is to minimize makespan

M = {max}_{O_{i j}} T (O_{i j})

by maximizing cumulative rewards

\sum_{t = 1}^{T} r^{t}

.

APG Transition $P$ . At timestep t, after the agent acts with updating one directed arc, a new APG is generated with a new global state

S^{'}

with

s^{t + 1} = (s_{s c}^{t + 1}, s_{c}^{t + 1})

.

5.2. Network Architecture

We employ distinct networks,

π_{θ_{1}} (a_{1} | s)

for operation selection policy and

π_{θ_{2}} (a_{2} | s, a_{1})

for thread-worker assignment policy, to guide the transaction-scheduling process. We use a distinct embedding layer for two policies as follows.

Graph embedding. We adopt a L-layer graph isomorphism network (GIN) [29], a variant of graph neural network (GNN), to compute structural embeddings for

s_{s c}^{t}

. A GNN is a neural network architecture that operates on graph-structured data by iteratively aggregating feature information from a node’s neighbors. In our case, the embedding of each operation is updated at iteration

l \in {1, 2, \dots, L}

as follows:

h_{l}^{O_{i j}, t} = {MLP}_{θ_{l}} ((1 + ϵ^{(l)}) \cdot h_{(l - 1)}^{O_{i j}, t} + \sum_{u \in N (O_{i j})} h_{(l - 1)}^{u, t})

(9)

where

N (O_{i j})

denotes the set of operations that have intra-transaction dependencies with

O_{i j}

, and u is one such operation.

θ_{l}

represents the learnable parameters of layer l, and

ϵ^{(l)}

is a trainable scalar that controls the relative weight of self-information and neighbor information during aggregation.

After L iterations, we obtain a global representation of the APG using an average pooling function that aggregates the embeddings of all operations into a vector:

h_{G}^{t} = \frac{1}{| T |} \sum_{O_{i j} \in T} h_{L}^{O_{i j}}

(10)

We adopt a fully connected layer to encode the thread-worker state

s_{c}^{t}

by computing the embedding vector of each thread worker

h^{C_{k}, t}

and the pooling vector

u^{t}

.

Arc transformation. APG is a mixed graph containing undirected arcs and directed arcs. For operations with no inter-transactions dependency that has undirected edges, we replace each undirected arc with two directed arcs in opposite directions. We ignore undirected disjunctive arcs in the initial state to reduce computational complexity [30], with

s_{s c}^{t} = {S {(O)}^{t}, κ^{t}}

to avoid APG too dense to be efficiently processed by GIN. With the transaction scheduling process,

κ^{t}

becomes larger since more directed arcs will be added.

Action selection. We implement MLP layers as action decoders based on state embeddings. First, we compute the action score of operation selection action

Φ_{1}^{O_{i j}, t}

and thread-worker assignment action

Φ_{2}^{C_{k}, t}

:

\begin{matrix} Φ_{1}^{O_{i j}, t} = M L P_{π_{θ_{1}}} (h_{(l)}^{O_{i, j}, t} | | h_{G}^{t} | | u^{t}) \\ Φ_{2}^{C_{k}, t} = M L P_{π_{θ_{2}}} (h^{C_{k}, t} | | h_{G}^{t} | | u^{t}) \end{matrix}

(11)

For operations that are already scheduled or violate precedence constraints, or thread workers that are incompatible with the selected operation. We set their score to

- \infty

. Finally, we normalize the scores using the softmax function, a standard transformation that converts real-valued inputs into a probability distribution. This enables the computation of selection probabilities and supports a sampling-based strategy for action prediction.

\begin{matrix} P_{O_{i j}} (a_{1}^{t}) = \frac{e^{Φ^{O_{i j}, t}}}{\sum_{T} e^{Φ^{O_{i j}, t}}} \\ P_{C_{k}} (a_{2}^{t}) = \frac{e^{Φ^{C_{k}, t}}}{\sum_{C} e^{Φ^{C_{k}, t}}} \end{matrix}

(12)

5.3. Training Algorithm

To train the policy network, we employ the Proximal Policy Optimization (PPO) algorithm [31] with a Sequence Actor for operation selection policy, a Load-Balancing (LB) Actor for thread-worker assignment policy, and a global critic, as shown in Figure 6. Actors running in the environment collect training experience tuples, and then the two sub-policies

π_{θ_{1}}

and

π_{θ_{2}}

are updated via the collected samples.

Figure 6. Actor–critic framework of DCoS executor.

Objectives of actor network. We implement identical objective computation mechanisms for both actor networks. The overall objective function combines the clipped surrogate objective and entropy regularization:

\begin{matrix} L (θ) & = α L_{CLIP} (θ) + β L_{E} (θ), \\ L_{CLIP} (θ) & = E_{t} [min (δ_{θ}^{t} {\hat{A}}^{t}, clip (δ_{θ}^{t}, 1 - ϵ, 1 + ϵ) {\hat{A}}^{t})], \\ L_{E} (θ) & = E_{t} [Entropy (π (a^{t} | s^{t}))], \\ δ_{θ}^{t} & = \frac{π (a^{t} | s^{t})}{π_{old} (a^{t} | s^{t})}, \end{matrix}

(13)

where

θ = {θ_{1}, θ_{2}}

denotes the parameters of both actor networks. Here,

L_{CLIP} (θ)

represents the clipped surrogate objective,

L_{E} (θ)

is the entropy regularization term, and

δ_{θ}^{t}

indicates the probability ratio between the updated and old policies. The hyperparameters

ϵ

,

α

, and

β

control the clipping range and objective weighting.

Objective of critic network. Both actors share a common critic network with state-value function

v_{ϕ} (s^{t})

. The advantage function estimator is calculated as follows:

{\hat{A}}^{t} = \sum_{t^{'} = t}^{T} γ^{t^{'}} r^{t^{'}} - v_{ϕ} (s^{t}),

(14)

where

{\hat{A}}^{t}

serves as a variance-reduced advantage estimator. The critic network is optimized by minimizing the mean squared error (MSE) between predicted and actual returns:

L_{MSE} (ϕ) = E_{t} [{(\sum_{t^{'} = t}^{T} γ^{t^{'} - t} r^{t^{'}} - v_{ϕ} (s^{t}))}^{2}] .

(15)

6. Deployment of DCoS

DCoS functions as an intelligent scheduling layer within the transaction processing pipeline. It builds upon a correctness enforcement layer that guarantees serializability through partitioning methods and runtime pipelining (RP). Figure 7 depicts the hierarchical interaction among partitioning, CC protocols, and scheduling methods. DCoS has the following characteristics:

Figure 7. Transaction processing of DCoS under varying contention levels.

Adaptive execution under varying contention levels. In low-contention scenarios, coarse-grained execution is sufficient. Transactions that do not access hot data and are free from contention can be executed entirely within a single partition. This approach eliminates coordination overhead and avoids the complexity associated with fine-grained control mechanisms. When contention is detected, typically through cross-partition accesses to hot data items, DCoS responds by activating a fine-grained scheduler based on DRL. The scheduler reorders and interleaves contention-intensive transaction pieces in a way that minimizes conflicts and improves concurrency.

Complementarity with existing CC protocols. DCoS is designed to serve as a complement to existing correctness mechanisms. Rather than replacing CC protocols that optimize lock acquisition order or validation timing, DCoS enhances them by operating at the scheduling layer. DCoS focuses on improving execution order to reduce conflicts and transaction aborts. DCoS assumes the presence of a base CC protocol, such as RP, to enforce correctness, while it adaptively applies fine-grained scheduling only when it is beneficial.

Limitations and Practical Considerations

DCoS introduces several practical considerations:

Assumption of known access patterns. DCoS relies on a one-shot transaction processing model where access patterns are known ahead of execution, which may limit its applicability in dynamic or ad hoc workloads with unpredictable access patterns.
Accuracy in hot data and contention detection. The integration of a DRL-based scheduler requires offline training. The effective activation of fine-grained scheduling depends on the accurate identification of hot data items and contention-intensive transaction pieces.

A promising direction for future work is the tighter integration of DCoS-style scheduling with enhanced CC mechanisms, enabling adaptive selection of scheduling strategies tailored to workload characteristics. Furthermore, exploring lightweight online learning and transfer learning techniques [32] could reduce training overhead and improve scheduler adaptability, broadening DCoS’s applicability across diverse OLTP environments.

7. Experimental Results

In this section, we comprehensively evaluate DCoS by comparing it with existing systems. Specifically, we focus on the following aspects:

(Section 7.2) First, we assess the effectiveness of the DCoS executor for handling high-contention workloads.
(Section 7.3 and Section 7.4) Then, we evaluate DCoS against state-of-the-art CCs for handling various workloads.
(Section 7.5) Finally, we analyze the overhead introduced by DCoS in high-contention scenarios to provide insights into the trade-offs involved in employing DRL-based scheduling.

7.1. Experimental Setup

We implement a prototype of DCoS within the DBx1000 [33] transaction-processing engine and integrate a partitioning algorithm following [15]. Our experiments are conducted on two machines equipped with two 16-core AMD 7302 processors and an NVIDIA 24GB Ampere A30 GPU. One machine initializes the OLTP application, while the other serves as the transaction-processing engine.

Hyperparameters. The embedding network of the executor consists of two layers, each containing two hidden layers with a hidden dimension of 128 neurons. Following common practice in DRL tasks [31], we set the coefficients for clipping, policy loss, value function, and entropy to 0.2, 2, 1, and 0.01, respectively. During training, we use the widely adopted Adam optimizer [34] with a learning rate of

l r = 1 \times 10^{- 3}

, which is a standard choice in policy gradient algorithms.

Metrics. We measure the performance of the transaction-processing system using the following metrics:

Makespan: The total execution time of a transaction batch, highlighting the effectiveness of scheduling under contention.
Throughput: The number of transactions committed per second, reflecting overall system efficiency.

7.2. Microbenchmark Evaluation

In this section, we evaluate the effectiveness of the DCoS DRL-based scheduler by comparing it against three state-of-the-art scheduling methods:

Simulated Annealing (SA) [10] explores scheduling configurations by probabilistically accepting higher-cost states.
Dynamic Priority (Priority) [33] prioritizes transactions based on the largest number of pending operations.
Greedy Scheduling (Greedy) selects immediately available thread workers using a first-fit allocation strategy.

We design a workload generator to evaluate the performance of DCoS under high-contention with varying workload skew. We sample a set of hot data items from the database and then test a set of transactions

| \bar{D} |

that will operate on these hot data items. With the increase of concurrent contention-intensive transactions, the contention rate also increases. We manage workload variance by adjusting the number of cold items, while the processing time for hot transactions, denoted as

t_{p}

, can vary with the cold item distribution. Specifically, we define three workload states among cores based on average transaction-processing time:

t_{p} \sim U (100, 120) μ s

for low workload variance,

t_{p} \sim U (100, 150) μ s

for medium workload variance, and

t_{p} \sim U (100, 200) μ s

for high workload variance.

Varying contention levels. We set the available thread workers to 5 and the number of concurrent contention-intensive transactions (10, 15, 20) to simulate increasing contention rates. The DCoS model was trained for 1000 episodes until convergence was reached, ensuring stable policy learning. We generated 500 distinct transaction workload sets not encountered during training for testing. The mean value of the results is used for comparison; we collect the makespan of processing the batch of hot items as shown in Figure 8a. DCoS consistently outperforms other algorithms across all scenarios, particularly under high workload variance and large concurrent transactions. Compared to the best-performing SA, DCoS demonstrates a makespan reduction of 8.8% to 12.8% under low workload variance and 14% to 15.5% under high workload variance as the number of concurrent transactions increases. The Greedy algorithm shows performance decreases in handling increased concurrency, which aligns with the inherent weakness of greedy approaches, which make locally optimal but globally suboptimal decisions. The priority algorithm performs comparably to SA under moderate conditions; however, with workload variance increases, the performance among SA, Greedy, and Priority is affected due to their lack of adaptability to dynamic adjustments, which limits their effectiveness.

Figure 8. (a) Scalability comparison under different workloads. (b) Generalization performance with increasing concurrent transactions and thread workers.

Generalization. To evaluate cross-scale generalization, we conduct transfer learning experiments where DCoS trained on small-scale configurations (five thread workers, 20 concurrent contention-intensive transactions) is directly deployed in large-scale environments (ten thread workers, 20–30 concurrent high-contention transactions). This zero-shot adaptation test eliminates parameter fine-tuning to isolate model generalization capacity. The evaluation protocol maintains 10 worker threads while scaling contention-intensive transactions to represent 100–150% of the original training workloads. Figure 8b shows the testing results, we observe a similar trend to small-scale transactions, with DCoS demonstrating a 12.57% makespan reduction compared to the best-performing SA. Since we implement embedding layers to enable DCoS to generalize effectively to different sizes of concurrent transaction sets and thread workers, even without additional training, this allows DCoS to adapt to various transaction workload scenarios.

7.3. TPC-C Benchmark Evaluation

DBx1000 features a pluggable lock manager that supports multiple CC schemes, enabling a direct comparison of DCoS against various baseline approaches within the same system. Specifically, we implement and evaluate the following three methods:

NO_WAIT is a variant of 2PL where any conflict leads to the immediate abortion of the requesting transaction.
Silo [27] is a representative OCC protocol.
IC3 [8] combines the static analysis of the transaction workload with runtime techniques that track and enforce dependencies among concurrent transactions.

To ensure an industry-standard evaluation, we conduct comprehensive experiments using the TPC-C benchmark [35] with enhanced contention control analysis. We maintain a balanced transaction mix of 50% New-Order and 50% Payment transactions to model real-world order-processing workloads closely. These two transactions comprise the vast majority of the benchmark and have been used previously to evaluate high-contention workloads to stress concurrency control. Our evaluation framework extends the classical TPC-C benchmark by varying two key factors: (i) the number of warehouses while maintaining a constant thread count, and (ii) the number of worker threads while keeping the warehouse count.

We fix the number of worker threads at 32 and vary the number of warehouses from 1 (high contention) to 32 (low contention). As shown in Figure 9, DCoS achieves the highest throughput across all warehouse configurations, significantly outperforming others, especially under high contention. At one warehouse, DCoS is three times higher than Silo and

1.5

times higher than IC3. These results highlight the effectiveness of DCoS in mitigating contention bottlenecks and maintaining robust performance under varying workloads, since it can avoid unnecessary waiting through intelligent scheduling based on runtime information. At the same time, IC3 only leverages the static information.

Figure 9. TPC-C performance (from high contention to low contention).

To assess system scalability, we vary the number of worker threads from 1 to 32 while keeping the warehouse count fixed at 1, as shown in Figure 10. The performance advantage of DCoS becomes more pronounced as the number of threads increases. At 16 threads, DCoS reaches 3.2 M txns/s, surpassing IC3 by 25% and significantly outperforming NoWait and Silo.

Figure 10. Scalability testing (one warehouse).

7.4. TPC-E Benchmark Evaluation

To evaluate how well DCoS adapts to a more complex contention scenario, we conduct experiments using the TPC-E benchmark [36]. This benchmark simulates transaction workloads with varying data access patterns. We control contention by changing the frequency of updates to the SECURITY table, using a Zipfian distribution for these updates. We adjust the skew parameter

θ

from 0 to 4. Higher values of

θ

indicate more significant data access skew and increased contention.

As shown in Figure 11, throughput decreases for all methods as

θ

increases, reflecting the impact of contention due to skewed data access. DCoS maintains the highest throughput across all skew levels. At

θ = 0

, DCoS achieves 1.2 M txns/s, slightly behind Silo (1.3 M txns/s), since DCoS adopts a dependency analysis, but at

θ = 4

, DCoS still maintains 0.6 M txns/s, whereas Silo drops to 0.35 M txns/s. Silo performs best under low-skew scenarios but degrades significantly as contention increases. NoWait and IC3 consistently underperform, struggling to handle skewed workloads efficiently. These findings highlight that DCoS is more resilient to data skew than traditional CC methods, as it dynamically adapts to runtime contention.

Figure 11. Throughput comparison under different Zipf skew values (

θ

).

To analyze the combined effect of parallelism and contention, we vary the number of worker threads while maintaining a Zipfian skew of

θ = 3

. The results, shown in Figure 12, indicate that DCoS significantly outperforms Silo and NoWait beyond 16 threads. At 32 threads, DCoS achieves a throughput that is

1.63

times higher than Silo,

1.48

times higher than NoWait, and

1.94

times higher than IC3.

Figure 12. Scalability analysis

θ = 3

.

In summary, the findings from TPC-C and TPC-E demonstrate that DCoS is particularly effective at managing data skew, sustaining high throughput even in scenarios of extreme contention. These results highlight the significance of runtime-adaptive scheduling strategies employed in DCoS for optimizing transaction throughput in OLTP systems.

7.5. Overhead Analysis

To evaluate the runtime overhead of the DRL-based scheduling executor, we analyze the overhead in Section 7.4, where

θ = 4

represents a highly skewed, high-contention scenario. Our overhead analysis focuses on high-contention workloads, where both the benefits and costs of DCoS are most evident. Figure 13 presents the execution profile across different phases of transaction processing. Note that for IC3, which uses an active waiting mechanism under high contention, we include its associated costs within the analysis phase for comparison. DCoS spends the largest percentage of execution time in the execute phase. Although DCoS introduces an additional analysis phase responsible for hot data item identification and DRL-based scheduling decisions, the abort-related overhead stays below 20%, which is significantly lower than the abort-related overhead observed in NoWait (40%) and Silo (45%). We also observe that IC3, which fully decomposes transactions, incurs even higher analysis overhead while achieving lower execution efficiency. This comparison illustrates that DCoS strikes a more favorable balance between coordination cost and execution throughput.

Figure 13. Execution profile under TPC-E benchmark (SECURITY table,

θ = 4

).

8. Conclusions

In this paper, we propose DCoS, a transaction-scheduling method based on DRL. DCoS is compatible with existing partitioning methods that divide transactions into conflict-free sets, and adopts a fine-grained scheduling architecture to enable efficient concurrency for high-contention transaction sets. DCoS addresses the transaction-scheduling problem by updating the undirected arcs in the novel APG structure. Our experimental results demonstrate that DCoS improves the system throughput compared to state-of-the-art scheduling approaches across diverse workload scenarios and maintains robust generalization capabilities when exposed to unseen transaction workload patterns. The fine-grained scheduling mechanism is selectively applied in high-contention scenarios, thereby avoiding unnecessary overhead in low-contention settings. Future work involves extending DCoS to distributed settings and examining its compatibility with emerging CC schemes across varying deployment architectures.

Author Contributions

Conceptualization, S.C.; methodology, C.S.; software, S.C.; validation, C.W.; formal analysis, C.S.; investigation, S.C.; resources, S.C.; data curation, C.S.; writing—original draft preparation, C.S.; writing—review and editing, C.W.; visualization, S.C.; supervision, C.W.; project administration, C.S.; funding acquisition, C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Key R&D Program of China (2024YFB2906500) and Key R&D Program of Zhejiang (2024SSYS0001).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The TPC-C and TPC-E benchmark are available in the references; the self-constructed microbenchmarks will be made available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DCoS	Dynamic Contention Scheduling
CC	Concurrency control
OLTP	Online transaction processing
2PL	Two-phase blocking
OCC	Optimistic concurrency control
MDP	Markov decision process
MVCC	Multi-version concurrency control
RP	Runtime pipelining
GIN	Graph isomorphism network
DRL	Deep reinforcement learning

References

Yu, X.; Bezerra, G.; Pavlo, A.; Devadas, S.; Stonebraker, M. Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores; MIT Labraries: Cambridge, MA, USA, 2014. [Google Scholar]
Pavlo, A.; Curino, C.; Zdonik, S. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, Scottsdale, AZ, USA, 20–24 May 2012; pp. 61–72. [Google Scholar]
Curino, C.; Jones, E.P.C.; Zhang, Y.; Madden, S.R. Schism: A Workload-Driven Approach to Database Replication and Partitioning; MIT Labraries: Cambridge, MA, USA, 2010. [Google Scholar]
Gray, J.; Reuter, A. Transaction Processing: Concepts and Techniques; Elsevier: Amsterdam, The Netherlands, 1992. [Google Scholar]
Kung, H.T.; Robinson, J.T. On optimistic methods for concurrency control. ACM Trans. Database Syst. (TODS) 1981, 6, 213–226. [Google Scholar] [CrossRef]
Shasha, D.; Llirbat, F.; Simon, E.; Valduriez, P. Transaction chopping: Algorithms and performance studies. ACM Trans. Database Syst. (TODS) 1995, 20, 325–363. [Google Scholar] [CrossRef]
Zhang, Y.; Power, R.; Zhou, S.; Sovran, Y.; Aguilera, M.K.; Li, J. Transaction chains: Achieving serializability with low latency in geo-distributed storage systems. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, Farmington, PA, USA, 3–6 November 2013; pp. 276–291. [Google Scholar]
Wang, Z.; Mu, S.; Cui, Y.; Yi, H.; Chen, H.; Li, J. Scaling multicore databases via constrained parallel execution. In Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, USA, 26 June–1 July 2016; pp. 1643–1658. [Google Scholar]
Xie, C.; Su, C.; Littley, C.; Alvisi, L.; Kapritsos, M.; Wang, Y. High-performance ACID via modular concurrency control. In Proceedings of the 25th Symposium on Operating Systems Principles, Monterey, CA, USA, 5–7 October 2015; pp. 279–294. [Google Scholar]
Zhou, N.; Zhou, X.; Zhang, X.; Du, X.; Wang, S. Reordering transaction execution to boost high-frequency trading applications. Data Sci. Eng. 2017, 2, 301–315. [Google Scholar] [CrossRef]
Liu, W.x.; Cai, J.; Chen, Q.C.; Wang, Y. DRL-R: Deep reinforcement learning approach for intelligent routing in software-defined data-center networks. J. Netw. Comput. Appl. 2021, 177, 102865. [Google Scholar] [CrossRef]
Yang, L.; Wei, Y.; Yu, F.R.; Han, Z. Joint routing and scheduling optimization in time-sensitive networks using graph-convolutional-network-based deep reinforcement learning. IEEE Internet Things J. 2022, 9, 23981–23994. [Google Scholar] [CrossRef]
Zhao, L.; Fan, J.; Zhang, C.; Shen, W.; Zhuang, J. A DRL-based reactive scheduling policy for flexible job shops with random job arrivals. IEEE Trans. Autom. Sci. Eng. 2023, 21, 2912–2923. [Google Scholar] [CrossRef]
Serafini, M.; Taft, R.; Elmore, A.J.; Pavlo, A.; Aboulnaga, A.; Stonebraker, M. Clay: Fine-grained adaptive partitioning for general database schemas. Proc. VLDB Endow. 2016, 10, 445–456. [Google Scholar] [CrossRef]
Prasaad, G.; Cheung, A.; Suciu, D. Handling highly contended OLTP workloads using fast dynamic partitioning. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Portland, OR, USA, 14–19 June 2020; pp. 527–542. [Google Scholar]
Bernstein, P.A.; Goodman, N. Multiversion concurrency control—Theory and algorithms. ACM Trans. Database Syst. (TODS) 1983, 8, 465–483. [Google Scholar] [CrossRef]
Thomson, A.; Diamond, T.; Weng, S.C.; Ren, K.; Shao, P.; Abadi, D.J. Calvin: Fast distributed transactions for partitioned database systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, Scottsdale, Arizona, 20–24 May 2012; pp. 1–12. [Google Scholar]
Ding, B.; Kot, L.; Gehrke, J. Improving optimistic concurrency control through transaction batching and operation reordering. Proc. VLDB Endow. 2018, 12, 169–182. [Google Scholar] [CrossRef]
Li, J.; Lu, Y.; Wang, Q.; Lin, J.; Yang, Z.; Shu, J. {AlNiCo}:{SmartNIC-accelerated} contention-aware request scheduling for transaction processing. In Proceedings of the 2022 USENIX Annual Technical Conference (USENIX ATC 22), Carlsbad, CA, USA, 11–13 July 2022; pp. 951–966. [Google Scholar]
Guo, Z.; Wu, K.; Yan, C.; Yu, X. Releasing locks as early as you can: Reducing contention of hotspots by violating two-phase locking. In Proceedings of the 2021 International Conference on Management of Data, Xi’an, China, 20–25 June 2021; pp. 658–670. [Google Scholar]
Zamanian, E.; Shun, J.; Binnig, C.; Kraska, T. Chiller: Contention-centric transaction execution and data partitioning for modern networks. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Portland, OR, USA, 14–19 June 2020; pp. 511–526. [Google Scholar]
Pandis, I.; Johnson, R.; Hardavellas, N.; Ailamaki, A. Data-oriented transaction execution. Proc. VLDB Endow. 2010, 3, 928–939. [Google Scholar] [CrossRef]
Qadah, T.; Gupta, S.; Sadoghi, M. Q-Store: Distributed, Multi-partition Transactions via Queue-oriented Execution and Communication. In Proceedings of the EDBT 23rd International Conference on Extending Database Technology, Copenhagen, Denmark, 30 March–2 April 2020; pp. 73–84. [Google Scholar]
Yao, C.; Agrawal, D.; Chen, G.; Lin, Q.; Ooi, B.C.; Wong, W.F.; Zhang, M. Exploiting single-threaded model in multi-core in-memory systems. IEEE Trans. Knowl. Data Eng. 2016, 28, 2635–2650. [Google Scholar] [CrossRef]
Qadah, T.M.; Sadoghi, M. Quecc: A queue-oriented, control-free concurrency architecture. In Proceedings of the 19th International Middleware Conference, Rennes, France, 10–14 December 2018; pp. 13–25. [Google Scholar]
Kallman, R.; Kimura, H.; Natkins, J.; Pavlo, A.; Rasin, A.; Zdonik, S.; Jones, E.P.; Madden, S.; Stonebraker, M.; Zhang, Y.; et al. H-store: A high-performance, distributed main memory transaction processing system. Proc. VLDB Endow. 2008, 1, 1496–1499. [Google Scholar] [CrossRef]
Tu, S.; Zheng, W.; Kohler, E.; Liskov, B.; Madden, S. Speedy transactions in multicore in-memory databases. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, Farmington, PA, USA, 3–6 November 2013; pp. 18–32. [Google Scholar]
Cao, Y.; Fan, W.; Ou, W.; Xie, R.; Zhao, W. Transaction Scheduling: From Conflicts to Runtime Conflicts. Proc. ACM Manag. Data 2023, 1, 26. [Google Scholar] [CrossRef]
Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How powerful are graph neural networks? arXiv 2018, arXiv:1810.00826. [Google Scholar]
Zhang, C.; Song, W.; Cao, Z.; Zhang, J.; Tan, P.S.; Chi, X. Learning to dispatch for job shop scheduling via deep reinforcement learning. Adv. Neural Inf. Process. Syst. 2020, 33, 1621–1632. [Google Scholar]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
DBx1000. Available online: https://github.com/yxymit/DBx1000 (accessed on 31 May 2025).
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Transaction Processing Performance Council. TPC-C Standards Specification; Technical Report; Transaction Processing Performance Council: San Francisco, CA, USA, 2010. [Google Scholar]
Transaction Processing Performance Council. TPC-E Standards Specification; Technical Report; Transaction Processing Performance Council: San Francisco, CA, USA, 2015. [Google Scholar]

Figure 1. DCoS workflow in transaction processing for OLTP system.

Figure 2. Transaction execution model.

Figure 3. Dependency patterns. (a) Direct dependency. (b) Transitive dependency.

Figure 4. Fine-grained scheduling process with a DRL executor.

Figure 5. DCoS executor solves the scheduling problem in Example 2.

Figure 6. Actor–critic framework of DCoS executor.

Figure 7. Transaction processing of DCoS under varying contention levels.

Figure 8. (a) Scalability comparison under different workloads. (b) Generalization performance with increasing concurrent transactions and thread workers.

Figure 9. TPC-C performance (from high contention to low contention).

Figure 10. Scalability testing (one warehouse).

Figure 11. Throughput comparison under different Zipf skew values (

θ

).

Figure 12. Scalability analysis

θ = 3

.

Figure 13. Execution profile under TPC-E benchmark (SECURITY table,

θ = 4

).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Intelligent Transaction Scheduling to Enhance Concurrency in High-Contention Workloads

Abstract

1. Introduction

3. Preliminaries

3.1. Formalism of Transaction Processing

3.2. Reinforcement Learning

4. Design of DCoS

4.1. Conflict-Free Partitioning Phase

4.2. Dynamic Execution Phase

4.2.1. Execution Model

4.2.2. Contention Transaction Patterns

4.2.3. Fine-Grained Scheduling

5. Drl-Based Executor

5.1. MDP Formulation

5.2. Network Architecture

5.3. Training Algorithm

6. Deployment of DCoS

Limitations and Practical Considerations

7. Experimental Results

7.1. Experimental Setup

7.2. Microbenchmark Evaluation

7.3. TPC-C Benchmark Evaluation

7.4. TPC-E Benchmark Evaluation

7.5. Overhead Analysis

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics

Intelligent Transaction Scheduling to Enhance Concurrency in High-Contention Workloads

Abstract

1. Introduction

2. Related Work

2.1. Transaction Partitioning

2.2. Concurrency Control

2.2.1. Transaction Decomposition

2.2.2. Transaction Dependency Management

2.3. Transaction Assignment

3. Preliminaries

3.1. Formalism of Transaction Processing

3.2. Reinforcement Learning

4. Design of DCoS

4.1. Conflict-Free Partitioning Phase

4.2. Dynamic Execution Phase

4.2.1. Execution Model

4.2.2. Contention Transaction Patterns

4.2.3. Fine-Grained Scheduling

5. Drl-Based Executor

5.1. MDP Formulation

5.2. Network Architecture

5.3. Training Algorithm

6. Deployment of DCoS

Limitations and Practical Considerations

7. Experimental Results

7.1. Experimental Setup

7.2. Microbenchmark Evaluation

7.3. TPC-C Benchmark Evaluation

7.4. TPC-E Benchmark Evaluation

7.5. Overhead Analysis

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics