Next Article in Journal
Systematic Analysis of the Hydrogen Value Chain from Production to Utilization
Next Article in Special Issue
Predicting the Cooling Rate in Steel-Part Heat Treatment via Random Forests
Previous Article in Journal
ASHM-YOLOv9: A Detection Model for Strawberry in Greenhouses at Multiple Stages
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Deep Reinforcement Learning-Based Concurrency Control of Federated Digital Twin for Software-Defined Manufacturing Systems

1
Computer Science and Engineering, Computer Science Department, Korea University of Technology and Education, Cheonan-si 31253, Republic of Korea
2
Future Convergence and Engineering, Computer Science Department, Korea University of Technology and Education, Cheonan-si 31253, Republic of Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(15), 8245; https://doi.org/10.3390/app15158245
Submission received: 28 June 2025 / Revised: 21 July 2025 / Accepted: 22 July 2025 / Published: 24 July 2025

Abstract

Modern manufacturing demands real-time, scalable coordination that legacy manufacturing management systems cannot provide. Digital transformation encompasses the entire manufacturing infrastructure, which can be represented by digital twins for facilitating efficient monitoring, prediction, and optimization of factory operations. A Federated Digital Twin (FDT) emerges by combining heterogeneous digital twins, enabling real-time collaboration, data sharing, and collective decision-making. However, deploying FDTs introduces new concurrency control challenges, such as priority inversion and synchronization failures, which can potentially cause process delays, missed deadlines, and reduced customer satisfaction. Traditional concurrency control approaches in the computing domain, due to their reliance on static priority assignments and centralized control, are inadequate for managing dynamic, real-time conflicts effectively in real production lines. To address these challenges, this study proposes a novel concurrency control framework combining Deep Reinforcement Learning with the Priority Ceiling Protocol. Using SimPy-based discrete-event simulations, which accurately model the asynchronous nature of FDT interactions, the proposed approach adaptively optimizes resource allocation and effectively mitigates priority inversion. The results demonstrate that against the rule-based PCP controller, our hybrid DRLCC enhances completion time maximum of 24.27% to a minimum of 1.51%, urgent-job delay maximum of 6.65% and a minimum of 2.18%, while preserving lower-priority inversions.

1. Introduction

1.1. Background and Motivation

The manufacturing sector has progressed from the mechanization of the first Industrial Revolution to the highly connected, data-driven factories of Industry 4.0, where cyber–physical systems, the Internet of Things (IoT), and cloud platforms greatly amplify automation and data use [1]. Industry 5.0 extends this evolution by placing humans back in the loop, that is, keeping skilled operators actively involved alongside intelligent machines to achieve resilient, customized, and sustainable production [2]. To meet these expectations, self-organizing manufacturing systems (SOMSs) have emerged, allowing a decentralized network of autonomous agents to adapt collaboratively to rapidly changing shop floor conditions in real time [3]. SOMSs enable mass-personalization and agile reconfiguration, yet their very decentralization makes it hard to coordinate hundreds of independent resources when large-scale disturbances occur. Communication overhead, distributed scheduling conflicts, and the absence of a global oversight layer often degrade overall performance. Figure 1 illustrates the evolution of manufacturing from mechanized looms to human-centric Industry 5.0 systems.

1.2. Problem Statement and Industrial Challenge

Software-defined manufacturing systems (SDMSs) address these limitations by overlaying a software control plane anchored by digital twins (DTs) and centralized control mechanisms, enabling more efficient management of complex manufacturing environments [4]. DTs enable manufacturers to visualize, simulate, and optimize manufacturing workflows, monitor real-time operations, and predict equipment performance using sensor data, AI algorithms, and IoT integration [5]. However, DTs’ effective implementation requires seamless data integration from diverse sources, including legacy systems, sensors, and enterprise-wide repositories [6]. This complexity often hinders interoperability and real-time decision-making, limiting the full potential of DT-based manufacturing systems.
To address these integration bottlenecks at scale, researchers have recently proposed the Federated Digital Twin (FDT) paradigm, which enables multiple interconnected DTs to share operational insights across distributed manufacturing sites [7,8]. Despite these benefits, ensuring real-time concurrency control across synchronized DTs within SDMSs remains an open problem. In distributed FDT environments, concurrent updates to shared resources without proper coordination can cause data conflicts, synchronization failures, and priority inversions, where critical high-priority tasks are delayed by lower-priority operations, undermining Quality of Service (QoS) and delivery guarantees [9,10]. Traditional concurrency control mechanisms, designed for tightly coupled centralized systems, are ill-suited for these dynamic, distributed interactions [11]. Likewise, legacy rule-based schedulers lack the real-time adaptability required to prevent conflicts under fluctuating loads [12]. These shortcomings underscore the need for a smarter, real-time concurrency strategy tailored to FDT. We therefore propose a hybrid approach that couples Deep Reinforcement Learning (DRL) with the Priority Ceiling Protocol (PCP): DRL provides adaptive, data-driven resource allocation, while PCP enforces dynamically updated ceilings to block priority inversion. We call the hybrid scheduler Deep Reinforcement Learning-based Concurrency Control (DRLCC); the term “concurrency control” subsumes the Priority Ceiling Protocol layer, while “DRL-based” highlights the adaptive PPO agent.

1.3. Contributions

Our key contributions are as follows:
  • Hybrid Concurrency Scheme: Enforces mutual-exclusion ceilings via PCP while allowing a PPO-driven agent to adaptively schedule tasks based on real-time system states;
  • Simulation Platform: Implements a SimPy-based digital twin environment modeling Autonomous Mobile Robot (AMR) fleets, stochastic job arrivals, and resource demands;
  • Extensive Evaluation: Demonstrates up to 20.2% makespan reduction, 4.3% lower delay for high-priority tasks, and near-elimination of priority inversions compared to Priority Ceiling Protocol (PCP), Priority Inheritance Protocol (PIP), DRL-only, and DRL+PIP baselines under varying loads and robot counts.
The remainder of this paper is organized as follows. Section 2 reviews related work on concurrency control and DRL scheduling in smart manufacturing. Section 3 details the materials and methods, including the federated SimPy environment, the hybrid DRL-PCP architecture, and agent design. Section 4 presents simulation results and comparative analyses. Section 5 discusses implications, situates DRLCC within related work, outlines deployment considerations and finally concludes by suggesting avenues for future research.

2. Literature Review

2.1. Digital Twin Foundations and Federated Architectures

Modern manufacturing demands real-time, scalable coordination that legacy systems often cannot provide. Digital transformation, represented by digital twins, facilitates efficient monitoring, prediction, and optimization of factory operations [13]. Recent studies have shown that digital twins are evolving to solve increasingly complex real-world problems. DTs would progress from physical object models to collaborative DTs, where multiple DTs interact as autonomous agents in cyberspace, extending local and global data analysis [14]. Federated architecture enables interoperability and data sharing among these digital twins, providing a more holistic and dynamic view of the entire manufacturing ecosystem.
The digital twin federation framework encompasses physical space, virtual space, data connection, and service layers, establishing a shared reference environment for secure data exchange, collaboration, and trusted execution [15]. Through real-time monitoring, simulation, and optimization, digital twins enhance production processes [16]. Building accurate twin models demands expertise in electromagnetism, fluid dynamics, and kinematics, with models kept current via direct data streams from operational assets [17]. Despite these benefits, FDTs magnify concurrency control challenges, as consistent, coordinated access to shared resources across the federation is critical for maintaining system integrity. Recent work has also begun exploring digital twin models in higher-education settings [18].

2.2. Concurrency Control Protocols in Distributed Manufacturing

In distributed manufacturing, concurrency control ensures that concurrent operations on shared resources do not conflict or corrupt state. Traditional concurrency control approaches, such as mutex locks, semaphores, and timestamp ordering, were designed for centralized computing and can lead to transaction blocking, causing priority inversion problems and degrading system schedulability [19]. A particularly severe issue is priority inversion, where a low-priority task holds a resource needed by a high-priority job, delaying critical operations and degrading real-time responsiveness.
The Priority Ceiling Protocol minimizes the duration of priority inversions by assigning each resource a ceiling equal to the highest priority of tasks that may access it; when a task locks the resource, its priority is elevated to this ceiling, bounding worst-case blocking to a single critical section essential for real-time manufacturing [20]. The Priority Inheritance Protocol (PIP) tackles the same problem by temporarily raising the priority of any low-priority task that blocks a higher-priority one, enabling it to complete its critical section sooner and release the resource [21,22]. Unlike the Priority Ceiling Protocol, PIP does not require a system-wide ceiling table; therefore, it is simpler to implement but offers only bounded rather than minimal blocking times. From Figure 2, we can see a three-thread scenario where, without Priority Inheritance locking, the low-priority thread holds the mutex and blocks the high-priority thread while a medium-priority thread preempts, causing an unbounded delay. With Priority Inheritance, the low-priority holder temporarily inherits the high priority, finishes its critical section sooner, and the inversion window is bounded. PIP works well when the number of shared resources is small, yet as resource counts and task chains grow, it can suffer from transitive priority boosts and increased schedulability analysis complexity.
Stack Resource Policy (SRP) is another resource locking protocol for real-time systems, which prevents unbounded blocking by assigning each resource a “ceiling” equal to the highest priority of any task that may use it; a task may only begin when its priority exceeds all current ceilings, thereby bounding blocking times to at most one critical section [23]. In addition to these lock protocols, real-time systems often rely on deadline-driven schedulers. Earliest Deadline First (EDF) is a preemptive, dynamic-priority scheduling algorithm that always dispatches the task with the nearest absolute deadline. While EDF is optimal for minimizing deadline misses on a single processor under no shared-resource constraints, it provides no protection against priority inversion when tasks contend for locks. To introduce inversion-avoidance into EDF, we pair it with the SRP. EDF + SRP thus isolates the benefit of inversion-avoidance on top of an otherwise ideal dynamic scheduler.

2.3. Deep Reinforcement Learning for Decision

DRL combines reinforcement learning’s decision loop with deep networks’ ability to handle high-dimensional state spaces [24]. In manufacturing, DRL adapts to uncertain task arrivals and varying job characteristics, ensuring low deadline miss ratios and high resource utilization [25]. Its use has expanded from asset-level twins to entire industrial processes and human–machine collaboration. DRL also outperforms traditional methods in dynamic job-shop scheduling and resource allocation [26].
Most DRL studies, however, target throughput or energy metrics; very few explore their application to concurrency control in FDTs. DRL can learn adaptive allocation policies, preempt uncommitted lower-priority transactions, and prioritize critical jobs without restarting them, thereby mitigating priority inversion while maximizing throughput [2]. Combining DRL with PCP within an FDT framework merges adaptability and safety: DRL dynamically optimizes local resource assignments, and PCP enforces global priority constraints. This hybrid approach enables policies to evolve through environmental interaction, handle large state-action spaces, and adapt to fluctuating demand, thereby minimizing conflicts and boosting performance in software-defined manufacturing systems.
Traditional concurrency control and scheduling protocols, EDF, SRP, PCP, and PIP, were designed for relatively static domains (e.g., OS kernels or databases) with well-understood workloads. In Federated Digital Twin manufacturing, however, task arrivals, deadlines, and resource availability fluctuate unpredictably, and bursty contention patterns cause static algorithms to suffer high deadline miss rates and unbounded blocking underload (see Section 4). These challenges motivate the hybrid DRL and PCP framework proposed in Section 3.

3. Materials and Methods

3.1. Simulation Environment

The experiments were executed in Python 3.9.20 using SimPy 4.1.1, selected for its lightweight, open-source support of discrete-event modeling. SimPy provided an accessible, efficient, and effective development of our simulation environment to model asynchronous job arrivals, task executions, battery charging cycles, and message delays.

3.1.1. Manufacturing Cell Layout

A discrete-event model, as shown in Figure 3, represents a cell-based factory in which multiple production cells operate concurrently. Each cell issues transport tasks, e.g., raw material delivery or finished goods dispatch, that must be handled by a shared fleet of AMRs.

3.1.2. Job Arrival Process

Jobs arrive randomly, following an exponential distribution with a rate parameter λ = 0.4, which represents the average job arrival rate. This value was determined empirically, and each generated job comprises between one and three subtasks, and each job is randomly assigned a priority level: 0 (low), 1 (medium), or 2 (high).
f t ; λ = λ e λ t , t 0

3.1.3. AMR Configuration

The manufacturing cell is serviced by five Autonomous Mobile Robots (AMRs), each with a weight capacity randomly assigned between 150 and 200 units. AMRs consume battery power during task execution and automatically initiate charging once the battery level falls below 20%. They resume operations once the battery is charged to 95%.

3.1.4. Task Duration Distribution

Subtask durations are drawn from a Gamma distribution (Equation (2)) to realistically represent variability and skewed processing times typical in actual manufacturing environments. The shape (k) and scale (θ) parameters for the Gamma distribution are set as k = 2 and θ = 1, respectively.
f x ; k , θ = x k 1 e x θ θ k Γ k ,       x > 0

3.2. Hybrid DRLCC Framework

The architecture comprises three modules.

3.2.1. Module 1: State Extraction

Module 1 performs two tasks: it keeps an in-memory digital twin (DT) database that mirrors every job and Autonomous Mobile Robot (AMR) on the shop floor, and compresses this rich state into a fixed-length vector consumed by the PPO agent.
  • Digital Twin database: Two object classes are maintained:
    • Job (job_id, priority {0, 1, 2}, task_list, arrival_time, load_weight, feasible_amrs);
    • AMR (amr_id, capacity, battery_level (0–1), charging_flag, current_job, dynamic_ceiling).
SimPy processes create and update these objects: a job arrival process appends job instances, battery and charging processes update AMR attributes, and task handlers bind jobs to robots.
2.
Context extraction: At every decision epoch, the helper function get_state (env, amrs, job_queue) builds a vector of length 4N + 8 (here 4 is for the number of AMRs, which can be changed if the number of AMRs increases):
  • AMR features (per robot): busy flag, priority of current job (–1 if idle), raw battery level (0.0–1.0), current PCP ceiling;
  • Queue statistics: queue length |Q|, counts of jobs at priorities 0–2, wait-time statistics:
    Mean ω ¯ = 1 Q ( t now t a r r i v a l ) ;
    Maximum delay m a x ( t now t a r r i v a l ) ;
    Standard deviation σ w ;
    Mean priority p ¯ of waiting for jobs.
  • Normalization: The concatenated array is z-scored (mean 0, std 1) before being passed to the agent. Figure 4 represents the whole architecture of our proposed method, DRLCC.

3.2.2. Module 2: Concurrency Layer (Dynamic Priority Ceiling Protocol)

The Concurrency Layer enforces mutual exclusion and prevents priority inversion by combining a classic Priority Ceiling Protocol (PCP) with dynamic updates driven by real-time queue information.
Dynamic ceiling calculation: In a traditional PCP, each resource’s ceiling is statically set to the maximum priority of all tasks that might ever use it. However, our manufacturing workloads are dynamic: job arrivals, cancellations, and completion changes, which tasks are contending for each AMR. To accommodate this, we recompute each AMR’s ceiling at every decision point, the dynamic variant shown in Figure 5. Each resource (AMR) in our system is assigned a priority ceiling that is dynamically recalculated in real time. Instead of being fixed, the ceiling is updated by continuously evaluating the waiting job queue, identifying the highest-priority job among those potentially requesting a specific resource.
c e i l i n g r t =     p r i o r i t y j   j Q t r f e a s i b l e j max                            
where Q t is the set of jobs waiting at the time t .
Resource acquisition and blocking logic: When a job reaches the front of the queue, it attempts to acquire an AMR. The Concurrency Layer imposes three simultaneous conditions:
  • Resource Availability: The AMR must be currently free (unoccupied by another job).
  • Battery Threshold: AMR’s payload capacity must meet the job’s load weight. The AMR must either not be charging with battery > 0.30, or if charging, have battery > 0.25.
    a m r . c p a c i t y   j o b . l o a d _ w e i g h t
  • Ceiling check: The requested job’s priority must be greater than or equal to the dynamically updated ceiling of the targeted AMR.
    j o b . p r i o r i t y   a m r . c e i l i n g
Unlike the strict “greater than” priority condition typically found in standard PCP implementations, our protocol employs a “greater than or equal to” condition. This approach provides enhanced flexibility, ensuring more responsive and dynamic resource allocation in rapidly evolving manufacturing scenarios.
Only AMRs satisfying all three conditions are deemed feasible. If these conditions are not met, the requested job enters a waiting state. During this waiting period, the system actively monitors and records delays, the number of times jobs are blocked, and occurrences of potential priority inversion scenarios, specifically when a high-priority job is delayed by a lower-priority task occupying an AMR. As jobs finish and release resources, priority ceilings are recalculated based on the real-time status of the waiting queue. This continuous recalibration significantly reduces improper resource allocation and effectively mitigates priority inversion.
Integration with the scheduling loop: Within the main task handler, the Concurrency Layer is seamlessly woven into the scheduling loop. After ceilings are updated, the job builds its list of feasible AMRs; if non-empty, it invokes the DRL agent to select one, then double-checks that the chosen AMR still meets all PCP constraints before assignment. If at any point the chosen AMR fails a check (e.g., its battery dips), the job briefly yields and retries without penalizing the agent.
This tight interplay between dynamic PCP filtering, blocking/inversion detection, and RL-driven action selection ensures the following:
  • No high-priority job is indefinitely postponed by a lower-priority one.
  • The system adapts to changing workloads and resource states.
  • The RL agent learns scheduling strategies that respect real-time safety constraints.

3.2.3. Module 3: Decision-Making Layer (PPO Agent)

While the Priority Ceiling Protocol (PCP) guarantees mutual exclusion and prevents unbounded priority inversion, it cannot, by itself, optimize resource utilization in highly dynamic, unpredictable manufacturing scenarios. By integrating DRL, we enable the system to learn from experience and continually improve scheduling decisions, balancing throughput, delay reduction, and adherence to safety constraints.
RL Agent Overview: The PPO agent sits atop the simulated environment, closing the loop. Receive a normalized state vector from Module 1. Sample a candidate AMR assignment from the policy network, masked to only valid robots. Module 2′s PCP logic validates or rejects the action, yielding either execution or a penalty. The environment returns a scalar signal reflecting task urgency, wait penalties, block counts, and inversion events. At the end of an episode, PPO ingests all ( s , a , r , log π ) tuples and updates its policy and value networks via a clipped surrogate objective with Generalized Advantage Estimation (GAE). Proximal Policy Optimization (PPO) was selected as the DRL backbone for our framework due to its strong balance between stability, simplicity, and sample efficiency, qualities essential for dynamic, distributed environments like Federated Digital Twin-based manufacturing systems. PPO stands out due to its balance of stability, sample efficiency, and ease of implementation. These qualities are especially critical in our context, where frequent policy updates, limited training time, and high variability in system states are key factors.
The objective function is formally expressed as follows:
L C L I P θ =   E t min r t θ A t ,   c l i p r t θ , 1       ϵ ,   1   +     ϵ A t
where
  • θ is the policy parameters;
  • E t is the expectation over timesteps;
  • r t θ is the probability ratio between new and old policies;
  • A t is the advantage function estimating the relative benefit of an action;
  • ϵ is the clipping threshold.
Our PPO agent is implemented using PyTorch 2.5.0+cu118, leveraging its deep learning and auto-differentiation capabilities for efficient policy gradient computation. A three-layer fully connected backbone with two 256-unit ReLU hidden layers feeds parallel policy and value heads; complete architectural and training settings are summarized in Table 1.
State representation: The DRL agent receives a state vector at each decision step, which captures the current status of the manufacturing system. This state representation is carefully designed to provide the agent with a comprehensive view of all relevant parameters needed for effective decision-making. The state includes the following features:
  • Job Queue Status: The number of pending jobs, their priority levels (0: low, 1: medium, 2: high), and their waiting times;
  • AMR Status: Availability (free or occupied), current location, battery level, and charging status of each AMR;
  • Resource Occupancy: Information about which AMRs are currently assigned and to which job;
  • Task-Specific Data: The number of subtasks remaining and the estimated processing times.
This rich, multi-dimensional state vector enables the DRL agent to reason over complex scheduling and resource allocation decisions within a dynamic environment.
Action Space: The action space consists of assigning a specific task to an available AMR. The agent selects an action based on the observed state, aiming to maximize system performance while respecting priority and resource constraints. The action space defines all the agent’s possible decisions at a given state. In our case, each action corresponds to the assignment of a specific AMR to a specific waiting job. Formally,
  • If there are N AMRs and J waiting jobs, the agent’s action space consists of N   ×   J potential pairings.
  • At each step, the agent selects a (job, AMR) pairing, subject to validation by the PCP before execution.
Invalid actions (e.g., assigning a task to a charging AMR or violating the PCP ceiling) are filtered out by the PCP module, ensuring policy compliance with system constraints.
  • State s t     ϵ     R   ( 4 N   +   8 )   as detailed in Module 1.
  • Action a t chooses one of the N AMRs for the current job.
  • We mask out infeasible AMRs (capacity, battery, or ceiling violations) so the policy is only assigned from valid AMRs.
Reward Mechanism: The agent receives a composite reward that reflects the three operational objectives of the system: efficient task completion, responsiveness to high-priority jobs, and avoidance of priority inversions (see Table 2). After each epoch, the updated actor weights are copied into the frozen “old” policy network so that probability ratios in the next epoch are computed against freshly learned parameters. A StepLR scheduler (factor 0.9 every 100 updates) is applied to the Adam optimizer to improve long-horizon convergence.
The complete training loop, including data collection, multi-epoch PPO updates, and checkpointing, is provided in Figure 6.
Table 3 provides a concise description of the helper routines and update steps invoked within Figure 6. All hyperparameter settings were kept constant across the six random seeds listed above. A one-off sensitivity check (±20% on γ, ε, and LR) showed < 3% variation in makespan, confirming that performance is not acutely sensitive to these values.

3.2.4. Training and Convergence

To validate that our PPO-based scheduler reliably learns good policies across different fleet sizes and workload intensities, we trained for 500 episodes (six independent random seeds; NVIDIA RTX 3080 GPU (NVIDIA Corporation, Santa Clara, CA, USA)). Figure 7 shows the value-loss, policy-loss, and training-reward curves for our PPO-based DRLCC agent over six independent seeds. Key convergence diagnostics were as follows:
  • Reward plateau
Across both small (4 AMRs, 80 jobs) and larger scenarios (up to 8 AMRs, 200 jobs), the mean episodic reward rose sharply in early episodes. It plateaued by ~350 episodes, with inter-seed variance dropping below 5%.
2.
Loss stabilization
Policy and value losses showed oscillations during the first ~250 update steps but settled into a steady band by ~300 steps. We applied gradient clipping (max norm 0.5) and a StepLR schedule (γ = 0.9 every 100 steps) to dampen these oscillations.
3.
Hyperparameter robustness
A ±20% sweep on the learning rate, discount factor (γ), and clip ratio (ε) changed final makespans by <3% in all tested configurations, confirming that performance is not acutely sensitive to precise hyperparameter values.
4.
Transient dips
Less than 1% of training runs showed short-lived reward drops—typically due to random latency spikes—yet the clipping mechanism and entropy bonus enabled rapid recovery, with no lasting harm to the learned policy.
By combining smoothed reward curves, loss traces, and a modest sensitivity sweep, we ensured that our DRLCC models reliably converge under a variety of AMR fleet sizes and job arrival patterns before proceeding to comparative evaluation.

4. Results

4.1. Simulation Study

To evaluate the effectiveness of the proposed DRL-PCP framework, we focus on three key performance metrics that reflect both scheduling efficiency and real-time responsiveness in a Federated Digital Twin manufacturing context.

Performance Metrics

MakeSpan (Total completion Time): MakeSpan is defined as the total time required to complete all jobs in a single simulation episode. It is computed as follows:
M a k e s p a n = m a x i ( C i     A i )
where
C i = completion time of the job i ;
A i = arrival time of job i .
A lower makespan indicates faster overall system throughput.
High-Priority Jobs Delay: This metric captures the cumulative delay experienced by all high-priority jobs (priority level = 2). Figure 8 schematically depicts the timeline for a representative high-priority job from its arrival at t1, through blocking, to the start of service before giving the formal definition.
It is computed as follows:
H i g h     P r i o r i t y   D e l a y =   j H ( C j     A j )
where
H = set of all high-priority jobs;
C j = completion time of high-priority job j ;
A j = arrival time of job j .
Lower values reflect the system’s ability to respond promptly to critical tasks.
Priority Inversion Count: This metric counts the number of incidents where a high-priority job was blocked by a lower-priority job due to resource contention. A priority inversion is recorded when
I n v   c o u n t = 1 { p r i o = 2   w a i t s   &   A M R   h e l d   b y   p r i o < 2 }
  • A job with priority level = 2;
  • Is waiting for an AMR;
  • And the AMR is occupied by a job with priority < 2.
The total count is tracked per episode. Fewer inversions reflect better concurrency control and prioritization enforcement.

4.2. Comparative Evaluation

The performance of the proposed DRL + PCP hybrid model was compared with six baseline approaches. The following seven models were evaluated and compared:
  • Priority Ceiling Protocol (PCP-only);
  • Priority Inheritance Protocol (PIP-only);
  • Deep Reinforcement Learning (DRL-only);
  • DRL with Priority Inheritance (DRL + PIP);
  • DRL with Priority Ceiling Protocol (DRLCC) (our proposed method);
  • Earliest Deadline First (EDF);
  • Stack Resource Policy with EDF (EDF-SRP).
The comparison was made across the following key metrics.

4.2.1. Efficiency (Makespan)

Traditional methods (PCP-only and PIP-only) struggled to optimize makespan under dynamic job conditions due to their static nature. The DRL-based methods showed better adaptability, with DRLCC achieving the lowest makespan on average.
As shown in Table 4, while all models experienced an increased makespan with higher job loads, the DRLCC framework consistently maintained the lowest total completion time, demonstrating better adaptability and decision-making under system pressure. Notably, traditional methods (PCP and PIP) scaled poorly due to their static conflict resolution mechanisms. DRL-only and DRL + PIP showed moderate degradation, whereas the proposed hybrid maintained performance with minimal drop-off. Figure 9 shows total completion time (mean ± SD), where bars represent the average; error bars show one standard deviation. DRLCC delivers the shortest completion time in every scenario, outperforming pure DRL, DRL + inheritance, fixed priority ceiling, and Priority Inheritance baselines.

4.2.2. Priority Responsiveness (High-Priority Job Delay)

PIP-only and DRL-only models often failed to meet the time-sensitive demands of high-priority jobs. In contrast, DRLCC significantly reduced high-priority job delay, confirming that adaptive scheduling combined with strict concurrency filtering yields more responsive behavior. Table 5 lists the corresponding numerical means, while Figure 10 reports the mean ± standard deviation delay experienced by high-priority jobs. Across all six workload configurations, DRLCC consistently attains the lowest delay, outperforming the next-best DRL-Inheritance baseline. The narrow error bars in Figure 10 indicate that these gains are statistically stable. These results confirm that combining adaptive scheduling with dynamic priority ceiling filtering yields markedly faster service for urgent jobs than either DRL or classical real-time locking used in isolation.

4.2.3. Priority Inversion

The DRL-only and DRL + PIP models suffered from frequent priority inversions. Although PIP attempts to resolve these, it often results in delays due to task promotion overhead. The DRLCC framework showed the lowest number of inversions, thanks to proactive ceiling enforcement. Table 6 and Figure 11 show the mean number of priority inversion occurrences (±SD) for each scheduler under six workload configurations. Lower values indicate better avoidance of priority inversion.
Table 6 shows that PCP never experiences priority inversion, but at the cost of strict sequential execution, this can leave high-priority jobs stuck behind a long queue. DRLCC, by contrast, learns when it is safe to let a lower-priority task proceed first (for example, when a robot’s battery is high or no urgent jobs are waiting) and only delays the most critical work. In applications where absolute avoidance of priority inversion is non-negotiable, classic PCP remains preferable thanks to its deterministic mutual-exclusion guarantees. Conversely, if minimizing the latency of high-priority tasks is paramount, our DRLCC framework consistently delivers faster turnaround for urgent jobs without degrading overall throughput. Finally, in scenarios requiring both strong average completion rates and low delays for critical jobs, DRLCC matches PCP’s aggregate performance while drastically reducing wait times for top-priority tasks.

4.2.4. Scalability Analysis (Varying Job Loads)

To evaluate how well the proposed framework scales with increasing workloads, we tested each model under higher job arrival volumes while keeping the number of AMRs fixed at 4, 5, and 6. Specifically, the number of jobs per episode increased from 80 to 100 and 120. The goal was to observe how each method maintains scheduling performance under growing system pressure.
Improvement gain
I m p r o v e m e n t % = b a s e l i n e C o m p a r e d   m o d e l b a s e l i n e   × 100
  r t = 5 ,       i f   j o b   p r i o r i t y = 2 2 ,             o t h e r w i s e                               0.1 w t 0.2 b t 10 I i n v e r s i o n + 20 I f a s t   h i g h   p r i o
Simulation results show that the hybrid method reduces the overall makespan and high-priority task delay while virtually eliminating priority inversions compared to PCP-only, DRL-only, and Priority Inheritance baselines, providing clear evidence and deployment guidelines for smart-factory engineers. The percentage improvements of DRLCC over each baseline scheduler (PCP, DRL, DRL + Inheritance, PI, EDF, and EDF_SRP) are summarized in Table 7.

5. Conclusions

By embedding the Priority Ceiling Protocol at the core of our Concurrency Layer and refreshing ceilings on the fly, we achieve both strong mutual exclusion guarantees and the flexibility to handle highly dynamic manufacturing scenarios.
This study presented a hybrid concurrency control framework, DRLCC, that couples Deep Reinforcement Learning (Proximal Policy Optimization) with a dynamically updated PCP to resolve resource contention in Federated Digital Twin, software-defined manufacturing systems. Simulations in a SimPy-based discrete-event environment, modeling asynchronous job arrivals and AMR constraints, show that DRLCC consistently outperforms four baselines (DRL-only, PCP-only, DRL + Priority Inheritance, and Priority Inheritance alone) in three metrics.
Makespan was reduced by up to 24.27% compared to the classic Priority Ceiling Protocol, with an average improvement of 1.51% over other DRL-based baselines. High-priority task delays decreased by up to 6.65%, ensuring more responsive handling of critical jobs. Priority inversions were virtually eliminated, dropping from dozens per episode under Priority Inheritance to fewer than one per run with DRLCC. These results confirm that combining data-driven adaptability (DRL) with strong real-time safety guarantees (PCP) yields a scheduler that both learns to optimize throughput and rigorously enforces hard deadlines. Performance gains persist as job volume scales, underscoring the framework’s practicality for real-world factories with fluctuating workloads.

5.1. Limitations and Future Recommendations

Despite these strengths, our study has several limitations:
  • We modeled manufacturing dynamics in SimPy, which abstracted away low-level physics (e.g., robot kinematics, network jitter) and assumed perfect sensor telemetry.
  • All robots shared identical dynamics and capacity distributions. Real factories often deploy heterogeneous fleets with different speeds, charging profiles, and failure modes.
  • Our scenario considered one production cell network. Large plants feature multiple cells with cross-cell resource sharing and traffic conflicts not captured here.
  • In this work, we held sensor noise (σ = 0.03) and one-way communication latency (50 ms) fixed to isolate the impact of our DRLCC framework. In practice, both quantities can vary substantially higher. σ would introduce noisier state observations (slowing convergence and increasing reward variance), and greater latency would risk decisions acting on stale state (increasing idle times, conflicts, and inversion penalties). A full sensitivity sweep over σ and latency is left to future work to delineate the robust envelope of DRLCC.
Our simulation uses idealized AMR models (identical robots, perfect sensing, zero communication loss), which may not fully capture real-world variability. While our SimPy experiments demonstrate DRLCC’s potential under controlled conditions, translating these results to real AMR fleets involves nontrivial system integration and resource investment. We therefore identify physical testbed deployment together with corresponding hardware-in-the-loop evaluations as an important future research direction.
Future work may focus on employing more advanced and realistic simulation environments to model generic and more complex manufacturing scenarios, potentially involving varying production layouts, dynamic AMR behavior, or disruptions. Additionally, further research can explore integrating federated learning techniques to support distributed training across multiple digital twin agents and extending the system to support heterogeneous AMRs and multi-agent coordination at larger production scales.

5.2. Computational Scalability of DRLCC

In addition to its scheduling effectiveness, we profiled how the computational cost of our PPO-based scheduler grows with fleet size and workload. Because our state vector scales as O(N) (three features per AMR plus eight queue stats), the neural network input dimension—and thus per-decision cost—grows linearly with the number of robots. On an NVIDIA RTX 3080, we observed the following:
  • Inference latency: Mean time per decision stayed below 0.5 ms (σ < 0.1 ms across seeds) for fleets up to 6 AMRs, ensuring real-time applicability even in our largest scenarios.
  • End-to-end training time: Running 500 episodes took approximately 35 min to 45 min for the smallest config (4 AMRs, 80 jobs) and about 1.5 h for the largest (6 AMRs, 120 jobs).
  • Policy-update cost: Each PPO update epoch (batch size 64, 6 passes) required on average 8–12 s with 4 AMRs, rising to 12–16 s with 6 AMRs.
These results indicate that both inference and learning times increase roughly linearly with problem size and remain within practical bounds for typical smart-manufacturing cells. For still larger deployments, one could explore vectorized environments or distributed training to further accelerate convergence.

Author Contributions

Conceptualization, J.-W.K., R.A. and W.-T.K.; methodology, R.A. and J.-W.K.; software, R.A.; validation, R.A. and W.-T.K.; investigation, J.-W.K. and R.A.; writing—original draft preparation, R.A.; writing—review and editing, J.-W.K., R.A. and W.-T.K.; visualization, J.-W.K.; supervision, W.-T.K.; project administration, W.-T.K.; funding acquisition, W.-T.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Technology Innovation Program (Development of SDF-Based AI Autonomous Manufacturing Core Technology to Advance the Automobile Industry) funded by the Ministry of Trade, Industry and Energy (MOTIE), South Korea, under Grant RS-2024-00507388.

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

The simulation data (SimPy event logs and metrics) generated and analyzed during this study are not deposited in a public repository but are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Yang, W.; Xiang, W.; Yang, Y.; Cheng, P. Optimizing Federated Learning with Deep Reinforcement Learning for Digital Twin Empowered Industrial IoT. IEEE Trans. Ind. Inform. 2023, 19, 1884–1893. [Google Scholar] [CrossRef]
  2. Xu, X.; Lu, Y.; Vogel-Heuser, B.; Wang, L. Industry 4.0 and Industry 5.0—Inception, Conception and Perception. J. Manuf. Syst. 2021, 61, 530–535. [Google Scholar] [CrossRef]
  3. Qin, Z.; Lu, Y. Self-Organizing Manufacturing Network: A Paradigm towards Smart Manufacturing in Mass Personalization. J. Manuf. Syst. 2021, 60, 35–47. [Google Scholar] [CrossRef]
  4. Shao, G.; Helu, M. Framework for a Digital Twin in Manufacturing: Scope and Requirements. Manuf. Lett. 2020, 24, 105–107. [Google Scholar] [CrossRef]
  5. Magomadov, V.S. The digital twin technology and its role in manufacturing. IOP Conf. Ser. Mater. Sci. Eng. 2020, 862, 032080. [Google Scholar] [CrossRef]
  6. Iliuţă, M.-E.; Moisescu, M.-A.; Pop, E.; Ionita, A.-D.; Caramihai, S.-I.; Mitulescu, T.-C. Digital Twin—A Review of the Evolution from Concept to Technology and Its Analytical Perspectives on Applications in Various Fields. Appl. Sci. 2024, 14, 5454. [Google Scholar] [CrossRef]
  7. Ahn, J.; Yun, S.; Kwon, J.-W.; Kim, W.-T. Literacy Deep Reinforcement Learning-Based Federated Digital Twin Scheduling for the Software-Defined Factory. Electronics 2024, 13, 4452. [Google Scholar] [CrossRef]
  8. Vergara, C.; Bahsoon, R.; Theodoropoulos, G.; Yanez, W.; Tziritas, N. Federated Digital Twin. In Proceedings of the 2023 IEEE/ACM 27th International Symposium on Distributed Simulation and Real Time Applications (DS-RT), Singapore, 4–5 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 115–116. [Google Scholar]
  9. Pang, T.Y.; Pelaez Restrepo, J.D.; Cheng, C.-T.; Yasin, A.; Lim, H.; Miletic, M. Developing a Digital Twin and Digital Thread Framework for an ‘Industry 4.0’ Shipyard. Appl. Sci. 2021, 11, 1097. [Google Scholar] [CrossRef]
  10. Fuller, A.; Fan, Z.; Day, C.; Barlow, C. Digital Twin: Enabling Technologies, Challenges and Open Research. IEEE Access 2020, 8, 108952–108971. [Google Scholar] [CrossRef]
  11. Lam, K.Y.; Kuo, T.W.; Kao, B.; Lee, T.S.H.; Cheng, R. Evaluation of Concurrency Control Strategies for Mixed Soft Real-Time Database Systems. Inf. Syst. 2002, 27, 123–149. [Google Scholar] [CrossRef]
  12. Chan, E.; Yu, K.M. A concurrency control model for PDM systems. Comput. Ind. 2007, 58, 823–831. [Google Scholar] [CrossRef]
  13. Lu, Y.; Liu, C.; Wang, K.I.-K.; Huang, H.; Xu, X. Digital Twin-driven Smart Manufacturing: Connotation, Reference Model, Applications and Research Issues. Robot. Comput.-Integr. Manuf. 2020, 61, 101837. [Google Scholar] [CrossRef]
  14. Kim, Y.-J.; Kim, H.; Ha, B.; Kim, W.-T. Federated Digital Twins: A Scheduling Approach Based on Temporal Graph Neural Network and Deep Reinforcement Learning. IEEE Access 2025, 13, 20763–20777. [Google Scholar] [CrossRef]
  15. Bécue, A.; Maia, E.; Feeken, L.; Borchers, P.; Praça, I. A New Concept of Digital Twin Supporting Optimization and Resilience of Factories of the Future. Appl. Sci. 2020, 10, 4482. [Google Scholar] [CrossRef]
  16. Ullah, A.; Younas, M. Development and Application of Digital Twin Control in Flexible Manufacturing Systems. J. Manuf. Mater. Process. 2024, 8, 214. [Google Scholar] [CrossRef]
  17. Lattanzi, L.; Raffaeli, R.; Peruzzini, M.; Pellicciari, M. Digital twin for smart manufacturing: A review of concepts towards a practical industrial implementation. Int. J. Comput. Integr. Manuf. 2021, 34, 567–597. [Google Scholar] [CrossRef]
  18. Selim, A.; Ali, I.; Saracevic, M.; Ristevski, B. Application of the digital twin model in higher education. Multimed. Tools Appl. 2025, 84, 24255–24272. [Google Scholar] [CrossRef]
  19. Davari, S.; Sha, L. Sources of Unbounded Priority Inversions in Real-Time Systems and a Comparative Study of Possible Solutions. In ACM SIGOPS Operating Systems Review; Association for Computing Machinery: New York, NY, USA, 1992; Volume 26, pp. 110–120. [Google Scholar]
  20. Yang, M.; Chen, Z.; Jiang, X.; Guan, N.; Lei, H. DPCP-p: A Distributed Locking Protocol for Parallel Real-Time Tasks. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 20–24 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
  21. Sha, L.; Rajkumar, R.; Lehoczky, J.P. Priority Inheritance Protocols: An Approach to Real-Time Synchronization. IEEE Trans. Comput. 1990, 39, 1175–1185. [Google Scholar] [CrossRef]
  22. Zhang, X.; Urban, C.; Wu, C. Priority Inheritance Protocol Proved Correct. J. Autom. Reason. 2020, 64, 73–95. [Google Scholar] [CrossRef]
  23. Baker, T.P. A stack-based resource allocation policy for real-time processes. In Proceedings of the 11th IEEE Real-Time Systems Symposium (RTSS ’90), Miami Beach, FL, USA, 3–6 December 1990; pp. 191–200. [Google Scholar]
  24. Zhang, P.; Wang, C.; Jiang, C.; Han, Z. Deep Reinforcement Learning Assisted Federated Learning Algorithm for Data Management of IIoT. arXiv 2022, arXiv:2202.03575. [Google Scholar] [CrossRef]
  25. Hammami, N.E.H.; Lardeux, B.; Hadj-Alouane, A.B.; Jridi, M. Job Shop Scheduling: A Novel DRL Approach for Continuous Schedule-Generation Facing Real-Time Job Arrivals. PapersOnLine 2022, 55, 2493–2498. [Google Scholar] [CrossRef]
  26. Zhang, J.; Ding, G.; Zou, Y.; Qin, S.; Fu, J. Review of Job Shop Scheduling Research and Its New Perspectives under Industry 4.0. J. Intell. Manuf. 2019, 30, 1809–1830. [Google Scholar] [CrossRef]
Figure 1. Evolution from Industry 1.0 to Industry 5.0.
Figure 1. Evolution from Industry 1.0 to Industry 5.0.
Applsci 15 08245 g001
Figure 2. (a) Without inheritance: The low-priority holder acquires the lock at A, then is preempted by a medium-priority thread at B, blocking the high-priority thread for an unbounded interval until C. (b) With inheritance: Upon blocking the high-priority thread at A, the low-priority holder immediately inherits the higher priority, completes its critical section at B without interference, and releases the lock at C, bounding the priority-inversion window.
Figure 2. (a) Without inheritance: The low-priority holder acquires the lock at A, then is preempted by a medium-priority thread at B, blocking the high-priority thread for an unbounded interval until C. (b) With inheritance: Upon blocking the high-priority thread at A, the low-priority holder immediately inherits the higher priority, completes its critical section at B without interference, and releases the lock at C, bounding the priority-inversion window.
Applsci 15 08245 g002
Figure 3. A scenario of cell-based factory.
Figure 3. A scenario of cell-based factory.
Applsci 15 08245 g003
Figure 4. DRLCC framework.
Figure 4. DRLCC framework.
Applsci 15 08245 g004
Figure 5. Flowchart of our dynamic Priority Ceiling Protocol for AMR allocation. When a robot becomes idle, a request from a waiting job triggers a priority check and real-time ceiling adjustment (the ceiling is set to the highest priority among queued jobs). If the requesting job’s priority meets or exceeds this ceiling, the robot is locked, and the task proceeds; otherwise, access is denied, and the job remains in the queue. Once the task finishes, the robot is released, and its ceiling is recomputed before the next allocation.
Figure 5. Flowchart of our dynamic Priority Ceiling Protocol for AMR allocation. When a robot becomes idle, a request from a waiting job triggers a priority check and real-time ceiling adjustment (the ceiling is set to the highest priority among queued jobs). If the requesting job’s priority meets or exceeds this ceiling, the robot is locked, and the task proceeds; otherwise, access is denied, and the job remains in the queue. Once the task finishes, the robot is released, and its ceiling is recomputed before the next allocation.
Applsci 15 08245 g005
Figure 6. DRLCC training loop with PPO and dynamic priority ceiling. DRLCC Scheduler (PPO-based Concurrency Control with Dynamic PCP).
Figure 6. DRLCC training loop with PPO and dynamic priority ceiling. DRLCC Scheduler (PPO-based Concurrency Control with Dynamic PCP).
Applsci 15 08245 g006
Figure 7. Training curves for the PPO-based DRLCC agent over 6 independent seeds for scenario 4AMR with 80 jobs. Mean value-loss (bold blue) with individual seed traces (faded lines) plotted against update step, demonstrating a steady decline and plateau by ~4000 steps. Mean policy-loss (bold red) with individual seed traces (faded lines), oscillating within a narrow band around zero after initial stabilization. Mean episodic reward (orange) and its exponential moving average (green dashed) over update steps, showing convergence to ~650 reward by ~3500 steps. Shaded regions indicate ±1 standard deviation across seeds.
Figure 7. Training curves for the PPO-based DRLCC agent over 6 independent seeds for scenario 4AMR with 80 jobs. Mean value-loss (bold blue) with individual seed traces (faded lines) plotted against update step, demonstrating a steady decline and plateau by ~4000 steps. Mean policy-loss (bold red) with individual seed traces (faded lines), oscillating within a narrow band around zero after initial stabilization. Mean episodic reward (orange) and its exponential moving average (green dashed) over update steps, showing convergence to ~650 reward by ~3500 steps. Shaded regions indicate ±1 standard deviation across seeds.
Applsci 15 08245 g007
Figure 8. Schematic timeline for a high-priority job. After arriving at t1, the job is blocked until t4 (the High-Priority Delay), then receives service from t4 to t7.
Figure 8. Schematic timeline for a high-priority job. After arriving at t1, the job is blocked until t4 (the High-Priority Delay), then receives service from t4 to t7.
Applsci 15 08245 g008
Figure 9. Total completion time (mean ± SD) achieved by five schedulers under six workload configurations: (a) 4 AMRs, 80 jobs; (b) 4 AMRs, 100 jobs; (c) 5 AMRs, 100 jobs; (d) 5 AMRs, 120 jobs; (e) 6 AMRs, 100 jobs; (f) 6 AMRs, 120 jobs.
Figure 9. Total completion time (mean ± SD) achieved by five schedulers under six workload configurations: (a) 4 AMRs, 80 jobs; (b) 4 AMRs, 100 jobs; (c) 5 AMRs, 100 jobs; (d) 5 AMRs, 120 jobs; (e) 6 AMRs, 100 jobs; (f) 6 AMRs, 120 jobs.
Applsci 15 08245 g009aApplsci 15 08245 g009b
Figure 10. High-priority job delay (mean ± SD) for five schedulers under six workload configurations: (a) 4 AMRs, 80 jobs; (b) 5 AMRs, 120 jobs; (c) 5 AMRs, 100 jobs; (d) 6 AMRs, 120 jobs; (e) 6 AMRs, 100 jobs; (f) 4 AMRs, 100 jobs.
Figure 10. High-priority job delay (mean ± SD) for five schedulers under six workload configurations: (a) 4 AMRs, 80 jobs; (b) 5 AMRs, 120 jobs; (c) 5 AMRs, 100 jobs; (d) 6 AMRs, 120 jobs; (e) 6 AMRs, 100 jobs; (f) 4 AMRs, 100 jobs.
Applsci 15 08245 g010
Figure 11. Priority inversion count (mean ± SD) for five schedulers across six workload configurations: (a) 4 AMRs, 80 jobs; (b) 5 AMRs, 120 jobs; (c) 5 AMRs, 100 jobs; (d) 6 AMRs, 120 jobs; (e) 6 AMRs, 100 jobs; (f) 4 AMRs, 100 jobs.
Figure 11. Priority inversion count (mean ± SD) for five schedulers across six workload configurations: (a) 4 AMRs, 80 jobs; (b) 5 AMRs, 120 jobs; (c) 5 AMRs, 100 jobs; (d) 6 AMRs, 120 jobs; (e) 6 AMRs, 100 jobs; (f) 4 AMRs, 100 jobs.
Applsci 15 08245 g011aApplsci 15 08245 g011b
Table 1. PPO agent neural network architecture and training hyperparameters.
Table 1. PPO agent neural network architecture and training hyperparameters.
Category ItemValue
ArchitectureInput dimension4 N + 8
Hidden layers2 × 256 (ReLU)
Policy headFully connected, M outputs, Softmax
Value headFully connected, 1 output, linear
TrainingOptimizerAdam
Learning rate1 × 10−4
Clip ratio ε0.10
Discount γ0.95
GAE λ0.95
Mini batch/Epochs64/6
Entropy bonus0.01
Table 2. Reward values are provided during the training of the model.
Table 2. Reward values are provided during the training of the model.
SymbolTriggerValue
R C o m p 2 Base reward per completed high-priority task+5
R c o m p 0 1 Base reward per completed medium/low task+2
R w a i t −0.1 × task-wait-time (s)–0.1 s 1
P block −0.2 × job-block-count–0.2 block 1
P i n v Priority inversion penalty–10
R q u i c k Bonus if a high-priority job finishes < 20 s after arrival +20
Table 3. Summary of Figure 6.
Table 3. Summary of Figure 6.
NameSummary
BATTERY_CHECK(amr)every 1 s: if battery < 0.2 ⇒ start charging;
if charging & battery ≥ 0.95 ⇒ stop.
JOB_ARRIVAL(…, λ)spawn next job after Exp(λ); set priority ∈ {0,1,2}; env.process(HANDLE_JOB)
HANDLE_JOB(job)Update AMR ceilings via PCP
while no feasible AMR: wait 1 s, penalise wait/block, detect inversion
else choose AMR = πθ(s, feasible), run task, give reward, release AMR
PPO_UPDATE()If buffer ≥ 64 samples: compute advantages (GAE), optimise policy/value nets for K = 6 epochs, clip grads at 0.1, step LR scheduler (step = 100, γ = 0.9).
BATTERY_CHECK(amr)every 1 s: if battery < 0.2 ⇒ start charging;
if charging & battery ≥ 0.95 ⇒ stop.
Table 4. Comparison of mean values of total completion time of all models.
Table 4. Comparison of mean values of total completion time of all models.
Models 4 × 80 4 × 100 5 × 100 5 × 120 6 × 100 6 × 120 % Improvement
DRLCC (Ours)14.42 ± 0.14 14.98 ± 0.1413.03 ± 0.0413.29 ± 0.1212.68 ± 0.0412.94 ± 0.13_
DRL_Inheritance15.06 ± 0.1315.79 ± 0.1514.33 ± 0.1314.70 ± 0.1913.81 ± 0.0514.10 ± 0.066.21%
DRL15.20 ± 0.2815.22 ± 0.1113.16 ± 0.0713.46 ± 0.0912.66 ± 0.0612.97 ± 0.091.51%
Priority Ceiling19.81 ± 0.2719.38 ± 0.2717.92 ± 0.2117.99 ± 0.2315.46 ± 0.1917.09 ± 0.2324.27%
Priority Inheritance17.83 ± 0.3119.52 ± 0.1113.82 ± 0.1114.34 ± 0.1512.78 ± 0.1013.12 ± 0.099.60%
EDF59.55 ± 0.2973.95 ± 0.4668.69 ± 0.6482.61 ± 0.1565.19 ± 0.4278.27 ± 0.5280.75%
EDF_SRP53.13 ± 0.2265.74 ± 0.2365.43 ± 0.2177.81 ± 0.1555.74 ± 0.1966.00 ± 0.1978.45%
Table 5. Comparison of mean values of high-priority delay of all models.
Table 5. Comparison of mean values of high-priority delay of all models.
Models 4 × 80 4 × 100 5 × 100 5 × 120 6 × 100 6 × 120 % Improvement
DRLCC (Ours)4.40 ± 0.044.38 ± 0.024.18 ± 0.014.20 ± 0.024.14 ± 0.024.12 ± 0.01_
DRL_Inheritance4.40 ± 0.024.82 ± 0.024.46 ± 0.034.47 ± 0.034.36 ± 0.024.33 ± 0.035.22%
DRL4.69 ± 0.014.51 ± 0.024.25 ± 0.014.26 ± 0.024.15 ± 0.024.15 ± 0.032.81%
Priority Ceiling4.60 ± 0.034.59 ± 0.034.44 ± 0.024.47 ± 0.024.27 ± 0.024.35 ± 0.034.86%
Priority Inheritance5.04 ± 0.035.18 ± 0.014.38 ± 0.024.40 ± 0.024.18 ± 0.014.19 ± 0.016.65%
EDF7.85 ± 0.1110.05 ± 0.1611.69 ± 0.3114.47 ± 0.2410.08 ± 0.1812.47 ± 0.2860.25%
EDF_SRP4.77 ± 0.164.09 ± 0.154.46 ± 0.134.60 ± 0.097.65 ± 0.238.49 ± 0.3118.83%
Table 6. The number of priority inversion occurrences across all models.
Table 6. The number of priority inversion occurrences across all models.
Models 4 × 80 4 × 100 5 × 100 5 × 120 6 × 100 6 × 120 %Improvement
DRLCC (Ours)5.95 ± 0.257.30 ± 0.272.37 ± 0.152.75 ± 0.110.65 ± 0.040.85 ± 0.03_
DRL_Inheritance14.20 ± 0.4215.79 ± 0.6510.43 ± 0.1912.68 ± 0.296.63 ± 0.327.71 ± 0.4074.86%
DRL12.57 ± 0.4914.33 ± 0.524.27 ± 0.595.36 ± 0.321.20 ± 0.101.43 ± 0.0847.36%
Priority Ceiling0.00 ± 0.000.00 ± 0.000.00 ± 0.000.00 ± 0.000.00 ± 0.000.00 ± 0.000%
Priority Inheritance25.51 ± 0.6435.37 ± 0.3616.23 ± 0.3620.42 ± 0.486.51 ± 0.338.43 ± 0.2484.87%
EDF13.41 ± 0.0816.96 ± 0.0415.89 ± 0.1619.26 ± 0.1214.95 ± 0.1818.12 ± 0.1279.51%
EDF_SRP7.34 ± 0.079.36 ± 0.099.09 ± 0.0711.16 ± 0.1011.36 ± 0.1414.04 ± 0.2663.81%
Table 7. Percentage improvement of the proposed DRLCC framework over six baseline scheduling algorithms—PCP, DRL, DRL_I (DRL + Inheritance), Priority Inheritance (PI), EDF, and EDF_SRP—measured on three key metrics: total completion time, high-priority delay, and priority-inversion count.
Table 7. Percentage improvement of the proposed DRLCC framework over six baseline scheduling algorithms—PCP, DRL, DRL_I (DRL + Inheritance), Priority Inheritance (PI), EDF, and EDF_SRP—measured on three key metrics: total completion time, high-priority delay, and priority-inversion count.
MetricsPCPDRLDRL_IPIEDFEDF_SRP
Total completion time24.27%1.51%6.21%9.60%80.75%78.45%
High-priority delay4.86%2.18%5.22%6.65%60.25%18.83%
Priority inversion count047.36%74.86%84.87%79.51%63.81%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Anwar, R.; Kwon, J.-W.; Kim, W.-T. A Deep Reinforcement Learning-Based Concurrency Control of Federated Digital Twin for Software-Defined Manufacturing Systems. Appl. Sci. 2025, 15, 8245. https://doi.org/10.3390/app15158245

AMA Style

Anwar R, Kwon J-W, Kim W-T. A Deep Reinforcement Learning-Based Concurrency Control of Federated Digital Twin for Software-Defined Manufacturing Systems. Applied Sciences. 2025; 15(15):8245. https://doi.org/10.3390/app15158245

Chicago/Turabian Style

Anwar, Rubab, Jin-Woo Kwon, and Won-Tae Kim. 2025. "A Deep Reinforcement Learning-Based Concurrency Control of Federated Digital Twin for Software-Defined Manufacturing Systems" Applied Sciences 15, no. 15: 8245. https://doi.org/10.3390/app15158245

APA Style

Anwar, R., Kwon, J.-W., & Kim, W.-T. (2025). A Deep Reinforcement Learning-Based Concurrency Control of Federated Digital Twin for Software-Defined Manufacturing Systems. Applied Sciences, 15(15), 8245. https://doi.org/10.3390/app15158245

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop