A Deep Reinforcement Learning-Based Concurrency Control of Federated Digital Twin for Software-Defined Manufacturing Systems
Abstract
1. Introduction
1.1. Background and Motivation
1.2. Problem Statement and Industrial Challenge
1.3. Contributions
- Hybrid Concurrency Scheme: Enforces mutual-exclusion ceilings via PCP while allowing a PPO-driven agent to adaptively schedule tasks based on real-time system states;
- Simulation Platform: Implements a SimPy-based digital twin environment modeling Autonomous Mobile Robot (AMR) fleets, stochastic job arrivals, and resource demands;
- Extensive Evaluation: Demonstrates up to 20.2% makespan reduction, 4.3% lower delay for high-priority tasks, and near-elimination of priority inversions compared to Priority Ceiling Protocol (PCP), Priority Inheritance Protocol (PIP), DRL-only, and DRL+PIP baselines under varying loads and robot counts.
2. Literature Review
2.1. Digital Twin Foundations and Federated Architectures
2.2. Concurrency Control Protocols in Distributed Manufacturing
2.3. Deep Reinforcement Learning for Decision
3. Materials and Methods
3.1. Simulation Environment
3.1.1. Manufacturing Cell Layout
3.1.2. Job Arrival Process
3.1.3. AMR Configuration
3.1.4. Task Duration Distribution
3.2. Hybrid DRLCC Framework
3.2.1. Module 1: State Extraction
- Digital Twin database: Two object classes are maintained:
- Job (job_id, priority {0, 1, 2}, task_list, arrival_time, load_weight, feasible_amrs);
- AMR (amr_id, capacity, battery_level (0–1), charging_flag, current_job, dynamic_ceiling).
- 2.
- Context extraction: At every decision epoch, the helper function get_state (env, amrs, job_queue) builds a vector of length 4N + 8 (here 4 is for the number of AMRs, which can be changed if the number of AMRs increases):
- AMR features (per robot): busy flag, priority of current job (–1 if idle), raw battery level (0.0–1.0), current PCP ceiling;
- Queue statistics: queue length |Q|, counts of jobs at priorities 0–2, wait-time statistics:
- ■
- Mean ;
- ■
- Maximum delay ;
- ■
- Standard deviation ;
- ■
- Mean priority of waiting for jobs.
- Normalization: The concatenated array is z-scored (mean 0, std 1) before being passed to the agent. Figure 4 represents the whole architecture of our proposed method, DRLCC.
3.2.2. Module 2: Concurrency Layer (Dynamic Priority Ceiling Protocol)
- Resource Availability: The AMR must be currently free (unoccupied by another job).
- Battery Threshold: AMR’s payload capacity must meet the job’s load weight. The AMR must either not be charging with battery > 0.30, or if charging, have battery > 0.25.
- Ceiling check: The requested job’s priority must be greater than or equal to the dynamically updated ceiling of the targeted AMR.
- No high-priority job is indefinitely postponed by a lower-priority one.
- The system adapts to changing workloads and resource states.
- The RL agent learns scheduling strategies that respect real-time safety constraints.
3.2.3. Module 3: Decision-Making Layer (PPO Agent)
- is the policy parameters;
- is the expectation over timesteps;
- is the probability ratio between new and old policies;
- is the advantage function estimating the relative benefit of an action;
- is the clipping threshold.
- Job Queue Status: The number of pending jobs, their priority levels (0: low, 1: medium, 2: high), and their waiting times;
- AMR Status: Availability (free or occupied), current location, battery level, and charging status of each AMR;
- Resource Occupancy: Information about which AMRs are currently assigned and to which job;
- Task-Specific Data: The number of subtasks remaining and the estimated processing times.
- If there are AMRs and waiting jobs, the agent’s action space consists of potential pairings.
- At each step, the agent selects a (job, AMR) pairing, subject to validation by the PCP before execution.
- State as detailed in Module 1.
- Action chooses one of the AMRs for the current job.
- We mask out infeasible AMRs (capacity, battery, or ceiling violations) so the policy is only assigned from valid AMRs.
3.2.4. Training and Convergence
- Reward plateau
- 2.
- Loss stabilization
- 3.
- Hyperparameter robustness
- 4.
- Transient dips
4. Results
4.1. Simulation Study
Performance Metrics
- A job with priority level = 2;
- Is waiting for an AMR;
- And the AMR is occupied by a job with priority 2.
4.2. Comparative Evaluation
- Priority Ceiling Protocol (PCP-only);
- Priority Inheritance Protocol (PIP-only);
- Deep Reinforcement Learning (DRL-only);
- DRL with Priority Inheritance (DRL + PIP);
- DRL with Priority Ceiling Protocol (DRLCC) (our proposed method);
- Earliest Deadline First (EDF);
- Stack Resource Policy with EDF (EDF-SRP).
4.2.1. Efficiency (Makespan)
4.2.2. Priority Responsiveness (High-Priority Job Delay)
4.2.3. Priority Inversion
4.2.4. Scalability Analysis (Varying Job Loads)
5. Conclusions
5.1. Limitations and Future Recommendations
- We modeled manufacturing dynamics in SimPy, which abstracted away low-level physics (e.g., robot kinematics, network jitter) and assumed perfect sensor telemetry.
- All robots shared identical dynamics and capacity distributions. Real factories often deploy heterogeneous fleets with different speeds, charging profiles, and failure modes.
- Our scenario considered one production cell network. Large plants feature multiple cells with cross-cell resource sharing and traffic conflicts not captured here.
- In this work, we held sensor noise (σ = 0.03) and one-way communication latency (50 ms) fixed to isolate the impact of our DRLCC framework. In practice, both quantities can vary substantially higher. σ would introduce noisier state observations (slowing convergence and increasing reward variance), and greater latency would risk decisions acting on stale state (increasing idle times, conflicts, and inversion penalties). A full sensitivity sweep over σ and latency is left to future work to delineate the robust envelope of DRLCC.
5.2. Computational Scalability of DRLCC
- Inference latency: Mean time per decision stayed below 0.5 ms (σ < 0.1 ms across seeds) for fleets up to 6 AMRs, ensuring real-time applicability even in our largest scenarios.
- End-to-end training time: Running 500 episodes took approximately 35 min to 45 min for the smallest config (4 AMRs, 80 jobs) and about 1.5 h for the largest (6 AMRs, 120 jobs).
- Policy-update cost: Each PPO update epoch (batch size 64, 6 passes) required on average 8–12 s with 4 AMRs, rising to 12–16 s with 6 AMRs.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yang, W.; Xiang, W.; Yang, Y.; Cheng, P. Optimizing Federated Learning with Deep Reinforcement Learning for Digital Twin Empowered Industrial IoT. IEEE Trans. Ind. Inform. 2023, 19, 1884–1893. [Google Scholar] [CrossRef]
- Xu, X.; Lu, Y.; Vogel-Heuser, B.; Wang, L. Industry 4.0 and Industry 5.0—Inception, Conception and Perception. J. Manuf. Syst. 2021, 61, 530–535. [Google Scholar] [CrossRef]
- Qin, Z.; Lu, Y. Self-Organizing Manufacturing Network: A Paradigm towards Smart Manufacturing in Mass Personalization. J. Manuf. Syst. 2021, 60, 35–47. [Google Scholar] [CrossRef]
- Shao, G.; Helu, M. Framework for a Digital Twin in Manufacturing: Scope and Requirements. Manuf. Lett. 2020, 24, 105–107. [Google Scholar] [CrossRef]
- Magomadov, V.S. The digital twin technology and its role in manufacturing. IOP Conf. Ser. Mater. Sci. Eng. 2020, 862, 032080. [Google Scholar] [CrossRef]
- Iliuţă, M.-E.; Moisescu, M.-A.; Pop, E.; Ionita, A.-D.; Caramihai, S.-I.; Mitulescu, T.-C. Digital Twin—A Review of the Evolution from Concept to Technology and Its Analytical Perspectives on Applications in Various Fields. Appl. Sci. 2024, 14, 5454. [Google Scholar] [CrossRef]
- Ahn, J.; Yun, S.; Kwon, J.-W.; Kim, W.-T. Literacy Deep Reinforcement Learning-Based Federated Digital Twin Scheduling for the Software-Defined Factory. Electronics 2024, 13, 4452. [Google Scholar] [CrossRef]
- Vergara, C.; Bahsoon, R.; Theodoropoulos, G.; Yanez, W.; Tziritas, N. Federated Digital Twin. In Proceedings of the 2023 IEEE/ACM 27th International Symposium on Distributed Simulation and Real Time Applications (DS-RT), Singapore, 4–5 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 115–116. [Google Scholar]
- Pang, T.Y.; Pelaez Restrepo, J.D.; Cheng, C.-T.; Yasin, A.; Lim, H.; Miletic, M. Developing a Digital Twin and Digital Thread Framework for an ‘Industry 4.0’ Shipyard. Appl. Sci. 2021, 11, 1097. [Google Scholar] [CrossRef]
- Fuller, A.; Fan, Z.; Day, C.; Barlow, C. Digital Twin: Enabling Technologies, Challenges and Open Research. IEEE Access 2020, 8, 108952–108971. [Google Scholar] [CrossRef]
- Lam, K.Y.; Kuo, T.W.; Kao, B.; Lee, T.S.H.; Cheng, R. Evaluation of Concurrency Control Strategies for Mixed Soft Real-Time Database Systems. Inf. Syst. 2002, 27, 123–149. [Google Scholar] [CrossRef]
- Chan, E.; Yu, K.M. A concurrency control model for PDM systems. Comput. Ind. 2007, 58, 823–831. [Google Scholar] [CrossRef]
- Lu, Y.; Liu, C.; Wang, K.I.-K.; Huang, H.; Xu, X. Digital Twin-driven Smart Manufacturing: Connotation, Reference Model, Applications and Research Issues. Robot. Comput.-Integr. Manuf. 2020, 61, 101837. [Google Scholar] [CrossRef]
- Kim, Y.-J.; Kim, H.; Ha, B.; Kim, W.-T. Federated Digital Twins: A Scheduling Approach Based on Temporal Graph Neural Network and Deep Reinforcement Learning. IEEE Access 2025, 13, 20763–20777. [Google Scholar] [CrossRef]
- Bécue, A.; Maia, E.; Feeken, L.; Borchers, P.; Praça, I. A New Concept of Digital Twin Supporting Optimization and Resilience of Factories of the Future. Appl. Sci. 2020, 10, 4482. [Google Scholar] [CrossRef]
- Ullah, A.; Younas, M. Development and Application of Digital Twin Control in Flexible Manufacturing Systems. J. Manuf. Mater. Process. 2024, 8, 214. [Google Scholar] [CrossRef]
- Lattanzi, L.; Raffaeli, R.; Peruzzini, M.; Pellicciari, M. Digital twin for smart manufacturing: A review of concepts towards a practical industrial implementation. Int. J. Comput. Integr. Manuf. 2021, 34, 567–597. [Google Scholar] [CrossRef]
- Selim, A.; Ali, I.; Saracevic, M.; Ristevski, B. Application of the digital twin model in higher education. Multimed. Tools Appl. 2025, 84, 24255–24272. [Google Scholar] [CrossRef]
- Davari, S.; Sha, L. Sources of Unbounded Priority Inversions in Real-Time Systems and a Comparative Study of Possible Solutions. In ACM SIGOPS Operating Systems Review; Association for Computing Machinery: New York, NY, USA, 1992; Volume 26, pp. 110–120. [Google Scholar]
- Yang, M.; Chen, Z.; Jiang, X.; Guan, N.; Lei, H. DPCP-p: A Distributed Locking Protocol for Parallel Real-Time Tasks. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 20–24 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
- Sha, L.; Rajkumar, R.; Lehoczky, J.P. Priority Inheritance Protocols: An Approach to Real-Time Synchronization. IEEE Trans. Comput. 1990, 39, 1175–1185. [Google Scholar] [CrossRef]
- Zhang, X.; Urban, C.; Wu, C. Priority Inheritance Protocol Proved Correct. J. Autom. Reason. 2020, 64, 73–95. [Google Scholar] [CrossRef]
- Baker, T.P. A stack-based resource allocation policy for real-time processes. In Proceedings of the 11th IEEE Real-Time Systems Symposium (RTSS ’90), Miami Beach, FL, USA, 3–6 December 1990; pp. 191–200. [Google Scholar]
- Zhang, P.; Wang, C.; Jiang, C.; Han, Z. Deep Reinforcement Learning Assisted Federated Learning Algorithm for Data Management of IIoT. arXiv 2022, arXiv:2202.03575. [Google Scholar] [CrossRef]
- Hammami, N.E.H.; Lardeux, B.; Hadj-Alouane, A.B.; Jridi, M. Job Shop Scheduling: A Novel DRL Approach for Continuous Schedule-Generation Facing Real-Time Job Arrivals. PapersOnLine 2022, 55, 2493–2498. [Google Scholar] [CrossRef]
- Zhang, J.; Ding, G.; Zou, Y.; Qin, S.; Fu, J. Review of Job Shop Scheduling Research and Its New Perspectives under Industry 4.0. J. Intell. Manuf. 2019, 30, 1809–1830. [Google Scholar] [CrossRef]













| Category | Item | Value |
|---|---|---|
| Architecture | Input dimension | 4 N + 8 |
| Hidden layers | 2 × 256 (ReLU) | |
| Policy head | Fully connected, M outputs, Softmax | |
| Value head | Fully connected, 1 output, linear | |
| Training | Optimizer | Adam |
| Learning rate | 1 × 10−4 | |
| Clip ratio ε | 0.10 | |
| Discount γ | 0.95 | |
| GAE λ | 0.95 | |
| Mini batch/Epochs | 64/6 | |
| Entropy bonus | 0.01 |
| Symbol | Trigger | Value |
|---|---|---|
| Base reward per completed high-priority task | +5 | |
| Base reward per completed medium/low task | +2 | |
| −0.1 × task-wait-time (s) | –0.1 | |
| −0.2 × job-block-count | –0.2 | |
| Priority inversion penalty | –10 | |
| Bonus if a high-priority job finishes < 20 s after arrival | +20 |
| Name | Summary |
|---|---|
| BATTERY_CHECK(amr) | every 1 s: if battery < 0.2 ⇒ start charging; if charging & battery ≥ 0.95 ⇒ stop. |
| JOB_ARRIVAL(…, λ) | spawn next job after Exp(λ); set priority ∈ {0,1,2}; env.process(HANDLE_JOB) |
| HANDLE_JOB(job) | Update AMR ceilings via PCP while no feasible AMR: wait 1 s, penalise wait/block, detect inversion else choose AMR = πθ(s, feasible), run task, give reward, release AMR |
| PPO_UPDATE() | If buffer ≥ 64 samples: compute advantages (GAE), optimise policy/value nets for K = 6 epochs, clip grads at 0.1, step LR scheduler (step = 100, γ = 0.9). |
| BATTERY_CHECK(amr) | every 1 s: if battery < 0.2 ⇒ start charging; if charging & battery ≥ 0.95 ⇒ stop. |
| Models | 4 × 80 | 4 × 100 | 5 × 100 | 5 × 120 | 6 × 100 | 6 × 120 | % Improvement |
|---|---|---|---|---|---|---|---|
| DRLCC (Ours) | 14.42 ± 0.14 | 14.98 ± 0.14 | 13.03 ± 0.04 | 13.29 ± 0.12 | 12.68 ± 0.04 | 12.94 ± 0.13 | _ |
| DRL_Inheritance | 15.06 ± 0.13 | 15.79 ± 0.15 | 14.33 ± 0.13 | 14.70 ± 0.19 | 13.81 ± 0.05 | 14.10 ± 0.06 | 6.21% |
| DRL | 15.20 ± 0.28 | 15.22 ± 0.11 | 13.16 ± 0.07 | 13.46 ± 0.09 | 12.66 ± 0.06 | 12.97 ± 0.09 | 1.51% |
| Priority Ceiling | 19.81 ± 0.27 | 19.38 ± 0.27 | 17.92 ± 0.21 | 17.99 ± 0.23 | 15.46 ± 0.19 | 17.09 ± 0.23 | 24.27% |
| Priority Inheritance | 17.83 ± 0.31 | 19.52 ± 0.11 | 13.82 ± 0.11 | 14.34 ± 0.15 | 12.78 ± 0.10 | 13.12 ± 0.09 | 9.60% |
| EDF | 59.55 ± 0.29 | 73.95 ± 0.46 | 68.69 ± 0.64 | 82.61 ± 0.15 | 65.19 ± 0.42 | 78.27 ± 0.52 | 80.75% |
| EDF_SRP | 53.13 ± 0.22 | 65.74 ± 0.23 | 65.43 ± 0.21 | 77.81 ± 0.15 | 55.74 ± 0.19 | 66.00 ± 0.19 | 78.45% |
| Models | 4 × 80 | 4 × 100 | 5 × 100 | 5 × 120 | 6 × 100 | 6 × 120 | % Improvement |
|---|---|---|---|---|---|---|---|
| DRLCC (Ours) | 4.40 ± 0.04 | 4.38 ± 0.02 | 4.18 ± 0.01 | 4.20 ± 0.02 | 4.14 ± 0.02 | 4.12 ± 0.01 | _ |
| DRL_Inheritance | 4.40 ± 0.02 | 4.82 ± 0.02 | 4.46 ± 0.03 | 4.47 ± 0.03 | 4.36 ± 0.02 | 4.33 ± 0.03 | 5.22% |
| DRL | 4.69 ± 0.01 | 4.51 ± 0.02 | 4.25 ± 0.01 | 4.26 ± 0.02 | 4.15 ± 0.02 | 4.15 ± 0.03 | 2.81% |
| Priority Ceiling | 4.60 ± 0.03 | 4.59 ± 0.03 | 4.44 ± 0.02 | 4.47 ± 0.02 | 4.27 ± 0.02 | 4.35 ± 0.03 | 4.86% |
| Priority Inheritance | 5.04 ± 0.03 | 5.18 ± 0.01 | 4.38 ± 0.02 | 4.40 ± 0.02 | 4.18 ± 0.01 | 4.19 ± 0.01 | 6.65% |
| EDF | 7.85 ± 0.11 | 10.05 ± 0.16 | 11.69 ± 0.31 | 14.47 ± 0.24 | 10.08 ± 0.18 | 12.47 ± 0.28 | 60.25% |
| EDF_SRP | 4.77 ± 0.16 | 4.09 ± 0.15 | 4.46 ± 0.13 | 4.60 ± 0.09 | 7.65 ± 0.23 | 8.49 ± 0.31 | 18.83% |
| Models | 4 × 80 | 4 × 100 | 5 × 100 | 5 × 120 | 6 × 100 | 6 × 120 | %Improvement |
|---|---|---|---|---|---|---|---|
| DRLCC (Ours) | 5.95 ± 0.25 | 7.30 ± 0.27 | 2.37 ± 0.15 | 2.75 ± 0.11 | 0.65 ± 0.04 | 0.85 ± 0.03 | _ |
| DRL_Inheritance | 14.20 ± 0.42 | 15.79 ± 0.65 | 10.43 ± 0.19 | 12.68 ± 0.29 | 6.63 ± 0.32 | 7.71 ± 0.40 | 74.86% |
| DRL | 12.57 ± 0.49 | 14.33 ± 0.52 | 4.27 ± 0.59 | 5.36 ± 0.32 | 1.20 ± 0.10 | 1.43 ± 0.08 | 47.36% |
| Priority Ceiling | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0% |
| Priority Inheritance | 25.51 ± 0.64 | 35.37 ± 0.36 | 16.23 ± 0.36 | 20.42 ± 0.48 | 6.51 ± 0.33 | 8.43 ± 0.24 | 84.87% |
| EDF | 13.41 ± 0.08 | 16.96 ± 0.04 | 15.89 ± 0.16 | 19.26 ± 0.12 | 14.95 ± 0.18 | 18.12 ± 0.12 | 79.51% |
| EDF_SRP | 7.34 ± 0.07 | 9.36 ± 0.09 | 9.09 ± 0.07 | 11.16 ± 0.10 | 11.36 ± 0.14 | 14.04 ± 0.26 | 63.81% |
| Metrics | PCP | DRL | DRL_I | PI | EDF | EDF_SRP |
|---|---|---|---|---|---|---|
| Total completion time | 24.27% | 1.51% | 6.21% | 9.60% | 80.75% | 78.45% |
| High-priority delay | 4.86% | 2.18% | 5.22% | 6.65% | 60.25% | 18.83% |
| Priority inversion count | 0 | 47.36% | 74.86% | 84.87% | 79.51% | 63.81% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Anwar, R.; Kwon, J.-W.; Kim, W.-T. A Deep Reinforcement Learning-Based Concurrency Control of Federated Digital Twin for Software-Defined Manufacturing Systems. Appl. Sci. 2025, 15, 8245. https://doi.org/10.3390/app15158245
Anwar R, Kwon J-W, Kim W-T. A Deep Reinforcement Learning-Based Concurrency Control of Federated Digital Twin for Software-Defined Manufacturing Systems. Applied Sciences. 2025; 15(15):8245. https://doi.org/10.3390/app15158245
Chicago/Turabian StyleAnwar, Rubab, Jin-Woo Kwon, and Won-Tae Kim. 2025. "A Deep Reinforcement Learning-Based Concurrency Control of Federated Digital Twin for Software-Defined Manufacturing Systems" Applied Sciences 15, no. 15: 8245. https://doi.org/10.3390/app15158245
APA StyleAnwar, R., Kwon, J.-W., & Kim, W.-T. (2025). A Deep Reinforcement Learning-Based Concurrency Control of Federated Digital Twin for Software-Defined Manufacturing Systems. Applied Sciences, 15(15), 8245. https://doi.org/10.3390/app15158245

