A Bi-Level Intelligent Control Framework Integrating Deep Reinforcement Learning and Bayesian Optimization for Multi-Objective Adaptive Scheduling in Opto-Mechanical Automated Manufacturing
Abstract
1. Introduction
- A novel bi-level intelligent control framework: We propose an integrated BO-DRL architecture that enables synergistic cooperation between perceptual decision-making and efficient parameter optimization, facilitating continuous system self-improvement.
- Domain-Specific Modeling and Benchmarking: We formalize a complex, large-scale scheduling benchmark for opto-mechanical automated manufacturing, incorporating realistic constraints to provide a rigorous testbed for advanced scheduling algorithms.
- Comprehensive Multi-Objective DRL Design: We develop a tailored DRL model featuring a graph neural network-based state encoder, a hierarchical action space, and a shaped reward function that dynamically balances competing objectives.
2. Problem Background and Description
3. Mathematical Model Analysis
3.1. Optimization Objectives and Constraints
3.2. Problem Complexity Analysis
4. Problem Difficulties and Algorithm Challenges
- Genetic Algorithm (GA) suffers from difficulties in maintaining feasibility under complex constraints through its crossover and mutation operations, especially for sequence-dependent calibration constraints Its population-based search mechanism is prone to becoming trapped in local optima in such high-dimensional spaces.
- Particle Swarm Optimization (PSO) is inherently mismatched with discrete scheduling problems due to its continuous optimization characteristics. Although encoding transformations enable its application to scheduling, the physical meaning of the position update formula becomes ambiguous in discrete space. In this formula, and represent the velocity and position of particle in dimension at iteration , respectively; is the particle’s personal best position (cognitive component); is the swarm’s global best position (social component); is the inertia weight; and are the cognitive and social acceleration coefficients; and , are random numbers uniformly distributed in . The continuous nature of this update mechanism makes the search susceptible to local optima in discrete scheduling spaces and significantly reduces the probability of finding the global optimum.
- Standard Reinforcement Learning (RL) faces critical challenges of sparse rewards and difficult credit assignment. During long sequential decision-making processes, the backpropagation of the final performance metric to intermediate decision steps results in low learning efficiency. Additionally, the enormous dimensionality of the state space (formally defined in Section 5.1.1) makes training function approximators particularly challenging.
5. Multi-Objective Adaptive APS Algorithm
5.1. Deep Reinforcement Learning Decision Model Design
5.1.1. State Space Design ()
- Machine State Vector (): For each machine , it includes: current machine status, processed time of the current operation, number of operations in the current queue, and utilization rate within the recent time window.
- Job State Vector (): For each job , it contains: number of completed operations/total operations, slack time: , current job status, and set of available machines for the current process.
- Global State Vector (): System time , average machine utilization, average queue length, and proportion of overdue jobs.
5.1.2. Action Space Design (Action Space )
5.1.3. Reward Function Design (Reward Function )
5.1.4. DRL Agent Architecture and Training Process
5.2. Mathematical Model of the BO-DRL Collaborative Mechanism
5.3. Efficient Rescheduling in Dynamic Environments
- Real-time State Updates: Any dynamic event (such as new order insertion or machine failure) triggers immediate updates to the state , enabling the DRL agent to respond based on the latest state information.
- Rapid DRL Response: The trained DRL policy network achieves extremely fast forward propagation speeds (at millisecond level), enabling real-time online scheduling.
- Continuous BO Learning: The system can periodically (e.g., monthly) collect new scheduling data and rerun the BO cycle to optimize hyperparameters , allowing the DRL agent to continuously adapt to changes in the production environment.
6. Experimental Validation
6.1. Experimental Design Rationale and Algorithm Configuration
- (a)
- Benchmark Problem Selection: The core experiments utilize a bespoke 20 jobs × 20 machines FJSP benchmark. The choice of this specific scale is pivotal: as quantified by Equation (8), the solution space size for this instance exceeds , signifying that the problem’s complexity transitions from the exponential to the hyper-exponential regime. This establishes the 20 × 20 instance as a formidable benchmark that truly challenges the limits of scheduling algorithms. Furthermore, its enrichment with domain-specific constraints (Section 3) ensures that this complexity is not merely combinatorial but also reflects the intricate feasibility rules of high-precision opto-mechanical manufacturing, providing a valid and stringent testbed for advanced algorithms.
- (b)
- Choice of Baseline Algorithms: Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) are adopted as primary baselines. They are widely recognized as representative and high-performing meta-heuristics for combinatorial scheduling, providing a credible and standard reference point for performance comparison.
- (c)
- Comprehensive Performance Metrics: A multi-dimensional evaluation system is employed, encompassing makespan, mean flow time, machine utilization, number of tardy jobs, and robustness indices. This holistic approach aligns with practical manufacturing objectives and prevents over-optimization toward a single metric.
- (d)
- Fairness in Resource Allocation: All algorithms were allocated identical computational resources, including the maximum number of iterations and hardware environment. The parameters for each algorithm were independently fine-tuned through systematic preliminary experiments to ensure that each operated in its most competitive configuration. This approach guarantees that any observed performance differences are attributable to the advantages of the algorithms’ core mechanisms, rather than imbalances in resource allocation or parameter tuning.
- (e)
- Scalability and Robustness Tests: Systematic tests from 5 × 5 to 100 × 100 scales assess scalability.
- (a)
- BO-DRL Algorithm: Discount factor 0.99, experience replay buffer size 10,000, batch size 32, employing an ε-greedy strategy.
- (b)
- Genetic Algorithm (GA): Population size 50, crossover probability 0.8, mutation probability 0.1, tournament selection (size 3), utilizing order crossover (OX) and swap mutation.
- (c)
- Particle Swarm Optimization (PSO): Population size 50, inertia weight 0.7, individual learning factor 1.5, social learning factor 1.5, maximum velocity 0.2.
6.2. Algorithm Performance Comparative Analysis
6.2.1. Solution Quality Comparison
6.2.2. Convergence Performance Analysis
- Directed Policy Search: Unlike the random search of metaheuristic algorithms, DRL performs directed policy improvement via policy gradients, continuously adjusting decisions towards actions that yield higher cumulative reward, naturally leading to higher search efficiency.
- Attention Mechanism: As described in Section 5.1.1, the attention mechanism within the state encoding network enables the agent to focus on the most critical scheduling decisions at any moment (e.g., bottleneck machines, urgent jobs), avoiding redundant searches on non-critical decisions. This is a key reason for its ability to quickly escape local optima.
- BO Preheating Effect: The outer-layer Bayesian optimization provides the DRL agent with a near-optimal initial hyperparameter configuration. This gives the inner-layer DRL training a higher starting point, equivalent to a high-quality “algorithm preheating,” significantly reducing the time required for convergence.
6.2.3. Algorithm Robustness Verification
- (a)
- Machine Failures: We randomly selected 5 out of the 20 machines (25% of the total fleet) to simulate unplanned breakdowns. This failure rate represents a moderate-to-high stress scenario for the system. Each failed machine became unavailable for a duration uniformly distributed between 2 and 8 h, after which it resumed operation. Failures were triggered at random time points after the 20th hour of the schedule to simulate mid-production disruptions, ensuring the initial schedule was already in execution.
- (b)
- Urgent Orders: We inserted 3 new high-priority jobs during the scheduling process. These jobs were released into the system at random times uniformly distributed between the 10th and 30th hours. To reflect their urgency, each was assigned a due date tightness factor of 0.3 (i.e., due date ), which is significantly tighter than the average factor of 1.2 used for regular jobs in the benchmark. Their internal process plans and machine eligibility were generated with the same complexity distribution as the original benchmark jobs.
- (c)
- Processing Time Fluctuations: To simulate natural variability in operation execution, the actual processing time for every operation was subject to a random fluctuation. The realized time was set to , where was drawn from a uniform distribution over the interval , representing a ±15% variation. This range captures typical variability observed in manual adjustment and precision assembly stages.
6.3. Algorithm Scalability Analysis
- Scale Adaptability Differences: The improvement of BO-DRL over GA continuously strengthens with increasing scale, indicating its stronger adaptability in complex large-scale problems, while its improvement over PSO stabilizes after peaking at medium scales, reflecting the characteristic differences of different algorithms when dealing with problems of varying complexity.
- Convergence Performance Advantage: For small to medium-scale problems, BO-DRL’s convergence speed significantly outperforms comparative algorithms, achieving stable solutions on average 60% earlier. For large-scale problems, BO-DRL effectively focuses on key scheduling decisions through its attention mechanism, avoiding redundant searches in invalid solution spaces.
- Disturbance Resistance Capability: As the problem scale increases, the disturbance resistance performance of all algorithms decreases, but BO-DRL shows the least degradation. For the 100 × 100 ultra-large-scale problem, BO-DRL’s solution quality retention rate is significantly higher than GA (+9.3%) and PSO (+14.8%), demonstrating its exceptional robustness.
6.4. Discussion and Insights
- Adaptive Decision-Making Capability: Through deep reinforcement learning, BO-DRL can adaptively adjust scheduling strategies based on real-time states, rather than relying on fixed heuristic rules.
- Constraint Handling Capability: The attention mechanism enables the algorithm to effectively identify and satisfy the complex process constraints in opto-mechanical automated manufacturing.
- Hyperparameter Optimization: Bayesian optimization ensures that the DRL algorithm always operates under the optimal hyperparameter configuration, fully realizing its learning potential.
- Knowledge Accumulation and Transfer: Through curriculum learning, BO-DRL can transfer knowledge learned from simple problems to solve complex problems.
7. Discussion and Limitations
- (a)
- Computational Overhead of Offline Training. The initial phase of training the DRL agent and optimizing hyperparameters via BO is computationally intensive. Although the trained agent operates with millisecond-level latency online, this upfront cost must be considered for deployment scenarios where rapid adaptation to a completely new production configuration is required. Future work could explore meta-learning or transfer learning techniques to reduce this cold-start cost.
- (b)
- Dependence on Simulation Fidelity. The agent’s policy is learned and tuned entirely within a simulated production environment. Its performance in practice is therefore contingent on the accuracy of the simulation model in capturing the dynamics and stochasticity of the real shop floor. Discrepancies between simulation and reality could lead to suboptimal decisions. Enhancing the simulation with digital twin technologies or incorporating online fine-tuning mechanisms are valuable directions.
- (c)
- Generalizability of Dynamic Disturbance Models. The robustness tests, while designed with realistic parameters, employ specific, pre-defined disturbance profiles (e.g., uniform distribution for downtime). The framework’s performance under unforeseen or more extreme disruption patterns warrants further investigation. Extending the state representation and reward function to handle a broader, less structured set of anomalies remains a challenge.
- (d)
- Interpretability of the Learned Policy. Like many deep RL-based controllers, the inner decision-making logic of the trained DRL agent is not easily interpretable to human planners. In high-stakes, high-precision manufacturing, a degree of explainability may be required for trust and adoption. Developing methods to explain or distill the agent’s policy into human-understandable rules is an important avenue for future research.
8. Conclusions and Future Work
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Martin, P.; Juergen, W. Industry 4.0 and object-oriented development: Incremental and architectural change. J. Technol. Manag. Innov. 2016, 11, 104–110. [Google Scholar] [CrossRef]
- Carlos, R.H.; Márquez Ribeiro, C.C. Shop scheduling in manufacturing environments: A review. Int. Trans. Oper. Res. 2022, 29, 3237–3293. [Google Scholar] [CrossRef]
- Tsai, M.F.; Wei-Tse, L.I.; Chen, L.W. Dynamic productivity prediction and new production feature selection methods for advanced planning scheduling. J. Inf. Sci. Eng. 2024, 40, 341. [Google Scholar] [CrossRef]
- Park, K.T.; Lee, J.Y.; Park, M.W.; Park, Y.H.; Lee, J.Y.; Choi, Y.H. Models and p4r asset description for digital twin-based advanced planning and scheduling using cyber-physical integration for resilient production operation. J. Manuf. Syst. 2024, 77, 127–153. [Google Scholar] [CrossRef]
- Yin, L.; Xiong, Z.; Chen, H.; Wang, C. Optimization of JSP based on particle swarm algorithm with oscillation regulation mutation. In Proceedings of the 5th IEEE International Conference on Electronic Engineering and Informatics, Wuhan, China, 30 June–2 July 2023. [Google Scholar]
- Márquez Carlos, R.H.; Braganholo, V.; Ribeiro, C.C. An open-source framework for solving shop scheduling problems in manufacturing environments. Ann. Oper. Res. 2025, 351, 1155–1183. [Google Scholar] [CrossRef]
- Jing, X.; Yao, X.; Liu, M.Z.J. Multi-agent reinforcement learning based on graph convolutional network for flexible job shop scheduling. J. Intell. Manuf. 2024, 35, 75–93. [Google Scholar] [CrossRef]
- Zhang, L.; Feng, Y.; Xiao, Q.; Xu, Y.; Li, D.; Yang, D. Deep reinforcement learning for dynamic flexible job shop scheduling problem considering variable processing times. J. Manuf. Syst. 2023, 71, 257–273. [Google Scholar] [CrossRef]
- Zhang, J.; Wang, H.; Liu, B.; Chu, D.; Xu, X.; Pei, G. Virtual assembly framework for performance analysis of large optics. Virtual Real. Intell. Hardw. 2020, 2, 28–42. [Google Scholar] [CrossRef]
- Zhang, W.; Peng, Z.; Zhao, F.; Feng, B.; Mei, X. A novel deep reinforcement learning framework based on digital twins for dynamic job shop scheduling problems. Expert Syst. Appl. 2026, 296, 128708. [Google Scholar] [CrossRef]
- Zhang, B.; Che, A.; Wang, Y. Grid-based artificial bee colony algorithm for multi-objective job shop scheduling with manual loading and unloading tasks. Expert Syst. Appl. 2024, 245, 123011. [Google Scholar] [CrossRef]
- Pan, C.; Yu, T.; Liu, Z.; Tang, H.; Li, X.; Pang, S. R-dmdqn: A rule embedding based dynamic multi-objective deep q-network for mass-individualized production scheduling of printed circuit board. J. Manuf. Syst. 2025, 79, 466–483. [Google Scholar] [CrossRef]
- Arulkumar, V.; Raju, K.K.; Pemula, R.; Vigil, M.S.A. An optimized scheduling algorithm for prioritized tasks with shared resources in cloud edge computing. Expert Syst. Appl. 2025, 293, 128594. [Google Scholar] [CrossRef]
- Wang, X.; Hu, X.; Zhang, C. Dynamic spatiotemporal scheduling of hull parts under complex constraints in shipbuilding workshop. Int. J. Comput. Integr. Manuf. 2023, 37, 123–148. [Google Scholar] [CrossRef]
- Pooranian, Z.; Shojafar, M.; Abawajy, J.H.; Abraham, A. An efficient meta-heuristic algorithm for grid computing. J. Comb. Optim. 2015, 30, 413–434. [Google Scholar] [CrossRef]
- Gao, Y.; Yuan, B.; Cui, W. A math-heuristic approach for scheduling the production and delivery of a mobile additive manufacturing hub. Comput. Ind. Eng. 2024, 188, 109929. [Google Scholar] [CrossRef]
- Madni, S.H.H.; Latiff, M.S.A.; Abdullahi, M.; Abdulhamid, S.M.; Usman, M.J. Performance comparison of heuristic algorithms for task scheduling in iaas cloud computing environment. PLoS ONE 2017, 12, e0176321. [Google Scholar] [CrossRef] [PubMed]
- Rahman, H.F.; Sarker, R.; Essam, D. A genetic algorithm for permutation flow shop scheduling under make to stock production system. Comput. Ind. Eng. 2015, 90, 12–24. [Google Scholar] [CrossRef]
- Hu, C.; Zheng, R.; Lu, S.; Liu, X.; Cheng, H. Integrated optimization of production scheduling and maintenance planning with dynamic job arrivals and mold constraints. Comput. Ind. Eng. 2023, 186, 109708. [Google Scholar] [CrossRef]
- Sugianto, W.C.; Kim, B.S. Particle swarm optimization for integrated scheduling problem with batch additive manufacturing and batch direct-shipping delivery. Comput. Oper. Res. 2024, 161, 106430. [Google Scholar] [CrossRef]
- Wang, Z.; Qi, Y.; Cui, H.; Zhang, J. A hybrid algorithm for order acceptance and scheduling problem in make-to-stock/make-to-order industries. Comput. Ind. Eng. 2019, 127, 841–852. [Google Scholar] [CrossRef]
- Zhuang, M.; Zhang, W.; Tang, H.; Li, X.; Wang, K. A multi-objective genetic algorithm based on two-stage reinforcement learning for green flexible shop scheduling problem considering machine speed. Expert Syst. Appl. 2024, 258, 125189. [Google Scholar] [CrossRef]
- Yang, H.; Du, Y.; Li, Y.; Qian, W.; Hu, B. A heuristic mutation based genetic algorithm for fast parallel scheduling of steel cold rolling. Chin. J. Mech. Eng. 2025, 38, 1–11. [Google Scholar] [CrossRef]
- Wan, L.; Fu, L.; Li, C.; Li, K. Flexible job shop scheduling via deep reinforcement learning with meta-path-based heterogeneous graph neural network. Knowl. Based Syst. 2024, 296, 111940. [Google Scholar] [CrossRef]
- Yu, H.; Tang, N.; Zhu, Z.; Guo, Z. Flexible job-shop scheduling via gated recurrent unit and deep reinforcement learning. Knowl. Based Syst. 2025, 330, 114734. [Google Scholar] [CrossRef]
- Yuan, M.; Yu, Q.; Zhang, L.; Lu, S.; Li, Z.; Pei, F. Deep reinforcement learning based proximal policy optimization algorithm for dynamic job shop scheduling. Comput. Oper. Res. 2025, 183, 107149. [Google Scholar] [CrossRef]
- Geng, Y.; Zhao, N. A Tree neural network deep reinforcement learning for flexible job shop scheduling with transportation constraints. Swarm Evol. Comput. 2025, 98, 102102. [Google Scholar] [CrossRef]
- Ding, L.; Guan, Z.; Luo, D.; Yue, L. Data-driven hierarchical multi-policy deep reinforcement learning framework for multi-objective multiplicity dynamic flexible job shop scheduling. J. Manuf. Syst. 2025, 80, 536–562. [Google Scholar] [CrossRef]
- Lv, L.; Zhang, C.; Fan, J.; Shen, W. Deep reinforcement learning for job shop scheduling problems: A comprehensive literature review. Knowl. Based Syst. 2025, 321, 113633. [Google Scholar] [CrossRef]
- Zhang, Z.; Tang, Q.; Zhang, L.; Li, Z.; Cheng, L. A q-learning-based multi-population algorithm for multi-objective distributed heterogeneous assembly no-idle flowshop scheduling with batch delivery. Expert Syst. Appl. 2025, 263, 125690. [Google Scholar] [CrossRef]
- Cheng, W.; Zhang, C.; Meng, L.; Gao, K.; Zhang, B.; Sang, H. A cooperative agent deep reinforcement learning framework for solving flexible job shop scheduling problem with automated guided vehicles. Expert Syst. Appl. 2025, 287, 128142. [Google Scholar] [CrossRef]
- Shi, Z.; Si, J.; Zhang, J.; Pang, Z.; Chen, H.; Ding, G. A deep reinforcement learning method based on Hindsight experience replay for multi-objective dynamic job-shop scheduling problem. Expert Syst. Appl. 2025, 284, 127989. [Google Scholar] [CrossRef]
- Young, M.T.; Hinkle, J.D.; Kannan, R.; Ramanathan, A. Distributed Bayesian optimization of deep reinforcement learning algorithms. J. Parallel Distrib. Comput. 2020, 139, 43–52. [Google Scholar] [CrossRef]
- Patro, S.K.; Shelke, S.; Maitre, N.; Salunkhe, S.S. Optimizing the thermal performance of phase change materials in building applications using deep reinforcement learning and Bayesian optimization. Therm. Sci. Eng. Prog. 2024, 55, 102867. [Google Scholar] [CrossRef]
- Paulson, J.A.; Tsay, C. Bayesian optimization as a flexible and efficient design framework for sustainable process systems. Curr. Opin. Green Sustain. Chem. 2025, 51, 100983. [Google Scholar] [CrossRef]
- Perez Colo, I.; Saavedra Sueldo, C.; De Paula, M.; Acosta, G.G. Intelligent approach for the industrialization of deep learning solutions applied to fault detection. Expert Syst. Appl. 2023, 233, 120959. [Google Scholar] [CrossRef]
- Sun, L.; Lin, L.; Wang, Y.; Gen, M.; Kawakami, H. A Bayesian Optimization-based Evolutionary Algorithm for Flexible Job Shop Scheduling. Procedia Comput. Sci. 2015, 61, 521–526. [Google Scholar] [CrossRef]
- Guan, X.; Li, M.Z.F.; Qin, J.; Wang, C. Short-term high-speed rail passenger flow forecasting integrated extended empirical mode decomposition with multivariate and bidirectional support vector machine. Expert Syst. Appl. 2026, 298, 129870. [Google Scholar] [CrossRef]
- Muhuri, P.K.; Biswas, S.K. Bayesian optimization algorithm for multi-objective scheduling of time and precedence constrained tasks in heterogeneous multiprocessor systems. Appl. Soft Comput. 2020, 92, 106274. [Google Scholar] [CrossRef]
- Papageorgiou, E.; Buzo, A.; Pelz, G.; Noulis, T. Deep reinforcement learning and Bayesian optimization based OpAmp design across the CMOS process space. AEU Int. J. Electron. Commun. 2025, 192, 155697. [Google Scholar] [CrossRef]
- Hong, H.; Kim, S.; Kim, W.; Kim, W.; Jeong, J.; Kim, S.S. Design optimization of 3D printed kirigami-inspired composite metamaterials for quasi-zero stiffness using deep reinforcement learning integrated with bayesian optimization. Compos. Struct. 2025, 359, 119031. [Google Scholar] [CrossRef]
- Springenberg, J.T.; Klein, A.; Falkner, S.; Hutter, F. Bayesian optimization with robust Bayesian neural networks. In Proceedings of the NIPS’16 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Wang, K.; Chen, Z.; Zhang, L.; Obaidat, M.S.; Cui, J.; Cheng, H. Building a self-evolving digital twin system with bayesian optimization and deep reinforcement learning for complex equipment optimization and control. Tsinghua Sci. Technol. 2026, 31, 199–216. [Google Scholar] [CrossRef]











| Symbol | Description |
|---|---|
| , | Index for jobs; used to denote distinct jobs when formulating pairwise constraints (e.g., ). |
| , | Index for operations within a job; used to denote distinct operations (e.g., operation of job vs. operation of job ). |
| Index for machines | |
| Index for time (iteration in PSO/DRL context) | |
| Makespan (maximum completion time) | |
| Completion time of job | |
| Average machine utilization | |
| Maximum tardiness | |
| Quality performance index | |
| Start time of operation of job | |
| Processing time of operation on machine | |
| Binary variable; equals 1 if operations and are both processed on machine , and precedes | |
| Binary variable; equals 1 if operation is assigned to machine | |
| Due date of job | |
| Set of machines capable of processing operation | |
| A sufficiently large positive number | |
| Total number of jobs | |
| Total number of machines | |
| Number of tardy jobs | |
| Indicator function (returns 1 if condition is true, else 0) | |
| Makespan under the static baseline schedule | |
| Makespan under dynamic disruptions |
| Indicator Type | Specific Indicators | Calculation Formula |
|---|---|---|
| efficiency index | makespan | |
| efficiency index | mean flow time | |
| Resource Utilization | Machine utilization | |
| Timeliness Indicators | Number of tardy jobs | |
| Robustness Index | Disturbance recovery capability |
| Category | Machine IDs | Function Description | Special Constraints |
|---|---|---|---|
| Cleaning | M6, M10, M12 | Optical component cleaning and dust removal | Cleanliness level requirements |
| Precision coating | M4, M11 | Optical surface coating treatment | Temperature and humidity control |
| High-precision assembly | M3, M8, M9, M13, M18 | Opto-mechanical integrated assembly | Vibration isolation |
| Calibration test | M1, M2, M5, M15, M16, M20 | Optical performance calibration and testing | Constant temperature environment |
| special handling | M7, M14, M17, M19 | Special process treatment | Dedicated equipment |
| Operation of Job 1 | Op1 | Op2 | Op3 | Op4 | Op5 |
| Equipment (working hours) | M2 (9.5 h), M6 (2.4 h), M10 (8 h) | M9 (8.9 h) | M1 (6.6 h), M17 (5.7 h), M19 (5.2 h) | M6 (0.8 h) | M7 (3.8 h), M19 (7.8 h), M1 (2 h) |
| operation of job 1 | Op6 | Op7 | Op8 | Op9 | Op10 |
| Equipment (working hours) | M15 (6.4 h), M20 (1.7 h), M5 (4.7 h) | M15 (9.5 h), M20 (4.3 h), M5 (5.4 h) | M17 (6.3 h), M5 (7.5 h) | M13 (3.4 h), M9 (5.1 h) | M15 (0.8 h), M16 (4.1 h) |
| Algorithm | Makespan (h) | Mean Flow Time (h) | Machine Utilization ± Std (%) | Number of Delayed Jobs |
|---|---|---|---|---|
| BO-DRL | 69.32 | 62.16 | 55.03 ± 11.39 | 1 |
| GA | 80.02 | 68.62 | 50.23 ± 10.30 | 1 |
| PSO | 93.06 | 77.16 | 43.80% ± 14.82% | 3 |
| Problem Scale | 5 × 5 | 10 × 10 | 15 × 15 | 20 × 20 | 25 × 25 | 50 × 50 | 100 × 100 | |
|---|---|---|---|---|---|---|---|---|
| Makespan (hours) | BO-DRL | 46.18 | 69.54 | 69.31 | 69.32 | 73.77 | 90.19 | 102.46 |
| GA | 46.55 | 71.79 | 75.38 | 80.02 | 89.44 | 114.95 | 134.73 | |
| PSO | 48.82 | 78.83 | 83.70 | 93.06 | 95.02 | 107.40 | 123.23 | |
| Mean Flow Time (hours) | BO-DRL | 43.52 | 61.59 | 62.15 | 62.16 | 67.60 | 77.40 | 83.32 |
| GA | 42.18 | 63.05 | 67.17 | 68.62 | 76.92 | 93.48 | 102.83 | |
| PSO | 44.98 | 69.07 | 75.94 | 77.16 | 75.51 | 87.13 | 98.49 | |
| Machine Utilization (%) | BO-DRL | 62.61% ± 3.96% | 54.98% ± 12.27% | 54.42% ± 14.85% | 55.03 ± 11.39 | 51.81% ± 10.60% | 43.20% ± 13.34% | 38.29% ± 15.42% |
| GA | 68.30% ± 12.02% | 54.69% ± 5.48% | 47.94% ± 10.49% | 50.23 ± 10.30 | 43.22% ± 9.51% | 33.97% ± 11.93% | 29.17% ± 11.52% | |
| PSO | 61.29% ± 17.04% | 54.66% ± 12.00% | 45.88% ± 12.14% | 43.80% ± 14.82% | 39.51% ± 11.88% | 36.44% ± 14.21% | 31.97% ± 14.21% | |
| Number of Tardy Jobs | BO-DRL | 0 | 1 | 2 | 1 | 3 | 13 | 45 |
| GA | 0 | 1 | 2 | 1 | 10 | 32 | 75 | |
| PSO | 0 | 2 | 3 | 3 | 8 | 20 | 67 | |
| Disturbance Resistance (Solution Quality Retention Rate, %) | BO-DRL | 96.3 | 94.7 | 93.1 | 91.5 | 89.6 | 85.2 | 79.1 |
| GA | 90.2 | 88.9 | 86.8 | 84.8 | 82.5 | 76.9 | 69.8 | |
| PSO | 87.6 | 85.3 | 83.2 | 81.2 | 78.7 | 72.4 | 64.3 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yin, L.; Fang, Z.; Li, K.; Chen, J.; Fan, N.; Li, M. A Bi-Level Intelligent Control Framework Integrating Deep Reinforcement Learning and Bayesian Optimization for Multi-Objective Adaptive Scheduling in Opto-Mechanical Automated Manufacturing. Appl. Sci. 2026, 16, 732. https://doi.org/10.3390/app16020732
Yin L, Fang Z, Li K, Chen J, Fan N, Li M. A Bi-Level Intelligent Control Framework Integrating Deep Reinforcement Learning and Bayesian Optimization for Multi-Objective Adaptive Scheduling in Opto-Mechanical Automated Manufacturing. Applied Sciences. 2026; 16(2):732. https://doi.org/10.3390/app16020732
Chicago/Turabian StyleYin, Lingyu, Zhenhua Fang, Kaicen Li, Jing Chen, Naiji Fan, and Mengyang Li. 2026. "A Bi-Level Intelligent Control Framework Integrating Deep Reinforcement Learning and Bayesian Optimization for Multi-Objective Adaptive Scheduling in Opto-Mechanical Automated Manufacturing" Applied Sciences 16, no. 2: 732. https://doi.org/10.3390/app16020732
APA StyleYin, L., Fang, Z., Li, K., Chen, J., Fan, N., & Li, M. (2026). A Bi-Level Intelligent Control Framework Integrating Deep Reinforcement Learning and Bayesian Optimization for Multi-Objective Adaptive Scheduling in Opto-Mechanical Automated Manufacturing. Applied Sciences, 16(2), 732. https://doi.org/10.3390/app16020732

