GA-HPO PPO: A Hybrid Algorithm for Dynamic Flexible Job Shop Scheduling
Abstract
1. Introduction
- How can a hybrid algorithm be effectively constructed to leverage the complementary strengths of Genetic Algorithms (GAs) for global search and Proximal Policy Optimization (PPO) for adaptive policy learning, specifically for the DFJSP?
- To what extent can the proposed GA-HPO PPO algorithm improve key scheduling performance metrics—such as the number of tardy tasks, makespan, and machine utilization—compared to established benchmarks like DDQN, standard PPO, and rule-based heuristics?
- Does the integration of automated hyperparameter optimization (HPO) using Optuna contribute significantly to the stability, convergence speed, and final performance of the reinforcement learning agent in dynamic scheduling environments?
- How well does the trained GA-HPO PPO model generalize across diverse and unseen scheduling scenarios, demonstrating robustness to variations in problem scale and dynamics?
2. Literature Review
2.1. Optimization Methods for the FJSP
2.2. PPO Algorithm
2.3. Genetic Algorithm
- First, an initial population containing multiple individuals is generated. Each individual represents a potential solution to the problem. The fitness value of each individual in the population is calculated, where the fitness function evaluates the “survivability” of each individual based on its performance.
- Individuals with higher fitness are more likely to be retained and used to generate the next generation. To potentially produce better solutions, selected individuals are paired to exchange some of their genes, generating new individuals.
- Then, the newly generated offspring undergo random mutations in some genes to introduce new gene combinations, increasing population diversity and preventing the search from becoming trapped in local optima.
- The newly generated offspring replace part or all of the old population, forming a new generation. The steps of fitness evaluation, selection, crossover, and mutation are repeated until the termination condition is met.
2.4. Our Research Content and Objectives
2.5. Summary of Previous Work
3. Problem Description
3.1. Dynamic Flexible Job-Shop Scheduling Problem
3.2. Problem Formulation
- Objective Functions:
- Constraints:
4. GA-HPO PPO Algorithm
4.1. GA Optimizer
4.1.1. Objectives and Motivation
4.1.2. Key Components and Workflow
- Search Space Definition:
- Network Structures: The GA explores a predefined set of network architectures, each characterized by distinct configurations of hidden layers and neurons. This search space includes variations in layer count (e.g., two or three hidden layers) and size (e.g., 128 or 256 neurons per layer), allowing flexibility in model complexity to better suit the scheduling task.
- Feasibility and Flexibility: By operating within a structured search space, the GA ensures that exploration remains feasible and computationally manageable, balancing between thorough search and computational efficiency.
- Population Initialization:
- Diversity: The initial population comprises a diverse set of PPO agents, each instantiated with unique network structures sampled from the defined search space. This diversity enables the GA to cover a broad spectrum of potential solutions, fostering robust exploration.
- Random Initialization: Each agent’s network parameters are randomly initialized using Xavier uniform initialization, ensuring unbiased starting conditions that support fair evolution across the population.
- Fitness Evaluation:
- Multi-Objective Criteria: Each individual is assessed based on multiple performance metrics, including tardy task count, makespan, and machine utilization. These metrics collectively determine the fitness of each solution, reflecting its overall scheduling effectiveness.
- Training and Evaluation: The fitness evaluation method partially trains each PPO agent, evaluating performance over multiple episodes. This comprehensive assessment provides an accurate measure of each individual’s scheduling efficacy.
- Selection Mechanism:
- Pareto Front Identification: A Pareto-based selection strategy is used to identify non-dominated individuals within the population, ensuring that only solutions excelling across all metrics are prioritized for propagation.
- Elitism: A subset of the highest-performing individuals (elite) is preserved across generations, ensuring that the top solutions are retained and have the opportunity to inform the evolution of future generations.
- Genetic Operators:
4.2. Hyperparameter Optimization with Optuna
4.2.1. Objectives and Rationale
4.2.2. Hyperparameter Search Space
4.2.3. Optimization Process
4.3. PPO Actor–Critic Model
4.3.1. Architectural Components
- Feature Extractor:
- Residual Blocks:
- Multi-Head Attention Layer:
- Action and Value Heads:
4.3.2. Weight Initialization
4.4. PPO Algorithm
4.4.1. Initialization of the PPO Class and Key Hyperparameters
- a.
- State Dimension (state_dim) and Action Dimension (action_dim):Represent the number of features in the task space and the number of available machines.
- b.
- Network Structure (network_structure):Defines the neural network architecture through a series of hidden layers, controlling network depth and learning capacity.
- c.
- Learning Rate (lr) and Discount Factor ():Regulate the speed of gradient updates and the weighting of future rewards, respectively.
- d.
- Clipping Parameter (eps_clip):Limits the magnitude of policy updates, thus preventing drastic policy changes and ensuring stable learning.
- e.
- Update Epochs (K_epochs):Specifies the number of times the policy network is updated in each learning cycle.
- f.
- GAE Parameter ():Used in Generalized Advantage Estimation (GAE) to balance bias and variance in advantage estimation.
- g.
- Batch Size (batch_size):Controls the size of mini-batches for each update, which directly impacts the stability and speed of learning.
4.4.2. Generalized Advantage Estimation (GAE) Calculation
- is the reward received at time step t.
- and are the estimated values of states and , respectively.
- is the discount factor that determines the importance of future rewards.
- is a smoothing parameter that controls the balance between bias and variance in the advantage estimation.
4.4.3. Policy Update
- is the probability ratio between the new and old policies.
- is the clipping parameter that limits the range of policy updates.
- is the estimated advantage for each action.
- and are coefficients that weight the value loss and entropy term, respectively.
- is the target value, typically computed as .
- represents the entropy of the policy, promoting exploration by penalizing deterministic policies.
5. Experiments
5.1. Problem Instances and Parameter Setting
5.2. Environment Setup
5.2.1. State and Action Spaces
5.2.2. Reward Function
- Late Penalty: A negative reward is given if a task is completed after the deadline, the size of which is controlled by the parameter alpha.
- Duration penalties: Penalize extended completion times (makespan) to encourage efficient scheduling.
- Machine Utilization Reward: Reward high machine utilization to promote efficient use of available resources.
- Dense Reward Structure: Combines smooth transitions and nonlinear scaling to provide continuous feedback and promote more stable and efficient learning by the RL agent.
5.2.3. Environment Dynamics
5.3. Results and Analysis
5.3.1. Convergence Performance
5.3.2. Comprehensive Comparison of the Metrics of the Five Algorithms and Visualization Results
5.3.3. Generalization Capability of Models
6. Conclusions
7. Managerial Insights
- Enhanced Responsiveness to Dynamic Events: GA-HPO PPO enables real-time adjustment to unforeseen disruptions such as machine breakdowns, urgent order insertions, or variable processing times. This capability allows managers to maintain high levels of service and on-time delivery, even under uncertainty.
- Improved Resource Utilization: By optimizing both task assignment and machine load, the algorithm helps reduce idle time and operational costs. Managers can achieve higher throughput with the same resource base, leading to better capital efficiency.
- Scalability to Large-Scale Scheduling: The algorithm’s robust performance on both small (100-task) and large (1000-task) datasets suggests its applicability in diverse industrial settings, from job shops to large-scale flexible manufacturing systems.
- Reduction in Manual Tuning Effort: The integration of automated hyperparameter optimization (HPO) and neural architecture search via the GA reduces the dependency on expert knowledge for parameter tuning, making advanced scheduling accessible to plants with limited technical expertise.
- Support for Multi-Objective Decision-Making: Managers can leverage the algorithm’s ability to balance competing objectives—such as minimizing tardiness, makespan, and maximizing utilization—to align scheduling outcomes with strategic goals like customer satisfaction, cost control, and energy efficiency.
- Facilitation of Digital Twin and Smart Manufacturing Initiatives: The reinforcement learning-based approach is compatible with digital twin frameworks and IoT-enabled production systems, providing a foundation for continuous learning and adaptive scheduling in Industry 4.0 environments.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Błażewicz, J.; Finke, G.; Haupt, R.; Schmidt, G. New trends in machine scheduling. Eur. J. Oper. Res. 1988, 37, 303–317. [Google Scholar] [CrossRef]
- Mahmoodjanloo, M.; Tavakkoli-Moghaddam, R.; Baboli, A.; Bozorgi-Amiri, A. Flexible job shop scheduling problem with reconfigurable machine tools: An improved differential evolution algorithm. Appl. Soft Comput. 2020, 94, 106416. [Google Scholar] [CrossRef]
- Luan, F.; Wang, W.; Fu, W.; Bao, Y.; Ren, G.; Wang, J.; Deng, M. FJSP solving by improved GA based on PST hierarchy structure. Comput. Integr. Manuf. Syst. 2014, 20, 2494–2501. [Google Scholar]
- Gao, J.; Sun, L.; Gen, M. A hybrid genetic and variable neighborhood descent algorithm for flexible job shop scheduling problems. Comput. Oper. Res. 2008, 35, 2892–2907. [Google Scholar] [CrossRef]
- Chen, R.; Yang, B.; Li, S.; Wang, S. A self-learning genetic algorithm based on reinforcement learning for flexible job-shop scheduling problem. Comput. Ind. Eng. 2020, 149, 106778. [Google Scholar] [CrossRef]
- Fan, J.; Zhang, C.; Tian, S.; Shen, W.; Gao, L. Flexible job-shop scheduling problem with variable lot-sizing: An early release policy-based matheuristic. Comput. Ind. Eng. 2024, 193, 110290. [Google Scholar] [CrossRef]
- Caldeira, R.H.; Gnanavelbabu, A.; Joseph Solomon, J. Solving the flexible job shop scheduling problem using a hybrid artificial bee colony algorithm. In Trends in Manufacturing and Engineering Management: Select Proceedings of ICMechD 2019; Springer: Berlin/Heidelberg, Germany, 2020; pp. 833–843. [Google Scholar]
- Wang, Y.; Zhu, Q. A hybrid genetic algorithm for flexible job shop scheduling problem with sequence-dependent setup times and job lag times. IEEE Access 2021, 9, 104864–104873. [Google Scholar] [CrossRef]
- Han, Y.; Chen, X.; Xu, M.; Gu, F. A study on multi-objective flexible job shop scheduling problem using a non-dominated sorting genetic algorithm. In Proceedings of the International Conference on Maintenance Engineering, Zhuhai, China, 23–25 October 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 745–755. [Google Scholar]
- Chen, F.; Xie, W.; Ma, J.; Chen, J.; Wang, X. Textile Flexible Job-Shop Scheduling Based on a Modified Ant Colony Optimization Algorithm. Appl. Sci. 2024, 14, 4082. [Google Scholar] [CrossRef]
- Xin, B.; Lu, S.; Wang, Q.; Deng, F.; Shi, X.; Cheng, J.; Kang, Y. Simultaneous scheduling of processing machines and automated guided vehicles via a multi-view modeling-based hybrid algorithm. IEEE Trans. Autom. Sci. Eng. 2023, 21, 4753–4767. [Google Scholar] [CrossRef]
- Lingkon, M.L.R.; Dash, A. Multi-Objective Flexible Job-Shop Scheduling with Limited Resource Constraints in Hospital Using Hybrid Discrete Firefly Algorithm. J. Inst. Eng. (India) Ser. C 2025, 106, 403–423. [Google Scholar] [CrossRef]
- Wang, W.; Zhang, J.; Jia, Y. Multiobjective flexible job-shop scheduling optimization for manufacturing servitization. Int. J. Web Inf. Syst. 2024, 20, 374–394. [Google Scholar] [CrossRef]
- Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence; MIT Press: Cambridge, MA, USA, 1992. [Google Scholar]
- Fuladi, S.K.; Kim, C.S. Dynamic events in the flexible job-shop scheduling problem: Rescheduling with a hybrid metaheuristic algorithm. Algorithms 2024, 17, 142. [Google Scholar] [CrossRef]
- Fatemi-Anaraki, S.; Tavakkoli-Moghaddam, R.; Foumani, M.; Vahedi-Nouri, B. Scheduling of multi-robot job shop systems in dynamic environments: Mixed-integer linear programming and constraint programming approaches. Omega 2023, 115, 102770. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; A Bradford Book; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Deisenroth, M.P.; Neumann, G.; Peters, J. A survey on policy search for robotics. Found. Trends Robot. 2013, 2, 1–142. [Google Scholar]
- Williams, R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 1992, 8, 229–256. [Google Scholar] [CrossRef]
- Li, X.; Guo, X.; Tang, H.; Wu, R.; Wang, L.; Pang, S.; Liu, Z.; Xu, W.; Li, X. Survey of integrated flexible job shop scheduling problems. Comput. Ind. Eng. 2022, 174, 108786. [Google Scholar] [CrossRef]
- Xu, Y.; Zhang, M.; Yang, M.; Wang, D. Hybrid quantum particle swarm optimization and variable neighborhood search for flexible job-shop scheduling problem. J. Manuf. Syst. 2024, 73, 334–348. [Google Scholar] [CrossRef]
- Fan, J.; Shen, W.; Gao, L.; Zhang, C.; Zhang, Z. A hybrid Jaya algorithm for solving flexible job shop scheduling problem considering multiple critical paths. J. Manuf. Syst. 2021, 60, 298–311. [Google Scholar] [CrossRef]
- Zhang, L.; Feng, Y.; Xiao, Q.; Xu, Y.; Li, D.; Yang, D.; Yang, Z. Deep reinforcement learning for dynamic flexible job shop scheduling problem considering variable processing times. J. Manuf. Syst. 2023, 71, 257–273. [Google Scholar] [CrossRef]
- Guo, H.; Liu, J.; Wang, Y.; Zhuang, C. An improved genetic programming hyper-heuristic for the dynamic flexible job shop scheduling problem with reconfigurable manufacturing cells. J. Manuf. Syst. 2024, 74, 252–263. [Google Scholar] [CrossRef]
- Uzer, M.S.; Inan, O. Application of improved hybrid whale optimization algorithm to optimization problems. Neural Comput. Appl. 2023, 35, 12433–12451. [Google Scholar] [CrossRef]
- Yuan, G.; Yang, W. Study on optimization of economic dispatching of electric power system based on Hybrid Intelligent Algorithms (PSO and AFSA). Energy 2019, 183, 926–935. [Google Scholar] [CrossRef]
- Corchado, E.; Abraham, A.; de Carvalho, A. Hybrid intelligent algorithms and applications. In Information Sciences—Informatics and Computer Science, Intelligent Systems, Applications: An International Journal; ACM: New York, NY, USA, 2010; Volume 180, pp. 2633–2634. [Google Scholar]
- Zhang, X.; Lin, Q.; Mao, W.; Liu, S.; Dou, Z.; Liu, G. Hybrid Particle Swarm and Grey Wolf Optimizer and its application to clustering optimization. Appl. Soft Comput. 2021, 101, 107061. [Google Scholar] [CrossRef]
- Ding, H.; Gu, X. Hybrid of human learning optimization algorithm and particle swarm optimization algorithm with scheduling strategies for the flexible job-shop scheduling problem. Neurocomputing 2020, 414, 313–332. [Google Scholar] [CrossRef]
- Xie, J.; Li, X.; Gao, L.; Gui, L. A hybrid genetic tabu search algorithm for distributed flexible job shop scheduling problems. J. Manuf. Syst. 2023, 71, 82–94. [Google Scholar] [CrossRef]
- Chen, X.l.; Li, J.q.; Du, Y. A hybrid evolutionary immune algorithm for fuzzy flexible job shop scheduling problem with variable processing speeds. Expert Syst. Appl. 2023, 233, 120891. [Google Scholar] [CrossRef]
- Zhang, C.; Song, W.; Cao, Z.; Zhang, J.; Tan, P.S.; Chi, X. Learning to dispatch for job shop scheduling via deep reinforcement learning. Adv. Neural Inf. Process. Syst. 2020, 33, 1621–1632. [Google Scholar]
- Zhang, D.; Cai, S.; Ye, F.; Si, Y.W.; Nguyen, T.T. A hybrid algorithm for a vehicle routing problem with realistic constraints. Inf. Sci. 2017, 394, 167–182. [Google Scholar] [CrossRef]
- Dauzère-Pérès, S.; Ding, J.; Shen, L.; Tamssaouet, K. The flexible job shop scheduling problem: A review. Eur. J. Oper. Res. 2024, 314, 409–432. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Michalewicz, Z. Genetic Algorithms + Data Structures = Evolution Programs; Springer Science & Business Media: New York, NY, USA, 2013. [Google Scholar]
- Pezzella, F.; Morganti, G.; Ciaschetti, G. A genetic algorithm for the flexible job-shop scheduling problem. Comput. Oper. Res. 2008, 35, 3202–3212. [Google Scholar] [CrossRef]
- Li, X.; Gao, L.; Li, X.; Gao, L. An Effective Genetic Algorithm for FJSP. In Effective Methods for Integrated Process Planning and Scheduling; Springer: Berlin/Heidelberg, Germany, 2020; pp. 133–155. [Google Scholar]
- Luo, S. Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning. Appl. Soft Comput. 2020, 91, 106208. [Google Scholar] [CrossRef]










| No. | Method | Key Contribution (s) | Limitation (s) |
|---|---|---|---|
| 1 | Genetic Algorithm (GA) | Foundation for global search in JSP/FJSP; highly adaptable via various crossover, mutation, and selection mechanisms. | Performance degrades in dynamic environments; requires full preknowledge of all tasks; convergence speed and solution quality rely heavily on parameter tuning. |
| 2 | Particle Swarm Optimization (PSO) | Effective for continuous optimization problems; faster convergence than the GA in some scenarios. | Not naturally suited for discrete scheduling problems; requires problem-specific encoding; similar to the GA, struggles with dynamic events. |
| 3 | HLO-PSO (Hybrid) | Combines human-learning concepts with PSO to enhance local search and avoid premature convergence in the FJSP. | Complex hybrid mechanism; computational overhead may be high; evaluation primarily in static settings. |
| 4 | Tabu Search | Powerful local search with memory mechanism to escape local optima. | Performance sensitive to parameter setting (e.g., tabu tenure); neighborhood search can be computationally expensive for large instances. |
| 5 | Hybrid GA-Tabu Search | Leverages a GA for global exploration and Tabu Search for intensified local search in the distributed FJSP. | Designed for a distributed and likely static environment; the integration strategy may not be directly applicable to highly dynamic DFJSP. |
| 6 | GNN + PPO | Uses Graph Neural Networks (GNNs) to effectively model shop floor state for RL in the FJSP. | Model complexity; requires structured graph representation; generalization to unseen task-machine configurations needs validation. |
| 7 | GNN + RL | Proposes an adaptive scheduling framework using a GNN to capture complex interrelationships. | Focuses on state representation; the RL agent itself may lack specialized optimization, such as HPO. |
| 8 | DQN + Rules | Integrates Deep Q-Network with composite dispatching rules to handle new job insertions in the DFJSP. | Limited by the representational capacity of predefined rules; value-based RL (DQN) may be less stable than policy-based methods (e.g., PPO). |
| 9 | SLGA (Hybrid RL-GA) | Integrates SARSA/Q-learning with a GA for self-learning evolution, improving search efficiency. | Employs simpler RL algorithms (SARSA/Q-learning) which may have limitations in complex policy learning compared to advanced policy gradient methods. |
| 10 | GPHH (Hyper-Heuristic) | Uses Genetic Programming to evolve dispatching rules for the DFJSP, offering interpretability. | Evolved rules can become complex and less interpretable; performance might plateau against non-rule-based approaches. |
| This Work | GA-HPO PPO (Hybrid) | 1. Integrates a GA for neural architecture search. 2. Employs Optuna for automated HPO. 3. Combines strengths of both for superior performance in the DFJSP, with demonstrated generalization. | 1. Higher computational cost during offline optimization phase. 2. Performance in extreme real-time constraints not yet validated. |
| Variable | Meaning |
|---|---|
| n | Total number of tasks |
| m | Total number of machines |
| Arrival time of task i | |
| Deadline of task i | |
| Type of task i | |
| Processing time of task i on machine k | |
| Set of machines capable of completing task type | |
| Total time machine k is busy | |
| Binary variable to determine if task i is assigned to machine k | |
| Start time of task i | |
| Completion time of task i | |
| Binary variable to determine if a task is overdue | |
| Average machine utilization rate |
| Hyperparameter | Range | Significance |
|---|---|---|
| Learning Rate | [1 × 10−5, 1 × 10−3] | Determines the step size during gradient descent, affecting convergence speed and stability. |
| Discount Factor | [0.95, 0.99] | Balances immediate and future rewards, influencing the agent’s long-term strategy. |
| Clipping Range | [0.1, 0.3] | Controls the extent of policy updates, ensuring stable and conservative adjustments. |
| Number of Epochs | [3, 10] | Specifies how many times the entire batch of data is used for policy updates, impacting learning efficiency. |
| GAE Parameter | [0.9, 0.99] | Governs the trade-off between bias and variance in advantage estimation, affecting the stability of learning. |
| Batch Size | 32, 64, 128 | Determines the number of samples processed before updating the network parameters, influencing training dynamics. |
| Task ID | Interval Time | Arrival Time | Deadline | Task Type |
|---|---|---|---|---|
| 1 | 1.24 | 1.24 | 4.47 | 1 |
| 2 | 1.13 | 2.37 | 5.95 | 2 |
| 3 | 0.81 | 3.19 | 7.11 | 3 |
| … | … | … | … | … |
| 99 | 0.97 | 97.79 | 101.41 | 2 |
| 100 | 1.35 | 99.15 | 104.09 | 3 |
| Service Time (Si, Tj) | Task Type 1 | Task Type 2 | Task Type 3 |
|---|---|---|---|
| 1 | 4.60 | 3.52 | |
| 2 | 4.22 | 4.44 | |
| 3 | 3.54 | ||
| 4 | 3.56 | 4.90 | |
| 5 | 4.67 | 2.62 |
| Algorithm | Parameter Setting | ||
|---|---|---|---|
| GA-HPO PPO | K_epoch = 4 | lam = 0.91314 | batch_size = 128 |
| lr = 1.1706 × 10−5 | gamma = 0.9507 | eps_clip = 0.21916 | |
| PPO | K_epoch = 10 | lam = 0.95 | batch_size = 64 |
| lr = 1 × 10−4 | gamma = 0.99 | eps_clip = 0.2 | |
| GA | Iter_max = 2000 | initial_mutation_rate = 0.8 | elite_rate = 0.02 |
| tournament_size = 5 | final_mutation_rate = 0.1 | ||
| DDQN | lr = 0.0001 | batch_size = 64 | epsilon = 1.0 |
| epsilon_decay = 0.995 | epsilon_min = 0.1 | ||
| Instance | V1 (100 × 5) | V2 (100 × 5) | V3 (1000 × 30) | V4 (1000 × 30) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GA | 49 | 102.62 | 77.04% | 42 | 104.51 | 70.96% | 212 | 508.69 | 22.74% | 0 | 506.02 | 8.75% |
| PPO | 68 | 116.90 | 71.20% | 57 | 108.39 | 68.30% | 676 | 509.51 | 31.20% | 113 | 506.61 | 16.40% |
| GA-HPO PPO | 58 | 108.33 | 75.00% | 46 | 114.20 | 67.00% | 308 | 508.62 | 28.30% | 10 | 506.52 | 14.00% |
| DDQN | 90 | 110.83 | 74.35% | 84 | 113.13 | 70.00% | 849 | 530.26 | 32.04% | 511 | 511.34 | 19.52% |
| FCFS + EAMS | 68 | 102.67 | 80.01% | 52 | 107.56 | 73.91% | 794 | 508.02 | 33.95% | 320 | 506.02 | 20.93% |
| Instance | n × m | GA-HPO PPO | PPO | ||||
|---|---|---|---|---|---|---|---|
| MK01 | 100 × 5 | 54 | 114.71 | 72% | 58 | 111.12 | 73% |
| MK02 | 100 × 5 | 39 | 116.59 | 63% | 53 | 121.12 | 61% |
| MK03 | 100 × 5 | 52 | 106.16 | 77% | 60 | 123.33 | 67% |
| MK04 | 100 × 5 | 34 | 109.42 | 62% | 41 | 109.04 | 63% |
| MK05 | 100 × 5 | 30 | 102.63 | 68% | 40 | 102.84 | 71% |
| MK06 | 1000 × 30 | 519 | 510.69 | 28% | 684 | 512 | 31% |
| MK07 | 1000 × 30 | 527 | 503.69 | 29% | 691 | 502.84 | 31% |
| MK08 | 1000 × 30 | 527 | 509.12 | 29% | 704 | 509.53 | 32% |
| MK09 | 1000 × 30 | 578 | 494.46 | 30% | 683 | 493.46 | 33% |
| MK10 | 1000 × 30 | 522 | 504.72 | 29% | 689 | 506.65 | 32% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, Y.; Jiang, J.; Shi, Q.; Fu, M.; Zhang, Y.; Chen, Y.; Zhou, L. GA-HPO PPO: A Hybrid Algorithm for Dynamic Flexible Job Shop Scheduling. Sensors 2025, 25, 6736. https://doi.org/10.3390/s25216736
Zhou Y, Jiang J, Shi Q, Fu M, Zhang Y, Chen Y, Zhou L. GA-HPO PPO: A Hybrid Algorithm for Dynamic Flexible Job Shop Scheduling. Sensors. 2025; 25(21):6736. https://doi.org/10.3390/s25216736
Chicago/Turabian StyleZhou, Yiming, Jun Jiang, Qining Shi, Maojie Fu, Yi Zhang, Yihao Chen, and Longfei Zhou. 2025. "GA-HPO PPO: A Hybrid Algorithm for Dynamic Flexible Job Shop Scheduling" Sensors 25, no. 21: 6736. https://doi.org/10.3390/s25216736
APA StyleZhou, Y., Jiang, J., Shi, Q., Fu, M., Zhang, Y., Chen, Y., & Zhou, L. (2025). GA-HPO PPO: A Hybrid Algorithm for Dynamic Flexible Job Shop Scheduling. Sensors, 25(21), 6736. https://doi.org/10.3390/s25216736

