1. Introduction
The operation of flexible job shops is impeded by numerous uncertain events, which significantly reduce the production efficiency and prolong the completion time. Particularly in the Intelligent Manufacturing System for Aluminum Profiles, machine faults, considering machine recovery and variable processing time, pose a significant challenge. In practical scenarios, flexible production lines require additional time for machine replacement and maintenance following such machine faults. In the Intelligent Manufacturing System for Aluminum Profiles, the machine processing time frequently varies due to wear and tear during the production of aluminum profiles. The dynamic flexible job shop scheduling problem (DFJSP) [
1] with machine faults considering recovery conditions and variable processing time (MFRVT-DFJSP) becomes particularly important. These types of problems are collectively known as DFJSP. The DFJSP is a variation of the flexible job shop problem (FJSP) [
2]. Among the DFJSP, the FJSP [
3] and the job shop problem (JSP) [
4] are NP-hard combinatorial optimization problems in the fields of computer science and operations research, presenting significant challenges.
The DFJSP rescheduling scheme must be generated in real time based on incomplete job and machine information after dynamic events [
1] occur. There are three types of methods for generating DFJSP rescheduling schemes: exact methods, meta-heuristic methods, and heuristic methods. Exact methods, such as integer linear programming [
5], guarantee the optimal solution. However, it is not feasible to solve the DFJSP in a reasonable time, making it impossible to solve such problems with exact algorithms. Meta-heuristic methods, such as the genetic algorithm [
6], particle swarm optimization [
7], and tabu search algorithm [
8], obtain approximate solutions in a feasible time. These meta-heuristic methods solve the rescheduling scheme of the DFJSP, which makes it difficult to generate high-quality DFJSP solutions in real time. Heuristic methods, represented by the priority dispatching rule (PDR) [
9], generate the rescheduling scheme of the DFJSP in real time. However, heuristic methods represented by the PDR have difficulty producing scheduling schemes that meet actual quality requirements. The Monte Carlo Tree Search algorithm (MCTS) [
10] employs the Monte Carlo method to build search trees and find approximate solutions in a feasible time. The MCTS is efficiently combined with reinforcement learning, providing a method to solve rescheduling schemes that generate high-quality DFJSP solutions in real time. Therefore, this research focuses on an MCTS algorithm framework combining relational graph attention networks (RGAT) [
11] and reinforcement learning [
12] (MGRL) to solve high-quality DFJSP rescheduling schemes in real-time.
Obtaining high-quality scheduling knowledge is a key challenge in solving the DFJSP in real time. Scheduling knowledge is integrated into various optimization methods through different forms. For example, PDR methods rely on pre-defined scheduling knowledge by experts and employ scheduling knowledge in the form of rules. The pre-defined scheduling knowledge of experts in the form of rules does not accurately reflect the optimal mapping relationship between job processes and machines, resulting in low-quality scheduling schemes generated by PDR. Accurately analyzing the scheduling disjunctive graph using traditional Graph Neural Networks (GNNs) and acquiring high-quality suggestions for optimizing the scheduling disjunctive graph by applying scheduling knowledge are challenging tasks. This is because the optimization of the makespan objective in the scheduling disjunctive graph is a global graph task rather than a traditional local graph task. Traditional GNNs, which aggregate neighborhood nodes of the target node in the graph, are unsuitable for addressing global graph tasks, especially analyzing the scheduling disjunctive graph, due to the over-squashing [
13] problem of information being compressed or distorted while passing among distant nodes. To address this problem, the MGRL employs the RGAT that integrates attention-based transformer models [
14] and relational-enhanced graph encoders to analyze the complex global relationships in the scheduling disjunctive graph [
15], propose suggestions for optimizing the scheduling disjunctive graph [
16], and obtain high-quality scheduling schemes. In particular, the relational-enhanced graph encoder in the RGAT is used to enhance the graph structure representation and strengthen the correlation modeling between pairs of nodes in the transformer model, thereby improving the quality of suggestions for optimizing the scheduling disjunctive graph. Experiments show that the MGRL with RGAT is effective in improving the quality of scheduling schemes in real time.
When solving the DFJSP in real time, enhancing the search capability of MCTS by efficiently utilizing high-quality scheduling knowledge learned through reinforcement learning and improving its running efficiency are key challenges. The MCTS is adept at making high-quality next-step optimization decisions for the current scheduling scheme and solving the DFJSP. In the MCTS algorithm with a learning mechanism-assisted Monte Carlo sampling method [
17], the Monte Carlo sampling process frequently wastes a lot of time and makes only one move, although the Monte Carlo sampling process improves the quality of next-step optimization decisions for the current scheduling scheme by understanding the neighborhood region. At the same time, a key observation is that the Monte Carlo sampling process frequently identifies potential solutions that are better than the current solution. How to utilize these potential solutions found during the Monte Carlo sampling process to enable the MCTS to move towards the global optimum more efficiently by skipping some nodes, while avoiding excessive reliance on local optima that could trap the algorithm in a local optimal region, is a challenging problem. To address this problem, the MGRL employs a skip-node restart strategy that allows MCTS to skip some nodes when conditions are satisfied and directly select the local optimal node to move, which significantly reduces the search times of MCTS and improves its local search ability. At the same time, the skip-node restart strategy employs a restart method [
18] to prevent the MCTS from getting trapped in local optimal regions due to a decline in the quality of optimization decisions. The experiments show that the MGRL with the skip-node restart strategy is effective in improving the quality of scheduling schemes, indicating that the skip-node restart strategy significantly enhances the ability to utilize high-quality scheduling knowledge obtained through reinforcement learning.
In summary, solving the DFJSP with machine faults, considering the recovery condition and the variable processing time, is key to improving production efficiency. The focus of this paper is on the design of an efficient MCTS and the proposal of the MGRL algorithm framework for generating real-time rescheduling schemes to minimize the makespan. The main research problems and contributions are as follows:
To address the problem of requiring high-quality scheduling knowledge and globally analyzing the scheduling disjunctive graph, the MGRL employs the RGAT that integrates the attention-based transformer model and the relational-enhanced graph encoder.
To address the problem of efficiently utilizing high-quality scheduling knowledge and improving the running efficiency of MCTS by leveraging local optimal solutions found during the Monte Carlo sampling process, the MGRL employs the skip-node restart strategy to skip some nodes, directly select the optimal node for finding high-quality scheduling schemes faster, and avoid the MGRL from stepping into the local optimal area due to excessive use of local optima.
A transformer-integrated and constraint-enhanced RGAT is designed to analyze the scheduling disjunctive graph, guide the Monte Carlo sampling method to improve sampling efficiency, and enhance the quality of MCTS optimization decisions.
The relational-enhanced graph encoder in the RGAT is designed to further improve the ability of RGAT to acquire high-quality scheduling knowledge.
A skip-node restart strategy that utilizes local optimal solutions found during the Monte Carlo sampling process is designed to enhance the optimization efficiency of the MCTS in real time.
The remaining sections of this paper are structured as follows:
Section 2 discusses the existing literature review. An overview of the DFJSP is provided in
Section 3. A detailed description of the proposed method is presented in
Section 4. Experimental results are offered in
Section 5. Further discussions are provided in
Section 6. Finally, the paper is concluded in
Section 7.
6. Further Discussion
6.1. Discussion About Generalization for MGRL
The paper considers machine failures and recoveries as well as processing time variations, because the changes in the number of machines and the processing time parameters of machines are the variation parameters in many common and dynamic events in scheduling. The paper focuses on verifying the real-time optimization ability of MGRL through classic and complex dynamic events. The MGRL is not limited to these two types of dynamic events. In order to verify the generalization and real-time optimization ability of MGRL, we conducted comparative experiments based on the classic FJSP problem. The experimental parameter settings are as shown in the
Table 3. The running times are set as 5, 30, and 60, respectively. The comparative experiments were carried out based on the public benchmark MK and the simulation benchmark. The MK benchmark comes from Brandimarte [
44] including MK01, MK02, MK03, MK04, MK05, MK06, MK07, MK08, MK09, and MK10 instances. The simulation benchmark includes the instance problem sizes of 40 × 20, 50 × 25, and 70 × 35. For instance, problem sizes of 40 × 20, 50 × 25, and 70 × 35, 20 instances were used for each size. Each instance generated 100 dynamic events, and each dynamic event was executed 50 times to obtain the mean value. The hyperparameters for the simulation dataset are shown in
Table 5.
Figure 14 shows the results of the MGRL and comparison algorithms executed on the public benchmark MK. In
Figure 14, the horizontal axis is the name of the public benchmark, mk01 to mk10, and the vertical axis is the Gap value. When the value of the Gap is smaller, the performance of the algorithm is closer to the optimal solution.
Figure 15 shows the results of the MGRL and the comparison algorithm executed on the simulation benchmark. As shown in
Figure 15, the horizontal axis is the problem size, and the vertical axis is the WR. When the value of the WR is higher, the performance of the algorithm is closer to the optimal solution.
Figure 14 and
Figure 15 list all the calculation results of CPU time 5, 30, 60 s.
As shown in
Figure 14, the MGRL outperforms the MCTS algorithm in 10 out of 10 instances when running for 60 s in the public benchmark. The results of MGRL comprehensively outperforming MCTS on public datasets once again prove the effectiveness of the improvements made by the MGRL. Since the maximum number of workpieces in the MK benchmark is 20, which belongs to a small-scale dataset, the ability of MGRL based on neural networks to accelerate optimization with high-quality optimization decisions is difficult to demonstrate. As shown in
Figure 15, MGRL outperforms all baseline algorithms in four out of five instance problem sizes when running for 60 s in the simulation benchmark, which is larger than the MK benchmark in the instance problem size. The time for a population of the GA algorithm to perform a single iteration optimization operation is faster than the time for MGRL to perform a single iteration optimization operation. The MGRL achieves better optimization results than GA within 60 s, because the Monte Carlo sampling method-based MGRL provides higher-quality optimization decisions than the GA. Therefore, even though the time for the MGRL to perform a single iteration optimization operation is slower than that of the GA, the MGRL still achieves better optimization results in real-time than the GA. Thus, under reasonably scaled instances and within real-time constraints, the fact that MGRL outperforms all comparison algorithms in solving the classical FJSP proves that the improvement of the MGRL in optimization capability is versatile for solving FJSP.
Although the MGRL has demonstrated outstanding performance in solving the DFJSP, considering machine breakdowns and processing time variability, as well as the classic FJSP, its potential extends far beyond these applications. The framework of the MGRL, which combines MCTS with a skip-node restart strategy and RL, along with the Graph Attention Network design based on constraint relation enhancement and the transformer model, is not dependent on specific dynamic events. This makes it easily adaptable to DFJSPs, considering a wider range of dynamic events, thereby endowing the MGRL with strong adaptability and generalization ability. This capability enables the MGRL not only to efficiently handle the specific dynamic events involved in the current study but also to have the potential to be applied to more FJSP-derived problems and DFJSPs with more dynamic events.
Specifically, the results achieved by the MGRL in solving the FJSP and the DFJSP, considering machine breakdowns and processing time variability, have proven its efficiency and robustness in optimizing scheduling solutions. These achievements indicate that the MGRL can effectively utilize the high-quality scheduling knowledge obtained through reinforcement learning, conduct global analysis of complex scheduling problems via the relation-enhanced Graph Attention Network, and efficiently search for high-quality solutions in large-scale solution spaces through the Monte Carlo Tree Search. These characteristics enable the MGRL to quickly adapt and generate high-quality scheduling solutions when faced with other types of FJSP-derived problems, such as Distributed Flexible Job Shop Scheduling Problems, and different types of dynamic events, such as emergency order insertion.
Moreover, the application analysis of the MGRL in real-world industrial scenarios further validates its industrial value in complex dynamic environments. This demonstrates that the MGRL not only excels in theoretical research but also has the capability to solve complex scheduling problems in actual production settings. Therefore, the MGRL is expected to be applied to a broader range of FJSP-derived problems in future research, providing new ideas and methods for solving complex scheduling problems in actual production and further promoting the development of intelligent scheduling technologies.
6.2. Discussion About Device Difference
Generating high-quality rescheduling solutions in real time and solving the DFJSP are the core focuses of this study when dynamic events occur. In this paper, the upper limit for real-time computation is set at 60 s. Moreover, in
Section 5, experiments validated the performance of MGRL within this time limit. To ensure fairness in the experiments, all comparative and ablation experiments in
Section 5 were conducted on the same computing platform, which is the Intel I5-10400F and Nvidia RTX 3060/12 GB. This platform is referred to as I3060 in
Table 8. However, the impact of different platforms on the execution of MGRL under the constraint of this time limit is not clear. Therefore, the following discussion focuses on the influence of different devices on the execution of the MGRL.
Based on the 100 40 × 20 and 100 100 × 50 scale cases generated in
Section 5.7, I3060 was compared with four common platforms from cloud service providers. The details of these four computing devices are shown in
Table 8. The selection criteria for these four devices were based on computing devices with an hourly cost of less than CNY 2 provided by AutoDL, one of the most popular cloud computing providers in China. This price point is suitable for online deployment by industrial enterprises. In particular, E5-3060 was chosen, because its CPU differs from that of I3060. This difference can illustrate the impact of CPU variations on the execution of the MGRL. In these 200 40 × 20 and 100 × 50 scale cases, each case simulated 10 dynamic events. Moreover, each dynamic event was executed 100 times on each of the five devices. Subsequently, the number of MGRL iterations within the 60 s time limit was recorded for I3060 and the other four devices. Thereafter, paired sample
t-tests were conducted to assess the significance of the results between I3060 and the other four devices. The experimental results are shown in
Table 9.
As shown in
Table 9, each row represents a set of experiments, totaling four sets. The computing platform information involved in each set of experiments is listed in the Device Name column. The Mean column records the mean number of MGRL iterations within the 60 s time limit for each set of experiments. The Std. Error Mean column records the standard error of the mean for the number of MGRL iterations in each set of experiments. The 95% Confidence Interval column records the 95% confidence interval of the difference for the number of MGRL iterations in each set of experiments. The Sig. (2-tailed) column records the significance level for the number of MGRL iterations in each set of experiments. The significance levels, all below 0.05, indicate that there is no significant difference among the four sets of experiments. This result further demonstrates that the MGRL-related experimental data in
Section 5 are practically meaningful for low-cost computing devices.
6.3. Discussion About Simulation Environment Difference
In the experiments designed to validate the algorithm’s performance, a simulation environment based on aluminum profile processing scenarios was employed. Discussing the extent to which this simulation environment replicates real-world conditions is crucial for further analyzing the algorithm’s applicability. Generating FJSP instances that conform to aluminum profile production scenarios is the main contribution of this simulation environment. To demonstrate that the generated FJSP instances can reasonably simulate real-world FJSP instances, statistical tests based on job-related metrics were conducted.
Data from a factory in a province in Northwest China, including 312 types of aluminum profile jobs and the processing quantities of each job, were used as the basis for real-world aluminum profile processing environment FJSP instances. The processing machines for these real-world job data are all listed in
Table 6. The statistical experiments involved randomly sampling instances of 80 × 40, 100 × 50, and 200 × 100 problem sizes from the real-world job data and statistically comparing them with instances of the same problem sizes generated by the simulation environment. When randomly sampling instances of 80 × 40, 100 × 50, and 200 × 100 problem sizes from the real-world job data, the experiments used duplicate-type machines to fill in the insufficient machine scales. Job-related statistical metrics in the instances include the average maximum processing time per job
, the average minimum processing time per job
, and the average maximum number of operations per job
within an instance.
The maximum processing time per job, the average minimum processing time per job, and the remaining number of operations per job are key metrics for PDR-based job scheduling decisions. This also indicates that these three metrics are crucial factors affecting the design of scheduling schemes. The average maximum processing time per job , the average minimum processing time per job , and the average maximum number of operations per job within an instance are calculated as the means of these three commonly used PDR metrics for all jobs within the instance. The specific calculation formulas for these metrics are as follows:
Specifically, the Simulation Environment Difference Analysis involves conducting independent-samples
t-tests on instances of 80 × 40, 100 × 50, and 200 × 100 problem sizes randomly sampled from the real-world job data mentioned above and instances of the same problem sizes generated by the simulation environment. For each problem size, 300 instances were generated. The independent-samples
t-tests were performed based on the assumption of equal variances. The results of these independent-samples
t-tests are shown in
Table 10. These results indicate that there are no significant differences in the classic job metrics
,
, and
calculated based on instances between those generated by the simulation environment and those sampled from real-world scenarios. This conclusion further demonstrates that the instances generated by the simulation environment reasonably mimic those sampled from real-world scenarios. Conducting experiments and scheduling research based on instances generated by this simulation environment hold practical significance.
6.4. Limitations
Despite the performance improvements demonstrated by the MGRL in solving the DFJSP in real time, there are several limitations in the MGRL. First, MGRL’s performance on extremely large-scale problems requires further optimization. The key reason the MGRL outperforms the baseline algorithms is its superior decision-making efficiency. This means that the algorithm can make higher-quality optimization decisions within a limited time frame. The primary means by which the MGRL enhances the quality of optimization decisions is through the use of deep learning methods. Larger-scale instances pose greater challenges to the model’s convergence and online computation time. Second, the current implementation of MGRL focuses primarily on minimizing the makespan. Extending the optimization objective to consider multiple objectives, such as energy consumption, tardiness, and machine utilization, is an important direction for broader applicability. Additionally, the acquisition and efficient utilization of high-quality scheduling knowledge still need to be further addressed.
To address these limitations and further enhance the generalizability of MGRL, several future research directions are proposed. First, it is a good idea to optimize the MCTS with GNN and RL for extremely large-scale and distributed scheduling problems by exploring more efficient searching strategies and parallel computing techniques. Second, it would be good to extend the optimization objective to enable the MGRL to address a wider range of practical scheduling scenarios, such as the cascading scheduling scenarios of a flow shop and a flexible job shop. Finally, future work will continue to explore the efficient integration method combining optimization algorithms and neural networks to acquire and utilize learned knowledge, thereby improving searching capabilities.