Improved SP-MCTS-Based Scheduling for Multi-Constraint Hybrid Flow Shop

: As a typical non-deterministic polynomial (NP)-hard combinatorial optimization problem, the hybrid ﬂow shop scheduling problem (HFSSP) is known to be a very common layout in real-life manufacturing scenarios. Even though many metaheuristic approaches have been presented for the HFSSP with makespan criterion, there are limitations of the metaheuristic method in accuracy, e ﬃ ciency, and adaptability. To address this challenge, an improved SP-MCTS (single-player Monte-Carlo tree search)-based scheduling is proposed for the hybrid ﬂow shop to minimize the makespan considering the multi-constraint. Meanwhile, the Markov decision process (MDP) is applied to transform the HFSSP into the problem of shortest time branch path. The improvement of the algorithm includes the selection policy blending standard deviation, the single-branch expansion strategy and the 4-Rule policy simulation. Based on this improved algorithm, it could accurately locate high-potential branches, economize the resource of the computer and quickly optimize the solution. Then, the parameter combination is introduced to trade o ﬀ the selection and simulation with the intention of balancing the exploitation and exploration in the search process. Finally, through the analysis of the calculated results, the validity of improved SP-MCTS (ISP-MCTS) for solving the benchmarks is proven, and the ISP-MCTS performs better than the other algorithms in solving large-scale problems.


Introduction
The flow shop scheduling problem (FSSP) has been a very active research field, since it was first proposed by Johnson [1]. In the FSSP, a set of n jobs have to be processed on m single-machine stages, where each job follows the same route of stages. In production, the FSSP may result in overloading some stages and blocking some jobs [2]. When parallel machines are introduced in some of the stage, the standard flow shop layout becomes a hybrid flow shop (HFS) for these questioners in many real-life manufacturing scenarios. Given its practical interest, the hybrid flow shop scheduling problem (HFSSP) is widely used in today's manufacturing and production systems, such as heavy industry, light industry and other industrial systems. Taking into account the computation complexity, more researchers have conducted in-depth research in this field, since HFSSP has been proved as a non-deterministic polynomial (NP)-hard problem [3]. Simultaneously, production scheduling has been proved to play an important role for improving productivity and responsiveness in the manufacturing system [4]. Therefore, developing more efficient algorithms for this problem is significant.
In order to solve these disadvantages, some scholars adopted machine learning to study HFSSP for achieving an efficient scheduling. Monte-Carlo tree search (MCTS) algorithms heuristically build an asymmetric partial search tree by applying machine learning, and then search through the potential branches to solve the optimal strategy by continuously traversing [31]. Therefore, Chaslot et al. [32] first introduced MCTS for solving production-related problems. A MCTS-based algorithm for the multi-objective flexible job shop scheduling problem was presented by Wu et al. [33], and it was improved by incorporating the Variable Neighborhood Descent Algorithm and other techniques, like rapid action value, which can estimate the heuristic and transposition table. Chou et al. [34] used the improved MCTS algorithm to solve the multi-objective flexible shop scheduling problem, and search the minimum completion time by the adaptive value game comparison. Furuoka and Matsumoto [35] used an MCTS-based algorithm to find good schedules for a re-entrant scheduling problem. Although good efforts have been made in the above studies, and MCTS is a series of two-person zero-sum game decision-making methods, there are still limitations in the performance and learning efficiency when solving shop floor scheduling problems.
For minimizing the makespan, the HFSSP scheduling approach using improved SP-MCTS (single-player Monte-Carlo tree search) is proposed in this paper. Schadd et al. [36] proposed a new machine learning algorithm, which is named the SP-MCTS algorithm. Schadd et al. [37] applied the algorithm to solve the single-machine puzzle game, and the application algorithm achieved good results in the standardized test. Shimpei et al. [38] verified the effectiveness of the new simulation policy through data experiments, calibrated the relationship between the search method and parameters, and demonstrated the application potential of the SP-MCTS algorithm in actual scheduling.
Through the above review, the improved SP-MCTS (ISP-MCTS) algorithm was successfully applied in the scheduling problems. Therefore, this paper applies ISP-MCTS to solve HFSSP, and the organization of this paper is as follows. In Section 2, the flow shop scheduling problem is introduced, and the mathematical model is established. Then, the improvement scheme of SP-MCTS is introduced in detail in Section 3. In Section 4, the calibration of the algorithm parameters is performed. In Section 5, the experimental results and the comparison are presented. Finally, the conclusions are discussed in Section 6.

Hybrid Flow Shop Scheduling Problem
The HFSSP is described as follows: a set of n jobs J = {1, 2, . . . , n} must be processed by the s-stage, i.e., O j1 →O j2 , . . . , O js , S = {1, 2, . . . , s}. Each stage k ∈S, has M k ≥ 1 identical parallel machines, and each job j can be processed in any one of the M k parallel machines. At the same time, the processing time of the job j in the stage k (O jk ) with deterministic processing time P jk . Under known the parameters and assumptions, the scheduling policy is solved to minimize the maximum completion time for processing n jobs.

Assumption
The HFSSP problem satisfies the following assumptions: At the beginning, each job release time is 0. Job j can only be processed once in each stage. Each process of job j can only be processed in one machine, and each machine can only process one job at a time. There are infinite buffers in the adjacent two stages. Job setup time and the travel time between consecutive stages are included in the processing time P js or are ignored.

Symbol Definition
The notation of the HFSSP model is shown as follows: P jk is the processing time of job j at stage k. D jk is the start processing time of job j at stage k. BM denotes a very large number, the traditional "Big M". x jik if job j is assigned to machine i at stage k, then x jik = 1, otherwise x jik = 0. y jj'k if job j is processed earlier than j' at stage k, then y jj'k = 1, otherwise y jj'k = 0.

Mathematical Model
Using the above notation, the mathematical relationship of HFSSP is formulated as follows: The objective function (1) is to minimize the maximum completion time. Constraints (2) ensures that each job completes all the processing stages and can only be assigned to one machine in each stage. Equation (3) ensures that the first stage start time of each job is greater than or equal to 0. Equation (4) indicates that when the job is processed in two consecutive stages, the next process can be started after the previous process is completed. Equations (5) and (6) describe the sequential constraints of different jobs in the same machine. When two jobs are processed, the previous job must be finished before the following job can be started. Equations (7) and (8) define the value ranges of decision variables.

ISP-MCTS Algorithm Design
The MCTS is based on the Monte Carlo simulation to build an asymmetric search tree, and subsequently find the "best" action in the current state [39]. The SP-MCTS algorithm derived from MCTS is proposed to solve HFSSP, and HFSSP is seen as a single-player optimization problem. In the solving process, the processing machines are regarded as starred positions (drop points), and the rules are formulated according to the constraint relationship in the mathematical model. With the goal of maximizing the score, the entire search process includes four phases of selection, expansion, simulation and backpropagation, as shown in Figure 1.

HFSSP Model
According to Markov decision processes (MDP), the model of HFSSP is established [40,41]: (1) s is a series of state matrix of the HFSSP, describing the relationship between the job state and stage. s1 and sgoal represent the initial state and completion state in the HFSSP. s1, s2,…, sgoal are used to describe the job production process through various stages. (2) a is the set of scheduling policies in each state s, the scheduling policy sets a1, a2,…, av are extracted under each state s1, s2,…, sgoal. A is the current scheduling policy selected from set a. Therefore, A1, A2,…, Av are the policies corresponding to each state of s1 to sgoal, selected from the policy sets of a1, a2,…, av. (3) T(s, A, s') = 1, the scheduling policy A is selected, and the probability that the state is converted from s to s' is 1.
According to the design methodology of the above MDP model, the relationship between the HFSSP field environment and the model state is established. The matrix model Sn×m satisfies the following conditions: (1) n represents the number of jobs, and m represents the number of stages. (2) The first column and the m th column of the matrix are on-line stage and offline stage respectively, the second to the m-1 th columns represent the processing stages. (3) The number of jobs being processed must be less than or equal to the number of parallel machines in the stage. (4) The job in the same row is processed according to the sequence of stages, so the job in the next column cannot be started until the job in the current column is completed. (5) For any job, there is Pi1 = 0, Pim = ∞. In Figure 1, S is a 4 × 4 state matrix, which is the problem of four jobs and four-stage HFSSP. In this matrix, stage 1 and stage 4 are the online stage and offline stage, respectively, and stage 2 and stage 3 have two parallel processing machines. When the state is searched to the k th state matrix, the element sk(i,j) = 0 means the job i is not in the machine of the stage j, and sk(i,j) = 1 represents that the job i is processing in the stage j. sk(i,j) = 2 represents that the job i has finished processing in the j th stage

HFSSP Model
According to Markov decision processes (MDP), the model of HFSSP is established [40,41]: (1) s is a series of state matrix of the HFSSP, describing the relationship between the job state and stage. s 1 and s goal represent the initial state and completion state in the HFSSP. s 1 , s 2 , . . . , s goal are used to describe the job production process through various stages. (2) a is the set of scheduling policies in each state s, the scheduling policy sets a 1 , a 2 , . . . , a v are extracted under each state s 1 , s 2 , . . . , s goal . A is the current scheduling policy selected from set a. Therefore, A 1 , A 2 , . . . , A v are the policies corresponding to each state of s 1 to s goal , selected from the policy sets of a 1 , a 2 , . . . , a v . (3) T(s, A, s') = 1, the scheduling policy A is selected, and the probability that the state is converted from s to s' is 1.
According to the design methodology of the above MDP model, the relationship between the HFSSP field environment and the model state is established. The matrix model S n×m satisfies the following conditions: (1) n represents the number of jobs, and m represents the number of stages.
(2) The first column and the m th column of the matrix are on-line stage and offline stage respectively, the second to the m-1 th columns represent the processing stages. (3) The number of jobs being processed must be less than or equal to the number of parallel machines in the stage. (4) The job in the same row is processed according to the sequence of stages, so the job in the next column cannot be started until the job in the current column is completed. (5) For any job, there is P i1 = 0, P im = ∞. In Figure 1, S is a 4 × 4 state matrix, which is the problem of four jobs and four-stage HFSSP. In this matrix, stage 1 and stage 4 are the online stage and offline stage, respectively, and stage 2 and stage 3 have two parallel processing machines. When the state is searched to the k th state matrix, the element s k(i,j) = 0 means the job i is not in the machine of the stage j, and s k(i,j) = 1 represents that the job i is processing in the stage j. s k(i,j) = 2 represents that the job i has finished processing in the j th stage and waits for scheduling. a kx represents the scheduling policy set (A k1 , A k2 , . . . A ku ) in the s k state. If the element A k1(i,j) = 1, it represents that the job i is dispatched from the machine in the stage j to the processing machine in the stage j + 1.
As shown in Figure 1, the HFSSP coding model was designed according to the above description. The specific model evolution process is described as follows: Step 1: Construct the n × m matrix of the initial state s 1 and the completion state s goal . In Figure 1, M is the number of parallel machines in each stage, M 1 is the number of idle machines in each stage during the process, and the matrix evolves from s 1 to s goal .
Step 2: According to the state of the job and the occupancy of the machine, the available scheduling policy set has a k = (A kγ , A kβ , . . . , A kδ ) during searching for the k-floor shop state node s k . Meanwhile, the search method is selected on the basis of the traversal times N(s k ) of the state node s k . Details are shown as follows: 1 when N(s k ) ≤ P (P is the critical value of the simulation times), heuristic rules are applied to select the scheduling policy for searching to the s goal state; 2 as shown in Figure 1, when N(k) > P, the selection policy of the SP-MCTS algorithm is used to evaluate the policy set a k and select the scheduling policy A k . After executing strategy A k , it will search from state s k to s k+1 .
Step 3: If s k+1 s goal , step 2 is repeated for state evolution until the s goal .

ISP-MCTS Algorithm Optimization Process
In this paper, the HFSSP scheduling process is constructed as a tree structure. For the SP-MCTS algorithm, there are disadvantages such as poor branch positioning accuracy and slow convergence speed. Therefore, the SP-MCTS algorithm is improved as follows:

Selection
The relationship between balanced depth exploration and breadth exploration is constantly explored in the existing state tree nodes, so that each exploration can reach a better solution as far as possible. The UCT (upper confidence bounds applied to trees) algorithm [33] is used to select the scheduling policy, and the improvements are as follows: The first two terms of the UCT algorithm in Equation (9) are reserved. N(s) represents the total times of that the state s is accessed, and N(s, a) represents the times during which the policy a is selected in the state s. Q(s, a) represents the average score of the policy a that is executed multiple times in state s. Considering a third item here, indicating the deviation obtained after the policy a is executed, where q(s, a) 2 represents the sum of the squared results which achieves at the state s, constant D is used to ensure that policies with few choices are not underestimated [32]. The purpose of the deviation term without the particularity of the neighborhood is to fully consider the change in range of the score when the policy is selected, and select the policy by using Equation (10) in the end. As shown in Figure 1, taking node s 1 as an example, when N(s 1 ) > P, every policy in the set (A 11 , A 12 , . . . , A 16 ) is evaluated according to Equation (9), and the policy is selected on the basis of Equation (10).

Expansion
When N(s) > P, the branch node is expanded. As shown in Figure 1, if s 1 node N(s 1 ) > P, the scheduling policy A 12 is selected, and the node s 1 is updated to s 2 . If s 2 is not a node in the tree, s 2 is added to the tree as a node.
Single-branch node continuous expansion: considering the existence of optional policy uniqueness and state self-updating in HFSSP operation, the single-branch node continuous expansion method is designed. After s 2 is expanded as shown in Figure 1, the state needs to be self-updated while the job finishes processing, so the state evolves to s 3 . It can be seen that in s 3 , the scheduling policy only has A 31 . At this time, the node s 3 is continuously expanded, and then the state evolves to the s 4 , and the number of policies (A 41 , A 42 ) is more than 1. Therefore, node s 4 is expanded into the tree, and heuristic rules are applied to search the scheduling policy from nodes s 4 to s goal . As shown in Figure 1, the states s 2 →s 4 are continuously expanded to avoid multiple meaningless explorations while effectively utilizing computing resources.

Simulation
Heuristic rules are used to simulate s goal from the leaf node s. When the tree searches from the A 12 branch to the s 4 leaf node, as shown in Figure 1, s 4 is used as the simulation initial state. Then, the heuristic rule is applied to search for s goal . Therefore, the root node s 1 is used as the starting leaf node of the program. Each simulation policy is obtained by different heuristic rule.

Backpropagation
When the simulation is completed, the backpropagation is performed according to the simulation result. Starting from leaf node s 4 , as shown in Figure 1, each child node goes back to its parent node until root node s 1 is reached. The information is updated as follows: Equation (11) represents the total number of times the node s was accessed. Equation (12) records the times that the policy a is executed under the node s. Equation (13) updates the average score of the execution policy a at node s. Equation (14) is used to calculate the sum of the squared score.

Evaluation and Policy Set
The application of ISP-MCT to solve the HFSSP is different from the game-type MCTS problem. In the ISP-MCTS scheduling search, the first iteration time T 1 is used as the benchmark, and the quality of each subsequent iteration is T n − T 1 as the evaluation score. When T n − T 1 ≤ 0, the score is T 1 − T n+1 , and if T n − T 1 > 0, the score is 0. Therefore, the optimization purpose is to select the highest scoring policy set.

The Complete Process of ISP-MCTS
Through the selection policy and the simulation policy, the root node searches from s 1 to s goal . The algorithm is based on the node average score design selection algorithm and takes the maximum score (T min ) for the optimization purpose. Therefore, the pseudo code of the ISP-MCTS for solving HFSSP is described as Algorithm 1.

2:
While the halting criterion is not satisfied do 3: Step 1: Selection and expansion 4: if N(s) ≤ P then 5: go to step 2 6: else The improved UCT in Section 3.2.1 is applied to select the scheduling policy a 7: Update state s→s' 8: if the state s' is a node in the tree then 9: s = s 1 , return Step 1 10: else s' is expanded to a leaf node in the tree 11: end if 12: if The new expanded leaf node s has one optional policy or no one 13: N(s) = P + 1, return Step 1 14: else go to step 2 15: end if 16: end if 17: Step 2: Simulation 18: The node s' is taken as the simulation start state.

19:
Selecting the simulation policy 20: Simulation to s goal , the completion time T n is obtained, go to Step 3 21: Step 3: Update simulation policy 22: if T n < T min then 23: The policy set (A 1γ , A 2β , . . . , A vδ ) from s 1 →s goal is recorded, T min = T n 24: go to step 4 25: else go to step 4 26: end if 27: Step 4: Backpropagation 28: The information from the leaf node s to the root node s 1 branch path node is updated according to Equation (11)

Simulation Policy Verificationα
In order to verify the simulation policy mentioned above, a set of instances are generated: n∈{20,40,60,80,100}, s∈{4,6,8}. Ten instances are generated for each combination of n and s, and a total of 150 instances are obtained. The processing time is generated from the uniform distribution [1], and the number of parallel machines M k are generated from the uniform distribution [1,5]. The three simulation policies were constructed in comparison with 4-Rule in Section 3.2.3. The comparison policies are: 4-Rule without continuous expansion (N-Expan), NEH-LPT(λ) and random policy (Random).
Simulation threshold: P = 4, taken the combination of expansion and exploration parameters as C = 0.5, D = 10,000.
These proposed algorithms were coded in MATLAB 2016a, and run on Intel Core i5-3470 3.2 GHz PC with 4 GB memory. The ISP-MCTS limited their run time to 30 n × s ms and ran two fixed replications. The minimum completion time C min is the optimal result of all simulation policies under the current instance. The deviation between the final solution of each simulation policy and C min is as shown in the following Equation (15): In Equation (15), C i is the solution value of the current simulation policy. The average relative percentage increase (ARPI) of the 10 sets of problems under each n and s combination is shown in Table 1. It can be seen that the average deviation of 4-Rule in the four-simulation policy is the smallest, with the average deviation of 0.07%. Meanwhile, the average deviation of the solution of the 4-Rule simulation policy is better than the simulation policy of the comparison. Simulation policies were analyzed using a 40 × 6 case instance. Figure 2a shows the relationship between the number of iterations and the node depth. The number of callbacks and average depth of callbacks for each simulation policy are shown in Figure 2b. From Figure 2, NEH-LPT(λ) has the greater heuristic depth than the other three simulation policies throughout the iteration. In the iterative process, the number of NEH-LPT(λ) callbacks are the least, and the average depth of NEH-LPT(λ) callbacks are less than the two algorithms of 4-Rule and N-Expan. Therefore, it can be concluded that the NEH-LPT(λ) can easily fall into local minimum. The evaluation positioning is not accurate, and the N-Expan has a high callback frequency in the heuristic process. Therefore, the search depth of the N-Expan is less than 4-Rule. The Random has the least depth of search. From Figure 2b, since 4-Rule has the largest average callback depth, the 4-Rule can be used to accurately locate the branch area within a limited number of callbacks. A single-factor analysis of variance (ANOVA) was carried out, where the type of simulation policy in Table 2 is considered to be a factor. The 4-Rule and the other three simulation policies are analyzed by multi-comparison method using the least significant difference (LSD) procedure. It indicates that the 4-Rule is significantly better than the other three simulation policies of N-Expan, NEH-LPT(λ) and Random. Therefore, the heuristic advantage of the 4-Rule is verified, and the 4-Rule is selected as the simulation policy.

Parameter Setting
The Benchmark problems [42] and Liao problems [17] were introduced to verify the performance of the ISP-MCT algorithm. The scale of the problems ranged from 10 jobs of 5 stages to 30 jobs of 10 stages. In this section, the parameter combination (C,D) is optimized in three segments according to the search space magnitude. The order of magnitude is 0-10 5 , 10 5 -10 6 and 10 6 -5 × 10 6 . The parameter combination (0.1, 32) focuses on the development of potential nodes during the search process. Parameter combination (1, 20,000) in the search process considers the exploration ability of unknown regions, and the combination of parameters (0.5, 10,000) in the search process takes into account the expansion and exploration capabilities. As shown in Table 3, j15c5d2 is selected for optimization analysis at each stage, and each parameter combination is operated independently for 20 times to obtain an average score. The depth is the distance between the deepest node and the root node, and the average depth is the average of 20 independent running depths.  A single-factor analysis of variance (ANOVA) was carried out, where the type of simulation policy in Table 2 is considered to be a factor. The 4-Rule and the other three simulation policies are analyzed by multi-comparison method using the least significant difference (LSD) procedure. It indicates that the 4-Rule is significantly better than the other three simulation policies of N-Expan, NEH-LPT(λ) and Random. Therefore, the heuristic advantage of the 4-Rule is verified, and the 4-Rule is selected as the simulation policy.

Parameter Setting
The Benchmark problems [42] and Liao problems [17] were introduced to verify the performance of the ISP-MCT algorithm. The scale of the problems ranged from 10 jobs of 5 stages to 30 jobs of 10 stages. In this section, the parameter combination (C,D) is optimized in three segments according to the search space magnitude. The order of magnitude is 0-10 5 , 10 5 -10 6 and 10 6 -5 × 10 6 . The parameter combination (0.1, 32) focuses on the development of potential nodes during the search process. Parameter combination (1,20,000) in the search process considers the exploration ability of unknown regions, and the combination of parameters (0.5, 10,000) in the search process takes into account the expansion and exploration capabilities. As shown in Table 3, j15c5d2 is selected for optimization analysis at each stage, and each parameter combination is operated independently for 20 times to obtain an average score. The depth is the distance between the deepest node and the root node, and the average depth is the average of 20 independent running depths. As shown in Table 3, in the order of magnitude of 10 5 -10 6 , 10 6 -5 × 10 6 within the specified time limit, the ISP-MCTS obtains the minimum average completion time and standard deviation, when the parameter combination is (0.1, 32). When applying the parameter combination (0.1, 32), the ISP-MCTS finds the minimum average completion time. The average depth of the search tree at the parameter combination (0.1, 32) is 57, then most of the nodes generated in the scheduling of the 15 jobs of 5 stages have been extended to the tree by the ISP-MCTS. Compared with the other two sets of parameters (0.5, 10,000) and (1, 20,000), the parameter combination (0.1, 32) expands the node deeper.
In the order of magnitude 10 5~1 0 6 , 10 6~5 × 10 6 within the specified time limit, when the parameter combination is (0.5, 10,000), the ISP-MCTS finds that the average minimum completion time is optimal, and the algorithm stability is better than other combinations.
With the increasing magnitude and search time, the exploratory parameter combination advantage becomes more and more obvious. Therefore, at the order of magnitude of 10 6 -5 × 10 6 , the exploration parameter (1, 20,000) was significantly better than the parameter combination (0.1, 32).
It can be observed from the above that the order of magnitude is 0-10 5 within the specified time limit. When the parameter combination is (0.1, 32), the ISP-MCTS algorithm has the highest solution quality and the best stability. Therefore, when the order of magnitude is 10 5~1 0 6 or 10 6~5 × 10 6 , the algorithm selects the parameter combination (0.5, 10,000).

Calculation Results and Analysis
According to the machine layout, the Benchmark problems is divided into two groups: (1) 47 a and b problems; and (2) 30 c and d problems. The ISP-MCTS algorithm was programmed in MATLAB 2016a and run on a central processing unit (CPU)-i5-3470(3.20GHZ) with 4.0 GB Main Memory. Six algorithms HVNS [20], IDABC [21], particle swarm optimization (PSO) [17], AIS [25], genetic algorithm (GA) [16] and B&B [8] are selected as comparison algorithms. The calculation results of the above comparison algorithm are obtained from the source literature. The HVNS algorithm sets the maximum running time to 100 s and takes the minimum completion time of 20 independent results under the same conditions. The IDABC algorithm also limits their runtime to 1600 s and runs 20 times independently. The maximum running time of the algorithms B&B, AIS, GA and PSO in the literature is limited to 1600 s. The minimum C max is obtained as the final solution by independently operating 20 times under the same conditions, and thus the CPU time corresponding to C max is recorded for each problem. The performance of each algorithm is compared by "%Deviation", and the deviation between the lower bound LB and each algorithm C max is obtained by using Equation (16), as shown in Tables 4  and 5. The statistical analysis results are shown in Table 6:

Comparison of Carlier and Neron's Benchmarks
From Tables 4 and 5, only the B&B algorithm does not obtain the optimal solution when solving the problem of j15c10a5 in the 47 a and b problems. The PSO and B&B algorithms, respectively, solve the optimal solutions of 18 problems in 30 c and d problems, and the average deviation of the solutions is 3.95% and 9.32%, respectively. As shown in Table 6, the IDABC, HVNS and GA algorithms solve the optimal solutions for the 17 problems in the 30 c and d problems, and the average deviation of the solutions are 4.00%, 4.00% and 4.17%, respectively. The AIS algorithm solves the optimal solution of 16 problems in 30 c and d problems, and the average deviation of the solution is 4.26%. The ISP-MCTS algorithm proposed in this paper solves the optimal solution of 18 problems in 30 c and d problems, and the average deviation of the solution is 3.4%. Therefore, the solution quality of the ISP-MCTS algorithm for solving c and d problems is better than the other six algorithms. It can be seen from Tables 4 and 5 that the average CPU time of the ISP-MCTS algorithm is much smaller than that of the the PSO and B&B algorithms. From the above comparison, the ISP-MCTS algorithm has a better solution and efficient optimization ability than the other six algorithms when solving Carlier and Neron problems.

Comparison of Liao's Benchmarks
From the discussion in Section 5.1, the average completion time of the three algorithms of IDABC, HVNS and PSO for solving Carlier and Neron's benchmarks is smaller than that of GA, AIS, and B&B algorithms. Therefore, the algorithms IDABC, HVNS and PSO are selected as the comparison algorithms to solve the Liao problems in this section. In the source literature, the IDABC algorithm was coded in Visual C++ and run on an Intel Pentium 3.06 GHz PC with 2 GB RAM under the Windows 7 operating system, and each problem was tested 20 times and the termination criterion for all the tests was the computation time 100 s. The running environment of the HVNS algorithm in the literature [20] was Intel Core i5 3.3 GHz PC with 4 GB of memory. The algorithm PSO was used to solve the Liao problems, and the algorithm limited its run time to 200 s and each problem runs 20 times independently. The ISP-MCTS algorithm is used to solve the Liao problems in the computer running environment proposed in this paper, and the execution time for each problem is limited to 200 s.
As shown in Table 7, the average completion time of the ISP-MCTS solution is smaller than that of the IDABC, HVNS and PSO, and the ISP-MCTS further updates the best solution of the three problems in Liao problems. The nonparametric test is proposed, and the null hypothesis H 0 assumes that µ ISP-MCTS ≥ µ IDABC , and the alternative hypothesis H 1 : µ ISP-MCTS < µ IDABC with 0.05 significance level. Similarly, the hypotheses H 0' :µ ISP-MCTS ≥ µ HVNS , H 1' :µ ISP-MCTS < µ HVNS . H 0" :µ ISP-MCTS ≥ µ PSO , H 1" :µ ISP-MCTS < µ PSO . As shown in Table 8, the ISP-MCTS algorithm performs significantly better than the PSO and AIS algorithms, and has certain advantages in terms of solution quality and stability compared with the HVNS algorithm. The six effective algorithms for solving HFSSP in recent years were selected for comparison with ISP-MCTS. The algorithms were: HVNS [20], IDABC [21], PSO [17], GA R [2], ISA [43] and artificial bee colony with permutation (ABC P ) [21]. In the computer operating environment of this study, each algorithm solved 150 problems in Section 4.1. For the comparison algorithms, HVNS, DABC, and PSO solved the same problems as reported in the literature, whereas GA R , ISA, and ABC P were used to solve complex HFSSP, and the ability to solve these algorithms was verified. The above algorithms directly select the optimal parameter combination of the source literature. All the algorithm runtimes were limited to the maximum CPU (t = ρ mn s) time, and ρ is: 10, 20, 30. All the above algorithms are programmed in MATLAB 2016a, and each of the 150 examples is run independently for 10 times, and the average of 10 running results is taken as the solution of each example. When ρ = 30, the best solution C min found by any algorithm is used to calculate the relative percentage increase (RPI). The results are shown in Tables 9-11. As shown in Tables 9-11, the ISP-MCTS solution quality is better than that of the other six algorithms. When ρ = 10, the average deviation of ISP-MCTS in the seven algorithms is the smallest, and the value is 0.115. The average deviation of the ISP-MCTS algorithm is less than IDABC (0.121), HVNS (0.263), PSO (0.495), GA R (0.743), ISA (0.509), and ABC P (0.591). Therefore, the solution quality of the ISP-MCTS algorithm is better than the other six algorithms. When ρ = 20 and ρ = 30, the average deviation of all algorithms is improved through comparing to ρ = 10. At the same time, the ISP-MCTS algorithm still has the smallest deviation among all algorithms, so the validity of the ISP-MCTS algorithm can be verified.
Here, a multi-factor analysis of variance was used to further compare the seven algorithms, and the algorithm type and ρ were taken as factors. From the mean graph and the two-factor Tukey HSD test (95% confidence interval) as shown in Figure 3, the ISP-MCTS algorithm is significantly better than the HVNS, PSO, GA R , ISA, and ABC P algorithms at different CPU times. According to the above average variance value, the superiority of the ISP-MCTS algorithm in solving the HFSSP problem was verified. ρ Figure 3. The means plot and 95% Tukey HSD confidence intervals for the interaction between the algorithms and the allowed CPU time.

Conclusions
1. This paper analyzed the HFSSP operation mechanism, and proposed a tree search algorithm based on SP-MCTS to solve HFSSP. In order to solve the problem of HFSSP by SP-MCTS algorithm, the three steps of selection, expansion and simulation were improved respectively. The improvement of the algorithm included the selection blending standard deviation, the Figure 3. The means plot and 95% Tukey HSD confidence intervals for the interaction between the algorithms and the allowed CPU time.

1.
This paper analyzed the HFSSP operation mechanism, and proposed a tree search algorithm based on SP-MCTS to solve HFSSP. In order to solve the problem of HFSSP by SP-MCTS algorithm, the three steps of selection, expansion and simulation were improved respectively. The improvement of the algorithm included the selection blending standard deviation, the single-branch expansion strategy and the 4-Rule policy simulation. The results show that these improvements can quickly and accurately locate high-potential branches, and obtain the optimal solution. 2.
The 4-Rule simulation policy was selected based on a comparative analysis of the problems, and the 4-Rule can give an accurate evaluation of the leaf nodes during the search process,