Cooperative Co-Evolution Algorithm with an MRF-Based Decomposition Strategy for Stochastic Flexible Job Shop Scheduling

Flexible job shop scheduling is an important issue in the integration of research area and real-world applications. The traditional flexible scheduling problem always assumes that the processing time of each operation is fixed value and given in advance. However, the stochastic factors in the real-world applications cannot be ignored, especially for the processing times. We proposed a hybrid cooperative co-evolution algorithm with a Markov random field (MRF)-based decomposition strategy (hCEA-MRF) for solving the stochastic flexible scheduling problem with the objective to minimize the expectation and variance of makespan. First, an improved cooperative co-evolution algorithm which is good at preserving of evolutionary information is adopted in hCEA-MRF. Second, a MRF-based decomposition strategy is designed for decomposing all decision variables based on the learned network structure and the parameters of MRF. Then, a self-adaptive parameter strategy is adopted to overcome the status where the parameters cannot be accurately estimated when facing the stochastic factors. Finally, numerical experiments demonstrate the effectiveness and efficiency of the proposed algorithm and show the superiority compared with the state-of-the-art from the literature.


Introduction
Scheduling is one of the important issues in combinational optimization problems as scheduling problems not only have theoretical significance but also have practical implications in many real-world applications [1,2].As an extended version of job shop scheduling (JSP), flexible JSP (FJSP) considers the machine flexibility, i.e., each operation can be processed on any one machine in a given machine set.The machine flexibility makes FJSP be closer to the models in the real-world applications [3,4].For example, the resource scheduling exists in the smart power grids system with the objective to minimize the response time with the balance of multiple resources [5,6].In the distributed computing centers, how to allocate the limited resources to complete all tasks with the handling capacity [7].In the logistics system or subway system, the vehicle scheduling has the same basic model of flexible scheduling in which different tasks from different users need to be transported by a set of vehicles.The objective is to transport all tasks in the minimum time with the balancing load [8], etc.All models in the mentioned real-world applications or systems are FJSP.Simply, FJSP can be denoted as the discrete optimization problem with two sub-problems, i.e., operation sequence and machine assignment, and was proved as an NP-hard problem [9].

Motivation
Various researchers concentrated on the effective algorithms for minimizing the maximum completion time which named makespan of FJSP.Lin and Gen provided a survey in 2018 and presented the information of scheduling, e.g., mathematical formulation and graph representations [10].Chaudhry and Khan reviewed solution techniques and methods published for flexible scheduling and presented an overview of the research trend of flexible scheduling in 2016 [11].However, most of the existing research assumes that the processing time of each operation on the corresponding machine is fixed and given in advance.In actual, the stochastic factors in the real-world applications cannot be ignored.In addition, the processing time appears to be particularly important [12].Therefore, in this paper, we consider the stochastic FJSP (S-FJSP), and the stochastic processing time is modeled as three probability distributions, i.e., the normal distribution, the Gaussian distribution and the exponential distribution.
In recent years, various research was studied for stochastic scheduling problems.As listed in Table 1, Lei proposed a simplified multi-objective genetic algorithm (MOGA) for minimizing the expected makespan of JSP [13].Hao et al. proposed an effective multi-objective estimation of distribution algorithm (moEDA) for minimizing the expected makespan and the total tardiness of JSP [14].Kundakc and Kulak proposed a hybrid evolutionary algorithm for minimizing the expected of JSP makespan as well [15].Represented by the above studies, some researchers modelled the stochastic processing timess by the uniform distribution.To show the superiority of the proposed algorithm with more conviction, various researchers modelled the stochastic processing time not only by the uniform distribution, but also by the normal distribution and the exponential distribution.Zhang and Wu proposed an artificial bee colony algorithm for minimizing the maximum lateness of JSP [16].Zhang and Song proposed a two-stage particle swarm optimization for minimizing the expected total weighted tardiness [17].Horng et al. and Azadeh et al. presented a two-stage optimization algorithm and an artificial neural networks (ANNs) based optimization algorithms for minimizing the expected makespan, respectively [18,19].These studies are all ensemble algorithms with the help of the search ability of various evolutionary algorithms (EAs), e.g., genetic algorithm (GA), particle swarm optimization (PSO), etc., the local search strategies and the global search strategies.As we know, the stochastic factors result in the exponentially increasing solution space which may make EAs suffer from the performance reduction [10].Therefore, how to increase the search ability of the proposed algorithm is quite significant for stochastic scheduling problems.Gu et al. proposed a novel parallel quantum genetic algorithm (NPQGA) for the stochastic JSP with the objective to minimize the expected makespan [20].The NPQGA searched the optimal solution through sub-populations and some universes.As the evolutionary information exchange plays an important role in EAs as well.The parallel strategy used in NPQGA is weak in the information exchange.This weakness results in the bad cooperation with EAs for exploring and exploiting more larger solution space.Nguyen et al. presented a cooperative coevolution genetic programming for the dynamic multi-objective JSP [21].Gu et al. studied a co-evolutionary quantum genetic algorithm for stochastic JSP [22].

Methodologies Distribution(s) Objective(s) (min)
Simplified multi-objective genetic algorithm [13] Uniform Expected makespan Effective multiobjective Estimation of Distribution Algorithm (EDA) [14] Uniform Expected makespan & total tardiness Hybrid evolutionary algorithm [15] Uniform Expected makespan Artificial bee colony algorithm [16] Uniform; Normal; Exponent Maximum lateness Two-stage particle swarm optimization [17] Uniform; Normal; Exponent Expected total weighted tardiness Evolutionary strategy in ordinal optimization [19] Uniform; Normal; Exponent Expected makespan & total tardiness Algorithm based on artificial neural networks [18] Uniform; Normal; Exponent Expected makespan Novel parallel quantum genetic algorithm [20] Normal Expected makespan Cooperative coevolution genetic programming (CCGP) [21] Uniform Expected makespan Co-evolutionary quantum genetic algorithm [22] Normal Expected makespan Two-stage optimization [23] Uniform; Normal; Exponent Expected makespan Cooperative co-evolution algorithm (CEA) is widely used in the research area of a large scale optimization problem.Because the basic optimization strategy, i.e., "divide-and-conquer", is appropriate for the problems with the increasing solution space, various researchers studied the performance of EAs applied for combinational optimization problems.However, EAs lost effectiveness as the problem scales increase, especially for FJSP [24] and the high dimensional discrete problems [25].CEA searches the optimal solution cooperatively through decomposing the whole solution space into various sub solution spaces.Therefore, the decomposition strategy is significant because the performance of CEA is sensitive to the decomposition status [26].Various researchers studied the decomposition strategies [27][28][29].However, they mainly focus on the optimization of mathematical function problems.The existing decomposition strategies cannot be used for combinational optimization problems directly because of the precedence constraint and dependency relationship among the decision variables.
The existing decomposition strategies can be divided into two categories, the non-automatic decomposition and the automatic decomposition.The non-automatic decomposition strategy contains three types, i.e., one-dimension decomposition strategy, random decomposition strategy and extended decomposition strategy.They all decompose the decision variables according to the random technique with no consideration of correlation among the decision variables.The automatic decomposition strategies consider and detect the correlation among decision variables through various techniques, i.e., delta grouping, K-means clustering algorithm, cooperative co-evolution with variable interaction learning (CCVIL), interdependence learning (IL), fast interdependency identification (FII), differential grouping (DG), the extended differential grouping (EDG), etc.The detail decomposition methodologies, advantages and disadvantages are listed in Table 2. Therefore, the detection methodologies for the correlation among the decision variables has a strong effect on the performance of CEA [30].However, the performance of the automatic decomposition strategy is better than the non-automatic decomposition strategy, the high computational cost is always high, which is taboo for combinational optimization problems, especially for FJSP due to the complex decoding process.One-dimension [31] Decompose N variables into N groups Easy to implement with low cost Lose efficacy on non-separable problems Random [32] Randomly decompose N variables into k groups Dependency on random technique Lose efficacy on non-separable problems Set-based [33] Random decompose N variables based on set Effective than one-dimension strategy Lose efficacy on non-separable problems Delta [28] Detect relationship based on the averaged difference More effective than random grouping Lose efficacy on non-separable problems K-means [34] Detect relationship based on K-means algorithm Effective on unbalanced grouping status High computational cost CCVIL [35] Detect relationship based on non-monotonicity method Effective than manual strategies Exist insurmountable benchmark IL [36] Detect relationship only once for each variable Lower computational cost than CCVIL Worse performance than CCVIL FII [37] Detect through fast interdependency identification Lower computational cost than CCVIL Worse performance for conditional variable DG [38] Detect relationship by the variance of fitness Better performance combined with PSO High computational cost EDG [39] Detect relationship by the variance of fitness Better performance compared with PSO High computational cost Probabilistic graphical models (PGMs) consist of Bayesian networks (BNs) and Markov random fields (MRFs).BNs are directed acyclic graphs which used for representing the causal relationship.MRFs are undirected graphs which are used for representing the dependency relationship.BNs and MRFs can provide the intuitive representation of joint probability distributions of the decision variables.Various researchers learned the correlation among decision variables with taking the problem characteristics and the graph characteristics into consideration according to the network structure of MRFs [40,41].Moreover, MRFs are widely used in many research areas in recent years.Sun et al. proposed a supervised approached to detect the relationship of variables to implement image classification according to the weighted MRFs [42].Li and Wand studied a combination of convolutional neural networks and MRFs to implement image synthesis [43], etc. Paragios and Ramesh studied a variable detection approach with the minimization techniques that are based on the learned MRF structure for solving the subway monitoring problem in the real-time environment [44].Tombari and Stefano presented an automatic 3D segment algorithm according to the learned variable relationship by the structure of MRF [45].Various researchers have been denoted to detect the relationship for decomposing decision variables.

Contribution
In this paper, a decomposition strategy considering a Markov random field which is called an MRF-based decomposition strategy is proposed.All decision variables are decomposed according to the network structure of MRF with respect to the estimated parameters.The decision variables are decomposed into two sets, and the redefined correlation between two variables is used for detecting the correlation relationship.The hybrid cooperative co-evolution with the MRF-based decomposition strategy is called hCEA-MRF.The hCEA-MRF decomposes the whole solution space into various sub small solution spaces and searches the optimal solution cooperatively.This increases the probability of searching out the optimal solution for the stochastic problems.

•
A cooperative co-evolution framework is improved and adopted.In the improved framework, CEA works iteratively within each sub solution space until the termination criterion matches.
All sub solution spaces are evoluted cooperatively, and the suitable representation selection strategy helps hCEA-MRF find the optimal solution.

•
An MRF-based decomposition strategy is proposed for decomposing the decision variables into various decompositions.All decision variables are decomposed according to the network structure with respect to the estimated parameters.The decision variables which are put into the same decomposition are viewed as the strong relevance among each other.Each decomposition is associated with a sub solution space.

•
A self-adaptive parameter mechanism is designed.Instead of the general linearity and nonlinearity self-adaptive mechanism, our proposed self-adaptive mechanism is based on the performance, i.e., the number and the percentage of the individuals generated by the current parameters successfully and unsuccessfully go into the next generation.
The rest of the paper is organized as follows: Section 2 introduces the formulation model of flexible scheduling.Section 3 describes the proposed hCEA-MRF in detail.The numerical experiments are designed and analyzed in Sections 4 and 5 presents the conclusions.

Formulation Model of S-FJSP
As an extended version of the standard JSP, FJSP gives the wider availability of machines for each operation.JSP obtains the minimum makespan by finding the optimal operation sequence while FJSP needs to assign the suitable machines to each ordered operation one at a time.The makespan is defined as the maximum completion time of all jobs.There exist some assumptions for FJSP.There are strict precedence constraints among the operations of the same jobs but no precedence constraints of different jobs.Each machine is available from time zero and only can process one operation without any interruption at a time.Each operation can only be processed once as well.The setup times among different machines are involved in the processing times, and all processing times are fixed and given in advance.The main difference between S-FJSP and FJSP is that the processing times of the operations in S-FJSP are not fixed and cannot be known in advance until the schedule is completed.In this paper, a pure integer programming model is adopted to model S-FJSP with transmuting the stochastic processing times.The processing time (ξ p ikj ) of each operation (O ij ) on M j belongs to a probability with the given distribution parameters, i.e., the expected value (E[ξ p ikj ]) and the variance value (V ij ).In this paper, three probability distributions, i.e., the uniform distribution, the Gaussian distribution, and the exponential distribution, are used for predicting ξ p ikj .Consequently, the concept scenario ξ is used for constructing the uncertainty.The notations used in this paper are listed as follows: Indices: The objective function of S-FJSP is to minimize the expected makespan: where ξt T ikj = ξt S ikj + ξ p ikj .During the process of the whole scheduling, each operation is only processed on one machine.In other words, all decision variables x ikj need to summed as 1.
As discussed, there exist the precedence constraint of the operations within each job, i.e., the successor operation needs to begin after the completion of the preorder operation.Thus, the start time of the successor operation needs to be larger than the end time of the preorder operation: Each operation cannot be interrupted, i.e., the successor operation on the same machine can only begin when the preorder operation on the same machine finishes and the preorder operation in the same job finishes: Finally, some non-negative value restrictions of the decision variables are needed.The decision variable x ikj can only be 0 or 1.When x ikj is equal to 1, it means O ik is assigned on machine M j ; otherwise, it means O ik is not assigned on machine M j .The start time and the end time of any operation need to be larger than 0:

The Implement of hCEA-MRF
In this section, the hybrid cooperative coevolution algorithm with MRF-based decomposition strategy (hCEA-MRF) is proposed for solving the flexible scheduling with the stochastic processing times for minimizing the expected makespan.The hCEA-MRF is described as follows: First, a real number representation for PSO is adopted.The fractional part and the integral part represent the priority of operation sequence and machine assignment, respectively.Then, the CEA is improved for maintaining more optimized information for each cooperative sub solution space.Considering the correlation relationship, the MRF-based decomposition strategy groups all decision variables according to the learned network structure of MRF.Finally, two points' critical path-based local search strategy is used for enhancing the search ability of hCEA-MRF.All details are introduced as follows, and the pseudocode is summarized in Algorithm 1.

Representation
The design of the representation of the candidate solutions plays an important role in the research of EAs.Designing an effective representation contains the following aspects [10]: The genotype space needs to cover as many as possible candidate solutions in order to search out the optiml solution.

•
The necessary decode time needs to be as short as possible.

•
Each representation needs to be corresponding to a feasible candidate solution, and the new representation which has completed evolutionary operators needs to be corresponding to a feasible candidate solution as well.
In this paper, we adopt the random-key based representation.It encodes one candidate solution with some random generated real numbers.Each real number (x ij (ϕ)) consists of two parts, i.e., an integral part and a fractional part.The integral part and the fractional part are randomly generated from [1, m] and [0, 1], respectively.An illustration of a random-key based representation for a simple example with full complexity (shown in Table 3) is given in Figure 1 Pre-process to obtain the best individuals and the corresponding critical paths; for each sub_ind(ϕ) do 24: up l (l best (ϕ));     3.

Evolutionary Strategy
The initialization of the individuals of PSO decides the initial search point while the update strategy decides the future detection direction and the length of moving steps.The traditional update strategy considers the global best solution (g best (ϕ)) and the personal best solution (p best (ϕ)).Thus, PSO with the traditional update strategy has good global search ability but no good ability of neighborhood search as both kinds of history best solutions can only give global guidance [11].Moreover, EAs always need the cooperation of exploration and exploitation, especially for the stochastic environment.Therefore, we adopt two update strategies with the cooperation of two distributions for the individuals of PSO.Two distributions are the normal distribution and the Cauchy distribution, respectively.The normal distribution is used for sampling new individuals among the neighbourhood structure while the Cauchy distribution is used for exploiting solution spaces more widely.The new written update strategy is: where N (0, 1) and C(1) represent the numbers randomly generated by a normal distribution with parameters 0 and 1 and a Cauchy distribution with parameter 1.For each iteration, a random number is generated.If the random number is larger than the given p, x ij (ϕ + 1) is updated according to equation with Cauchy distribution.Otherwise, x ij (ϕ + 1) is updated according to equation with normal distribution.l best (ϕ) stands for the best local individual in the ϕth generation.It is defined as the best individual among three adjacent individuals in the population, and it is updated at the end of each iteration.A critical path-based local search is used in our hCEA-MRF for searching for better solutions and increasing the local search ability.

CEA Evaluation
CEA is an evolutionary computation approach which works through the core idea of "divide and conquer" to divide one large scale solution space into various subcomponents, and all divided components are evaluated cooperatively through multiple optimizers until the maximum generations (T) is reached.As described in Algorithm 1, e() is a function which is used for evaluating the makespan, and r(s, target, trail) returns a new individual which is the copy of target with the sth component replaced by the sth component of trail.If e(r(s, sub ind (ϕ), p best (ϕ)) is better than e(p best (ϕ)).It illustrates that the sth sub-component of sub ind (ϕ) has positive effects on improving p best (ϕ).Then, the sth sub-component of p best (ϕ) is replaced with s th sub-component of sub ind (ϕ) (line: 13-15).sg best (ϕ) (line: 16-18) and g best (line: 25-27) are updated in the same manner.CEA works for each sub-component for t generations in order to maintain the best historical information and search the optimal solution.up l works through updating the best local individual with the best personal individual among three adjacent individuals in the whole population.

MRF-Based Decomposition Strategy
MRF (G, Φ) is an undirected graph which can represent the correlation relationship among the decision variables.The network structure (G = (Ω, A)) consists of a set of nodes (Ω) and a set of undirected arcs (A).Two nodes linked with one undirected arc means that they have the correlation relationship.Φ represents the parameters of MRF.The network structure used in our MRF-based decomposition strategy consists of D nodes where D is equal to the length of the individual, i.e., the total number of all operations.We use each node (x h simply for x ij ) to represent each gene (random key) in the individual, i.e., each node is corresponding to each operation.

MRF Structure Learning
During the period of preprocessing, various individuals are sampled and evaluated according to the classical PSO update strategy for π generations.π best global individuals are recorded and restored.Each individual contains one critical path.Thus, at least π critical pathes are obtained.Then, the probability (P cp ) of each operation (node) belonging to the critical operation (node) can be estimated as well.As the critical path is a key factor for scheduling problems, only the change of the length of the critical path has a chance to change the final makespan.Therefore, we divide all nodes into two sets, i.e., the set of parent nodes (Ω prt ) and child nodes (Ω cld ).The nodes with higher P cp are put into the set of parent nodes while the rest of the nodes are put into the set of children nodes.
The correlation among the variables is important for learning the network structure.To illustrate clearly, we use ω prt and ω k to represent the nodes in Ω prt and Ω k , respectively.We use V(ω k /ω prt ) to represent the difference value which is the value before and after deleting ω prt in the critical path.V(ω k /ω prt ) is defined and calculated in Equation ( 8).The large V(ω k /ω prt ) stands for the strong correlation between ω prt and ω k ; otherwise, the small V(ω k /ω prt ) stands for the weak correlation between ω prt and ω k : where π is the number of obtained solutions in the preprocessing, and Z is the normalization function.E S (ω k /ω prt ) and L S (ω k /ω prt ) return the earliest starting time and the latest starting time of ω k after deleting the critical node ω prt , respectively.Each ω k will be linked with ω prt with the highest value of V because the higher value, the stronger correlation do ω k and ω prt have.The network structure can be constructed through traversing all children nodes.We assume Constructing the network structure has b × (a − b) computational complexity, i.e., O(n).

MRF Parameters Learning
There exists one set of potential functions which is called "factor" for MRF, and it is a nonnegative real function defined on the subset of variables to define the probability distribution functions.For any subset of nodes in G, if there exists the connection in any pair of nodes, the subset of nodes is called "clique".If any one node cannot be added into the clique to form a new clique, this clique is called "maximal clique", i.e., the maximal clique is the one which cannot be included by other cliques.As shown in Figure 4, {X 1 , X 2 , X 3 }, {X 3 , X 4 }, {X 4 , X 5 } are the maximal cliques.Obviously, each node appears in at least one clique.In MRF, the joint probability distribution (JPD) can be used for estimating the relationship between two linked variables.It be calculated by the product of multiple factors based on the decomposition cliques.To illustrate clearly, we use Q ∈ C to represent a set of cliques is Q ∈ C, and x Q represents the variable set corresponding to Q.The JPD can be calculated as: where ψ Q is the potential function corresponding to Q for modeling the variables within Q, and W is a normalizing factor for ensuring the correction of P(X).The JPD of X in Figure 4 can be calculated as: The factors of X 1 , X 2 and X 3 are shown in Figure 5. x 0 1 and x 1 1 represent that X 1 is a critical node or non-critical node, respectively.The number in Figure 5 represents the solution numbers that conform to the combination of x 0 1 and x 1 1 .For example, 17 means that there exist 17 solutions in which X 1 is a non-critical node while X 2 is a critical node.We model the variables within the factors, and the JPD of X 1 , X 2 and X 3 is rewritten as follows: where Figure 5.The factors for X 1 , X 2 and X 3 for 100 samples.
In order to obtain the marginal probability (MP) of X 2 , we can sum out the JPD of X 1 and X 3 .The MP and the network structure are collaborated to decompose the nodes.Each parent node leads a simple group, and the child nodes are grouped based on the MP.Each child node is grouped with the parent node with the highest MP.In this way, all nodes (variables) can be decomposed into several groups.

Parameters Self-Adaptive Strategy
Parameters of EAs play an important role, and various researchers studied how to find or set the suitable parameters for decades [10].However, the fixed parameters cannot meet all statuses when facing the stochastic factors.Thus, our hCEA-MRF only gives an initial value of the selection probability (p = 0.5) of update equations, and p is updated for each 15 iterations.The exact value is self-adjusted in the process of optimization according to the performance of the population.The self-adaptive strategy is listed in: where ω stands for the selection probability of update equations for p, and the initial value of ω is 0.5.
ω is updated for each 50 iterations.Obviously, self-adaptive ω is more suitable than a fixed value as well.Thus, our hCEA-MRF adopts the Equation ( 14) for self-adapting ω: where θ 1 and θ 2 account for the numbers of individuals that are selected into the next generation updated by the update equation with the Cauchy distribution and the normal distribution, respectively.λ 1 and λ 2 account for the numbers of individuals that are abandoned to next generational evaluation by the update equation with the Cauchy distribution and the normal distribution, respectively.

Simulation Experiments
In this section, the description of the simulation experiments and the discussion of experiments results are introduced and analyzed.To guarantee the fairness, our proposed algorithm and all compared algorithms are implemented in the same hardware environment, i.e., a computer with Inter (R) Core(TM) i7-4790 CPU (Dell Inc., Round Rock, TX, USA) @3.60GHZ with 12 GB RAM.To evaluate the effectiveness of our proposed algorithm under the stochastic environment, each experiment runs 30 independent repeats.Three state-of-the-art, i.e., hybrid GA (HA) [46], hybrid PSO (HPSO) [47], hybrid harmony search and large neighborhood search (HHS/LNS) [48] are tested and compared.The used parameters, i.e., T, t, π are set to 20, 10, 100, respectively.

Description of the Dataset
As discussed above, the stochastic factors among the processing times in real world applications cannot be ignored.Thus, we model the stochastic processing times with three distributions, i.e., the uniform distribution, the Gaussian distribution and the exponential distribution.We choose the widely tested benchmark BRdata which was first proposed by Brandimarte as the basic benchmark.BRdata consists of 10 instances whose scale varies from 10 jobs and six machines to 20 jobs and 15 machines.For the uniform distribution, the parameter δ is set to be 0.2 which means that the stochastic processing times are randomly generated from [(1 − δ)p ijk , (1 − δ)p ijk ], where p ikj represents the determined processing time.For the Gaussian distribution, the stochastic processing times are randomly generated from Gaussian distribution with two parameters, α and β, where α and β are set to p ikj and round(0.5 * p ikj ).round stands for the function that returns a rounded integer value.For the exponential distribution, the parameters are set to p ikj and 1 p ikj .All parameter settings are referred to [19].The new generated benchmarks are named according to different probability distributions.For example, for the classical instances in BRData, i.e., Mk01 [49], the generated instances with the uniform distribution, the Gaussian distribution and the exponential distribution are named as UMk01, GMk01 and EMk01, respectively.To verify the effectiveness, we compare our hCEA-MRF with three state-of-the-art ones on the generated instances with three distributions.The average (Average) and standard variance (Variance) as shown in Tables 4-6 are used for evaluating the performance.The bold numbers represent the best performance in each line, and they have the same meanings in the whole paper.We can see that our hCEA-MRF can get the minimum average and variance of makespan in 30 scenarios for all instances with three probability distributions.The reasons for this can be analyzed as follows: • The improved cooperative coevolution with the help of the new written update strategy of PSO searches the optimal solution in multiple sub solution spaces.

•
The MRF-based decomposition strategy considers the relationship among variables instead of decomposing them only based on the random technique.

•
The parameter self-adaptive strategy works in the process of optimization instead of the fixed parameters when facing stochastic factors.We record the makespan of 30 scenarios of UMk10, GMk10 and EMk10.The box plots are listed in Figures 6-8, respectively.We can see that our proposed hCEA-MRF has better stability with the following aspects.The minimum value, medium value and maximum value obtained by our hCEA-MRF are all better than the state-of-the-art ones.The box of hCEA-MRF is narrower than the state-of-the-art ones.In order to verify the effectiveness of self-adaptive parameter strategy, we compared our proposed self-adaptive parameter strategy with two widely applied self-adaptive parameter strategies, i.e, the linear increase strategy and the linear decrease strategy.We called our hCEA-MRF with the linear increase and the linear decrease self-adaptive parameter strategies hCEA-MRF(I) and hCEA-MRF(D), respectively.The instances generated by all three probability distributions are used for testing the effectiveness of our proposed algorithm.The experiments' results are recorded in Tables 7-9, respectively.We can see that our proposed self-adaptive parameter strategy performs better than two compared strategies.The reason is that the compared strategies only self-adjust according to the iterations of algorithms but take no performance into consideration.However, our proposed strategy not only considers the information of iteration but also considers the performance of different update equations.Thus, our proposed strategy performs better.

Effect of the Decomposition Strategy
In order to verify the effectiveness of MRF-based decomposition strategy, we compared our proposed MRF-based decomposition strategy with two effective and lower computational cost decomposition strategies, i.e, fixed decomposition and set-based decomposition strategy.We named our hCEA-MRF with fixed decomposition and set-based decomposition strategies hCEA-MRF(F) and hCEA-MRF(S), respectively.The experiments' results of the instances with three probability distributions are recorded in Tables 10-12, respectively.We can see that our proposed decomposition strategy performs better than the two compared strategies.This is because our proposed MRF-based decomposition strategy takes the correlation among variables into consideration instead of relying on the random technique.Meanwhile, our proposed MRF-based decomposition strategy has lower computational cost as discussed in the previous section.

Conclusions
Flexible job shop scheduling is an important research issue and is widely applied in many real-world applications as well.However, the stochastic factors are usually ignored and are not considered.In this paper, we modeled the stochastic processing times with three distribution probabilities.We proposed a hybrid cooperative co-evolution algorithm with Markov random field (MRF)-based decomposition strategy for solving the stochastic flexible scheduling problem with the objective to minimize the expectation and the standard variance of makespan.Various simulation experiments are constructed and the simulation results demonstrate the validity of hCEA-MRF.

Algorithm 1 3 :
. The sorted fractional part of each operation is used for decoding the representation of a candidate problem solution and provides the priority set of operations: {0.21, 0.16, 0.77, 0.15, 0.42, 0.09, 0.18, 0.66}.Considering the priority-based decoding process, sorting the fractional part provides the operations sequence (OS): {O 11 , O 31 , O 32 , O 12 , O 13 , O 21 , O 22 , O 23 }.The integral part is in charge of the machine assignment (MA) for the corresponding operation and provides MA:{M 3 , M 2 , M 2 , M 1 , M 3 , M 3 , M 2 , M 4 }.The illustration of machine assignment is shown in Figure2.The makespan is 10, and the Gantt chart is shown in Figure3.The reason why we adopt random-key based representation is that the random-key based representation can represent a larger solution space and each decoded solution is ensured as a feasible solution.These two points guarantee the searching efficiency and the probability of searching out an optimal solution.The procedure of hCEA-MRF.Require: problem data, problem model, parameters Ensure: g best ; 1: Initialize the population randomly; 2: while (τ < T) do if τ = 1 or Regrouping is true then

Figure 1 .
Figure 1.Illustration of random-key based representation.

Figure 2 .
Figure 2. Illustration of the machine assignment process.

Figure 3 .
Figure 3. Gantt chart of the illustrated solution for the instance in Table3.

Figure 4 .
Figure 4. Illustration of the network structure of MRF.

Table 1 .
A summary of the fuzzy models, objectives and methodologies.

Table 2 .
A summary of the definition , advantages and disadvantages of the existing decomposition strategies.

Table 3 .
An instance of flexible job shop scheduling.

Table 4 .
The performance compared with the state-of-the-art for the generated instances by uniform distribution.

Table 5 .
The performance compared with the state-of-the-art for the generated instances by Gaussian distribution.

Table 6 .
The performance compared with state-of-the-art for the generated instances by exponential distribution.

Table 7 .
The performance of different parameter self-adaptive strategies for the generated instances by uniform distribution.

Table 8 .
The performance of different parameter self-adaptive strategies for the generated instances by Gaussian distribution.

Table 9 .
The performance of different parameter self-adaptive strategies for the generated instances by exponential distribution.

Table 10 .
The performance of different decomposition strategies for the generated instances by uniform distribution.

Table 11 .
The performance of different decomposition strategies for the generated instances by Gaussian distribution.

Table 12 .
The performance of different decomposition strategies for the generated instances by exponential distribution.