Next Article in Journal
Local Cluster-Aware Attention for Non-Euclidean Structure Data
Next Article in Special Issue
Design of Axially Symmetric Fluid–Spring Vibration Absorber with Five DOFs Based on Orthogonal Experiment
Previous Article in Journal
MS-CANet: Multi-Scale Subtraction Network with Coordinate Attention for Retinal Vessel Segmentation
Previous Article in Special Issue
A Shuffled Frog-Leaping Algorithm with Cooperations for Distributed Assembly Hybrid-Flow Shop Scheduling with Factory Eligibility
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Dual-Population Genetic Algorithm with Q-Learning for Multi-Objective Distributed Hybrid Flow Shop Scheduling Problem

1
School of Mechanical Engineering, Anhui Institute of Information Technology, Wuhu 241000, China
2
School of Mechanical Engineering, Anhui Polytechnic University, Wuhu 241000, China
3
Anhui Key Laboratory of Detection Technology and Energy Saving Devices, Anhui Polytechnic University, Wuhu 241000, China
*
Author to whom correspondence should be addressed.
Symmetry 2023, 15(4), 836; https://doi.org/10.3390/sym15040836
Submission received: 9 March 2023 / Revised: 27 March 2023 / Accepted: 28 March 2023 / Published: 30 March 2023
(This article belongs to the Special Issue Meta-Heuristics for Manufacturing Systems Optimization Ⅱ)

Abstract

:
In real-world production processes, the same enterprise often has multiple factories or one factory has multiple production lines, and multiple objectives need to be considered in the production process. A dual-population genetic algorithm with Q-learning is proposed to minimize the maximum completion time and the number of tardy jobs for distributed hybrid flow shop scheduling problems, which have some symmetries in machines. Multiple crossover and mutation operators are proposed, and only one search strategy combination, including one crossover operator and one mutation operator, is selected in each iteration. A population assessment method is provided to evaluate the evolutionary state of the population at the initial state and after each iteration. Two populations adopt different search strategies, in which the best search strategy is selected for the first population and the search strategy of the second population is selected under the guidance of Q-learning. Experimental results show that the dual-population genetic algorithm with Q-learning is competitive for solving multi-objective distributed hybrid flow shop scheduling problems.

1. Introduction

The hybrid flow shop scheduling problem (HFSP) is a complex combinatorial optimization problem that has been extensively studied due to its practical relevance in a wide range of manufacturing and production environments [1,2]. In HFSP, the production process involves multiple stages or workstations, and each stage may have multiple machines that can process different types of jobs. The problem is to determine the processing sequence of jobs and the allocation of machines to each stage to minimize a specific objective function [3].
The distributed shop scheduling problem (DSSP) is a complex optimization problem that has gained significant attention in the literature due to its relevance to a wide range of manufacturing and production environments [4,5,6]. In DSSP, there are multiple production shops, each with its set of machines and jobs to be scheduled. The objective is to determine an optimal schedule for all the jobs and machines in the shops, subject to various constraints, such as machine availability, processing times, and due dates. DSSP has been studied extensively in the literature, and various algorithms have been proposed to solve this problem. In recent years, there has been a growing interest in developing efficient and effective algorithms to address DSSP with additional constraints such as uncertain processing times [7], energy consumption [8], and the presence of multiple objectives [9].
The distributed hybrid flow shop scheduling problem (DHFSP) is an extension of HFSP, which has attracted significant research attention due to its importance in production environments [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34]. The objective of DHFSP is to minimize the maximum completion time for all production processes to enhance production efficiency. Several studies have been conducted to address this problem using different approaches. Reference [26] proposed a mixed-integer linear programming formulation and a self-tuning iterated greedy (SIG) algorithm with an adaptive cocktail decoding mechanism to solve DHFSP with multiprocessor tasks. Reference [18] proposed a dynamic shuffled frog-leaping algorithm to solve the same problem and provided a lower bound. References [32,33,34] developed a hybrid brain storm optimization algorithm. References [10,16,22] designed many kinds of artificial bee colony algorithms. Reference [11] proposed a bi-population cooperative memetic algorithm. Reference [30] presented a novel shuffled frog-leaping algorithm with reinforcement learning for DHFSP with assembly. Reference [29] formulated three novel mixed-integer linear programming models and a constraint programming model for DHFSP with sequence-dependent setup times. Reference [24] was the only one who considered optimizing the minimization of the sum of earliness, tardiness, and delivery costs in the single-objective optimization study of DHFSP. They proposed an adaptive human-learning-based genetic algorithm. In summary, DHFSP is a significant problem in production environments and several studies have been conducted to address it using different optimization techniques. These include mixed-integer linear programming, shuffled frog-leaping algorithm, hybrid brain storm optimization algorithm, artificial bee colony algorithm, and bi-population cooperative memetic algorithm, among others. The proposed algorithms have shown promising results, but further research is needed to develop more effective and efficient algorithms for more complex DHFSP variants.

2. Logical Scheme to DHFSP

Recent research has explored the use of Q-learning, a type of reinforcement learning, in combination with intelligent optimization algorithms to improve algorithm performance [35,36]. Reference [37] demonstrated the use of Q-learning to adjust key parameters of a genetic algorithm, while Reference [38] used Q-learning to select a search operator dynamically. Reference [30] embedded Q-learning in a memeplex search strategy to select a search strategy dynamically. However, current research has only applied this approach to solve single-objective shop scheduling problems. To apply Q-learning and intelligent optimization algorithms to solve multi-objective optimization problems, further exploration is needed to design appropriate actions and evaluate populations. Therefore, there is a need for future research to investigate the combination of Q-learning and intelligent optimization algorithms for multi-objective optimization problems. Such research could contribute to developing effective algorithms for complex optimization problems in diverse fields. Results from such studies would be of interest to researchers and practitioners working in the area of intelligent optimization.
Optimization problems, such as the distributed hybrid flow shop scheduling problem, are often solved using intelligent optimization algorithms. Among these, genetic algorithms have been widely applied due to their effectiveness in solving complex optimization problems. Recently, the combination of intelligent optimization algorithms with reinforcement learning algorithms, such as Q-learning, has received attention as a means to improve algorithm performance. In this paper, we propose a dual-population genetic algorithm with Q-learning to minimize the maximum completion time and the number of tardy jobs for the distributed hybrid flow shop scheduling problem. The algorithm combines genetic algorithm (GA) with Q-learning to guide the selection of crossover and mutation operators. Multiple crossover and mutation operators are proposed, and only one search strategy combination is selected in each iteration. To evaluate the evolutionary state of the population, we provide a population assessment method at the initial state and after each iteration. Two populations adopt different search strategies, in which the best search strategy is selected for one population and the search strategy of the other population is selected under the guidance of Q-learning. The approach proposed in this paper could be applied to other optimization problems and could be of interest to researchers and practitioners working in the field of intelligent optimization. Figure 1 gives the logical scheme to DHFSP.
The remainder of the paper is organized as follows. The distributed hybrid flow shop scheduling problem is described in Section 2, and an introduction to GA and Q-learning is provided in Section 3. Section 4 presents a dual-population genetic algorithm with Q-learning and numerical experiments on the performance of the algorithm are reported in Section 5. In conclusion, our approach demonstrates promising results for solving the distributed hybrid flow shop scheduling problem.

3. Description of Distributed Hybrid Flow Shop Scheduling Problem

DHFSP is described as follows. There are n jobs, J 1 , J 2 , , J n , that need to be processed. There are F factories, each of which is a hybrid flow shop. In the process of processing, each job has S processes. Each job in different processes needs to be processed on a different machine and the performance and function of these machines are identical. There are m f s identical parallel machines, M f s 1 , M f s 2 , , M f s l , at stage s in factory f. The processing time of J i on M f s l is p i s . d i is the due date of J i . The machine distribution in different factories has symmetry.
The goal of DHFSP is to minimize the maximum completion time and the number of tardy jobs simultaneously through reasonable scheduling.
C max = max i { 1 , 2 , , n } { C i }
U ¯ = i = 1 n W i
where C max is the maximum completion time of all jobs, C i is the completion time of J i , U ¯ is the number of tardy jobs, W i represents whether J i is delivered on time, W i = 1 , if C i > d i , and W i = 0 , if C i d i .
DHFSP has three sub-problems, factory assignment, machine assignment, and processing sequence assignment. Obviously, an acceptable solution can be obtained only when all three of these assignments are reasonable.
Table 1 describes an illustrative example of DHFSP with 20 jobs, 2 stages and 2 factories. For example, the processing time of J 1 in the first stage is 39. There are four parallel machines in the first stage and five parallel machines in the second stage. Figure 2 shows a schedule of the example which is described in Table 1. As can be seen from Figure 2, job J 1 is processed in machine M 2 , 1 , 2 , which is the second machine in the first stage of factory 1, and the processing time is p 11 = 39 .

4. Introduction to GA and Q-Learning

The main steps of GA are as follows: initialization, evaluation, selection, crossover, and mutation. (1) Initialization: the initial population contains many candidate solutions, usually generated using random numbers. (2) Evaluation: evaluate the fitness function value for each individual. (3) Selection: select individuals based on their fitness function values to be preserved in later evolution. (4) Crossover: perform crossover between two selected individuals to generate new offspring. (5) Mutation: randomly select some individuals and mutate them to increase diversity in the population. (6) Repeat steps (2)–(5) until a stopping condition is met. (7) Output the final best solution.
The main steps of Q-learning are as follows: (1) Initialize the Q-table: First, a table that stores Q values (Q-table) needs to be created, with each element representing the Q value corresponding to the current state and action. (2) Select an action: Under the current state, select an action. The action can be selected randomly or based on the current Q value (for example, selecting the action with the highest current Q value). (3) Perform the action: Perform the selected action, thereby updating the state of the environment. (4) Calculate the reward: based on the new state, calculate the reward obtained. (5) Update the Q value: Use the Q-learning formula to update the Q value corresponding to the current state and action. (6) Repeat steps (2)–(5) until the environment reaches the terminal state or reaches the maximum number of steps set. (7) Output the result: Output the learned optimal strategy, that is, the action corresponding to the maximum Q value in the Q-table. Table 2 gives an example of a Q-table, and the Q-table will be updated as the algorithms runs.

5. A Dual-Population Genetic Algorithm with Q-Learning

5.1. Coding and Decoding

DHFSP encompasses three sub-problems: factory assignment, machine assignment, and processing sequence assignment. A reasonable coding method is the foundation for solving this problem using intelligent optimization algorithms. A double-string coding method is adopted to encode the solution of the problem, where the encoding string contains machine assignment and processing sequence assignment.
An encoding of DEHFSP is represented as a factory string [ α 1 , α 2 , . . . , α n ] and a sequence string [ β 1 , β 2 , . . . , β n ] , and α i { 1 , 2 , . . . , F } , β i { 1 , 2 , . . . , n } , β i β j , i , j , i j . The decoding procedure is described as follows: Assign all jobs to factories according to the factory string, where J i is assigned to factory α i . In each factory, the order in which all jobs are processed is determined by a sequence string. If two jobs, J β i and J β j , are such that i < j , then J β i is given priority and assigned to the machines. When selecting a machine for a job, the set of machines that can be selected is first determined and then the machine is chosen that will minimize the completion time of the job.
To further explain the decoding process, a solution is provided in Section 2. This solution consists of a factory string [2,2,1,1,2,1,1,2,2,1,2,2,1,1,2,1,1,2,2,1] and a sequence string [2,19,8,20,13,4,12,7,17,1,14,18,3,9,15,6,10,5,11,16]. Based on the factory string, jobs J 3 , J 4 , J 6 , J 7 , J 10 , J 13 , J 14 , J 16 , J 17 , and J 20 are assigned to factory 1, and their process sequence is determined by the sequence string to be J 20 , J 13 , J 4 , J 7 , J 17 , J 14 , J 3 , J 6 , J 10 , and J 16 . The scheduling scheme can be obtained and the Gantt chart is shown in Figure 2. The final value of C m a x is 296 and the number of tardy jobs is 13.

5.2. Crossover Operator

The crossover operator is a way of generating new solutions in genetic algorithms. Its purpose is to combine the genetic information of two parents to produce a new individual, with the new solution containing some features of both parents.
A crossover operator C O ( x , y ) is designed for DHFSP, where x and y represent two parents. The process of C O ( x , y ) is as follows: (1) randomly select a set A of jobs; (2) swap the values of the elements corresponding to jobs in A in x with those in y for the factory string; (3) rearrange the order of jobs in x belonging to A to match the order of these jobs in y for the sequence string. Figure 3 gives the process of global search and it can be observed that two offspring are significantly different from the previous parents, indicating a large variation in the search process during global search.

5.3. Mutation Operator

Mutation is a common operator used in evolutionary algorithms to generate new individuals in a population, increasing the diversity of the algorithm. Mutation is typically performed by introducing random changes to the gene values of an individual at some gene locus, either by randomizing or by applying some transformation rule. Mutation is usually applied to an individual in the current population to generate a new individual with slight variations. Unlike crossover, mutation does not require the pairing of individuals and can explore the search space independently. Mutation is often used in combination with crossover to increase the diversity of the population and facilitate better exploration of the search space.
Two mutation operators, MO1 based on insertion and MO2 based on exchange, are designed to generate new solutions by changing the factory string and the sequence string. In MO1, randomly select jobs J i and J j such that i < j , and let π p o s 1 = i and π p o s 2 = j . Then, insert π p o s 2 = j into the position of π p o s 1 in the sequence string and let θ j = θ i . In MO2, randomly choose jobs J i and J j such that i < j , and let π p o s 1 = i and π p o s 2 = j . Then, swap the values of π p o s 1 and π p o s 2 in the sequence string and swap the values of θ j and θ i in the factory string. The process of generating new solutions in MO1 and MO2 are illustrated in Figure 4 and Figure 5.

5.4. Q-Learning Process

The Q-learning algorithm consists of a state set s t , an action set a t , a reward function, and an action selection strategy. The environment state is determined based on population evaluation, while actions are represented by various search strategies. A novel reward function is designed and the action selection strategy adopts the ε -greedy strategy.
A new method is provided for evaluating the state of the population. The evaluation value of the population in generation t is calculated as
E t = N ¯ t N
where N ¯ t represents the number of solutions in the t-th generation population that are not dominated by any solutions in the (t-1)-th generation population and E t represents the evaluation value of the population in generation t.
The evaluation value E t of t-th generation is bounded by [ E 1 , E 2 ] with E 1 = 0 and E 2 = 2 . The state set S = 1 , 2 , . . . , 10 is defined by dividing the interval [ E 1 , E 2 ] into 10 equal parts, where the state value s t = k if E t [ 0.2 × ( k 1 ) , 0.2 × k ) for 1 k 10 , and a t = 10 if E t 2 .
Two variants of C O have been created, namely C O 1 and C O 2 . Each of these variants modifies only one string, with C O 1 changing the factory string and C O 2 altering the sequence string.
The set of actions, denoted by A, has been constructed by combining various crossover operators and mutation operators. Specifically, there are two crossover operators and two mutation operators, resulting in a total of four possible combinations. Table 3 illustrates the corresponding relationships between these combinations and the resulting actions in the set A.
Since the current metric s t is relatively small, indicating that the population is closer to the set of Pareto front in the solution space, we define r t + 1 as the difference between s t + 1 and s t . If we take action a t in state s t to improve the population’s state, we will receive a positive gain. Conversely, we will be penalized.
To select an action, the ε -greed method is employed. Typically, all entries in the initial Q-table are initialized to 0.

5.5. Dual-Population Genetic Algorithm with Q-Learning

QDGA is an algorithm that combines GA and Q-learning, and Q-learning is used in the process of crossover process and mutation process to choose a suitable crossover operator and mutation operator. The main steps of QDGA include population initialization and Q-table initialization, selection operation, crossover process and mutation process with Q-learning, Q-table updating, the information exchange of two populations, and outputting results.
In the population initialization, the two initial populations are randomly generated. The initial Q-table is initialized to 0. The two populations perform crossover, mutation, and selection operations independently. The two solutions are chosen according to the roulette wheel, which belongs to the parent population, then crossover operator and mutation operator corresponding to the action are selected according to the Q-table. The Q-table is updated according to the performance of action in the state. Then non-dominant sorting is performed on all individuals in both populations and a portion of individuals from both populations are randomly exchanged, accounting for 50% of the population. The result is output when the termination condition is satisfied.

6. Computation Experiments

All experiments are programmed using Visual Studio 2022 C++ and run on a computer with 16.0G RAM 12th Gen Intel(R) Core(TM) i7-12700H 2.70GHz.

6.1. Instances

To evaluate the effectiveness of the algorithm proposed in this study, 140 instances have been made available for download. These instances comprise varying numbers of jobs, factories, and stages, and can be obtained from https://gitee.com/caijingcao/DHFSP002 (30 March 2022). Each instance is represented by the notation n × F × S , where n denotes the number of jobs, F denotes the number of factories, and S denotes the number of stages.

6.2. Calculational Metrics

Metric ρ is the proportion of solutions that an algorithm A can provide for the reference solution set Ω . ρ is calculated as
ρ = x Ω A x Ω Ω
where Ω A is the set of non-dominated solutions obtained by algorithm A, Ω represents the reference solution set, and Ω indicates the number of solutions in Ω .
Metric C is applied to compare the approximate Pareto optimal set obtained by algorithms. C L , B measures the fraction of members of B that are dominated by members of L.
C L , B = b B : h L , h b B
where x y indicates that x dominates y and B indicates the number of solutions in B. The Pareto optimal solution refers to a set of solutions in multi-objective optimization problems, in which improving one objective further would result in a degradation of at least one other objective.
Metric I G D (inverted generational distance) is a comprehensive performance indicator to evaluate algorithms. The smaller the value of I G D , the better the algorithm’s overall performance of A. I G D is calculated as
I G D Ω A , Ω = 1 Ω x Ω min y Ω A d ( x , y )
where d ( x , y ) is the Euclidean distance between solution x and y by normalized objectives.

6.3. Comparison Algorithms

To assess the effectiveness of QDGA, two widely recognized multi-objective evolutionary algorithms were selected for comparison: non-dominated sorting genetic algorithm-II (NSGA-II [39]) and strength Pareto evolutionary algorithm2 (SPEA2 [40]). These algorithms were chosen based on their reputation and their demonstrated ability to effectively solve complex multi-objective optimization problems. By comparing the performance of QDGA against these established algorithms, we can gain a better understanding of the strengths and weaknesses of QDGA in tackling multi-objective optimization challenges. In order to assess the impact of Q-learning on the algorithm, a variant of the QDGA algorithm was implemented in which the crossover and mutation operators were randomly selected. This variant is referred to as GA. By introducing this variant, we aim to determine whether the incorporation of Q-learning leads to improvements in the algorithm’s performance. The GA variant serves as a control group against which the performance of the original QDGA algorithm can be compared and the results can be used to evaluate the effectiveness of Q-learning in the algorithm.

6.4. Parameter Settings

QDGA relies on N, P c , P m , α , γ , ϵ and stopping condition. Although longer algorithm runs could potentially result in better outcomes, experiments have demonstrated that both QSFLA and its comparison algorithm tend to converge or experience only minor improvements after running for 0.1 × n × S seconds. To ensure a fair comparison, we chose 0.1 × n × S seconds as the stopping condition for all algorithms, which is consistent with similar studies [30].
The Taguchi method [31] is a powerful statistical approach to optimize the performance of a product or process by identifying the best combination of controllable factors or parameters. This method has been widely used in various fields including engineering, manufacturing, and healthcare. The Taguchi method was utilized to determine the optimal settings for the other parameters, with several instances featuring varying numbers of jobs, factories, or stages used in parameter experiments.
By conducting Taguchi experiments on examples of different scales, a set of optimal parameters can be obtained that is best for all examples. Based on the results, it can be concluded that, among different combinations of parameters for QDGA, the setting with N = 100 , P c = 0.8 , P m = 0.05 , α = 0.1 , γ = 0.9 , and ϵ = 0.2 achieves the best performance. Therefore, we adopt these settings for QDGA in our further experiments. All parameters of NSGAII, SPEA2, and GA are determined by the same way.

6.5. Results and Analyses

The computational results of four algorithms, which were run 10 times for each instance, were reported. Table 4, Table 5 and Table 6 display the performance metrics of the algorithms. The reference set Ω was created by aggregating the non-dominated solutions obtained by the four algorithms.
The computational results presented in Table 4 indicate that QDGA outperforms GA, NSGAII, and SPEA2 in most instances in terms of the metric ρ . Specifically, the metric ρ l of QDGA is higher than that of its comparative algorithms in 45 instances and it equals 1 in 22 instances. This implies that all members of the reference set Ω are generated by CVS. Moreover, the statistical results displayed in Figure 6 support the advantage of QDGA over the other algorithms. These findings suggest that QDGA is a more effective method for DHFSP.
Table 5 shows the computational results of four algorithms on metric C where ‘Ins’, ‘Q’, ‘G’, ‘N’, and ‘S’ represent ‘Instance’, ‘QDGA’, ‘GA’, ‘NSGAII’, and ‘SPEA2’, respectively. The results presented in Table 5 demonstrate that, in 41 instances, the non-dominated solutions are not dominated by any solutions of GA, as C ( G , Q ) equals 0. C ( Q , N ) is smaller than C ( N , Q ) in 58 instances and C ( Q , N ) equals 1 in 38 instances indicating that all solutions of NSGAII are dominated by the non-dominated solutions of QDGA. Compared with SPEA2, QDGA also has obvious advantages in terms of metric C . The statistical results presented in Figure 7 illustrate the differences in C among QDGA and its three comparative algorithms. It can be concluded that QDGA can generate better results compared to the comparative algorithms.
Based on the comprehensive analysis presented in Table 6, it is discernible that QDGA surpasses GA, NSGA-II, and SPEA2 with regard to convergence in the majority of instances examined. Specifically, QDGA demonstrates a significantly lower diversity indicator I G D compared to the three comparative algorithms in 50 instances and it attains a value of 0 in 22 instances, which unequivocally suggests that all members of the reference set Ω are generated solely by QDGA. The statistical results reported in Figure 8 further validate the superior convergence performance of QDGA.
The performance of QDGA is superior, primarily due to the significant role played by Q-learning in the search process. The results presented in Table 4, Table 5 and Table 6 demonstrate that QDGA achieves outstanding performance. The statistical results shown in Figure 6, Figure 7 and Figure 8 indicate, with statistical significance, that QDGA outperforms the compared algorithms. Q-learning is capable of effectively determining which of the various designed crossover and mutation operators should be used to achieve better algorithm performance.

7. Conclusions

A dual-population genetic algorithm with Q-learning is proposed to address the distributed hybrid flow shop scheduling problem. This problem is common in the real-world production process, where multiple objectives need to be considered. QDGA can also solve other problems, such as multi-legged robot control, multiple agents moving cooperatively on another, voltage control of power systems, distribution networks, urban water resource management system, etc. The proposed algorithm employs multiple crossover and mutation operators, and a population assessment method to evaluate the evolutionary state of the population. The algorithm also utilizes Q-learning to guide the search strategy of the second population. The experimental results demonstrate that the proposed algorithm outperforms the basic genetic algorithm. Therefore, the dual-population genetic algorithm with Q-learning can be considered an effective solution for distributed hybrid flow shop scheduling problems.
In the near future, we will continue our research on distributed scheduling problems. We aim to apply meta-heuristics such as the imperialist competitive algorithm to solve these problems. Additionally, we plan to address the problem with energy-related objectives and focus on developing solutions for energy-efficient HFSP. By incorporating energy-efficient objectives, we can create solutions that align with the current trend of sustainability in industrial manufacturing. Therefore, we anticipate that this area of research will continue to gain importance in the future, and our work will contribute to the development of sustainable and efficient manufacturing practices.

Author Contributions

Conceptualization, J.C.; methodology, J.C.; software, J.C.; validation, J.C.; formal analysis, J.C.; investigation, J.C.; resources, J.C.; data curation, J.Z.; writing—original draft preparation, J.C.; writing—review and editing, J.C.; visualization, J.Z.; supervision, J.Z.; project administration, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Research Initiation Foundation of Anhui Polytechnic University (2022YQQ002), Anhui Polytechnic University Research Project (Xjky2022002), and the Open Research Fund of Anhui Key Laboratory of Detection Technology and Energy Saving Devices (JCKJ2022B01).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declared that they have no conflicts of interest in this work.

References

  1. Qin, H.X.; Han, Y.Y.; Zhang, B.; Meng, L.L.; Liu, Y.P.; Pan, Q.K.; Gong, D.W. An improved iterated greedy algorithm for the energy-efficient blocking hybrid flow shop scheduling problem. Swarm Evol. Comput. 2022, 69, 100992. [Google Scholar] [CrossRef]
  2. Qin, M.; Wang, R.; Shi, Z.; Liu, L.; Shi, L. A Genetic Programming-Based Scheduling Approach for Hybrid Flow Shop with a Batch Processor and Waiting Time Constraint. IEEE Trans. Autom. Sci. Eng. 2021, 18, 94–105. [Google Scholar] [CrossRef]
  3. Meng, L.L.; Zhang, C.Y.; Shao, X.Y.; Zhang, B.; Ren, Y.P.; Lin, W.W. More MILP models for hybrid flow shop scheduling problem and its extended problems. Int. J. Prod. Res. 2020, 58, 3905–3930. [Google Scholar] [CrossRef]
  4. Wang, G.; Li, X.; Gao, L.; Li, P. An effective multi-objective whale swarm algorithm for energy-efficient scheduling of distributed welding flow shop. Ann. Oper. Res. 2021, 310, 223–255. [Google Scholar] [CrossRef]
  5. Zhao, F.Q.; Zhang, L.X.; Cao, J.; Tang, J.X. A cooperative water wave optimization algorithm with reinforcement learning for the distributed assembly no-idle flowshop scheduling problem. Comput. Ind. Eng. 2021, 153, 107082. [Google Scholar] [CrossRef]
  6. Zhang, Z.Q.; Qian, B.; Hu, R.; Jin, H.P.; Wang, L. A matrix-cube-based estimation of distribution algorithm for the distributed assembly permutation flow-shop scheduling problem. Swarm Evol. Comput. 2021, 60, 116484. [Google Scholar] [CrossRef]
  7. Yang, J.; Xu, H. Hybrid Memetic Algorithm to Solve Multiobjective Distributed Fuzzy Flexible Job Shop Scheduling Problem with Transfer. Processes 2022, 10, 1517. [Google Scholar] [CrossRef]
  8. Shao, W.; Shao, Z.; Pi, D. A multi-neighborhood-based multi-objective memetic algorithm for the energy-efficient distributed flexible flow shop scheduling problem. Neural Comput. Appl. 2022, 34, 22303–22330. [Google Scholar] [CrossRef]
  9. Meng, L.; Ren, Y.; Zhang, B.; Li, J.Q.; Sang, H.; Zhang, C. MILP Modeling and Optimization of Energy-Efficient Distributed Flexible Job Shop Scheduling Problem. IEEE Access 2020, 8, 191191–191203. [Google Scholar] [CrossRef]
  10. Li, Y.; Li, F.; Pan, Q.K.; Gao, L.; Tasgetiren, M.F. An artificial bee colony algorithm for the distributed hybrid flowshop scheduling problem. Procedia Manuf. 2019, 39, 1158–1166. [Google Scholar] [CrossRef]
  11. Wang, J.J.; Wang, L. A bi-population cooperative memetic algorithm for distributed hybrid flow-shop scheduling. IEEE Trans. Emerg. Top. Comput. Intell. 2020, 5, 947–961. [Google Scholar] [CrossRef]
  12. Cai, J.C.; Lei, D.M. A cooperated shuffled frog-leaping algorithm for distributed energy-efficient hybrid flow shop scheduling with fuzzy processing time. Complex Intell. Syst. 2021, 7, 2235–2253. [Google Scholar] [CrossRef]
  13. Zheng, J.; Wang, L.; Wang, J.J. A cooperative coevolution algorithm for multi-objective fuzzy distributed hybrid flow shop. Knowl.-Based Syst. 2020, 194, 105536. [Google Scholar] [CrossRef]
  14. Wang, J.J.; Wang, L. A cooperative memetic algorithm with learning-based agent for energy-aware distributed hybrid flow-Shop scheduling. IEEE Trans. Evol. Comput. 2021, 26, 461–475. [Google Scholar] [CrossRef]
  15. Jiang, E.D.; Wang, L.; Wang, J.J. Decomposition-based multi-objective optimization for energy-aware distributed hybrid flow shop scheduling with multiprocessor tasks. Tsinghua Sci. Technol. 2021, 26, 646–663. [Google Scholar] [CrossRef]
  16. Li, Y.; Li, X.; Gao, L.; Zhang, B.; Pan, Q.K.; Tasgetiren, M.F.; Meng, L. A discrete artificial bee colony algorithm for distributed hybrid flowshop scheduling problem with sequence-dependent setup times. Int. J. Prod. Res. 2021, 59, 3880–3899. [Google Scholar] [CrossRef]
  17. Lei, D.M.; Xi, B.J. Diversified teaching-learning-based optimization for fuzzy two-stage hybrid flow shop scheduling with setup time. J. Intell. Fuzzy Syst. 2021, 41, 4159–4173. [Google Scholar] [CrossRef]
  18. Cai, J.C.; Zhou, R.; Lei, D.M. Dynamic shuffled frog-leaping algorithm for distributed hybrid flow shop scheduling with multiprocessor tasks. Eng. Appl. Artif. Intell. 2020, 90, 103540. [Google Scholar] [CrossRef]
  19. Wang, L.; Li, D.D. Fuzzy distributed hybrid flow shop scheduling problem with heterogeneous factory and unrelated parallel machine: A shuffled frog leaping algorithm with collaboration of multiple search strategies. IEEE Access 2020, 8, 214209–214223. [Google Scholar] [CrossRef]
  20. Cai, J.C.; Zhou, R.; Lei, D.M. Fuzzy distributed two-stage hybrid flow shop scheduling problem with setup time: Collaborative variable search. J. Intell. Fuzzy Syst. 2020, 38, 3189–3199. [Google Scholar] [CrossRef]
  21. Dong, J.; Ye, C. Green scheduling of distributed two-stage reentrant hybrid flow shop considering distributed energy resources and energy storage system. Comput. Ind. Eng. 2022, 169, 108146. [Google Scholar] [CrossRef]
  22. Li, Y.L.; Li, X.Y.; Gao, L.; Meng, L.L. An improved artificial bee colony algorithm for distributed heterogeneous hybrid flowshop scheduling problem with sequence-dependent setup times. Comput. Ind. Eng. 2020, 147, 106638. [Google Scholar] [CrossRef]
  23. Li, J.Q.; Yu, H.; Chen, X.; Li, W.; Du, Y.; Han, Y.Y. An improved brain storm optimization algorithm for fuzzy distributed hybrid flowshop scheduling with setup time. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO, New York, NY, USA, 8–12 July 2020; pp. 275–276, Association for Computing Machinery, Inc.: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
  24. Qin, H.; Li, T.; Teng, Y.; Wang, K. Integrated production and distribution scheduling in distributed hybrid flow shops. Memetic Comput. 2021, 13, 185–202. [Google Scholar] [CrossRef]
  25. Li, J.; Chen, X.l.; Duan, P.; Mou, J.h. KMOEA: A knowledge-based multi-objective algorithm for distributed hybrid flow shop in a prefabricated system. IEEE Trans. Ind. Inform. 2021, 18, 5318–5329. [Google Scholar] [CrossRef]
  26. Ying, K.C.; Lin, S.W. Minimizing makespan for the distributed hybrid flowshop scheduling problem with multiprocessor tasks. Expert Syst. Appl. 2018, 92, 132–141. [Google Scholar] [CrossRef]
  27. Shao, W.S.; Shao, Z.S.; Pi, D.C. Modeling and multi-neighborhood iterated greedy algorithm for distributed hybrid flow shop scheduling problem. Knowl.-Based Syst. 2020, 194, 105527. [Google Scholar] [CrossRef]
  28. Shao, W.S.; Shao, Z.S.; Pi, D.C. Multi-objective evolutionary algorithm based on multiple neighborhoods local search for multi-objective distributed hybrid flow shop scheduling problem. Expert Syst. Appl. 2021, 183, 115453. [Google Scholar] [CrossRef]
  29. Meng, L.; Gao, K.; Ren, Y.; Zhang, B.; Sang, H.; Chaoyong, Z. Novel MILP and CP models for distributed hybrid flowshop scheduling problem with sequence-dependent setup times. Swarm Evol. Comput. 2022, 71, 101058. [Google Scholar] [CrossRef]
  30. Cai, J.; Lei, D.; Wang, J.; Wang, L. A novel shuffled frog-leaping algorithm with reinforcement learning for distributed assembly hybrid flow shop scheduling. Int. J. Prod. Res. 2023, 61, 1233–1251. [Google Scholar] [CrossRef]
  31. Cai, J.; Lei, D.; Li, M. A shuffled frog-leaping algorithm with memeplex quality for bi-objective distributed scheduling in hybrid flow shop. Int. J. Prod. Res. 2020, 59, 5404–5421. [Google Scholar] [CrossRef]
  32. Hao, J.H.; Li, J.Q.; Du, Y.; Song, M.X.; Duan, P.; Zhang, Y.Y. Solving distributed hybrid flowshop scheduling problems by a hybrid brain storm optimization algorithm. IEEE Access 2019, 7, 66879–66894. [Google Scholar] [CrossRef]
  33. Lei, D.; Wang, T. Solving distributed two-stage hybrid flowshop scheduling using a shuffled frog-leaping algorithm with memeplex grouping. Eng. Optim. 2019, 52, 1461–1474. [Google Scholar] [CrossRef]
  34. Li, J.Q.; Li, J.K.; Zhang, L.J.; Sang, H.Y.; Han, Y.Y.; Chen, Q.D. Solving type-2 fuzzy distributed hybrid flowshop scheduling using an improved brain storm optimization algorithm. Int. J. Fuzzy Syst. 2021, 23, 1194–1212. [Google Scholar] [CrossRef]
  35. Atallah, M.J.; Blanton, M. Algorithms and Theory of Computation Handbook, Volume 2: Special Topics and Techniques; CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar]
  36. Charu, A. Neural Networks and Deep Learning, A Textbook; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
  37. Chen, R.; Yang, B.; Li, S.; Wang, S. A self-learning genetic algorithm based on reinforcement learning for flexible job-shop scheduling problem. Comput. Ind. Eng. 2020, 149, 106778. [Google Scholar] [CrossRef]
  38. Wang, J.; Lei, D.; Cai, J. An adaptive artificial bee colony with reinforcement learning for distributed three-stage assembly scheduling with maintenance. Appl. Soft Comput. 2021, 117, 108371. [Google Scholar] [CrossRef]
  39. Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef] [Green Version]
  40. Zitzler, E.; Laumanns, M.; Thiele, L. SPEA2: Improving the Strength Pareto Evolutionary Algorithm for Multiobjective Optimization; Technical Report Gloriastrasse; 103; TIK-Rep; Swiss Federal Institute of Technology: Lausanne, Switzerland, 2001; pp. 1–20. [Google Scholar]
Figure 1. The logical scheme to DHFSP.
Figure 1. The logical scheme to DHFSP.
Symmetry 15 00836 g001
Figure 2. Gantt chart of the case.
Figure 2. Gantt chart of the case.
Symmetry 15 00836 g002
Figure 3. The process of global search.
Figure 3. The process of global search.
Symmetry 15 00836 g003
Figure 4. The process of local search MO1.
Figure 4. The process of local search MO1.
Symmetry 15 00836 g004
Figure 5. The process of local search MO2.
Figure 5. The process of local search MO2.
Symmetry 15 00836 g005
Figure 6. The box plots of metric ρ for four algorithms.
Figure 6. The box plots of metric ρ for four algorithms.
Symmetry 15 00836 g006
Figure 7. The box plots of metric C for four algorithms.
Figure 7. The box plots of metric C for four algorithms.
Symmetry 15 00836 g007
Figure 8. The box plots of metric I G D for four algorithms.
Figure 8. The box plots of metric I G D for four algorithms.
Symmetry 15 00836 g008
Table 1. An illustrative example.
Table 1. An illustrative example.
Job/i1234567891011121314151617181920
p i 1 39818843613648100456130988599628939696198
p i 2 5341814185984745844292756974528638804592
Table 2. An example of a Q-table.
Table 2. An example of a Q-table.
Action 1Action 2Action 3Action 4
State 10.0290.7830.0690.684
State 20.3200.3810.5770.732
State 30.5480.4860.0060.316
State 40.6360.4160.6150.630
State 50.4510.0050.7350.469
State 60.4290.6390.6470.512
State 70.7420.2090.3860.442
State 80.5830.6910.3470.209
State 90.6740.1810.1640.346
State 100.1040.8110.1350.224
Table 3. The corresponding relationship between action and search method.
Table 3. The corresponding relationship between action and search method.
ActionSearch Method
1 C O 1 + M O 1
2 C O 1 + M O 2
3 C O 2 + M O 1
4 C O 2 + M O 2
Table 4. Computational results of four algorithms on metric ρ .
Table 4. Computational results of four algorithms on metric ρ .
InstanceQDGAGANSGAIISPEA2InstanceQDGAGANSGAIISPEA2
10.8330.0000.1670.000430.8000.0000.2000.000
20.6670.0000.3330.000440.4000.0000.2000.400
30.5000.0000.5000.500451.0000.0000.0000.000
40.0000.5000.5000.000461.0000.0000.0000.000
50.6670.6670.3330.000471.0000.0000.0000.000
60.4000.4000.2000.000480.3330.6670.0000.000
71.0001.0001.0001.000490.3330.0000.3330.667
80.6670.3330.0000.000500.3330.0000.0000.667
91.0001.0001.0001.000510.6670.0000.3330.000
101.0000.0000.0000.000520.2500.0000.7500.000
111.0000.0000.0000.000530.5000.0000.5000.000
120.8000.2000.2000.000541.0000.0000.0000.000
131.0000.0000.0000.000550.0000.0001.0000.000
141.0000.0000.0000.000560.5000.0000.0000.500
150.5000.0000.0000.500570.6670.3330.0000.000
160.5000.2500.2500.000581.0000.0000.0000.000
170.6670.3330.0000.000590.6670.0000.0000.333
180.0000.3330.3330.333601.0000.0000.0000.000
190.0000.0001.0000.000610.0000.0000.0001.000
200.0000.3330.0000.667620.0000.0000.5000.500
210.1670.3330.5000.000630.2500.2500.0000.500
220.4290.4290.0000.143640.0000.0000.0001.000
230.2500.7500.0000.250650.3330.3330.3330.000
241.0000.0000.0000.000660.0000.0001.0000.000
250.5000.5000.0000.000670.0001.0000.0000.000
261.0000.0000.0000.000680.0000.0000.5000.500
270.5000.0000.0000.500690.0000.0000.2000.800
280.0000.0000.0001.000700.0000.2500.7500.000
291.0000.0000.0000.000710.6670.0000.0000.667
300.5000.5000.0000.000720.6670.0000.0000.333
310.3330.3330.3330.000730.6670.0000.0000.333
321.0000.0000.0000.000740.5000.5000.0000.000
330.0000.3330.3330.333750.6670.0000.0000.333
341.0000.0000.0000.000760.3330.0000.3330.333
350.6670.0000.3330.000770.5000.0000.0000.500
360.0000.3330.6670.000780.0000.3330.0000.667
370.5000.5000.0000.000791.0000.0000.0000.000
380.0001.0000.0000.000800.3330.1670.5000.000
391.0000.0000.0000.000810.0000.5000.5000.000
401.0000.0000.0000.000821.0000.0000.0000.000
410.5000.0000.5000.000830.6670.3330.0000.000
420.5000.5000.2500.000841.0000.0000.0000.000
Table 5. Computational results of four algorithms on metric C .
Table 5. Computational results of four algorithms on metric C .
Ins C ( Q , G ) C ( G , Q ) C ( Q , N ) C ( N , Q ) C ( Q , S ) C ( S , Q ) Ins C ( Q , G ) C ( G , Q ) C ( Q , N ) C ( N , Q ) C ( Q , S ) C ( S , Q )
11.0000.0000.8570.0000.8330.000431.0000.0000.6670.0001.0000.000
21.0000.0000.7500.3331.0000.000441.0000.0000.7500.0000.5000.000
31.0000.0000.3330.3330.3330.333451.0000.0001.0000.0001.0000.000
40.6670.0000.0001.0001.0000.000461.0000.0001.0000.0001.0000.000
50.3330.3330.6670.0001.0000.000471.0000.0001.0000.0001.0000.000
60.6000.5000.6670.5000.8330.000480.3330.7500.2500.5001.0000.000
70.0000.0000.0000.0000.0000.000491.0000.0000.6670.0000.0000.500
80.5000.3331.0000.0001.0000.000500.0000.6001.0000.0000.0000.800
90.0000.0000.0000.0000.0000.000511.0000.0000.5000.3331.0000.000
101.0000.0001.0000.0001.0000.000520.5000.3330.0000.6671.0000.000
111.0000.0001.0000.0001.0000.000531.0000.0000.6670.5000.5000.500
120.7500.0000.7500.0001.0000.000541.0000.0001.0000.0001.0000.000
131.0000.0001.0000.0001.0000.000550.0000.3330.0001.0000.0001.000
141.0000.0001.0000.0001.0000.000561.0000.0001.0000.0000.0000.500
151.0000.0000.7500.2000.0000.600570.5000.3331.0000.0001.0000.000
160.5000.3330.6670.0001.0000.000581.0000.0001.0000.0001.0000.000
170.6670.3330.6670.3331.0000.000591.0000.0001.0000.0000.7500.333
180.5000.5000.6670.0000.5000.500601.0000.0001.0000.0001.0000.000
191.0000.0000.0001.0001.0000.000610.0001.0000.0001.0000.0001.000
200.6670.3331.0000.0000.0001.000620.5000.0000.0001.0000.0001.000
210.4000.3330.2500.6671.0000.000630.6670.0000.4000.3330.0000.667
220.5000.5711.0000.0000.7500.000640.3330.6670.0001.0000.0001.000
230.0000.7501.0000.0000.0000.500650.7500.0000.6670.7501.0000.000
241.0000.0001.0000.0001.0000.000660.5000.5000.0001.0000.6670.000
250.0000.5001.0000.0000.6670.000670.0001.0000.5000.5001.0000.000
261.0000.0001.0000.0001.0000.000681.0000.0000.0000.3330.0001.000
270.5000.2000.6670.2000.3330.600690.0001.0000.0001.0000.0001.000
281.0000.0001.0000.0000.0001.000700.3330.5000.0001.0001.0000.000
291.0000.0001.0000.0001.0000.000711.0000.0001.0000.0000.3330.333
300.0000.7501.0000.0000.2500.625721.0000.0001.0000.0000.5000.500
310.5000.0000.5000.5000.7500.000731.0000.0001.0000.0000.6670.333
321.0000.0001.0000.0001.0000.000740.6670.6671.0000.0000.7500.667
330.5000.2500.0000.7500.0000.750751.0000.0000.5000.2500.5000.500
341.0000.0001.0000.0001.0000.000761.0000.0000.6670.0000.7500.000
351.0000.0000.6670.3331.0000.000770.0000.2001.0000.0000.0000.600
360.0001.0000.0001.0000.6670.333780.0001.0000.0001.0000.0001.000
370.0000.5001.0000.0001.0000.000791.0000.0001.0000.0001.0000.000
380.0001.0000.0000.5000.6670.250800.3330.3330.0000.6671.0000.000
391.0000.0001.0000.0001.0000.000810.0001.0000.0001.0000.0000.667
401.0000.0001.0000.0001.0000.000821.0000.0001.0000.0001.0000.000
411.0000.0000.0000.6671.0000.000830.5000.0001.0000.0001.0000.000
420.3330.0000.5000.0000.6670.000841.0000.0001.0000.0001.0000.000
Table 6. Computational results of four algorithms on metric I G D .
Table 6. Computational results of four algorithms on metric I G D .
InstanceQDGAGANSGAIISPEA2InstanceQDGAGANSGAIISPEA2
10.0210.0200.0140.015430.0110.1650.1460.437
20.0020.0370.0030.235440.0920.2520.0270.023
30.0290.2210.1550.142450.0000.0650.7230.453
40.4450.0180.5310.179460.0000.0290.0370.167
50.0010.0270.0270.083470.0000.2480.0910.057
60.1610.0980.0080.087480.0400.0420.0620.157
70.0000.0000.0000.000490.1890.2770.0390.149
80.0010.0210.0360.040500.0370.1450.2890.029
90.0000.0000.0000.000510.0140.3410.1680.168
100.0000.0270.0090.024520.1090.0980.0850.210
110.0000.0630.5000.813530.0730.1000.0140.037
120.0130.0870.0380.041540.0000.5730.4011.190
130.0000.2550.4070.276550.4160.3090.0000.262
140.0000.1760.0770.103560.0000.6270.0440.131
150.0160.2470.0220.091570.0040.0570.1130.539
160.0590.2130.0720.145580.0000.0630.3970.253
170.0030.0260.0420.028590.0120.0200.1680.049
180.1000.1030.0350.139600.0000.3240.4310.403
190.2030.7000.0001.000610.0420.0290.0320.000
200.0600.0760.0580.033620.5460.5830.1890.223
210.0300.0550.0580.060630.0340.1760.0760.023
220.0360.0210.0800.076640.4030.3370.0490.000
230.0560.0000.1420.072650.0390.1940.1550.185
240.0000.0720.0320.113660.5341.1110.0000.588
250.0200.1350.0550.046670.1370.0000.1760.798
260.0000.0830.1410.163680.1730.3980.3720.514
270.0220.1190.2340.120690.3530.1920.1560.000
280.0230.3460.0400.000700.0300.0620.0130.324
290.0001.1600.0580.400710.0020.7190.2450.007
300.0200.1340.1080.084720.0050.0950.0870.077
310.0640.0630.1910.158730.0050.0520.0790.014
320.0000.0380.1920.628740.0530.0530.0510.022
330.0570.0730.1070.141750.0140.3270.2980.050
340.0000.0250.0170.084760.0950.0830.0780.061
350.0000.0990.0460.201770.0080.0860.1360.082
360.0880.3780.0000.124781.3980.2040.3940.054
370.0000.0400.0380.161790.0000.1660.1340.122
380.0120.0000.0120.013800.0420.1420.0360.352
390.0000.4281.1210.376810.2970.0000.0700.102
400.0000.1340.0430.113820.0000.1380.1300.901
410.0160.1680.5010.289830.1890.2740.4270.426
420.1510.0170.1270.141840.0000.1760.5660.795
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, J.; Cai, J. A Dual-Population Genetic Algorithm with Q-Learning for Multi-Objective Distributed Hybrid Flow Shop Scheduling Problem. Symmetry 2023, 15, 836. https://doi.org/10.3390/sym15040836

AMA Style

Zhang J, Cai J. A Dual-Population Genetic Algorithm with Q-Learning for Multi-Objective Distributed Hybrid Flow Shop Scheduling Problem. Symmetry. 2023; 15(4):836. https://doi.org/10.3390/sym15040836

Chicago/Turabian Style

Zhang, Jidong, and Jingcao Cai. 2023. "A Dual-Population Genetic Algorithm with Q-Learning for Multi-Objective Distributed Hybrid Flow Shop Scheduling Problem" Symmetry 15, no. 4: 836. https://doi.org/10.3390/sym15040836

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop