Next Article in Journal
Clustering- and Graph-Coloring-Based Inter-Network Interference Mitigation for Wireless Body Area Networks
Previous Article in Journal
GF-NGB: A Graph-Fusion Natural Gradient Boosting Framework for Pavement Roughness Prediction Using Multi-Source Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Improved Multi-Objective Memetic Algorithm with Q-Learning for Distributed Hybrid Flow Shop Considering Sequence-Dependent Setup Times

School of Software, Yunnan University, Kunming 650000, China
*
Author to whom correspondence should be addressed.
Symmetry 2026, 18(1), 135; https://doi.org/10.3390/sym18010135
Submission received: 5 December 2025 / Revised: 6 January 2026 / Accepted: 7 January 2026 / Published: 9 January 2026
(This article belongs to the Section Computer)

Abstract

Most multi-objective studies on distributed hybrid flow shops that include tardiness-related objectives focus solely on optimizing makespan alongside a single tardiness objective. However, in real-world scenarios with strict contractual deadlines or high penalty costs for delays, minimizing both total tardiness and the number of tardy jobs becomes critically important. This paper addresses this gap by prioritizing tardiness-related objectives while simultaneously optimizing makespan, total tardiness, and the number of tardy jobs. It investigates a distributed hybrid flow shop scheduling problem (DHFSP), which has some symmetries on machines. We propose an improved multi-objective memetic algorithm incorporating Q-learning (IMOMA-QL) to solve this problem, featuring (1) a hybrid initialization method that generates high-quality, diverse solutions by balancing all three objectives; (2) a multi-factory SB2OX crossover operator preserving high-performance job sequences across factories; (3) six problem-specific neighborhood structures for efficient solution space exploration; and (4) a Q-learning-guided variable neighborhood search that adaptively selects neighborhood structures. Based on extensive numerical experiments across 100 generated instances and a comprehensive comparison with four comparative algorithms, the proposed IMOMA demonstrates its effectiveness and proves to be a competitive method for solving the DHFSP.

1. Introduction

With the development of globalization, many enterprises setup production bases in different countries and regions. Distributed flow shop can coordinate production activities between different geographical areas, achieve load balancing between different factories, avoid overloading of some factories or waste of resources in some factories, and improve the overall production efficiency. The distributed flow shop scheduling problem (DFSP) has been thoroughly considered, yielding numerous results addressing a set of pragmatic constraints such as re-entrant [1], no idle [2], deteriorating job [3], sequence-dependent setup times (SDSTs) [4,5,6,7], and energy-conscious [8]. Various evolution algorithms have been widely used to solve real-world problems extensively, such as the memetic algorithm [9,10,11], artificial bee colony algorithm [12], distributional estimation algorithm (DEA) [13], discrete fruit fly optimization algorithm [14], hybrid meta-heuristic [15], variable neighborhood descent algorithm [8], and spherical evolution algorithm [16].
To address distributed scheduling problems (DSPs), researchers have developed diverse solution approaches. Meta-heuristic algorithms have gained widespread adoption for shop scheduling optimization due to their notable advantages: straightforward implementation, robust performance, rapid convergence characteristics, and seamless compatibility with other algorithmic frameworks.
Memetic algorithms (MAs) have gained significant attention for their effectiveness in tackling various NP-hard optimization problems, particularly in single-objective distributed scheduling problems (DSPs). For instance, Wang [17] developed an MA based on EDA to address DFSP and minimize makespan. To optimize the makespan in a two-stage DFSP, Zhang [18] integrated the social spider optimization method into an MA framework. Wang [19] further explored a cooperative bi-population MA that incorporated collaborative initialization and inter-population cooperation and intensified local search for minimizing makespan in distributed hybrid flow shop scheduling problems (DHFSPs). Zhang et al. [20] achieved makespan minimization by leveraging cooperation within MA.
Memetic algorithms (MAs) have also been widely applied to DFSPs. Deng and Wang [11] investigated a multi-objective DFSP aimed at minimizing both makespan and total tardiness, and developed a competitive MA employing two populations with distinct operators for each objective. Wang [19] examined an energy-focused variant targeting the reduction of energy consumption and makespan, introducing a collaborative MA guided by reinforcement learning policy agents. Shao [21] proposed a network-based MA to minimize total tardiness, overall production cost, and carbon emissions.
The rapid advancement of AI is fundamentally transforming operations across a multitude of fields. As a typical reinforcement learning algorithm originating from dynamic programming, Q-learning makes the best decision at each step to optimize the overall process. To address the uncertainty in assembly job shop scheduling and to enhance scheduling algorithms under various production environments, Q-learning has been widely integrated into different frameworks. For instance, in assembly job shop scheduling, a dual-loop framework based on Q-learning is proposed to cope with environmental uncertainty by self-learning [22]. In DFSP, Q-learning is combined with metaheuristics such as the fruit fly optimization algorithm to enhance neighborhood selection and improve solution quality [23]. For the studied DFSP variant with consistent sublots, the method employs a value-based RL method. It is coupled with the meta-heuristic to achieve adaptive operator selection [24].
In practical scenarios, decision-makers are often concerned not only with minimizing the makespan, but also with tardiness-related objectives. Tardiness-based objectives are particularly important in industries where late deliveries incur significant penalties or disrupt downstream processes. Cai [25] proposed two enhanced shuffled leapfrog algorithms (SLFA) for solving DHFSP in a multi-processor setting, aiming to minimize both total tardiness and the makespan simultaneously. Later, Li developed neighborhood-based heuristic to address a two-stage DHFSP with SDST, targeting reductions in total tardiness and makespan [12]. In addition, Lei [26] investigated a SFLA with memeplex partitioning in DHFSP. To address the dual objectives of makespan and maximum tardiness minimization, Lei [27] crafted a novel multi-class optimization approach based on the teaching–learning paradigm, which enhances search efficiency through inter-class interaction.
Few studies have treated tardiness-related objectives as the main focus in multi-objective optimization. Lei and Zheng [28] tackled HFSP with assembly operations and minimized total tardiness, maximum tardiness, and makespan with tardiness objectives regarded as key ones. In the DHFSP, which is more complex than the standard HFSP, it is therefore of great importance to develop effective algorithms that can simultaneously optimize total tardiness, the number of tardy jobs, and makespan.
In light of the above literature on DFSP and memetic algorithms, this study addresses the DHFSP with SDST, aiming to optimize makespan, total tardiness, and the number of tardy jobs prioritizing tardiness-related objectives. To tackle this problem, a multi-objective memetic algorithm combined with Q-learning is proposed (IMOMA-QL). The major contributions of this paper are as follows:
  • Hybrid initialization method—A mixed initialization strategy is proposed to simultaneously optimize total tardiness, the number of tardy jobs, and makespan, generating a high-quality and diverse population.
  • Multi-factory SB2OX crossover operator—The Similar Block 2-Point Order Crossover (SB2OX) is extended to a multi-factory context, leveraging structural similarity of job sequences to retain high-quality sub-sequences and enhance information exchange between factories.
  • Problem-specific neighborhood structures are developed to guide the search process toward more promising regions. Considering the optimization objectives and problem characteristics, these neighborhood structures effectively explore the solution space.
  • Q-learning-guided variable neighborhood search—A Q-learning strategy is introduced to adaptively choose the most effective neighborhood structure during the search process. The reward is designed based on the change in distance between the new and old solutions to their nearest Pareto front solution, encouraging moves that improve convergence toward the Pareto front while balancing intensification and diversification.
The paper is divided into the following sections. Section 2 formally describes DHFSP with SDST and presents its mathematical model. It also introduces the foundational framework of the memetic algorithm. Section 3 elaborates on the details of our proposed IMOMA, including its novel initialization, genetic operators, and the Q-learning-guided variable neighborhood search. Section 4 provides a comprehensive evaluation of IMOMA, including the experimental setup, sensitivity analysis, and comparisons with four other algorithms. Following this, Section 5 summarizes the main findings of this study, discusses its limitations, and suggests potential directions for future research.

2. Description of the Problem

2.1. Multi-Objective Optimization Problem

Many applied contexts involve simultaneously optimizing multiple conflicting objectives subject to a set of constraints. These are known as multi-objective optimization problems (MOPs). A mathematical model for an MOP is defined as follows:
min F ( x ) = f 1 ( x ) , f 2 ( x ) , ,   f m ( x ) , x D
where D denotes the decision space of the multi-objective optimization problem, D denotes the dimension of the decision space, m denotes the number of objectives included in the multi-objective optimization problem, x = X 1 , X 2 , , X i , , X D denotes the decision vector, and  X i denotes the i-th decision variable.
f i ( a ) f i ( b ) , i = 1 , 2 , , m f j ( a ) < f j ( b ) , j = 1 , 2 , , m
When solution a does not dominate solution b and solution b does not dominate solution a, then solution a and solution b are said to be non-dominated, and solution a and solution b are considered to have the same performance in MOPs.

2.2. Problem Definition

A DHFSP instance comprises a set of n jobs, denoted by J 1 , J 2 , , J n , and they need to complete s stages of processing, each of which is assigned to F factories and each of which is a hybrid flow shop with m parallel machines. Each operation then selects an appropriate machine in the assigned factory. Symmetry exists both in the processing sequence of adjacent jobs on the same machine and in the machine distribution across different factories. The optimization objectives are makepspan, total tardiness, and the number of tardy jobs. The notation used in this model is defined in Table 1.
The mathematical formulation is presented below.
M i n i m i s e j J T j = j J m a x { 0 , E j , k d j }
M i n i m i s e C max
M i n i m i s e j J U j
f C X j , f = 1 , j J
X j , f = m M i , f p P Y j , f , m , p , i , i I , j J , f N
j J Y j , f , m , p , i 1 , f N , m M i , f , p P
j J Y j , f , m , p , i j J Y j , f , m , p + 1 , i , f F , i I , m M i , f , p { 1 , , n 1 }
M E f , m , p = M S f , m , p + j J p j , i Y j , f , m , p , i , f N , i I , m M i , c
M S f , m , p + 1 M E f , m , p , f N , m M f , p { 1 , , n 1 }
M S f , m , 1 s j j , i M 1 Y j , f , m , p , i , f N , i I , m M f , p P
M S f , m , p S j , i + M 1 Y j , f , m , p , i , f N , i I , j J , m M i , f , p P
M S f , m , p S j , i M 1 Y j , f , m , p , i , f N , i I , j J , m M i , f , p P
E j , i = S j , i + p j , i , j J , i { 1 , m 1 }
E j , i S j , i + 1 , j J , i { 1 , k 1 }
M S f , m , p 0 , f N , m M f , p P
S j , i 0 , j J , i I
The objective functions (3)–(5) are to minimize the total tardiness, makespan, and the number of tardy jobs. Constraint (6) specifies that every job must be assigned to a single factory. Constraints (7)–(9) collectively govern machine–job assignment and processing sequence: each job is processed by exactly one machine per stage; no machine may process more than one job simultaneously; and job assignments must adhere to consecutive machine positions, ensuring previous slots are occupied. Constraints (10) and (11) define the start and finish times for each machine position. Constraint (12) incorporates setup time requirements. According to Constraint (13), the start time of the first job on any machine must account for the required setup time. Constraints (14) and (15) align machine positions with the corresponding job order. Constraint (16) defines a job’s completion time as the sum of its start time and processing duration. Sequential processing of jobs is enforced by Constraint (17), while Constraint (18) ensures that no job can start processing before time zero, which is equivalent to assuming all jobs have a release time of zero.
The layout of the distributed hybrid flow shop is shown in Figure 1.

2.3. Encoding and Decoding Methods

This paper utilizes a permutation-based coding scheme, where the solution is represented by factory vector F f = F 1 , , F c , , F f , with each vector corresponding to a specific factory, and  α = α 1 , , α j , , α n representing the job processing sequence for the first stage in each factory. In the decoding mechanism, a combination of “FIFO” (first in, first out) and “FMA” (first machine available) is employed.

2.4. A Simple Memetic Algorithm

The MA fuses global evolutionary search with dedicated local search within a population-based paradigm to effectively navigate the exploration–exploitation trade-off [29]. A simple MA typically iterates through the following phases: initial population generation, fitness evaluation, selection, crossover, mutation, local refinement, and population update. The key feature distinguishing MA from a standard genetic algorithm is the embedded local search procedure (Line 12), which intensively refines individuals to reach local optima within promising regions identified by the evolutionary process.

3. Improved Multi-Objective Memetic Alogorithm

3.1. Algorithm Procedure

The proposed multi-objective memetic algorithm (IMOMA) is composed of four main components. First, a hybrid initialization procedure adopts a combination of random generation and problem-specific heuristics to yield a diverse set of well-performing initial solutions. Second, the algorithm incorporates genetic operators, to perform population-based global exploration of the search space. Third, a Q-learning-guided multi-neighborhood search is applied, in which a reinforcement learning mechanism adaptively selects among multiple neighborhood structures to intensify the search and enhance convergence toward the Pareto front. The algorithm flowchart is illustrated in Figure A1 and the pseudocode is shown in Algorithm 1. The proposed IMOMA algorithm consists of three main components per iteration, each contributing to the overall computational complexity. The time complexity per iteration is O ( N · n + N 2 ) , where N denotes the population size and n represents the number of jobs. Given a maximum of T iterations, the overall time complexity of IMOMA is O T · ( N · n + N 2 ) .
Algorithm 1 Algorithm of IMOMA
Input: 
i n s t a n c e , d a t e , p a r a m e t e r s
Output: 
A p p r o x i m a t P a r e t o f r o n t
  1:
Initialize population P O P 0 ;
  2:
Set t=0;
  3:
while  c p u t i m e < c p u t i m e m a x  do
  4:
    for each i [ 1 , p o p S i z e / 2 ]  do
  5:
        selecting two parents p 1 , p 2 from P O P t by using the binary tournament selection;
  6:
        Generate offspring s 1 , s 2 by using Crossover Operator;
  7:
        Generate offspring s 1 , s 2 by using Mutation Operator;
  8:
        Merge s 1 , s 2 into P o p t
  9:
    end for
10:
    Apply VNS guided by Q-learning to P o p t to generate offspring population P o p t
11:
    Combine P o p t and P o p t into C o m b i n e d P o p
12:
    Perform non-dominated sorting on C o m b i n e d P o p
13:
    Select N best individuals to form P o p t + 1
14:
end while
15:
Output Pareto optimal solutions

3.2. Hybrid Initialization

High-quality initialization plays a pivotal role in multi-objective optimization algorithms for DHFSP by significantly influencing convergence speed, solution diversity, and computational efficiency. A well-designed initialization strategy reduces the algorithm’s exploration burden.
To address the multi-objective characteristics of the DHFSP—specifically the simultaneous minimization of makespan, total tardiness, and number of tardy jobs—this research develops a hybrid initialization approach. The proposed methodology combines targeted heuristic generation with stochastic exploration mechanisms, ensuring both high-quality initial solutions and adequate population diversity.
This paper introduces six initialization methods based on the LPT (Longest Processing Time) and EDD (Earliest Due Date) rules, combined with randomization, to enhance both the quality and diversity of the initial population.
Method 1: Generate a job vector in descending order of the total processing time (LPT), and then iteratively insert job into each factory in the order of the job vector to minimize the maximum completion time.
Method 2: Generate a job vector in ascending order of the due date (EDD), and then iteratively insert job into each factory in the order of the job vector to minimize the total tardiness.
Method 3: Generate a job vector in ascending order of the due date (EDD), and then iteratively insert job into each factory in the order of the job vector to minimize the number of tardy job.
Method 4: Randomly generate a job vector, and then iteratively insert job into each factory in the order of the job vector to minimize the maximum completion time.
Method 5: Randomly generate a job vector, and then iteratively insert job into each factory in the order of the job vector to minimize the total tardiness.
Method 6: Randomly generate a job vector, and then iteratively insert job into each factory in the order of the job vector to minimize the number of tardy job.
Methods 1–3 each generate one solution, while methods 4–6 generate the rest of the population in the ratio 2:2:1. Specifically, Method 4 and Method 5 each generate 2 ( N 3 ) / 5 solutions, and Method 6 generates ( N 3 ) 2 × 2 ( N 3 ) / 5 solutions to ensure the total population size is N. The random seed for the initialization is set to 2020.

3.3. Selection

Use binary tournament selection to select two parents. The steps are as follows:
  • Randomly select two individuals: Two candidate solutions are chosen at random from the existing population.
  • Selection of better individuals into the next generation.
    (1)
    If a dominates b, choose a.
    (2)
    If b dominates a, choose b.
    If both individuals are non-dominated, one of the two individuals is chosen at random.
  • Repeat until the next generation is filled.
The flowchart of the selection process is shown in Figure 2.

3.4. Genetic Operator

SB2OX solves flow shop problems that take into account SDST [30], but it is only used to solve the flow shop scheduling problem for a single factory.
This article proposes crossover operators based on SB2OX that can be adapted to distributed flow shops. Two vectors are used to represent the scheduling sequences of the two parents.
parent 1 = x 1 1 , x 2 1 , , x n 1 1 ; x 1 2 , x 2 2 , , x n 2 2 ; ; x 1 F , x 2 F , , x n F F , parent 2 = x 1 1 , x 2 1 , , x n 1 1 ; x 1 2 , x 2 2 , ; x n 2 2 ; , x 1 F , x 2 F , , x n F F . The superscript x denotes the factory, and the subscript indicates the position in the scheduling sequence of that factory.
Step 1: Both parents check on a position-by-position basis. This step exclusively transfer identical blocks containing at least two consecutive matching jobs from both parents directly to their offspring. [ x p c 1 , , x q c 1 ] = [ x p c 2 , , x q c 2 ] . In particular, these blocks that are retained in Parent 1 and Parent 2 are not necessarily in the same factory. An identical block in Parent 1 is processed in factory c1 and Parent 2 is processed in factory c2; Child 1 will retain the block in the same location in factory c1 and Child 2 will retain the block in the same location in factory c2.
Step 2: Two cut points are randomly selected in the scheduling sequence of each factory of the two parents, and the vector of each factory of Child 1 retains all the intercepts of cut point 1 and cut point 2 of Parent 1 and maintains their original positions. Child 2 is generated from Parent 2 using the same principle.
Step 3: In the final step, the missing element will be copied in the relative order of the other parent. In this step, the exchange of parental information is efficiently accomplished and the reassignment of jobs between factories is realized.
Nevertheless, relying exclusively on crossover operations proves inadequate. To enhance population diversity, a random swap mutation (RSM) mechanism is additionally implemented.
RSM: Randomly select two jobs in the sequence and swap their positions.
An illustration of the crossover process is shown in Figure 3.

3.5. Problem-Specific Neighborhood Structures

To address the DHFSP, the designed neighborhood structures generate alternative feasible solutions by modifying the current ones. The effectiveness of these structures has a significant influence on both the solution quality and the computational efficiency of the algorithm. Well-designed neighborhoods can guide the search process toward more promising regions. Hence, in order to improve the performance of IQLMA, six tailored neighborhood structures are proposed to produce superior solutions.
NS1: Randomly select a job from the factory with the largest maximum completion time and insert the jobs into another position of the same factory.
NS2: Randomly select a job from the factory with the largest maximum completion time and exchange the job with another job in the same factory.
NS3: Randomly select a tardy job that does not have the largest tardiness, and exchange that job with all other jobs that have a greater tardiness.
NS4: Randomly select a tardy job and insert it in the more forward position of all the factory positions.
NS5: Randomly select a job in a factory with the largest total tardiness and insert the job into other locations in the same factory.
NS6: Randomly select a tardy job in a factory with the largest number of tardy jobs and insert the job into other locations in the same factory.

3.6. Variable Neighborhood Search with Q-Learning

Q-learning operates by maintaining and updating a state–action value table (Q-table), which stores estimated cumulative rewards to derive an optimal policy [31]. As a model-free algorithm, it has been employed to tackle a range of scheduling problems [23,32]. Figure 4 illustrates this interaction process. The Q-table is updated according to the following formula:
Q ( s , a ) = Q ( s , a ) + α R + γ max Q s , a Q ( s , a )
α and γ are the learning rate and discount factor, respectively.
The agent selects actions according to Q-values stored in the Q-table. This work initializes the Q-table with zeros, as shown in Figure 5, to indicate that the agent co-initializes the Q-table with zeros. Actions are selected using an ϵ -greedy strategy that maximizes the expected reward while maintaining exploration. A random number p ∈ [0, 1] determines the selection: if p <  ϵ , the action with the maximum Q value is selected (exploitation); otherwise, a random action is chosen (exploration). The Q-table is iteratively updated through this action–state–reward cycle. An example of a Q-table update is provided in Appendix A.3.
Q-learning, when integrated with neighborhood search, plays the role of adaptively selecting the most promising neighborhood structures based on feedback from the search performance, rather than relying on a fixed or predetermined local search method. This reinforcement learning mechanism allows the search process to dynamically focus on neighborhoods that are more likely to yield improvements, thereby enhancing both convergence speed and solution quality. In this study, the six neighborhood structures described earlier are defined as the action set in the Q-learning framework, where each action corresponds to applying one specific neighborhood to generate new solutions.
In this paper, the state is determined according to the three objective values of each solution. First, the current population is sorted in ascending order of makespan. The top 30% of solutions with the smallest makespan are assigned to State 1. The remaining solutions are then sorted in ascending order of total tardiness, and the top 30% of these solutions are assigned to State 2. The remaining solutions are assigned to State 3.
This study employs a delayed reward mechanism based on population cooperation to train the Q-learning strategy. In each generation, all individuals conduct local search following the current strategy. Upon completion of the local search for the entire population, non-dominated sorting is applied to obtain an updated Pareto front. After obtaining the updated Pareto front via non-dominated sorting, we compute the minimum Euclidean distance from every solution to this front both before and after its local search. This signal is utilized to update the shared Q-learning strategy. Let x be the original solution to be replaced, and y be the candidate solution generated by the local search. A post is the non-dominated solution set obtained from the population after applying the variable neighborhood local search, inserting the new solution, and removing dominated solutions. The Euclidean distance from a solution z to A post is defined as
d ( z , A post ) = min a A post i = 1 m f i ( z ) f i ( a ) 2 ,
f i ( z ) denotes the objective value of z on the i-th objective, with m representing the total number of objectives. The reward r t is then given by
r t = 1 , if d ( y , A post ) < d ( x , A post ) , 0 , otherwise .
This binary reward provides a positive signal whenever the candidate solution is closer to the updated Pareto front than the original solution, thereby encouraging the selection of neighborhood structures that improve convergence towards the Pareto front.
The Q-learning-guided local search is illustrated in Algorithm 2.
Algorithm 2 Variable Neighborhood Search using Q-learning
Input: 
P O P t (current population), P o p S i z e t (population size), Q-table, s i (state of the i-th individual), a i (action taken by the i-th individual)
Output: 
P o p t , Q-table
  1:
S 1 S P o p S i z e ← Ranking the three objective values for all individuals in the population yields the state of each individual.
  2:
for each i in P o p S i z e t  do
  3:
      a i ← select an action via the ϵ -greedy strategy and Q-table
  4:
      P o p t ( i ) ←Apply action a i to the i-th individual ( P o p t ( i ) ) in the population
  5:
end for
  6:
P o p t ← Combine P o p t ( 1 ) t o P o p t ( P o p S i z e ) ;
  7:
A p o s t ← Non-dominated sorting  ( P o p t a n d P o p t )
  8:
for each i in P o p S i z e t  do
  9:
     For each individual P o p t ( i ) and P o p t ( i ) :
10:
     for  individual { P o p t ( i ) , P o p t ( i ) }  do
11:
          dist
12:
         for each solution x A p o s t  do
13:
              curr _ dist individual x 2 {Euclidean distance}
14:
             if  curr _ dist < dist  then
15:
                  dist curr _ dist
16:
             end if
17:
         end for
18:
     end for
19:
     Record the minimum distance for this individual
20:
     Check if d after < d before to evaluate the improvement.
21:
     Obtain reward r and next state s i
22:
     Update Q-table: Q ( s i , a i ) Q ( s i , a i ) + α [ r + γ max a Q ( s i , a ) Q ( s i , a i ) ]
23:
end for

3.7. Non-Dominated Order and Elitism Strategy

For fast non-dominated ordering of the merged populations, low rank solutions are preferred to high rank solutions. If two solutions have the same rank, the one with the greater crowding distance is selected over the one with the smaller value. The population of the next generation of a given size is selected according to the above principle. In the iterative process, the population of the next generation is merged with the population of the parent. The merged population is then subjected to a fast non-dominated ordering.

4. Experimental Comparison and Analysis

4.1. Experiment Setting

All experiments were conducted in MATLAB R2022b (64-bit) with the Optimization Toolbox and Global Optimization Toolbox enabled. The operating system was Windows 11 (Version 22H2, 64-bit). All computations were performed on a desktop computer equipped with an Intel Core i7-12700K.
This article generates instances by following the design described by Sun [33]. (F) makes up each instance, in which N = 50, 100, 150, 200, F = 2, 3, 4, 5, 6, and S = 2, 4, 6, 8, 10. The processing time of each job at each stage and the sequence-dependent setup time are randomly generated within the range [1, 99]. The number of identical parallel machines are randomly generated within the range [1, 5]. The random seed is set to 2025. In total, there are 4 × 5 × 5 = 100 instances. The CPU time per instance run is set to 0.08 × FN × JN × S s.
Equations (22) and (23) [11] establish the due date for each job in each instance in the mathematical model provided in this study.
D j = P j × ( 1 + 3 × rand ( 0 , 1 ) )
P j = i = 1 S p i , j

4.2. Experimental Indicators

To evaluate the behavior of IMOMA, two performance metrics were used in the experiments.
The hypervolume (HV) represents the volume of the region bounded by the non-dominated solution set produced by the algorithm in the objective space and the reference points. The reference point is set to (1.2, 1.2, 1.2). A higher HV value indicates better overall performance. HV is calculated as follows.
H V = T i = 1 P F v i
The Lebesgue measure is defined by T ( ) , while v i denotes the hypervolume bounded by the reference point and the non-dominated solution set.
I G D = x P F d ( x , z ) P F
d ( x , z ) represents the minimum Euclidean distance from an individual x in P F to an individual z in PF. The notation | P F | denotes the the number of solutions.

4.3. Parameter Calibration

This section calibrates the main parameters of the algorithm, including population size, discount factor, greedy rate, crossover probability, and mutation probability. The parameters were calibrated using the design of experiments method. An exhaustive analytic factorization experiment is carried out on each parameter. The levels of each parameter were P S = {60, 80, 100}, p c = {0.7, 0.8, 0.9}, p m = {0.2, 0.4, 0.6}, γ = {0.7, 0.8, 0.9}, ϵ = {0.7, 0.8, 0.9}. The five key parameters were analyzed using the Taguchi method of experiment (DOE) using orthogonal array L 18 3 7 , which consists of 18 different combinations of P S , p c , p m .
Figure 6 drafts the main effects diagram of three parameters of IMOMA, so the optimal parameter configuration is identified as follows: P S = 80, p c = 0.7, p m = 0.2, γ = 0.8, ϵ = 0.8.
To further investigate the influence of key parameters on the performance of the proposed IMOMA algorithm, an extended sensitivity analysis was conducted. This analysis adopts a one-factor-at-a-time (OFAT) approach to observe the individual effect of each parameter clearly. During the test of each target parameter, the remaining parameters were fixed at their empirically determined optimal baseline values. The five target parameters were tested across three levels each. The analysis was performed on three representative instances of different scales selected from the benchmark set: a small-scale instance (F = 2, n = 50, s = 2), a medium-scale instance (F = 4, n = 150, s = 4), and a large-scale instance (F = 6, n = 150, s = 6).
The experimental results are shown in Figure 7. Based on the extended sensitivity analysis, the mutation probability ( p m ) exhibits the most pronounced influence on algorithm performance, where increased values consistently lead to degradation across all tested instances, firmly validating the optimal baseline setting of p m = 0.2 . The population size ( P S ) shows moderate sensitivity, with its optimal value shifting slightly with the problem scale, though the baseline P S = 80 remains robust. The Q-learning discount factor ( γ ) demonstrates a moderate level of sensitivity, performing optimally near its baseline of γ = 0.8 . In contrast, the crossover probability ( p c ) and the Q-learning exploration rate ( ϵ ) exhibit relatively low sensitivity within the tested ranges, confirming the stability of their baseline settings ( p c = 0.7 , ϵ = 0.8 ). The overall low-to-moderate sensitivity of the Q-learning hyperparameters ( γ and ϵ ) indicates that the proposed Q-learning-guided local search module possesses good robustness, as its performance does not critically depend on their precise tuning. Overall, the results strongly justify the selected parameter configuration as a reliable default, while indicating that fine-tuning efforts for future applications should primarily focus on p m and P S , particularly when addressing problems of substantially different scales.

4.4. Effectiveness of Each Improvement Component of IMOMA-QL

To examine the effectiveness of each component, IMOMA-QL was compared with four variant versions in which a specific component was removed. These variants are IMOMA-QL without the hybrid initialization strategy (denoted as IMOMA-QL1), IMOMA-QL without the genetic operators (denoted as IMOMA-QL2), IMOMA-QL without the multi-neighborhood search (denoted as IMOMA-QL3), and IMOMA-QL without the Q-learning selection mechanism (denoted as IMOMA-QL4). Specifically, IMOMA-QL1 adopts random population initialization instead of the hybrid method; IMOMA-QL2 and IMOMA-QL3, respectively, remove the genetic operators and the multi-neighborhood search while retaining the rest of the algorithm; and IMOMA-QL4 replaces the Q-learning mechanism with a random local search. To ensure a fair comparison, all algorithmic parameters for these variants remained consistent with those of the original IMOMA-QL. All algorithms were executed independently 10 times on the test instances. The reference Pareto front was constructed from the combined non-dominated solutions of all compared algorithms.
The experimental results systematically evaluate performance across different problem scales by grouping the 100 instances according to the number of factories F, jobs n, and stages s. The average HV and IGD values are presented in Table 2 and Table 3. The complete experimental data for each instance are presented in Appendix A, Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9 and Table A10. IMOMA-QL consistently achieved the best performance across all instances.
Figure 8 and Figure 9 display interval plots with 95% confidence intervals for HV and IGD metrics across all instances, comparing IMOMA-QL with its four variants. The results demonstrate that each component of IMOMA-QL contributes to performance improvements at varying degrees.
The Friedman test rankings with 95% confidence intervals are presented in Table 4. IMOMA-QL achieved first-place rankings against all four variants with statistically significant improvements (all p-values < 0.05), demonstrating that each proposed enhancement contributes substantially to its superior performance. The results indicate that removing any of these components leads to a clear performance drop, demonstrating the necessity of each component.

4.5. Comparison of IMOMA and Other Algorithms

We select four comparative algorithms: IMPGA [34], MQSFLA [25], MOEA/D [35], and NSGA-II [36]. The first two are recent algorithms specifically designed for DHFSP, which is the same problem domain studied in this work. IMPGA minimizes both makespan and total tardiness—two of the three objectives optimized in our paper—and is composed of multiple populations that co-evolve in sub-regions, a greedy inter-factory job insertion neighborhood structure for local search, and a probability-sampling-based re-initialization procedure. MQSFLA also targets the minimization of makespan and total tardiness, sharing a highly similar objective set with our work. It incorporates a memeplex quality measurement mechanism, a search process guided by solution quality, and a novel memeplex shuffling that dynamically selects memeplexes based on evolution quality. The latter two, NSGA-II and MOEA/D, are classic multi-objective optimizers widely used in scheduling problems. NSGA-II employs non-dominated sorting and crowding distance to preserve solution diversity; in contrast, MOEA/D converts the multi-objective problem into a set of scalar subproblems. For IMPGA and MQSFLA, we directly adopt the parameter settings reported in their respective original papers. For NSGA-II and MOEA/D, we set the population size to 50, crossover probability pc to 0.7, and mutation probability pm to 0.2. Together, these algorithms provide diverse and strong benchmarks for evaluating our proposed method.
Each comparison algorithm was performed independently 10 times on each instance. The average HV and IGD values are presented in Table 5 and Table 6. The complete experimental data for each instance are presented in Appendix A, Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9 and Table A10. IMOMA-QL consistently achieved the best performance across all instances.
Figure 10 and Figure 11 display interval plots with 95% confidence intervals for HV and IGD metrics across all instances, comparing IMOMA-QL with its comparative algorithms. As evidenced by the figure, IMOMA-QL outperforms all other algorithms, yielding superior results in both HV (higher) and IGD (lower) metrics. To validate the superior performance of the proposed algorithm, the Friedman test rankings with 95% confidence intervals are presented in Table 7. It can be seen that IMOMA-QL outperforms the other comparison algorithms.
Figure 12 displays the 3D scatter plots representing the non-dominated set obtained from five distinct methods applied to one of the problems. We found that the non-dominated solution sets obtained by the respective algorithms form different layers. The solutions obtained by IMOMA are close to the point at which all target values are lowest. It was clearly found that the non-dominated set obtained by IMOMA-QL is better than the non-dominated set obtained by the other algorithms.
The superior performance of IMOMA-QL can be attributed to four key algorithmic innovations: Specifically, the hybrid initialization strategy improves the diversity and quality of the initial population, providing a better starting point for a search. The genetic operators enhance global exploration and recombination of high-quality solutions. The multi-neighborhood search mechanism diversifies local search patterns, improving the chance of escaping local optima. Finally, the Q-learning-guided variable neighborhood search adaptively selects promising neighborhoods based on search feedback, further enhancing search efficiency.

5. Conclusions

In the field of multi-objective optimization for tardiness-related scheduling problems, most existing studies focus on optimizing makespan along with only one tardiness-related objective. This study addresses DHFSP with SDST, optimizing three critical objectives: makespan, total tardiness, and the number of tardy jobs. This work emphasizes tardiness-related objectives, which are crucial in real-world manufacturing scenarios where meeting the due date is essential—such as just-in-time production, order-driven manufacturing, and supply chain scheduling with strict delivery commitments. To solve this problem, we propose a multi-objective memetic algorithm enhanced with Q-learning-guided variable neighborhood search (VNS). Extensive numerical experiments and comparisons with algorithms demonstrate that the proposed method significantly improves solution quality, convergence speed, and robustness in handling multi-objective DHFSP.
Despite the promising results, this study has certain limitations. The proposed model and algorithm operate under a set of standardized assumptions, such as deterministic processing times and static job availability. Consequently, they cannot be directly applied to real-world scheduling environments.
Future research can extend this work in several meaningful directions. A promising avenue is to investigate more comprehensive and environmentally conscious multi-objectives. For example, meaningful problem variations to explore could include minimizing makespan, total tardiness, and total energy consumption, or alternatively, minimizing makespan and maximizing tardiness and total energy consumption. Addressing such integrated problems would require developing new models that capture the energy dynamics of machines and designing efficient algorithms capable of balancing productivity.

Author Contributions

Resource provision, Y.S.; project administration, Y.S. and Y.L.; data curation and management, Y.L.; figure and visualization, Y.L. and Q.C.; original draft writing, Y.L.; software implementation, X.S.; result validation, X.S. and Q.C.; research supervision, X.S.; study conception and design, H.K.; methodology development, H.K.; formal analysis, H.K.; manuscript revision and editing, H.K. and X.S.; investigation, Q.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Open Foundation of Key Laboratory of Software Engineering of Yunnan Province: 2020SE308, Open Foundation of Key Laboratory of Software Engineering of Yunnan Province: 2020SE309, New round of Double First-class Project of Yunnan University: CY22624103, National Natural Science Foundation of China: 62366057, Special Fund for the Central Government to Guide Local Science: 202407AB110003, and Key Research and Development Program of Yunnan Province: 202402AA310056.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Detailed Experimental Results

Table A1. HV of the proposed algorithm and its four variants (F = 2).
Table A1. HV of the proposed algorithm and its four variants (F = 2).
F = 2
(n, s)IMOMA-QLIMOMA-QL1IMOMA-QL2IMOMA-QL3IMOMA-QL4
(50, 2)1.201.100.890.961.08
(50, 4)1.241.081.001.051.15
(50, 6)1.241.150.971.021.15
(50, 8)1.161.130.921.031.08
(50, 10)1.181.100.880.951.07
(100, 2)1.261.161.001.011.15
(100, 4)1.171.110.940.961.08
(100, 6)1.181.110.850.941.07
(100, 8)1.211.140.961.021.14
(100, 10)1.161.070.880.981.06
(150, 2)1.251.090.961.011.11
(150, 4)1.271.191.031.091.14
(150, 6)1.201.090.891.031.06
(150, 8)1.221.130.951.091.10
(150, 10)1.251.201.001.101.12
(200, 2)1.261.170.981.031.16
(200, 4)1.251.220.991.091.16
(200, 6)1.171.100.930.991.08
(200, 8)1.171.080.971.051.09
(200, 10)1.181.130.921.021.11
Table A2. HV of the proposed algorithm and its four variants (F = 3).
Table A2. HV of the proposed algorithm and its four variants (F = 3).
F = 3
(n, s)IMOMA-QLIMOMA-QL1IMOMA-QL2IMOMA-QL3IMOMA-QL4
(50, 2)1.241.141.021.031.13
(50, 4)1.191.160.980.971.10
(50, 6)1.131.070.910.941.02
(50, 8)1.171.111.020.971.11
(50, 10)1.131.110.960.941.05
(100, 2)1.191.050.960.991.06
(100, 4)1.161.100.990.981.08
(100, 6)1.131.060.860.941.02
(100, 8)1.181.140.960.971.12
(100, 10)1.201.171.001.011.12
(150, 2)1.291.181.041.041.11
(150, 4)1.151.100.940.961.02
(150, 6)1.251.160.991.001.08
(150, 8)1.211.111.051.061.07
(150, 10)1.231.181.031.031.11
(200, 2)1.171.080.940.951.04
(200, 4)1.171.131.000.971.07
(200, 6)1.181.080.920.981.06
(200, 8)1.151.080.950.951.07
(200, 10)1.211.131.041.051.14
Table A3. HV of the proposed algorithm and its four variants (F = 4).
Table A3. HV of the proposed algorithm and its four variants (F = 4).
F = 4
(n, s)IMOMA-QLIMOMA-QL1IMOMA-QL2IMOMA-QL3IMOMA-QL4
(50, 2)1.291.171.051.091.22
(50, 4)1.201.131.031.091.18
(50, 6)1.151.040.891.001.08
(50, 8)1.181.160.971.061.15
(50, 10)1.131.070.890.981.11
(100, 2)1.191.070.921.011.10
(100, 4)1.131.090.941.021.11
(100, 6)1.111.000.880.971.06
(100, 8)1.121.010.930.991.06
(100, 10)1.171.130.951.081.13
(150, 2)1.201.130.921.051.07
(150, 4)1.171.070.911.061.10
(150, 6)1.241.120.961.091.06
(150, 8)1.131.030.900.941.04
(150, 10)1.211.120.941.071.13
(200, 2)1.221.120.961.091.15
(200, 4)1.141.100.961.051.10
(200, 6)1.131.050.930.961.09
(200, 8)1.161.140.961.001.13
(200, 10)1.211.121.021.091.15
Table A4. HV of the proposed algorithm and its four variants (F = 5).
Table A4. HV of the proposed algorithm and its four variants (F = 5).
F = 5
(n, s)IMOMA-QLIMOMA-QL1IMOMA-QL2IMOMA-QL3IMOMA-QL4
(50, 2)1.331.251.011.121.20
(50, 4)1.231.160.981.041.17
(50, 6)1.251.190.961.101.14
(50, 8)1.191.140.981.021.10
(50, 10)1.251.171.031.041.17
(100, 2)1.231.130.941.021.14
(100, 4)1.271.181.001.111.14
(100, 6)1.191.110.931.001.09
(100, 8)1.251.150.981.041.15
(100, 10)1.141.070.831.081.06
(150, 2)1.231.070.930.971.07
(150, 4)1.261.161.011.081.14
(150, 6)1.241.140.971.031.08
(150, 8)1.251.150.991.111.14
(150, 10)1.161.060.891.021.03
(200, 2)1.241.130.981.041.11
(200, 4)1.261.200.971.121.21
(200, 6)1.261.191.001.091.16
(200, 8)1.241.161.001.101.18
(200, 10)1.201.150.951.041.14
Table A5. HV of the proposed algorithm and its four variants (F = 6).
Table A5. HV of the proposed algorithm and its four variants (F = 6).
F = 6
(n,s)IMOMA-QLIMOMA-QL1IMOMA-QL2IMOMA-QL3IMOMA-QL4
(50, 2)1.281.231.031.111.23
(50, 4)1.251.151.001.131.18
(50, 6)1.161.060.871.011.05
(50, 8)1.211.190.941.081.15
(50, 10)1.171.120.921.031.11
(100, 2)1.321.181.021.091.19
(100, 4)1.191.080.931.031.10
(100, 6)1.141.050.850.961.03
(100, 8)1.231.171.011.071.16
(100, 10)1.121.070.871.001.07
(150, 2)1.221.090.920.991.06
(150, 4)1.301.201.051.121.16
(150, 6)1.251.120.921.061.10
(150, 8)1.181.100.901.001.07
(150, 10)1.211.120.921.041.09
(200, 2)1.201.080.931.011.10
(200, 4)1.251.211.001.121.20
(200, 6)1.161.090.900.961.07
(200, 8)1.251.061.041.071.10
(200, 10)1.241.161.001.011.06
Table A6. IGD of the proposed algorithm and its four variants (F = 2).
Table A6. IGD of the proposed algorithm and its four variants (F = 2).
F = 2
(n, s)IMOMA-QLIMOMA-QL1IMOMA-QL2IMOMA-QL3IMOMA-QL4
(50, 2)0.1160.1280.1530.1570.121
(50, 4)0.1100.1330.1600.1570.127
(50, 6)0.0730.0990.1530.1520.103
(50, 8)0.0600.0990.1270.0930.065
(50, 10)0.1290.1340.1430.1520.148
(100, 2)0.1380.1520.1770.1820.165
(100, 4)0.1210.1340.1650.1590.129
(100, 6)0.0690.0810.1170.1060.072
(100, 8)0.0900.1030.1600.1400.095
(100, 10)0.1000.1240.1820.1430.137
(150, 2)0.1230.1760.1930.2020.173
(150, 4)0.1320.1480.1760.1820.159
(150, 6)0.0870.1230.1360.1580.115
(150, 8)0.0930.1350.1340.1290.127
(150, 10)0.1020.1230.1510.1430.129
(200, 2)0.1320.1370.1610.1670.148
(200, 4)0.0770.0840.1520.1430.097
(200, 6)0.1050.1220.1630.1590.123
(200, 8)0.0830.1030.1220.1280.111
(200, 10)0.1170.1320.1670.1520.142
Table A7. IGD of the proposed algorithm and its four variants (F = 3).
Table A7. IGD of the proposed algorithm and its four variants (F = 3).
F = 3
(n, s)IMOMA-QLIMOMA-QL1IMOMA-QL2IMOMA-QL3IMOMA-QL4
(50, 2)0.1100.1310.1770.1820.155
(50, 4)0.1080.1440.1690.1620.132
(50, 6)0.0700.0950.1440.1330.093
(50, 8)0.0940.1400.1610.1560.121
(50, 10)0.1080.1400.1780.1680.136
(100, 2)0.0820.1070.1570.1650.130
(100, 4)0.1030.1300.1770.1810.151
(100, 6)0.0720.0980.1610.1490.098
(100, 8)0.0640.0710.1030.0930.078
(100, 10)0.0710.0780.1380.1410.075
(150, 2)0.1230.1550.1750.1810.150
(150, 4)0.0830.1190.1540.1570.110
(150, 6)0.0670.1140.1760.1500.112
(150, 8)0.0500.0870.1490.1230.107
(150, 10)0.1090.1320.1450.1620.142
(200, 2)0.1170.1360.1560.1650.168
(200, 4)0.1150.1320.1610.1530.161
(200, 6)0.0760.1020.1870.1510.113
(200, 8)0.0500.0750.1210.1340.100
(200, 10)0.0620.0900.1650.1660.083
Table A8. IGD of the proposed algorithm and its four variants (F = 4).
Table A8. IGD of the proposed algorithm and its four variants (F = 4).
F = 4
(n, s)IMOMA-QLIMOMA-QL1IMOMA-QL2IMOMA-QL3IMOMA-QL4
(50, 2)0.1220.1570.1760.1650.131
(50, 4)0.1150.1530.1680.1700.120
(50, 6)0.1010.1370.1860.1500.110
(50, 8)0.1070.1490.1630.1390.130
(50, 10)0.0820.1220.1630.1400.085
(100, 2)0.0970.1300.1760.1540.131
(100, 4)0.1070.1570.1780.1570.108
(100, 6)0.0930.1380.1810.1270.112
(100, 8)0.0530.0860.1180.1290.070
(100, 10)0.0690.0910.1060.1390.080
(150, 2)0.1240.1360.1660.1420.135
(150, 4)0.1050.1270.1430.1400.131
(150, 6)0.0600.1030.1240.0910.099
(150, 8)0.0970.1450.1370.1390.125
(150, 10)0.1140.1420.1760.1790.130
(200, 2)0.1070.1330.1690.1550.121
(200, 4)0.1300.1460.1630.1710.153
(200, 6)0.1010.1250.1670.1520.132
(200, 8)0.1160.1380.1710.1640.147
(200, 10)0.1050.1220.1500.1420.108
Table A9. IGD of the proposed algorithm and its four variants (F = 5).
Table A9. IGD of the proposed algorithm and its four variants (F = 5).
F = 5
(n, s)IMOMA-QLIMOMA-QL1IMOMA-QL2IMOMA-QL3IMOMA-QL4
(50, 2)0.1020.1340.1510.1410.113
(50, 4)0.1050.1260.1650.1380.127
(50, 6)0.0600.0890.1340.1200.073
(50, 8)0.1230.1590.1800.1690.154
(50, 10)0.0990.1370.1370.1240.125
(100, 2)0.0820.1140.1060.1130.105
(100, 4)0.0790.1010.1400.1470.084
(100, 6)0.0880.1130.1560.1460.090
(100, 8)0.0700.0830.1050.1010.075
(100, 10)0.0610.0980.1240.1280.093
(150, 2)0.1340.1720.1600.1960.153
(150, 4)0.0990.1140.1620.1570.118
(150, 6)0.1150.1540.1740.1560.125
(150, 8)0.1110.1440.1500.1330.118
(150, 10)0.0990.1210.1540.1370.134
(200, 2)0.1400.1410.1780.1780.152
(200, 4)0.1230.1480.1810.1700.147
(200, 6)0.0730.0890.1210.0980.089
(200, 8)0.0760.0970.1290.1110.109
(200, 10)0.1270.1490.1680.1580.152
Table A10. IGD of the proposed algorithm and its four variants (F = 6).
Table A10. IGD of the proposed algorithm and its four variants (F = 6).
F = 6
(n,s)IMOMA-QLIMOMA-QL1IMOMA-QL2IMOMA-QL3IMOMA-QL4
(50, 2)0.1550.1600.1780.1760.167
(50, 4)0.1190.1350.1720.1630.139
(50, 6)0.0880.1030.1670.1490.119
(50, 8)0.0960.1190.1180.1530.125
(50, 10)0.1040.1200.1580.1680.136
(100, 2)0.0960.1070.1510.1410.114
(100, 4)0.1370.1470.1820.2110.163
(100, 6)0.0750.0820.1480.1090.077
(100, 8)0.050.0720.0850.1100.052
(100, 10)0.0610.0630.1300.1050.066
(150, 2)0.1150.1280.1260.1570.120
(150, 4)0.1180.1220.1850.1880.132
(150, 6)0.0580.0660.1360.1280.074
(150, 8)0.1000.1210.1640.1480.011
(150, 10)0.110.1080.1520.1770.141
(200, 2)0.1300.1350.1720.1830.159
(200, 4)0.1290.1400.1860.1790.137
(200, 6)0.1000.1180.1610.1350.126
(200, 8)0.0980.1020.1550.1700.101
(200, 10)0.1100.1250.1820.1820.125
Table A11. HV of the proposed algorithm and the comparison algorithms (F = 2).
Table A11. HV of the proposed algorithm and the comparison algorithms (F = 2).
F = 2
(n,s)IMOMA-QLIMPGAMQSLFAMOEA/DNSGA-II
(50, 2)1.290.971.000.720.88
(50, 4)1.180.960.900.580.71
(50, 6)1.180.920.880.620.74
(50, 8)1.240.980.940.660.67
(50, 10)1.270.960.980.750.73
(100, 2)1.270.950.980.660.81
(100, 4)1.330.920.950.760.82
(100, 6)1.250.990.930.660.71
(100, 8)1.150.870.850.560.67
(100, 10)1.180.850.960.590.66
(150, 2)1.300.991.010.640.79
(150, 4)1.180.980.880.520.64
(150, 6)1.340.900.910.660.80
(150, 8)1.281.010.880.610.83
(150, 10)1.220.950.970.670.78
(200, 2)1.280.951.070.730.82
(200, 4)1.280.990.950.620.81
(200, 6)1.290.960.980.700.80
(200, 8)1.261.030.950.760.78
(200, 10)1.250.861.010.630.73
Table A12. HV of the proposed algorithm and the comparison algorithms (F = 3).
Table A12. HV of the proposed algorithm and the comparison algorithms (F = 3).
F = 3
(n,s)IMOMA-QLIMPGAMQSLFAMOEA/DNSGA-II
(50, 2)1.271.010.980.740.84
(50, 4)1.220.940.850.660.67
(50, 6)1.240.980.930.650.73
(50, 8)1.150.880.830.620.78
(50, 10)1.310.970.990.720.72
(100, 2)1.210.951.040.720.79
(100, 4)1.291.000.910.670.81
(100, 6)1.200.900.950.620.77
(100, 8)1.170.820.830.570.72
(100, 10)1.190.820.940.630.63
(150, 2)1.190.880.970.680.75
(150, 4)1.140.950.860.650.77
(150, 6)1.341.070.950.740.87
(150, 8)1.180.970.950.720.77
(150, 10)1.260.850.880.580.71
(200, 2)1.270.950.990.690.84
(200, 4)1.220.880.830.590.69
(200, 6)1.250.921.000.710.68
(200, 8)1.250.970.880.640.75
(200, 10)1.120.810.900.590.78
Table A13. HV of the proposed algorithm and the comparison algorithms (F = 4).
Table A13. HV of the proposed algorithm and the comparison algorithms (F = 4).
F = 4
(n,s)IMOMA-QLIMPGAMQSLFAMOEA/DNSGA-II
(50, 2)1.250.980.870.780.75
(50, 4)1.140.950.840.700.60
(50, 6)1.240.960.900.630.71
(50, 8)1.160.870.800.680.78
(50, 10)1.330.940.900.660.72
(100, 2)1.190.940.960.690.77
(100, 4)1.281.040.910.610.84
(100, 6)1.170.930.840.540.80
(100, 8)1.150.860.740.550.71
(100, 10)1.170.820.910.550.58
(150, 2)1.150.880.980.610.73
(150, 4)1.140.920.870.630.77
(150, 6)1.251.030.930.730.83
(150, 8)1.210.980.870.710.82
(150, 10)1.240.870.820.610.73
(200, 2)1.270.960.960.630.87
(200, 4)1.240.930.780.560.71
(200, 6)1.200.890.990.660.69
(200, 8)1.270.960.840.610.67
(200, 10)1.150.900.840.550.69
Table A14. HV of the proposed algorithm and the comparison algorithms (F = 5).
Table A14. HV of the proposed algorithm and the comparison algorithms (F = 5).
F = 5
(n,s)IMOMA-QLIMPGAMQSLFAMOEA/DNSGA-II
(50, 2)1.320.931.000.750.81
(50, 4)1.210.940.850.660.74
(50, 6)1.250.990.910.630.75
(50, 8)1.130.950.850.690.85
(50, 10)1.250.950.930.600.63
(100, 2)1.190.890.930.640.71
(100, 4)1.231.090.960.550.76
(100, 6)1.200.900.830.490.72
(100, 8)1.180.890.770.500.64
(100, 10)1.190.770.870.530.59
(150, 2)1.180.820.940.590.74
(150, 4)1.150.880.860.570.76
(150, 6)1.311.040.950.760.90
(150, 8)1.170.940.830.710.82
(150, 10)1.250.860.820.590.77
(200, 2)1.200.981.010.560.80
(200, 4)1.290.930.810.560.73
(200, 6)1.190.840.960.590.60
(200, 8)1.270.920.810.610.70
(200, 10)1.230.810.870.530.70
Table A15. HV of the proposed algorithm and the comparison algorithms (F = 6).
Table A15. HV of the proposed algorithm and the comparison algorithms (F = 6).
F = 6
(n,s)IMOMA-QLIMPGAMQSLFAMOEA/DNSGA-II
(50, 2)1.200.961.010.680.71
(50, 4)1.210.930.870.690.75
(50, 6)1.250.910.830.680.79
(50, 8)1.130.940.840.710.78
(50, 10)1.250.820.880.630.72
(100, 2)1.190.860.960.660.76
(100, 4)1.230.920.960.600.79
(100, 6)1.200.820.850.540.73
(100, 8)1.180.940.950.530.68
(100, 10)1.090.960.900.560.62
(150, 2)1.180.940.840.630.78
(150, 4)1.150.890.850.610.78
(150, 6)1.220.840.870.720.80
(150, 8)1.171.000.890.660.83
(150, 10)1.250.950.940.630.81
(200, 2)1.200.880.860.590.82
(200, 4)1.290.830.880.580.77
(200, 6)1.190.910.870.630.73
(200, 8)1.270.900.860.640.75
(200, 10)1.230.980.950.560.73
Table A16. IGD of the proposed algorithm and the comparison algorithms (F = 2).
Table A16. IGD of the proposed algorithm and the comparison algorithms (F = 2).
F = 2
(n,s)IMOMA-QLIMPGAMQSLFAMOEA/DNSGA-II
(50, 2)0.0960.1310.1590.2110.189
(50, 4)0.1170.1590.1730.2640.216
(50, 6)0.1050.1730.1630.2670.246
(50, 8)0.1350.1810.1690.2440.204
(50, 10)0.1460.1660.1520.2390.279
(100, 2)0.0760.150.190.2560.212
(100, 4)0.1100.1740.1820.2840.234
(100, 6)0.1240.1670.1460.2740.256
(100, 8)0.0660.150.190.2570.21
(100, 10)0.1020.1650.1830.2890.223
(150, 2)0.1330.1530.160.2270.189
(150, 4)0.0980.1630.1780.2960.214
(150, 6)0.1310.1520.16110.2780.245
(150, 8)0.1050.1310.1590.2960.144
(150, 10)0.1320.1550.1790.2650.263
(200, 2)0.0890.1660.1350.2540.223
(200, 4)0.0780.1690.1740.2880.237
(200, 6)0.0660.1320.1580.2750.257
(200, 8)0.0940.1450.1630.2740.223
(200, 10)0.1070.1580.1650.2310.17
Table A17. IGD of the proposed algorithm and the comparison algorithms (F = 3).
Table A17. IGD of the proposed algorithm and the comparison algorithms (F = 3).
F = 3
(n,s)IMOMA-QLIMPGAMQSLFAMOEA/DNSGA-II
(50, 2)0.060.120.1440.1950.189
(50, 4)0.1040.1650.1720.2240.191
(50, 6)0.1290.1820.1790.2350.229
(50, 8)0.110.1750.1610.2540.242
(50, 10)0.1330.1640.1510.2280.232
(100, 2)0.0530.1180.1030.2230.181
(100, 4)0.0830.1010.1180.2640.245
(100, 6)0.0860.1430.130.2950.198
(100, 8)0.0780.1280.1390.2510.183
(100, 10)0.0690.1370.1230.1850.169
(150, 2)0.0710.1580.1450.2580.171
(150, 4)0.0740.1330.1640.2120.201
(150, 6)0.0850.1790.1760.2310.218
(150, 8)0.0590.1150.1320.2520.172
(150, 10)0.1150.1690.1580.2520.192
(200, 2)0.0760.1250.1210.2890.167
(200, 4)0.0710.1240.1360.2960.175
(200, 6)0.080.1510.1310.2640.223
(200, 8)0.0960.1360.1420.2550.233
(200, 10)0.0820.1630.1780.2780.209
Table A18. IGD of the proposed algorithm and the comparison algorithms (F = 4).
Table A18. IGD of the proposed algorithm and the comparison algorithms (F = 4).
F = 4
(n,s)IMOMA-QLIMPGAMQSLFAMOEA/DNSGA-II
(50, 2)0.0620.1610.2070.2750.253
(50, 4)0.1320.1560.1610.2550.235
(50, 6)0.0780.1410.1730.2520.228
(50, 8)0.0760.1720.1890.2460.216
(50, 10)0.1050.1660.1720.2360.202
(100, 2)0.0580.1170.1060.3030.223
(100, 4)0.0730.1290.1660.2410.171
(100, 6)0.0690.1410.1360.2540.154
(100, 8)0.0540.1340.1460.2580.161
(100, 10)0.0760.1740.1570.2230.238
(150, 2)0.0690.1420.160.2660.171
(150, 4)0.0910.1670.1450.2290.179
(150, 6)0.0940.1890.2120.2120.239
(150, 8)0.0630.1430.160.2640.151
(150, 10)0.1110.1540.1590.2560.187
(200, 2)0.1020.1420.1340.2620.157
(200, 4)0.090.1620.1490.2730.223
(200, 6)0.0690.1310.1230.2450.205
(200, 8)0.0760.1350.1440.280.197
(200, 10)0.0660.1330.1420.2590.182
Table A19. IGD of the proposed algorithm and the comparison algorithms (F = 5).
Table A19. IGD of the proposed algorithm and the comparison algorithms (F = 5).
F = 5
(n,s)IMOMA-QLIMPGAMQSLFAMOEA/DNSGA-II
(50, 2)0.1070.1630.1430.2290.185
(50, 4)0.1230.1750.1840.2560.211
(50, 6)0.1250.1350.1580.2560.196
(50, 8)0.0810.1530.1710.2580.194
(50, 10)0.1130.1890.2150.2760.233
(100, 2)0.0840.1640.2140.2440.132
(100, 4)0.1350.1530.1730.2690.236
(100, 6)0.0920.1470.1640.2780.205
(100, 8)0.0690.1290.1110.3240.206
(100, 10)0.1070.1690.1510.2780.214
(150, 2)0.1290.1830.250.3190.269
(150, 4)0.1020.1910.2110.2820.251
(150, 6)0.1250.1420.1540.3030.188
(150, 8)0.0950.1560.1480.2370.247
(150, 10)0.110.1620.1750.2510.267
(200, 2)0.1070.1640.1520.2660.332
(200, 4)0.0790.1320.250.290.311
(200, 6)0.1440.1830.1960.2790.233
(200, 8)0.0570.1470.1530.2250.165
(200, 10)0.1190.1890.1570.2390.195
Table A20. IGD of the proposed algorithm and the comparison algorithms (F = 6).
Table A20. IGD of the proposed algorithm and the comparison algorithms (F = 6).
F = 6
(n,s)IMOMA-QLIMPGAMQSLFAMOEA/DNSGA-II
(50, 2)0.0750.1530.1480.2430.194
(50, 4)0.1210.1390.1460.2200.174
(50, 6)0.1150.1810.1780.2360.180
(50, 8)0.1140.1670.1880.2320.216
(50, 10)0.1360.1780.1660.2570.227
(100, 2)0.0750.1360.1330.2460.177
(100, 4)0.0850.1640.1420.2560.200
(100, 6)0.0880.1570.1120.2300.173
(100, 8)0.0920.1220.1600.2380.198
(100, 10)0.0720.1290.1610.2490.212
(150, 2)0.1140.1660.1830.2230.183
(150, 4)0.0820.1350.1710.2530.247
(150, 6)0.0690.1370.1460.2620.156
(150, 8)0.1140.1940.1970.2490.228
(150, 10)0.0940.1650.1580.2230.189
(200, 2)0.0560.1280.1360.2240.178
(200, 4)0.0740.1530.1620.3560.220
(200, 6)0.1150.1840.1830.2630.202
(200, 8)0.0760.1730.1860.2270.207
(200, 10)0.0680.1320.1460.2560.187

Appendix A.2. Illustrative Example of SB2OX Crossover

This appendix provides a step-by-step illustrative example of the SB2OX crossover operator, complementing the description in Section 3.2.
Consider two parent solutions, each with two factories. The semicolon separates the factory sequences
Parent 1 : { 1 , 5 , 8 , 11 , 9 , 6 , 4 , 3 ; 12 , 2 , 7 , 13 , 6 , 14 , 15 , 10 } Parent 2 : { 2 , 16 , 8 , 13 , 10 , 6 , 4 , 3 ; 11 , 6 , 5 , 12 , 14 , 15 , 9 , 10 }
Step 1: Identical Block Inheritance—Both parents are compared position by position. Identical blocks containing at least two consecutive matching jobs are identified and directly inherited by the offspring. In the first factory, jobs 4 and 3 at positions 7–8 are identical in both parents. In the second factory, jobs 14 and 15 at positions 6–7 are identical in both parents. These blocks are transferred to the children, preserving their factory assignments and positions:
Child 1 and Child 2 after Step 1 : { _ , _ , _ , _ , _ , _ , 4 , 3 ; _ , _ , _ , _ , _ , 14 , 15 , _ }
where _ denotes an empty position.
Step 2: Random Segment Inheritance—Two cut points are randomly selected in each factory. The segments between these cut points are inherited as follows. For Child 1 (derived from Parent 1), the first factory retains jobs at positions 3–5 (jobs 8, 11, and 13) and the second factory retains jobs at positions 4–5 (jobs 13 and 12). For Child 2 (derived from Parent 2), the first factory retains jobs at positions 2–4 (jobs 16, 8, and 13) and the second factory retains jobs at positions 2–5 (jobs 6, 5, 12, and 14). After Step 2
Child 1 : { _ , _ , 8 , 11 , 13 , _ , 4 , 3 ; _ , _ , _ , 13 , 12 , 14 , 15 , _ } Child 2 : { _ , 16 , 8 , 13 , _ , _ , 4 , 3 ; _ , 1 , 6 , 5 , 12 , 14 , 15 , _ }
Step 3: Remaining Position Filling—The empty positions are filled with the missing jobs in the order they appear in the other parent. For Child 1, missing jobs are taken from Parent 2’s sequence (excluding those already in Child 1) and filled left-to-right. For Child 2, missing jobs are taken from Parent 1’s sequence.
The final children are
Child 1 : { 2 , 16 , 8 , 11 , 9 , 10 , 4 , 3 ; 7 , 1 , 6 , 13 , 12 , 14 , 15 , 5 } Child 2 : { 11 , 16 , 8 , 13 , 9 , 2 , 4 , 3 ; 7 , 1 , 6 , 5 , 12 , 14 , 15 , 10 }

Appendix A.3. Q-Table Update Example

To illustrate the Q-learning update mechanism, consider a Q-table with three states and six actions per state. The initial Q-values are randomly initialized as integers between 1 and 3:
Table A21. Initial Q-table (with integer values 1–3).
Table A21. Initial Q-table (with integer values 1–3).
StateAction 1Action 2Action 3Action 4Action 5Action 6
State1312212
State2121321
State3213221
Assume the agent is in State 1 and greedily selects Action 1 (which has the highest Q-value of 3 in State 1). For the purpose of this example, we assume that after executing this action, the agent receives a reward r = 1 and transitions to State 2.
Assuming learning rate α = 1 and discount factor γ = 0.9
Q ( s , a ) Q ( s , a ) + α r + γ max a Q ( s , a ) Q ( s , a )
  • Current Q-value: Q ( State 1 , Action 1 ) = 3 .
  • Maximum Q-value in next state: max a Q ( State 2 , a ) = 3 (Action 4 in State 2).
  • Temporal difference target: r + γ max a Q ( s , a ) = 1 + 0.9 × 3 = 1 + 2.7 = 3.7 .
  • Updated Q-value: Q ( State 1 , Action 1 ) 3.7 .
After the update, the Q-table becomes
Table A22. Q-table after update with α = 1.0 .
Table A22. Q-table after update with α = 1.0 .
StateAction 1Action 2Action 3Action 4Action 5Action 6
State 13.712212
State 2121321
State 3213221
Figure A1. Flowchart of the proposed algorithm.
Figure A1. Flowchart of the proposed algorithm.
Symmetry 18 00135 g0a1

References

  1. Rifai, A.P.; Nguyen, H.T.; Dawal, S.Z.M. Multi-objective adaptive large neighborhood search for distributed reentrant permutation flow shop scheduling. Appl. Soft Comput. 2016, 40, 42–57. [Google Scholar] [CrossRef]
  2. Chen, J.-f.; Wang, L.; Peng, Z.-p. A collaborative optimization algorithm for energy-efficient multi-objective distributed no-idle flow-shop scheduling. Swarm Evol. Comput. 2019, 50, 100557. [Google Scholar] [CrossRef]
  3. Li, J.Q.; Song, M.X.; Wang, L.; Duan, P.Y.; Han, Y.Y.; Sang, H.Y.; Pan, Q.K. Hybrid Artificial Bee Colony Algorithm for a Parallel Batching Distributed Flow-Shop Problem With Deteriorating Jobs. IEEE Trans. Cybern. 2020, 50, 2425–2439. [Google Scholar] [CrossRef]
  4. Lin, J.; Zhang, S. An effective hybrid biogeography-based optimization algorithm for the distributed assembly permutation flow-shop scheduling problem. Comput. Ind. Eng. 2016, 97, 128–136. [Google Scholar] [CrossRef]
  5. Huang, J.; Gu, X. Distributed assembly permutation flow-shop scheduling problem with sequence-dependent setup times using a novel biogeography-based optimization algorithm. Eng. Optim. 2021, 54, 593–613. [Google Scholar] [CrossRef]
  6. Zhang, H.; Xu, G.; Pan, R.; Ge, H. A novel heuristic method for the energy-efficient flexible job-shop scheduling problem with sequence-dependent set-up and transportation time. Eng. Optim. 2021, 54, 1646–1667. [Google Scholar] [CrossRef]
  7. Deng, J.; Zhang, J.; Yang, S. A hybrid genetic programming algorithm for the distributed assembly scheduling problems with transportation and sequence-dependent setup times. Eng. Optim. 2024, 57, 786–812. [Google Scholar] [CrossRef]
  8. Fu, Y.; Tian, G.; Fathollahi-Fard, A.M.; Ahmadi, A.; Zhang, C. Stochastic multi-objective modelling and optimization of an energy-conscious distributed permutation flow shop scheduling problem with the total tardiness constraint. J. Clean. Prod. 2019, 226, 515–525. [Google Scholar] [CrossRef]
  9. Lei, C.; Zhao, N.; Ye, S.; Wu, X. Memetic algorithm for solving flexible flow-shop scheduling problems with dynamic transport waiting times. Comput. Ind. Eng. 2020, 139, 105984. [Google Scholar] [CrossRef]
  10. Xu, Y.; Jiang, X.; Li, J.; Xing, L.; Song, Y. A knowledge-driven memetic algorithm for the energy-efficient distributed homogeneous flow shop scheduling problem. Swarm Evol. Comput. 2024, 89, 101625. [Google Scholar] [CrossRef]
  11. Deng, J.; Wang, L. A competitive memetic algorithm for multi-objective distributed permutation flow shop scheduling problem. Swarm Evol. Comput. 2017, 32, 121–131. [Google Scholar] [CrossRef]
  12. Li, Y.; Li, X.; Gao, L.; Zhang, B.; Pan, Q.K.; Tasgetiren, M.F.; Meng, L. A discrete artificial bee colony algorithm for distributed hybrid flowshop scheduling problem with sequence-dependent setup times. Int. J. Prod. Res. 2021, 59, 3880–3899. [Google Scholar] [CrossRef]
  13. Zhou, B.-h.; Hu, L.-m.; Zhong, Z.-y. A hybrid differential evolution algorithm with estimation of distribution algorithm for reentrant hybrid flow shop scheduling problem. Neural Comput. Appl. 2018, 30, 193–209. [Google Scholar] [CrossRef]
  14. Shao, Z.; Pi, D.; Shao, W. Hybrid enhanced discrete fruit fly optimization algorithm for scheduling blocking flow-shop in distributed environment. Expert Syst. Appl. 2020, 145, 113147. [Google Scholar] [CrossRef]
  15. Naderi, B.; Ruiz, R. A scatter search algorithm for the distributed permutation flowshop scheduling problem. Eur. J. Oper. Res. 2014, 239, 323–334. [Google Scholar] [CrossRef]
  16. Wang, Y.; Cai, Z.; Guo, L.; Li, G.; Yu, Y.; Gao, S. A spherical evolution algorithm with two-stage search for global optimization and real-world problems. Inf. Sci. 2024, 665, 120424. [Google Scholar] [CrossRef]
  17. Wang, S.; Cao, B.; Teng, B. Torsional tribological behavior of polytetrafluoroethylene composites filled with hexagonal boron nitride and phenyl p-hydroxybenzoate under different angular displacements. Ind. Lubr. Tribol. 2015, 67, 139–149. [Google Scholar] [CrossRef]
  18. Zhang, G.; Xing, K. Memetic social spider optimization algorithm for scheduling two-stage assembly flowshop in a distributed environment. Comput. Ind. Eng. 2018, 125, 423–433. [Google Scholar] [CrossRef]
  19. Wang, J.-j.; Wang, L. A Bi-Population Cooperative Memetic Algorithm for Distributed Hybrid Flow-Shop Scheduling. IEEE Trans. Emerg. Top. Comput. Intell. 2021, 5, 947–961. [Google Scholar] [CrossRef]
  20. Zhang, G.; Liu, B.; Wang, L.; Yu, D.; Xing, K. Distributed Co-Evolutionary Memetic Algorithm for Distributed Hybrid Differentiation Flowshop Scheduling Problem. IEEE Trans. Evol. Comput. 2022, 26, 1043–1057. [Google Scholar] [CrossRef]
  21. Shao, W.; Shao, Z.; Pi, D. A network memetic algorithm for energy and labor-aware distributed heterogeneous hybrid flow shop scheduling problem. Swarm Evol. Comput. 2022, 75, 101190. [Google Scholar] [CrossRef]
  22. Wang, H.; Sarker, B.R.; Li, J.; Li, J. Adaptive scheduling for assembly job shop with uncertain assembly times based on dual Q-learning. Int. J. Prod. Res. 2021, 59, 5867–5883. [Google Scholar] [CrossRef]
  23. Zhao, C.; Wu, L.; Zuo, C.; Zhang, H. An improved fruit fly optimization algorithm with Q-learning for solving distributed permutation flow shop scheduling problems. Complex Intell. Syst. 2024, 10, 5965–5988. [Google Scholar] [CrossRef]
  24. Lu, B.; Gao, K.; Ren, Y.; Li, D.; Slowik, A. Combining meta-heuristics and Q-learning for scheduling lot-streaming hybrid flow shops with consistent sublots. Swarm Evol. Comput. 2024, 91, 101731. [Google Scholar] [CrossRef]
  25. Cai, J.; Lei, D.; Li, M. A shuffled frog-leaping algorithm with memeplex quality for bi-objective distributed scheduling in hybrid flow shop. Int. J. Prod. Res. 2021, 59, 5404–5421. [Google Scholar] [CrossRef]
  26. Lei, D.; Wang, T. Solving distributed two-stage hybrid flowshop scheduling using a shuffled frog-leaping algorithm with memeplex grouping. Eng. Optim. 2020, 52, 1461–1474. [Google Scholar] [CrossRef]
  27. Lei, D.; Su, B. A multi-class teaching–learning-based optimization for multi-objective distributed hybrid flow shop scheduling. Knowl.-Based Syst. 2023, 263, 110252. [Google Scholar] [CrossRef]
  28. Lei, D.; Zheng, Y. Hybrid flow shop scheduling with assembly operations and key objectives: A novel neighborhood search. Appl. Soft Comput. 2017, 61, 122–128. [Google Scholar] [CrossRef]
  29. Chiang, T.-C.; Cheng, H.-C.; Fu, L.-C. NNMA: An effective memetic algorithm for solving multiobjective permutation flow shop scheduling problems. Expert Syst. Appl. 2011, 38, 5986–5999. [Google Scholar] [CrossRef]
  30. Ruiz, R.; Maroto, C.; Alcaraz, J. Solving the flowshop scheduling problem with sequence dependent setup times using advanced metaheuristics. Eur. J. Oper. Res. 2005, 165, 34–54. [Google Scholar] [CrossRef]
  31. Watkins, C.J.C.H.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
  32. Xi, B.; Lei, D. Q-Learning-Based Teaching-Learning Optimization for Distributed Two-Stage Hybrid Flow Shop Scheduling with Fuzzy Processing Time. Complex Syst. Model. Simul. 2022, 2, 113–129. [Google Scholar] [CrossRef]
  33. Sun, X.; Gong, Y.; Kang, H.; Lei, W.; Jin, Z.; Li, Z.; Shen, Y.; Chen, Q. A multi-objective differential evolution algorithm for the distributed hybrid flowshop scheduling problem with deteriorating jobs. Eng. Optim. 2025, 57, 3101–3133. [Google Scholar] [CrossRef]
  34. Cui, H.; Li, X.; Gao, L.; Zhang, C. Multi-population genetic algorithm with greedy job insertion inter-factory neighbourhoods for multi-objective distributed hybrid flow-shop scheduling with unrelated-parallel machines considering tardiness. Int. J. Prod. Res. 2024, 62, 4427–4445. [Google Scholar] [CrossRef]
  35. Li, H.; Zhang, Q. Multiobjective Optimization Problems With Complicated Pareto Sets, MOEA/D and NSGA-II. IEEE Trans. Evol. Comput. 2009, 13, 284–302. [Google Scholar] [CrossRef]
  36. Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
Figure 1. Illustration of DHFSP.
Figure 1. Illustration of DHFSP.
Symmetry 18 00135 g001
Figure 2. The flowchart of the selection.
Figure 2. The flowchart of the selection.
Symmetry 18 00135 g002
Figure 3. The crossover operator.
Figure 3. The crossover operator.
Symmetry 18 00135 g003
Figure 4. Learning process of Q-learning.
Figure 4. Learning process of Q-learning.
Symmetry 18 00135 g004
Figure 5. The initial Q-table.
Figure 5. The initial Q-table.
Symmetry 18 00135 g005
Figure 6. Main effects plot.
Figure 6. Main effects plot.
Symmetry 18 00135 g006
Figure 7. Extended sensitivity analysis.
Figure 7. Extended sensitivity analysis.
Symmetry 18 00135 g007
Figure 8. HV metrics: proposed algorithm vs. four variants.
Figure 8. HV metrics: proposed algorithm vs. four variants.
Symmetry 18 00135 g008
Figure 9. IGD metrics: proposed algorithm vs. four variants.
Figure 9. IGD metrics: proposed algorithm vs. four variants.
Symmetry 18 00135 g009
Figure 10. HV metrics: proposed algorithm vs. comparison algorithms.
Figure 10. HV metrics: proposed algorithm vs. comparison algorithms.
Symmetry 18 00135 g010
Figure 11. IGD metrics: proposed algorithm vs. comparison algorithms.
Figure 11. IGD metrics: proposed algorithm vs. comparison algorithms.
Symmetry 18 00135 g011
Figure 12. The 3D scatter plots.
Figure 12. The 3D scatter plots.
Symmetry 18 00135 g012
Table 1. Sets, parameters, and variables.
Table 1. Sets, parameters, and variables.
SymbolDescription
Sets
NFactories: N = { 1 , , f , , F }
IStages: I = { 1 , , i , , k }
JJobs: J = { 1 , , j , , n }
PPositions: P = { 1 , , p , , n }
M f Machines at factory f. M f = { 1 , , m , , l }
M i , f Machines at stage i in factory f. M f = { 1 , , m , , F × l }
Parameters
p j , i Processing time of job j at stage i
d j due date of job j
S j , i Start time of job j at stage i
E j , i Completion time of job j at stage i
M S f , m , p Start time of machine m at pth position in factory f
M E f , m , p Completion time of machine m at pth position in factory f
s j , j , i Setup time at stage i from job j to job j
Variables
U j Binary variable: 1 if job j is tardy, 0 otherwise
X j , f Binary variable: 1 if job j is assigned to factory f, 0 otherwise
Y j , f , m , p , i Binary variable: 1 if job j is processed at the pth position
of machine m at stage i in factory f, 0 otherwise
Table 2. HV of the proposed algorithm and its four variants.
Table 2. HV of the proposed algorithm and its four variants.
InstancesIMOMA-QLIMOMA-QL1IMOMA-QL2IMOMA-QL3IMOMA-QL4
F21.211.130.951.021.11
31.191.120.980.991.08
41.171.100.941.031.11
51.231.150.971.051.13
61.221.130.951.041.11
n501.211.130.961.031.13
1001.181.100.931.011.10
1501.221.120.961.041.09
2001.201.130.971.031.12
s21.241.130.971.031.12
41.211.140.981.051.13
61.191.100.921.001.08
81.191.120.961.031.11
101.181.130.951.021.10
Table 3. IGD of the proposed algorithm and and its four variants.
Table 3. IGD of the proposed algorithm and and its four variants.
InstancesIMOMA-QLIMOMA-QL1IMOMA-QL2IMOMA-QL3IMOMA-QL4
F20.1050.1300.1760.1670.128
30.0830.1150.1910.1730.120
40.1020.1340.1810.1610.120
50.0980.1240.1660.1510.117
60.1020.1140.1770.1700.114
n500.1030.1320.1820.1690.125
1000.0850.1080.1620.1470.102
1500.1070.1310.1810.1680.124
2000.1080.1220.1880.1730.128
s20.1200.1430.2040.1880.144
40.1110.1340.1890.1800.132
60.0820.1080.1580.1450.103
80.0820.1120.1620.1430.100
100.0960.1200.1800.1660.120
Table 4. Friedman test rankings across all algorithm variants.
Table 4. Friedman test rankings across all algorithm variants.
AlogrithmsHVIGD
Rankp-ValueRankp-Value
IMOMA-QL1 1.0000
IMOMA-QL12.323.56476 × 10−92.560
IMOMA-QL24.9304.7500
IMOMA-QL34.0504.1800
IMOMA-QL42.702.90878 × 10−142.485.68936 × 10−11
Table 5. HV of the proposed algorithm and the comparison algorithms.
Table 5. HV of the proposed algorithm and the comparison algorithms.
InstancesIMOMA-QLIMPGAMQSLFAMOEA/DNSGA-II
F21.250.950.940.640.75
31.220.920.920.650.74
41.210.910.870.610.73
51.230.920.880.600.72
61.200.880.900.640.75
n501.220.930.900.690.73
1001.190.900.910.610.72
1501.240.900.910.630.74
s21.190.930.970.690.78
41.220.940.870.620.72
61.250.930.910.660.75
81.190.930.850.650.75
101.210.860.920.620.69
Table 6. IGD of the proposed algorithm and comparison algorithms.
Table 6. IGD of the proposed algorithm and comparison algorithms.
InstancesIMOMA-QLIMIPGAMQSLFAMOEA/DNSGA-II
F20.1050.1570.1670.2630.222
30.0860.1440.1450.2470.201
40.0810.1490.1570.2540.199
50.1050.1610.1770.2680.224
60.0920.1550.1600.2470.197
n500.1080.1620.1690.2440.214
1000.0830.1440.1490.2590.200
1500.0990.1570.1700.2560.206
2000.0850.1500.1570.2660.212
s20.0850.1470.1560.2510.199
40.0950.1520.1680.2650.219
60.0990.1570.1590.2590.212
80.0860.1490.1600.2560.200
100.1030.1610.1620.2490.214
Table 7. Friedman test rankings across IMOMA-QL and comparative algorithms.
Table 7. Friedman test rankings across IMOMA-QL and comparative algorithms.
HVIGD
Rankp-ValueRankp-Value
IMOMA-QL1 1
IMPGA2.4201.60385 × 10−102.351.56633 × 10−9
MQSLFA2.6106.01519 × 10−132.88.88178 × 10−16
MOEAD4.86004.720
NSGA24.10004.130
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shen, Y.; Liu, Y.; Kang, H.; Sun, X.; Chen, Q. An Improved Multi-Objective Memetic Algorithm with Q-Learning for Distributed Hybrid Flow Shop Considering Sequence-Dependent Setup Times. Symmetry 2026, 18, 135. https://doi.org/10.3390/sym18010135

AMA Style

Shen Y, Liu Y, Kang H, Sun X, Chen Q. An Improved Multi-Objective Memetic Algorithm with Q-Learning for Distributed Hybrid Flow Shop Considering Sequence-Dependent Setup Times. Symmetry. 2026; 18(1):135. https://doi.org/10.3390/sym18010135

Chicago/Turabian Style

Shen, Yong, Yibo Liu, Hongwei Kang, Xingping Sun, and Qingyi Chen. 2026. "An Improved Multi-Objective Memetic Algorithm with Q-Learning for Distributed Hybrid Flow Shop Considering Sequence-Dependent Setup Times" Symmetry 18, no. 1: 135. https://doi.org/10.3390/sym18010135

APA Style

Shen, Y., Liu, Y., Kang, H., Sun, X., & Chen, Q. (2026). An Improved Multi-Objective Memetic Algorithm with Q-Learning for Distributed Hybrid Flow Shop Considering Sequence-Dependent Setup Times. Symmetry, 18(1), 135. https://doi.org/10.3390/sym18010135

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop