An Improved Multi-Objective Memetic Algorithm with Q-Learning for Distributed Hybrid Flow Shop Considering Sequence-Dependent Setup Times

Shen, Yong; Liu, Yibo; Kang, Hongwei; Sun, Xingping; Chen, Qingyi

doi:10.3390/sym18010135

Open AccessArticle

An Improved Multi-Objective Memetic Algorithm with Q-Learning for Distributed Hybrid Flow Shop Considering Sequence-Dependent Setup Times

by

Yong Shen

,

Yibo Liu

,

Hongwei Kang

^*

,

Xingping Sun

and

Qingyi Chen

School of Software, Yunnan University, Kunming 650000, China

^*

Author to whom correspondence should be addressed.

Symmetry 2026, 18(1), 135; https://doi.org/10.3390/sym18010135

Submission received: 5 December 2025 / Revised: 6 January 2026 / Accepted: 7 January 2026 / Published: 9 January 2026

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

Most multi-objective studies on distributed hybrid flow shops that include tardiness-related objectives focus solely on optimizing makespan alongside a single tardiness objective. However, in real-world scenarios with strict contractual deadlines or high penalty costs for delays, minimizing both total tardiness and the number of tardy jobs becomes critically important. This paper addresses this gap by prioritizing tardiness-related objectives while simultaneously optimizing makespan, total tardiness, and the number of tardy jobs. It investigates a distributed hybrid flow shop scheduling problem (DHFSP), which has some symmetries on machines. We propose an improved multi-objective memetic algorithm incorporating Q-learning (IMOMA-QL) to solve this problem, featuring (1) a hybrid initialization method that generates high-quality, diverse solutions by balancing all three objectives; (2) a multi-factory SB2OX crossover operator preserving high-performance job sequences across factories; (3) six problem-specific neighborhood structures for efficient solution space exploration; and (4) a Q-learning-guided variable neighborhood search that adaptively selects neighborhood structures. Based on extensive numerical experiments across 100 generated instances and a comprehensive comparison with four comparative algorithms, the proposed IMOMA demonstrates its effectiveness and proves to be a competitive method for solving the DHFSP.

Keywords:

distributed hybrid flow shop; multi-objective optimization; memetic algorithm; sequence-dependent setup times

1. Introduction

With the development of globalization, many enterprises setup production bases in different countries and regions. Distributed flow shop can coordinate production activities between different geographical areas, achieve load balancing between different factories, avoid overloading of some factories or waste of resources in some factories, and improve the overall production efficiency. The distributed flow shop scheduling problem (DFSP) has been thoroughly considered, yielding numerous results addressing a set of pragmatic constraints such as re-entrant [1], no idle [2], deteriorating job [3], sequence-dependent setup times (SDSTs) [4,5,6,7], and energy-conscious [8]. Various evolution algorithms have been widely used to solve real-world problems extensively, such as the memetic algorithm [9,10,11], artificial bee colony algorithm [12], distributional estimation algorithm (DEA) [13], discrete fruit fly optimization algorithm [14], hybrid meta-heuristic [15], variable neighborhood descent algorithm [8], and spherical evolution algorithm [16].

To address distributed scheduling problems (DSPs), researchers have developed diverse solution approaches. Meta-heuristic algorithms have gained widespread adoption for shop scheduling optimization due to their notable advantages: straightforward implementation, robust performance, rapid convergence characteristics, and seamless compatibility with other algorithmic frameworks.

Memetic algorithms (MAs) have gained significant attention for their effectiveness in tackling various NP-hard optimization problems, particularly in single-objective distributed scheduling problems (DSPs). For instance, Wang [17] developed an MA based on EDA to address DFSP and minimize makespan. To optimize the makespan in a two-stage DFSP, Zhang [18] integrated the social spider optimization method into an MA framework. Wang [19] further explored a cooperative bi-population MA that incorporated collaborative initialization and inter-population cooperation and intensified local search for minimizing makespan in distributed hybrid flow shop scheduling problems (DHFSPs). Zhang et al. [20] achieved makespan minimization by leveraging cooperation within MA.

Memetic algorithms (MAs) have also been widely applied to DFSPs. Deng and Wang [11] investigated a multi-objective DFSP aimed at minimizing both makespan and total tardiness, and developed a competitive MA employing two populations with distinct operators for each objective. Wang [19] examined an energy-focused variant targeting the reduction of energy consumption and makespan, introducing a collaborative MA guided by reinforcement learning policy agents. Shao [21] proposed a network-based MA to minimize total tardiness, overall production cost, and carbon emissions.

The rapid advancement of AI is fundamentally transforming operations across a multitude of fields. As a typical reinforcement learning algorithm originating from dynamic programming, Q-learning makes the best decision at each step to optimize the overall process. To address the uncertainty in assembly job shop scheduling and to enhance scheduling algorithms under various production environments, Q-learning has been widely integrated into different frameworks. For instance, in assembly job shop scheduling, a dual-loop framework based on Q-learning is proposed to cope with environmental uncertainty by self-learning [22]. In DFSP, Q-learning is combined with metaheuristics such as the fruit fly optimization algorithm to enhance neighborhood selection and improve solution quality [23]. For the studied DFSP variant with consistent sublots, the method employs a value-based RL method. It is coupled with the meta-heuristic to achieve adaptive operator selection [24].

In practical scenarios, decision-makers are often concerned not only with minimizing the makespan, but also with tardiness-related objectives. Tardiness-based objectives are particularly important in industries where late deliveries incur significant penalties or disrupt downstream processes. Cai [25] proposed two enhanced shuffled leapfrog algorithms (SLFA) for solving DHFSP in a multi-processor setting, aiming to minimize both total tardiness and the makespan simultaneously. Later, Li developed neighborhood-based heuristic to address a two-stage DHFSP with SDST, targeting reductions in total tardiness and makespan [12]. In addition, Lei [26] investigated a SFLA with memeplex partitioning in DHFSP. To address the dual objectives of makespan and maximum tardiness minimization, Lei [27] crafted a novel multi-class optimization approach based on the teaching–learning paradigm, which enhances search efficiency through inter-class interaction.

Few studies have treated tardiness-related objectives as the main focus in multi-objective optimization. Lei and Zheng [28] tackled HFSP with assembly operations and minimized total tardiness, maximum tardiness, and makespan with tardiness objectives regarded as key ones. In the DHFSP, which is more complex than the standard HFSP, it is therefore of great importance to develop effective algorithms that can simultaneously optimize total tardiness, the number of tardy jobs, and makespan.

In light of the above literature on DFSP and memetic algorithms, this study addresses the DHFSP with SDST, aiming to optimize makespan, total tardiness, and the number of tardy jobs prioritizing tardiness-related objectives. To tackle this problem, a multi-objective memetic algorithm combined with Q-learning is proposed (IMOMA-QL). The major contributions of this paper are as follows:

Hybrid initialization method—A mixed initialization strategy is proposed to simultaneously optimize total tardiness, the number of tardy jobs, and makespan, generating a high-quality and diverse population.
Multi-factory SB2OX crossover operator—The Similar Block 2-Point Order Crossover (SB2OX) is extended to a multi-factory context, leveraging structural similarity of job sequences to retain high-quality sub-sequences and enhance information exchange between factories.
Problem-specific neighborhood structures are developed to guide the search process toward more promising regions. Considering the optimization objectives and problem characteristics, these neighborhood structures effectively explore the solution space.
Q-learning-guided variable neighborhood search—A Q-learning strategy is introduced to adaptively choose the most effective neighborhood structure during the search process. The reward is designed based on the change in distance between the new and old solutions to their nearest Pareto front solution, encouraging moves that improve convergence toward the Pareto front while balancing intensification and diversification.

The paper is divided into the following sections. Section 2 formally describes DHFSP with SDST and presents its mathematical model. It also introduces the foundational framework of the memetic algorithm. Section 3 elaborates on the details of our proposed IMOMA, including its novel initialization, genetic operators, and the Q-learning-guided variable neighborhood search. Section 4 provides a comprehensive evaluation of IMOMA, including the experimental setup, sensitivity analysis, and comparisons with four other algorithms. Following this, Section 5 summarizes the main findings of this study, discusses its limitations, and suggests potential directions for future research.

2. Description of the Problem

2.1. Multi-Objective Optimization Problem

Many applied contexts involve simultaneously optimizing multiple conflicting objectives subject to a set of constraints. These are known as multi-objective optimization problems (MOPs). A mathematical model for an MOP is defined as follows:

\{\begin{matrix} min F (x) = (f_{1} (x), f_{2} (x), \dots, f_{m} (x)), \\ x \in ℜ^{D} \end{matrix}

(1)

where

ℜ^{D}

denotes the decision space of the multi-objective optimization problem, D denotes the dimension of the decision space, m denotes the number of objectives included in the multi-objective optimization problem,

x = (X_{1}, X_{2}, \dots, X_{i}, \dots, X_{D})

denotes the decision vector, and

X_{i}

denotes the i-th decision variable.

\{\begin{matrix} f_{i} (a) \leq f_{i} (b), \forall i = 1, 2, \dots, m \\ f_{j} (a) < f_{j} (b), \exists j = 1, 2, \dots, m \end{matrix}

(2)

When solution a does not dominate solution b and solution b does not dominate solution a, then solution a and solution b are said to be non-dominated, and solution a and solution b are considered to have the same performance in MOPs.

2.2. Problem Definition

A DHFSP instance comprises a set of n jobs, denoted by

J_{1}, J_{2}, \dots, J_{n}

, and they need to complete s stages of processing, each of which is assigned to F factories and each of which is a hybrid flow shop with m parallel machines. Each operation then selects an appropriate machine in the assigned factory. Symmetry exists both in the processing sequence of adjacent jobs on the same machine and in the machine distribution across different factories. The optimization objectives are makepspan, total tardiness, and the number of tardy jobs. The notation used in this model is defined in Table 1.

The mathematical formulation is presented below.

M i n i m i s e \sum_{j \in J} T_{j} = \sum_{j \in J} m a x {0, E_{j, k} - d_{j}}

(3)

M i n i m i s e C_{max}

(4)

M i n i m i s e \sum_{j \in J} U_{j}

(5)

\sum_{f \in C} X_{j, f} = 1, \forall j \in J

(6)

X_{j, f} = \sum_{m \in M_{i, f}} \sum_{p \in P} Y_{j, f, m, p, i}, \forall i \in I, j \in J, f \in N

(7)

\sum_{j \in J} Y_{j, f, m, p, i} \leq 1, \forall f \in N, m \in M_{i, f}, p \in P

(8)

\sum_{j \in J} Y_{j, f, m, p, i} \geq \sum_{j^{'} \in J} Y_{j^{'}, f, m, p + 1, i}, \forall f \in F, i \in I, m \in M_{i, f}, p \in {1, \dots, n - 1}

(9)

M E_{f, m, p} = M S_{f, m, p} + \sum_{j \in J} (p_{j, i} Y_{j, f, m, p, i}), \forall f \in N, i \in I, m \in M_{i, c}

(10)

M S_{f, m, p + 1} \geq M E_{f, m, p}, \forall f \in N, m \in M_{f}, p \in {1, \dots, n - 1}

(11)

M S_{f, m, 1} \geq s_{j j, i} - M (1 - Y_{j, f, m, p, i}), \forall f \in N, i \in I, m \in M_{f}, p \in P

(12)

M S_{f, m, p} \leq S_{j, i} + M (1 - Y_{j, f, m, p, i}), \forall f \in N, i \in I, j \in J, m \in M_{i, f}, p \in P

(13)

M S_{f, m, p} \geq S_{j, i} - M (1 - Y_{j, f, m, p, i}), \forall f \in N, i \in I, j \in J, m \in M_{i, f}, p \in P

(14)

E_{j, i} = S_{j, i} + p_{j, i}, \forall j \in J, i \in {1, \dots m - 1}

(15)

E_{j, i} \leq S_{j, i + 1}, \forall j \in J, i \in {1, \dots k - 1}

(16)

M S_{f, m, p} \geq 0, \forall f \in N, m \in M_{f}, p \in P

(17)

S_{j, i} \geq 0, \forall j \in J, i \in I

(18)

The objective functions (3)–(5) are to minimize the total tardiness, makespan, and the number of tardy jobs. Constraint (6) specifies that every job must be assigned to a single factory. Constraints (7)–(9) collectively govern machine–job assignment and processing sequence: each job is processed by exactly one machine per stage; no machine may process more than one job simultaneously; and job assignments must adhere to consecutive machine positions, ensuring previous slots are occupied. Constraints (10) and (11) define the start and finish times for each machine position. Constraint (12) incorporates setup time requirements. According to Constraint (13), the start time of the first job on any machine must account for the required setup time. Constraints (14) and (15) align machine positions with the corresponding job order. Constraint (16) defines a job’s completion time as the sum of its start time and processing duration. Sequential processing of jobs is enforced by Constraint (17), while Constraint (18) ensures that no job can start processing before time zero, which is equivalent to assuming all jobs have a release time of zero.

The layout of the distributed hybrid flow shop is shown in Figure 1.

2.3. Encoding and Decoding Methods

This paper utilizes a permutation-based coding scheme, where the solution is represented by factory vector

F_{f} = \{F_{1}, \dots, F_{c}, \dots, F_{f}\}

, with each vector corresponding to a specific factory, and

α = \{α_{1}, \dots, α_{j}, \dots, α_{n}\}

representing the job processing sequence for the first stage in each factory. In the decoding mechanism, a combination of “FIFO” (first in, first out) and “FMA” (first machine available) is employed.

2.4. A Simple Memetic Algorithm

The MA fuses global evolutionary search with dedicated local search within a population-based paradigm to effectively navigate the exploration–exploitation trade-off [29]. A simple MA typically iterates through the following phases: initial population generation, fitness evaluation, selection, crossover, mutation, local refinement, and population update. The key feature distinguishing MA from a standard genetic algorithm is the embedded local search procedure (Line 12), which intensively refines individuals to reach local optima within promising regions identified by the evolutionary process.

3. Improved Multi-Objective Memetic Alogorithm

3.1. Algorithm Procedure

The proposed multi-objective memetic algorithm (IMOMA) is composed of four main components. First, a hybrid initialization procedure adopts a combination of random generation and problem-specific heuristics to yield a diverse set of well-performing initial solutions. Second, the algorithm incorporates genetic operators, to perform population-based global exploration of the search space. Third, a Q-learning-guided multi-neighborhood search is applied, in which a reinforcement learning mechanism adaptively selects among multiple neighborhood structures to intensify the search and enhance convergence toward the Pareto front. The algorithm flowchart is illustrated in Figure A1 and the pseudocode is shown in Algorithm 1. The proposed IMOMA algorithm consists of three main components per iteration, each contributing to the overall computational complexity. The time complexity per iteration is

O (N \cdot n + N^{2})

, where N denotes the population size and n represents the number of jobs. Given a maximum of T iterations, the overall time complexity of IMOMA is

O (T \cdot (N \cdot n + N^{2}))

.

Algorithm 1 Algorithm of IMOMA

Input:: $i n s t a n c e$ , $d a t e$ , $p a r a m e t e r s$
Output:: $A p p r o x i m a t P a r e t o f r o n t$
1:: Initialize population $P O P_{0}$ ;
2:: Set t=0;
3:: while $c p u t i m e < c p u t i m e_{m a x}$ do
4:: for each $i \in [1, p o p S i z e / 2]$ do
5:: selecting two parents $p_{1}$ , $p_{2}$ from $P O P_{t}$ by using the binary tournament selection;
6:: Generate offspring $s_{1}$ , $s_{2}$ by using Crossover Operator;
7:: Generate offspring $s_{1}^{'}$ , $s_{2}^{'}$ by using Mutation Operator;
8:: Merge $s_{1}^{'}$ , $s_{2}^{'}$ into $P o p_{t}$
9:: end for
10:: Apply VNS guided by Q-learning to $P o p_{t}$ to generate offspring population $P o p_{t}^{'}$
11:: Combine $P o p_{t}$ and $P o p_{t}^{'}$ into $C o m b i n e d P o p$
12:: Perform non-dominated sorting on $C o m b i n e d P o p$
13:: Select N best individuals to form $P o p_{t + 1}$
14:: end while
15:: Output Pareto optimal solutions

3.2. Hybrid Initialization

High-quality initialization plays a pivotal role in multi-objective optimization algorithms for DHFSP by significantly influencing convergence speed, solution diversity, and computational efficiency. A well-designed initialization strategy reduces the algorithm’s exploration burden.

To address the multi-objective characteristics of the DHFSP—specifically the simultaneous minimization of makespan, total tardiness, and number of tardy jobs—this research develops a hybrid initialization approach. The proposed methodology combines targeted heuristic generation with stochastic exploration mechanisms, ensuring both high-quality initial solutions and adequate population diversity.

This paper introduces six initialization methods based on the LPT (Longest Processing Time) and EDD (Earliest Due Date) rules, combined with randomization, to enhance both the quality and diversity of the initial population.

Method 1: Generate a job vector in descending order of the total processing time (LPT), and then iteratively insert job into each factory in the order of the job vector to minimize the maximum completion time.

Method 2: Generate a job vector in ascending order of the due date (EDD), and then iteratively insert job into each factory in the order of the job vector to minimize the total tardiness.

Method 3: Generate a job vector in ascending order of the due date (EDD), and then iteratively insert job into each factory in the order of the job vector to minimize the number of tardy job.

Method 4: Randomly generate a job vector, and then iteratively insert job into each factory in the order of the job vector to minimize the maximum completion time.

Method 5: Randomly generate a job vector, and then iteratively insert job into each factory in the order of the job vector to minimize the total tardiness.

Method 6: Randomly generate a job vector, and then iteratively insert job into each factory in the order of the job vector to minimize the number of tardy job.

Methods 1–3 each generate one solution, while methods 4–6 generate the rest of the population in the ratio 2:2:1. Specifically, Method 4 and Method 5 each generate

⌊ 2 (N - 3) / 5 ⌋

solutions, and Method 6 generates

(N - 3) - 2 \times ⌊ 2 (N - 3) / 5 ⌋

solutions to ensure the total population size is N. The random seed for the initialization is set to 2020.

3.3. Selection

Use binary tournament selection to select two parents. The steps are as follows:

Randomly select two individuals: Two candidate solutions are chosen at random from the existing population.
Selection of better individuals into the next generation.
(1)
If a dominates b, choose a.
(2)
If b dominates a, choose b.
If both individuals are non-dominated, one of the two individuals is chosen at random.
Repeat until the next generation is filled.

The flowchart of the selection process is shown in Figure 2.

3.4. Genetic Operator

SB2OX solves flow shop problems that take into account SDST [30], but it is only used to solve the flow shop scheduling problem for a single factory.

This article proposes crossover operators based on SB2OX that can be adapted to distributed flow shops. Two vectors are used to represent the scheduling sequences of the two parents.

parent 1 = [x_{1}^{1}, x_{2}^{1}, \dots, x_{n_{1}}^{1}; x_{1}^{2}, x_{2}^{2}, \dots, x_{n_{2}}^{2}; \dots; x_{1}^{F}, x_{2}^{F}, \dots, x_{n_{F}}^{F}]

,

parent 2 = [{x_{1}^{'}}^{1}, {x_{2}^{'}}^{1}, \dots,

{x_{n_{1}^{'}}^{'}}^{1}; {x_{1}^{'}}^{2}, {x_{2}^{'}}^{2}, \dots; {x_{n_{2}^{'}}^{'}}^{2}; \dots, {x_{1}^{'}}^{F}, {x_{2}^{'}}^{F}, \dots, {x_{n_{F}^{'}}^{'}}^{F}]

. The superscript x denotes the factory, and the subscript indicates the position in the scheduling sequence of that factory.

Step 1: Both parents check on a position-by-position basis. This step exclusively transfer identical blocks containing at least two consecutive matching jobs from both parents directly to their offspring.

[x_{p}^{c_{1}}, \dots, x_{q}^{c_{1}}] = [x_{p} ’^{c_{2}}, \dots, x_{q}^{' c_{2}}]

. In particular, these blocks that are retained in Parent 1 and Parent 2 are not necessarily in the same factory. An identical block in Parent 1 is processed in factory c1 and Parent 2 is processed in factory c2; Child 1 will retain the block in the same location in factory c1 and Child 2 will retain the block in the same location in factory c2.

Step 2: Two cut points are randomly selected in the scheduling sequence of each factory of the two parents, and the vector of each factory of Child 1 retains all the intercepts of cut point 1 and cut point 2 of Parent 1 and maintains their original positions. Child 2 is generated from Parent 2 using the same principle.

Step 3: In the final step, the missing element will be copied in the relative order of the other parent. In this step, the exchange of parental information is efficiently accomplished and the reassignment of jobs between factories is realized.

Nevertheless, relying exclusively on crossover operations proves inadequate. To enhance population diversity, a random swap mutation (RSM) mechanism is additionally implemented.

RSM: Randomly select two jobs in the sequence and swap their positions.

An illustration of the crossover process is shown in Figure 3.

3.5. Problem-Specific Neighborhood Structures

To address the DHFSP, the designed neighborhood structures generate alternative feasible solutions by modifying the current ones. The effectiveness of these structures has a significant influence on both the solution quality and the computational efficiency of the algorithm. Well-designed neighborhoods can guide the search process toward more promising regions. Hence, in order to improve the performance of IQLMA, six tailored neighborhood structures are proposed to produce superior solutions.

NS1: Randomly select a job from the factory with the largest maximum completion time and insert the jobs into another position of the same factory.

NS2: Randomly select a job from the factory with the largest maximum completion time and exchange the job with another job in the same factory.

NS3: Randomly select a tardy job that does not have the largest tardiness, and exchange that job with all other jobs that have a greater tardiness.

NS4: Randomly select a tardy job and insert it in the more forward position of all the factory positions.

NS5: Randomly select a job in a factory with the largest total tardiness and insert the job into other locations in the same factory.

NS6: Randomly select a tardy job in a factory with the largest number of tardy jobs and insert the job into other locations in the same factory.

3.6. Variable Neighborhood Search with Q-Learning

Q-learning operates by maintaining and updating a state–action value table (Q-table), which stores estimated cumulative rewards to derive an optimal policy [31]. As a model-free algorithm, it has been employed to tackle a range of scheduling problems [23,32]. Figure 4 illustrates this interaction process. The Q-table is updated according to the following formula:

Q (s, a) = Q (s, a) + α (R + γ max Q (s^{'}, a^{'}) - Q (s, a))

(19)

α

and

γ

are the learning rate and discount factor, respectively.

The agent selects actions according to Q-values stored in the Q-table. This work initializes the Q-table with zeros, as shown in Figure 5, to indicate that the agent co-initializes the Q-table with zeros. Actions are selected using an

ϵ

-greedy strategy that maximizes the expected reward while maintaining exploration. A random number p ∈ [0, 1] determines the selection: if p <

ϵ

, the action with the maximum Q value is selected (exploitation); otherwise, a random action is chosen (exploration). The Q-table is iteratively updated through this action–state–reward cycle. An example of a Q-table update is provided in Appendix A.3.

Q-learning, when integrated with neighborhood search, plays the role of adaptively selecting the most promising neighborhood structures based on feedback from the search performance, rather than relying on a fixed or predetermined local search method. This reinforcement learning mechanism allows the search process to dynamically focus on neighborhoods that are more likely to yield improvements, thereby enhancing both convergence speed and solution quality. In this study, the six neighborhood structures described earlier are defined as the action set in the Q-learning framework, where each action corresponds to applying one specific neighborhood to generate new solutions.

In this paper, the state is determined according to the three objective values of each solution. First, the current population is sorted in ascending order of makespan. The top 30% of solutions with the smallest makespan are assigned to State 1. The remaining solutions are then sorted in ascending order of total tardiness, and the top 30% of these solutions are assigned to State 2. The remaining solutions are assigned to State 3.

This study employs a delayed reward mechanism based on population cooperation to train the Q-learning strategy. In each generation, all individuals conduct local search following the current strategy. Upon completion of the local search for the entire population, non-dominated sorting is applied to obtain an updated Pareto front. After obtaining the updated Pareto front via non-dominated sorting, we compute the minimum Euclidean distance from every solution to this front both before and after its local search. This signal is utilized to update the shared Q-learning strategy. Let x be the original solution to be replaced, and y be the candidate solution generated by the local search.

A_{post}

is the non-dominated solution set obtained from the population after applying the variable neighborhood local search, inserting the new solution, and removing dominated solutions. The Euclidean distance from a solution z to

A_{post}

is defined as

d (z, A_{post}) = min_{a \in A_{post}} \sqrt{\sum_{i = 1}^{m} {(f_{i} (z) - f_{i} (a))}^{2}},

(20)

f_{i} (z)

denotes the objective value of z on the i-th objective, with m representing the total number of objectives. The reward

r_{t}

is then given by

r_{t} = \{\begin{matrix} 1, & if d (y, A_{post}) < d (x, A_{post}), \\ 0, & otherwise . \end{matrix}

(21)

This binary reward provides a positive signal whenever the candidate solution is closer to the updated Pareto front than the original solution, thereby encouraging the selection of neighborhood structures that improve convergence towards the Pareto front.

The Q-learning-guided local search is illustrated in Algorithm 2.

Algorithm 2 Variable Neighborhood Search using Q-learning

Input:: $P O P_{t}$ (current population), $P o p S i z e_{t}$ (population size), Q-table, $s_{i}$ (state of the i-th individual), $a_{i}$ (action taken by the i-th individual)
Output:: $P o p_{t}^{'}$ , Q-table
1:: $S_{1} \dots S_{P o p S i z e}$ ← Ranking the three objective values for all individuals in the population yields the state of each individual.
2:: for each i in $P o p S i z e_{t}$ do
3:: $a_{i}$ ← select an action via the $ϵ$ -greedy strategy and Q-table
4:: $P o p_{t} (i)$ ←Apply action $a_{i}$ to the i-th individual ( $P o p_{t} (i)$ ) in the population
5:: end for
6:: $P o p_{t}^{'}$ ← Combine $P o p_{t}^{'} (1) t o P o p_{t}^{'} (P o p S i z e)$ ;
7:: $A_{p o s t}$ ← Non-dominated sorting $(P o p_{t}^{'} a n d P o p_{t})$
8:: for each i in $P o p S i z e_{t}$ do
9:: For each individual $P o p_{t}^{'} (i)$ and $P o p_{t} (i)$ :
10:: for $individual \in {P o p_{t}^{'} (i), P o p_{t} (i)}$ do
11:: $dist \leftarrow \infty$
12:: for each solution $x \in A_{p o s t}$ do
13:: $curr_dist \leftarrow {∥ individual - x ∥}_{2}$ {Euclidean distance}
14:: if $curr_dist < dist$ then
15:: $dist \leftarrow curr_dist$
16:: end if
17:: end for
18:: end for
19:: Record the minimum distance for this individual
20:: Check if $d_{after} < d_{before}$ to evaluate the improvement.
21:: Obtain reward r and next state $s_{i}^{'}$
22:: Update Q-table: $Q (s_{i}, a_{i}) \leftarrow Q (s_{i}, a_{i}) + α [r + γ {max}_{a^{'}} Q (s_{i}^{'}, a^{'}) - Q (s_{i}, a_{i})]$
23:: end for

3.7. Non-Dominated Order and Elitism Strategy

For fast non-dominated ordering of the merged populations, low rank solutions are preferred to high rank solutions. If two solutions have the same rank, the one with the greater crowding distance is selected over the one with the smaller value. The population of the next generation of a given size is selected according to the above principle. In the iterative process, the population of the next generation is merged with the population of the parent. The merged population is then subjected to a fast non-dominated ordering.

4. Experimental Comparison and Analysis

4.1. Experiment Setting

All experiments were conducted in MATLAB R2022b (64-bit) with the Optimization Toolbox and Global Optimization Toolbox enabled. The operating system was Windows 11 (Version 22H2, 64-bit). All computations were performed on a desktop computer equipped with an Intel Core i7-12700K.

This article generates instances by following the design described by Sun [33]. (F) makes up each instance, in which N = 50, 100, 150, 200, F = 2, 3, 4, 5, 6, and S = 2, 4, 6, 8, 10. The processing time of each job at each stage and the sequence-dependent setup time are randomly generated within the range [1, 99]. The number of identical parallel machines are randomly generated within the range [1, 5]. The random seed is set to 2025. In total, there are 4 × 5 × 5 = 100 instances. The CPU time per instance run is set to 0.08 × FN × JN × S s.

Equations (22) and (23) [11] establish the due date for each job in each instance in the mathematical model provided in this study.

\begin{matrix} D_{j} & = P_{j} \times (1 + 3 \times rand (0, 1)) \end{matrix}

(22)

\begin{matrix} P_{j} & = \sum_{i = 1}^{S} p_{i, j} \end{matrix}

(23)

4.2. Experimental Indicators

To evaluate the behavior of IMOMA, two performance metrics were used in the experiments.

The hypervolume (HV) represents the volume of the region bounded by the non-dominated solution set produced by the algorithm in the objective space and the reference points. The reference point is set to (1.2, 1.2, 1.2). A higher HV value indicates better overall performance. HV is calculated as follows.

H V = T (⋃_{i = 1}^{P F} v_{i})

(24)

The Lebesgue measure is defined by

T (•)

, while

v_{i}

denotes the hypervolume bounded by the reference point and the non-dominated solution set.

I G D = \frac{\sum_{x \in P F^{*}} d (x, z)}{|P F^{*}|}

(25)

d (x, z)

represents the minimum Euclidean distance from an individual x in

P F^{*}

to an individual z in PF. The notation

| P F^{*} |

denotes the the number of solutions.

4.3. Parameter Calibration

This section calibrates the main parameters of the algorithm, including population size, discount factor, greedy rate, crossover probability, and mutation probability. The parameters were calibrated using the design of experiments method. An exhaustive analytic factorization experiment is carried out on each parameter. The levels of each parameter were

P S

= {60, 80, 100},

p c

= {0.7, 0.8, 0.9},

p m

= {0.2, 0.4, 0.6},

γ

= {0.7, 0.8, 0.9},

ϵ

= {0.7, 0.8, 0.9}. The five key parameters were analyzed using the Taguchi method of experiment (DOE) using orthogonal array

L_{18} (3^{7})

, which consists of 18 different combinations of

P S

,

p c

,

p m

.

Figure 6 drafts the main effects diagram of three parameters of IMOMA, so the optimal parameter configuration is identified as follows:

P S

= 80,

p c

= 0.7,

p m

= 0.2,

γ

= 0.8,

ϵ

= 0.8.

To further investigate the influence of key parameters on the performance of the proposed IMOMA algorithm, an extended sensitivity analysis was conducted. This analysis adopts a one-factor-at-a-time (OFAT) approach to observe the individual effect of each parameter clearly. During the test of each target parameter, the remaining parameters were fixed at their empirically determined optimal baseline values. The five target parameters were tested across three levels each. The analysis was performed on three representative instances of different scales selected from the benchmark set: a small-scale instance (F = 2, n = 50, s = 2), a medium-scale instance (F = 4, n = 150, s = 4), and a large-scale instance (F = 6, n = 150, s = 6).

The experimental results are shown in Figure 7. Based on the extended sensitivity analysis, the mutation probability (

p_{m}

) exhibits the most pronounced influence on algorithm performance, where increased values consistently lead to degradation across all tested instances, firmly validating the optimal baseline setting of

p_{m} = 0.2

. The population size (

P S

) shows moderate sensitivity, with its optimal value shifting slightly with the problem scale, though the baseline

P S = 80

remains robust. The Q-learning discount factor (

γ

) demonstrates a moderate level of sensitivity, performing optimally near its baseline of

γ = 0.8

. In contrast, the crossover probability (

p_{c}

) and the Q-learning exploration rate (

ϵ

) exhibit relatively low sensitivity within the tested ranges, confirming the stability of their baseline settings (

p_{c} = 0.7

,

ϵ = 0.8

). The overall low-to-moderate sensitivity of the Q-learning hyperparameters (

γ

and

ϵ

) indicates that the proposed Q-learning-guided local search module possesses good robustness, as its performance does not critically depend on their precise tuning. Overall, the results strongly justify the selected parameter configuration as a reliable default, while indicating that fine-tuning efforts for future applications should primarily focus on

p_{m}

and

P S

, particularly when addressing problems of substantially different scales.

4.4. Effectiveness of Each Improvement Component of IMOMA-QL

To examine the effectiveness of each component, IMOMA-QL was compared with four variant versions in which a specific component was removed. These variants are IMOMA-QL without the hybrid initialization strategy (denoted as IMOMA-QL1), IMOMA-QL without the genetic operators (denoted as IMOMA-QL2), IMOMA-QL without the multi-neighborhood search (denoted as IMOMA-QL3), and IMOMA-QL without the Q-learning selection mechanism (denoted as IMOMA-QL4). Specifically, IMOMA-QL1 adopts random population initialization instead of the hybrid method; IMOMA-QL2 and IMOMA-QL3, respectively, remove the genetic operators and the multi-neighborhood search while retaining the rest of the algorithm; and IMOMA-QL4 replaces the Q-learning mechanism with a random local search. To ensure a fair comparison, all algorithmic parameters for these variants remained consistent with those of the original IMOMA-QL. All algorithms were executed independently 10 times on the test instances. The reference Pareto front was constructed from the combined non-dominated solutions of all compared algorithms.

The experimental results systematically evaluate performance across different problem scales by grouping the 100 instances according to the number of factories F, jobs n, and stages s. The average HV and IGD values are presented in Table 2 and Table 3. The complete experimental data for each instance are presented in Appendix A, Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9 and Table A10. IMOMA-QL consistently achieved the best performance across all instances.

Figure 8 and Figure 9 display interval plots with 95% confidence intervals for HV and IGD metrics across all instances, comparing IMOMA-QL with its four variants. The results demonstrate that each component of IMOMA-QL contributes to performance improvements at varying degrees.

The Friedman test rankings with 95% confidence intervals are presented in Table 4. IMOMA-QL achieved first-place rankings against all four variants with statistically significant improvements (all p-values < 0.05), demonstrating that each proposed enhancement contributes substantially to its superior performance. The results indicate that removing any of these components leads to a clear performance drop, demonstrating the necessity of each component.

4.5. Comparison of IMOMA and Other Algorithms

We select four comparative algorithms: IMPGA [34], MQSFLA [25], MOEA/D [35], and NSGA-II [36]. The first two are recent algorithms specifically designed for DHFSP, which is the same problem domain studied in this work. IMPGA minimizes both makespan and total tardiness—two of the three objectives optimized in our paper—and is composed of multiple populations that co-evolve in sub-regions, a greedy inter-factory job insertion neighborhood structure for local search, and a probability-sampling-based re-initialization procedure. MQSFLA also targets the minimization of makespan and total tardiness, sharing a highly similar objective set with our work. It incorporates a memeplex quality measurement mechanism, a search process guided by solution quality, and a novel memeplex shuffling that dynamically selects memeplexes based on evolution quality. The latter two, NSGA-II and MOEA/D, are classic multi-objective optimizers widely used in scheduling problems. NSGA-II employs non-dominated sorting and crowding distance to preserve solution diversity; in contrast, MOEA/D converts the multi-objective problem into a set of scalar subproblems. For IMPGA and MQSFLA, we directly adopt the parameter settings reported in their respective original papers. For NSGA-II and MOEA/D, we set the population size to 50, crossover probability pc to 0.7, and mutation probability pm to 0.2. Together, these algorithms provide diverse and strong benchmarks for evaluating our proposed method.

Each comparison algorithm was performed independently 10 times on each instance. The average HV and IGD values are presented in Table 5 and Table 6. The complete experimental data for each instance are presented in Appendix A, Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9 and Table A10. IMOMA-QL consistently achieved the best performance across all instances.

Figure 10 and Figure 11 display interval plots with 95% confidence intervals for HV and IGD metrics across all instances, comparing IMOMA-QL with its comparative algorithms. As evidenced by the figure, IMOMA-QL outperforms all other algorithms, yielding superior results in both HV (higher) and IGD (lower) metrics. To validate the superior performance of the proposed algorithm, the Friedman test rankings with 95% confidence intervals are presented in Table 7. It can be seen that IMOMA-QL outperforms the other comparison algorithms.

Figure 12 displays the 3D scatter plots representing the non-dominated set obtained from five distinct methods applied to one of the problems. We found that the non-dominated solution sets obtained by the respective algorithms form different layers. The solutions obtained by IMOMA are close to the point at which all target values are lowest. It was clearly found that the non-dominated set obtained by IMOMA-QL is better than the non-dominated set obtained by the other algorithms.

The superior performance of IMOMA-QL can be attributed to four key algorithmic innovations: Specifically, the hybrid initialization strategy improves the diversity and quality of the initial population, providing a better starting point for a search. The genetic operators enhance global exploration and recombination of high-quality solutions. The multi-neighborhood search mechanism diversifies local search patterns, improving the chance of escaping local optima. Finally, the Q-learning-guided variable neighborhood search adaptively selects promising neighborhoods based on search feedback, further enhancing search efficiency.

5. Conclusions

In the field of multi-objective optimization for tardiness-related scheduling problems, most existing studies focus on optimizing makespan along with only one tardiness-related objective. This study addresses DHFSP with SDST, optimizing three critical objectives: makespan, total tardiness, and the number of tardy jobs. This work emphasizes tardiness-related objectives, which are crucial in real-world manufacturing scenarios where meeting the due date is essential—such as just-in-time production, order-driven manufacturing, and supply chain scheduling with strict delivery commitments. To solve this problem, we propose a multi-objective memetic algorithm enhanced with Q-learning-guided variable neighborhood search (VNS). Extensive numerical experiments and comparisons with algorithms demonstrate that the proposed method significantly improves solution quality, convergence speed, and robustness in handling multi-objective DHFSP.

Despite the promising results, this study has certain limitations. The proposed model and algorithm operate under a set of standardized assumptions, such as deterministic processing times and static job availability. Consequently, they cannot be directly applied to real-world scheduling environments.

Future research can extend this work in several meaningful directions. A promising avenue is to investigate more comprehensive and environmentally conscious multi-objectives. For example, meaningful problem variations to explore could include minimizing makespan, total tardiness, and total energy consumption, or alternatively, minimizing makespan and maximizing tardiness and total energy consumption. Addressing such integrated problems would require developing new models that capture the energy dynamics of machines and designing efficient algorithms capable of balancing productivity.

Author Contributions

Resource provision, Y.S.; project administration, Y.S. and Y.L.; data curation and management, Y.L.; figure and visualization, Y.L. and Q.C.; original draft writing, Y.L.; software implementation, X.S.; result validation, X.S. and Q.C.; research supervision, X.S.; study conception and design, H.K.; methodology development, H.K.; formal analysis, H.K.; manuscript revision and editing, H.K. and X.S.; investigation, Q.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Open Foundation of Key Laboratory of Software Engineering of Yunnan Province: 2020SE308, Open Foundation of Key Laboratory of Software Engineering of Yunnan Province: 2020SE309, New round of Double First-class Project of Yunnan University: CY22624103, National Natural Science Foundation of China: 62366057, Special Fund for the Central Government to Guide Local Science: 202407AB110003, and Key Research and Development Program of Yunnan Province: 202402AA310056.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Detailed Experimental Results

Table A1. HV of the proposed algorithm and its four variants (F = 2).

			F = 2
(n, s)	IMOMA-QL	IMOMA-QL1	IMOMA-QL2	IMOMA-QL3	IMOMA-QL4
(50, 2)	1.20	1.10	0.89	0.96	1.08
(50, 4)	1.24	1.08	1.00	1.05	1.15
(50, 6)	1.24	1.15	0.97	1.02	1.15
(50, 8)	1.16	1.13	0.92	1.03	1.08
(50, 10)	1.18	1.10	0.88	0.95	1.07
(100, 2)	1.26	1.16	1.00	1.01	1.15
(100, 4)	1.17	1.11	0.94	0.96	1.08
(100, 6)	1.18	1.11	0.85	0.94	1.07
(100, 8)	1.21	1.14	0.96	1.02	1.14
(100, 10)	1.16	1.07	0.88	0.98	1.06
(150, 2)	1.25	1.09	0.96	1.01	1.11
(150, 4)	1.27	1.19	1.03	1.09	1.14
(150, 6)	1.20	1.09	0.89	1.03	1.06
(150, 8)	1.22	1.13	0.95	1.09	1.10
(150, 10)	1.25	1.20	1.00	1.10	1.12
(200, 2)	1.26	1.17	0.98	1.03	1.16
(200, 4)	1.25	1.22	0.99	1.09	1.16
(200, 6)	1.17	1.10	0.93	0.99	1.08
(200, 8)	1.17	1.08	0.97	1.05	1.09
(200, 10)	1.18	1.13	0.92	1.02	1.11

Table A2. HV of the proposed algorithm and its four variants (F = 3).

			F = 3
(n, s)	IMOMA-QL	IMOMA-QL1	IMOMA-QL2	IMOMA-QL3	IMOMA-QL4
(50, 2)	1.24	1.14	1.02	1.03	1.13
(50, 4)	1.19	1.16	0.98	0.97	1.10
(50, 6)	1.13	1.07	0.91	0.94	1.02
(50, 8)	1.17	1.11	1.02	0.97	1.11
(50, 10)	1.13	1.11	0.96	0.94	1.05
(100, 2)	1.19	1.05	0.96	0.99	1.06
(100, 4)	1.16	1.10	0.99	0.98	1.08
(100, 6)	1.13	1.06	0.86	0.94	1.02
(100, 8)	1.18	1.14	0.96	0.97	1.12
(100, 10)	1.20	1.17	1.00	1.01	1.12
(150, 2)	1.29	1.18	1.04	1.04	1.11
(150, 4)	1.15	1.10	0.94	0.96	1.02
(150, 6)	1.25	1.16	0.99	1.00	1.08
(150, 8)	1.21	1.11	1.05	1.06	1.07
(150, 10)	1.23	1.18	1.03	1.03	1.11
(200, 2)	1.17	1.08	0.94	0.95	1.04
(200, 4)	1.17	1.13	1.00	0.97	1.07
(200, 6)	1.18	1.08	0.92	0.98	1.06
(200, 8)	1.15	1.08	0.95	0.95	1.07
(200, 10)	1.21	1.13	1.04	1.05	1.14

Table A3. HV of the proposed algorithm and its four variants (F = 4).

			F = 4
(n, s)	IMOMA-QL	IMOMA-QL1	IMOMA-QL2	IMOMA-QL3	IMOMA-QL4
(50, 2)	1.29	1.17	1.05	1.09	1.22
(50, 4)	1.20	1.13	1.03	1.09	1.18
(50, 6)	1.15	1.04	0.89	1.00	1.08
(50, 8)	1.18	1.16	0.97	1.06	1.15
(50, 10)	1.13	1.07	0.89	0.98	1.11
(100, 2)	1.19	1.07	0.92	1.01	1.10
(100, 4)	1.13	1.09	0.94	1.02	1.11
(100, 6)	1.11	1.00	0.88	0.97	1.06
(100, 8)	1.12	1.01	0.93	0.99	1.06
(100, 10)	1.17	1.13	0.95	1.08	1.13
(150, 2)	1.20	1.13	0.92	1.05	1.07
(150, 4)	1.17	1.07	0.91	1.06	1.10
(150, 6)	1.24	1.12	0.96	1.09	1.06
(150, 8)	1.13	1.03	0.90	0.94	1.04
(150, 10)	1.21	1.12	0.94	1.07	1.13
(200, 2)	1.22	1.12	0.96	1.09	1.15
(200, 4)	1.14	1.10	0.96	1.05	1.10
(200, 6)	1.13	1.05	0.93	0.96	1.09
(200, 8)	1.16	1.14	0.96	1.00	1.13
(200, 10)	1.21	1.12	1.02	1.09	1.15

Table A4. HV of the proposed algorithm and its four variants (F = 5).

			F = 5
(n, s)	IMOMA-QL	IMOMA-QL1	IMOMA-QL2	IMOMA-QL3	IMOMA-QL4
(50, 2)	1.33	1.25	1.01	1.12	1.20
(50, 4)	1.23	1.16	0.98	1.04	1.17
(50, 6)	1.25	1.19	0.96	1.10	1.14
(50, 8)	1.19	1.14	0.98	1.02	1.10
(50, 10)	1.25	1.17	1.03	1.04	1.17
(100, 2)	1.23	1.13	0.94	1.02	1.14
(100, 4)	1.27	1.18	1.00	1.11	1.14
(100, 6)	1.19	1.11	0.93	1.00	1.09
(100, 8)	1.25	1.15	0.98	1.04	1.15
(100, 10)	1.14	1.07	0.83	1.08	1.06
(150, 2)	1.23	1.07	0.93	0.97	1.07
(150, 4)	1.26	1.16	1.01	1.08	1.14
(150, 6)	1.24	1.14	0.97	1.03	1.08
(150, 8)	1.25	1.15	0.99	1.11	1.14
(150, 10)	1.16	1.06	0.89	1.02	1.03
(200, 2)	1.24	1.13	0.98	1.04	1.11
(200, 4)	1.26	1.20	0.97	1.12	1.21
(200, 6)	1.26	1.19	1.00	1.09	1.16
(200, 8)	1.24	1.16	1.00	1.10	1.18
(200, 10)	1.20	1.15	0.95	1.04	1.14

Table A5. HV of the proposed algorithm and its four variants (F = 6).

			F = 6
(n,s)	IMOMA-QL	IMOMA-QL1	IMOMA-QL2	IMOMA-QL3	IMOMA-QL4
(50, 2)	1.28	1.23	1.03	1.11	1.23
(50, 4)	1.25	1.15	1.00	1.13	1.18
(50, 6)	1.16	1.06	0.87	1.01	1.05
(50, 8)	1.21	1.19	0.94	1.08	1.15
(50, 10)	1.17	1.12	0.92	1.03	1.11
(100, 2)	1.32	1.18	1.02	1.09	1.19
(100, 4)	1.19	1.08	0.93	1.03	1.10
(100, 6)	1.14	1.05	0.85	0.96	1.03
(100, 8)	1.23	1.17	1.01	1.07	1.16
(100, 10)	1.12	1.07	0.87	1.00	1.07
(150, 2)	1.22	1.09	0.92	0.99	1.06
(150, 4)	1.30	1.20	1.05	1.12	1.16
(150, 6)	1.25	1.12	0.92	1.06	1.10
(150, 8)	1.18	1.10	0.90	1.00	1.07
(150, 10)	1.21	1.12	0.92	1.04	1.09
(200, 2)	1.20	1.08	0.93	1.01	1.10
(200, 4)	1.25	1.21	1.00	1.12	1.20
(200, 6)	1.16	1.09	0.90	0.96	1.07
(200, 8)	1.25	1.06	1.04	1.07	1.10
(200, 10)	1.24	1.16	1.00	1.01	1.06

Table A6. IGD of the proposed algorithm and its four variants (F = 2).

			F = 2
(n, s)	IMOMA-QL	IMOMA-QL1	IMOMA-QL2	IMOMA-QL3	IMOMA-QL4
(50, 2)	0.116	0.128	0.153	0.157	0.121
(50, 4)	0.110	0.133	0.160	0.157	0.127
(50, 6)	0.073	0.099	0.153	0.152	0.103
(50, 8)	0.060	0.099	0.127	0.093	0.065
(50, 10)	0.129	0.134	0.143	0.152	0.148
(100, 2)	0.138	0.152	0.177	0.182	0.165
(100, 4)	0.121	0.134	0.165	0.159	0.129
(100, 6)	0.069	0.081	0.117	0.106	0.072
(100, 8)	0.090	0.103	0.160	0.140	0.095
(100, 10)	0.100	0.124	0.182	0.143	0.137
(150, 2)	0.123	0.176	0.193	0.202	0.173
(150, 4)	0.132	0.148	0.176	0.182	0.159
(150, 6)	0.087	0.123	0.136	0.158	0.115
(150, 8)	0.093	0.135	0.134	0.129	0.127
(150, 10)	0.102	0.123	0.151	0.143	0.129
(200, 2)	0.132	0.137	0.161	0.167	0.148
(200, 4)	0.077	0.084	0.152	0.143	0.097
(200, 6)	0.105	0.122	0.163	0.159	0.123
(200, 8)	0.083	0.103	0.122	0.128	0.111
(200, 10)	0.117	0.132	0.167	0.152	0.142

Table A7. IGD of the proposed algorithm and its four variants (F = 3).

			F = 3
(n, s)	IMOMA-QL	IMOMA-QL1	IMOMA-QL2	IMOMA-QL3	IMOMA-QL4
(50, 2)	0.110	0.131	0.177	0.182	0.155
(50, 4)	0.108	0.144	0.169	0.162	0.132
(50, 6)	0.070	0.095	0.144	0.133	0.093
(50, 8)	0.094	0.140	0.161	0.156	0.121
(50, 10)	0.108	0.140	0.178	0.168	0.136
(100, 2)	0.082	0.107	0.157	0.165	0.130
(100, 4)	0.103	0.130	0.177	0.181	0.151
(100, 6)	0.072	0.098	0.161	0.149	0.098
(100, 8)	0.064	0.071	0.103	0.093	0.078
(100, 10)	0.071	0.078	0.138	0.141	0.075
(150, 2)	0.123	0.155	0.175	0.181	0.150
(150, 4)	0.083	0.119	0.154	0.157	0.110
(150, 6)	0.067	0.114	0.176	0.150	0.112
(150, 8)	0.050	0.087	0.149	0.123	0.107
(150, 10)	0.109	0.132	0.145	0.162	0.142
(200, 2)	0.117	0.136	0.156	0.165	0.168
(200, 4)	0.115	0.132	0.161	0.153	0.161
(200, 6)	0.076	0.102	0.187	0.151	0.113
(200, 8)	0.050	0.075	0.121	0.134	0.100
(200, 10)	0.062	0.090	0.165	0.166	0.083

Table A8. IGD of the proposed algorithm and its four variants (F = 4).

			F = 4
(n, s)	IMOMA-QL	IMOMA-QL1	IMOMA-QL2	IMOMA-QL3	IMOMA-QL4
(50, 2)	0.122	0.157	0.176	0.165	0.131
(50, 4)	0.115	0.153	0.168	0.170	0.120
(50, 6)	0.101	0.137	0.186	0.150	0.110
(50, 8)	0.107	0.149	0.163	0.139	0.130
(50, 10)	0.082	0.122	0.163	0.140	0.085
(100, 2)	0.097	0.130	0.176	0.154	0.131
(100, 4)	0.107	0.157	0.178	0.157	0.108
(100, 6)	0.093	0.138	0.181	0.127	0.112
(100, 8)	0.053	0.086	0.118	0.129	0.070
(100, 10)	0.069	0.091	0.106	0.139	0.080
(150, 2)	0.124	0.136	0.166	0.142	0.135
(150, 4)	0.105	0.127	0.143	0.140	0.131
(150, 6)	0.060	0.103	0.124	0.091	0.099
(150, 8)	0.097	0.145	0.137	0.139	0.125
(150, 10)	0.114	0.142	0.176	0.179	0.130
(200, 2)	0.107	0.133	0.169	0.155	0.121
(200, 4)	0.130	0.146	0.163	0.171	0.153
(200, 6)	0.101	0.125	0.167	0.152	0.132
(200, 8)	0.116	0.138	0.171	0.164	0.147
(200, 10)	0.105	0.122	0.150	0.142	0.108

Table A9. IGD of the proposed algorithm and its four variants (F = 5).

			F = 5
(n, s)	IMOMA-QL	IMOMA-QL1	IMOMA-QL2	IMOMA-QL3	IMOMA-QL4
(50, 2)	0.102	0.134	0.151	0.141	0.113
(50, 4)	0.105	0.126	0.165	0.138	0.127
(50, 6)	0.060	0.089	0.134	0.120	0.073
(50, 8)	0.123	0.159	0.180	0.169	0.154
(50, 10)	0.099	0.137	0.137	0.124	0.125
(100, 2)	0.082	0.114	0.106	0.113	0.105
(100, 4)	0.079	0.101	0.140	0.147	0.084
(100, 6)	0.088	0.113	0.156	0.146	0.090
(100, 8)	0.070	0.083	0.105	0.101	0.075
(100, 10)	0.061	0.098	0.124	0.128	0.093
(150, 2)	0.134	0.172	0.160	0.196	0.153
(150, 4)	0.099	0.114	0.162	0.157	0.118
(150, 6)	0.115	0.154	0.174	0.156	0.125
(150, 8)	0.111	0.144	0.150	0.133	0.118
(150, 10)	0.099	0.121	0.154	0.137	0.134
(200, 2)	0.140	0.141	0.178	0.178	0.152
(200, 4)	0.123	0.148	0.181	0.170	0.147
(200, 6)	0.073	0.089	0.121	0.098	0.089
(200, 8)	0.076	0.097	0.129	0.111	0.109
(200, 10)	0.127	0.149	0.168	0.158	0.152

Table A10. IGD of the proposed algorithm and its four variants (F = 6).

			F = 6
(n,s)	IMOMA-QL	IMOMA-QL1	IMOMA-QL2	IMOMA-QL3	IMOMA-QL4
(50, 2)	0.155	0.160	0.178	0.176	0.167
(50, 4)	0.119	0.135	0.172	0.163	0.139
(50, 6)	0.088	0.103	0.167	0.149	0.119
(50, 8)	0.096	0.119	0.118	0.153	0.125
(50, 10)	0.104	0.120	0.158	0.168	0.136
(100, 2)	0.096	0.107	0.151	0.141	0.114
(100, 4)	0.137	0.147	0.182	0.211	0.163
(100, 6)	0.075	0.082	0.148	0.109	0.077
(100, 8)	0.05	0.072	0.085	0.110	0.052
(100, 10)	0.061	0.063	0.130	0.105	0.066
(150, 2)	0.115	0.128	0.126	0.157	0.120
(150, 4)	0.118	0.122	0.185	0.188	0.132
(150, 6)	0.058	0.066	0.136	0.128	0.074
(150, 8)	0.100	0.121	0.164	0.148	0.011
(150, 10)	0.11	0.108	0.152	0.177	0.141
(200, 2)	0.130	0.135	0.172	0.183	0.159
(200, 4)	0.129	0.140	0.186	0.179	0.137
(200, 6)	0.100	0.118	0.161	0.135	0.126
(200, 8)	0.098	0.102	0.155	0.170	0.101
(200, 10)	0.110	0.125	0.182	0.182	0.125

Table A11. HV of the proposed algorithm and the comparison algorithms (F = 2).

			F = 2
(n,s)	IMOMA-QL	IMPGA	MQSLFA	MOEA/D	NSGA-II
(50, 2)	1.29	0.97	1.00	0.72	0.88
(50, 4)	1.18	0.96	0.90	0.58	0.71
(50, 6)	1.18	0.92	0.88	0.62	0.74
(50, 8)	1.24	0.98	0.94	0.66	0.67
(50, 10)	1.27	0.96	0.98	0.75	0.73
(100, 2)	1.27	0.95	0.98	0.66	0.81
(100, 4)	1.33	0.92	0.95	0.76	0.82
(100, 6)	1.25	0.99	0.93	0.66	0.71
(100, 8)	1.15	0.87	0.85	0.56	0.67
(100, 10)	1.18	0.85	0.96	0.59	0.66
(150, 2)	1.30	0.99	1.01	0.64	0.79
(150, 4)	1.18	0.98	0.88	0.52	0.64
(150, 6)	1.34	0.90	0.91	0.66	0.80
(150, 8)	1.28	1.01	0.88	0.61	0.83
(150, 10)	1.22	0.95	0.97	0.67	0.78
(200, 2)	1.28	0.95	1.07	0.73	0.82
(200, 4)	1.28	0.99	0.95	0.62	0.81
(200, 6)	1.29	0.96	0.98	0.70	0.80
(200, 8)	1.26	1.03	0.95	0.76	0.78
(200, 10)	1.25	0.86	1.01	0.63	0.73

Table A12. HV of the proposed algorithm and the comparison algorithms (F = 3).

			F = 3
(n,s)	IMOMA-QL	IMPGA	MQSLFA	MOEA/D	NSGA-II
(50, 2)	1.27	1.01	0.98	0.74	0.84
(50, 4)	1.22	0.94	0.85	0.66	0.67
(50, 6)	1.24	0.98	0.93	0.65	0.73
(50, 8)	1.15	0.88	0.83	0.62	0.78
(50, 10)	1.31	0.97	0.99	0.72	0.72
(100, 2)	1.21	0.95	1.04	0.72	0.79
(100, 4)	1.29	1.00	0.91	0.67	0.81
(100, 6)	1.20	0.90	0.95	0.62	0.77
(100, 8)	1.17	0.82	0.83	0.57	0.72
(100, 10)	1.19	0.82	0.94	0.63	0.63
(150, 2)	1.19	0.88	0.97	0.68	0.75
(150, 4)	1.14	0.95	0.86	0.65	0.77
(150, 6)	1.34	1.07	0.95	0.74	0.87
(150, 8)	1.18	0.97	0.95	0.72	0.77
(150, 10)	1.26	0.85	0.88	0.58	0.71
(200, 2)	1.27	0.95	0.99	0.69	0.84
(200, 4)	1.22	0.88	0.83	0.59	0.69
(200, 6)	1.25	0.92	1.00	0.71	0.68
(200, 8)	1.25	0.97	0.88	0.64	0.75
(200, 10)	1.12	0.81	0.90	0.59	0.78

Table A13. HV of the proposed algorithm and the comparison algorithms (F = 4).

			F = 4
(n,s)	IMOMA-QL	IMPGA	MQSLFA	MOEA/D	NSGA-II
(50, 2)	1.25	0.98	0.87	0.78	0.75
(50, 4)	1.14	0.95	0.84	0.70	0.60
(50, 6)	1.24	0.96	0.90	0.63	0.71
(50, 8)	1.16	0.87	0.80	0.68	0.78
(50, 10)	1.33	0.94	0.90	0.66	0.72
(100, 2)	1.19	0.94	0.96	0.69	0.77
(100, 4)	1.28	1.04	0.91	0.61	0.84
(100, 6)	1.17	0.93	0.84	0.54	0.80
(100, 8)	1.15	0.86	0.74	0.55	0.71
(100, 10)	1.17	0.82	0.91	0.55	0.58
(150, 2)	1.15	0.88	0.98	0.61	0.73
(150, 4)	1.14	0.92	0.87	0.63	0.77
(150, 6)	1.25	1.03	0.93	0.73	0.83
(150, 8)	1.21	0.98	0.87	0.71	0.82
(150, 10)	1.24	0.87	0.82	0.61	0.73
(200, 2)	1.27	0.96	0.96	0.63	0.87
(200, 4)	1.24	0.93	0.78	0.56	0.71
(200, 6)	1.20	0.89	0.99	0.66	0.69
(200, 8)	1.27	0.96	0.84	0.61	0.67
(200, 10)	1.15	0.90	0.84	0.55	0.69

Table A14. HV of the proposed algorithm and the comparison algorithms (F = 5).

			F = 5
(n,s)	IMOMA-QL	IMPGA	MQSLFA	MOEA/D	NSGA-II
(50, 2)	1.32	0.93	1.00	0.75	0.81
(50, 4)	1.21	0.94	0.85	0.66	0.74
(50, 6)	1.25	0.99	0.91	0.63	0.75
(50, 8)	1.13	0.95	0.85	0.69	0.85
(50, 10)	1.25	0.95	0.93	0.60	0.63
(100, 2)	1.19	0.89	0.93	0.64	0.71
(100, 4)	1.23	1.09	0.96	0.55	0.76
(100, 6)	1.20	0.90	0.83	0.49	0.72
(100, 8)	1.18	0.89	0.77	0.50	0.64
(100, 10)	1.19	0.77	0.87	0.53	0.59
(150, 2)	1.18	0.82	0.94	0.59	0.74
(150, 4)	1.15	0.88	0.86	0.57	0.76
(150, 6)	1.31	1.04	0.95	0.76	0.90
(150, 8)	1.17	0.94	0.83	0.71	0.82
(150, 10)	1.25	0.86	0.82	0.59	0.77
(200, 2)	1.20	0.98	1.01	0.56	0.80
(200, 4)	1.29	0.93	0.81	0.56	0.73
(200, 6)	1.19	0.84	0.96	0.59	0.60
(200, 8)	1.27	0.92	0.81	0.61	0.70
(200, 10)	1.23	0.81	0.87	0.53	0.70

Table A15. HV of the proposed algorithm and the comparison algorithms (F = 6).

			F = 6
(n,s)	IMOMA-QL	IMPGA	MQSLFA	MOEA/D	NSGA-II
(50, 2)	1.20	0.96	1.01	0.68	0.71
(50, 4)	1.21	0.93	0.87	0.69	0.75
(50, 6)	1.25	0.91	0.83	0.68	0.79
(50, 8)	1.13	0.94	0.84	0.71	0.78
(50, 10)	1.25	0.82	0.88	0.63	0.72
(100, 2)	1.19	0.86	0.96	0.66	0.76
(100, 4)	1.23	0.92	0.96	0.60	0.79
(100, 6)	1.20	0.82	0.85	0.54	0.73
(100, 8)	1.18	0.94	0.95	0.53	0.68
(100, 10)	1.09	0.96	0.90	0.56	0.62
(150, 2)	1.18	0.94	0.84	0.63	0.78
(150, 4)	1.15	0.89	0.85	0.61	0.78
(150, 6)	1.22	0.84	0.87	0.72	0.80
(150, 8)	1.17	1.00	0.89	0.66	0.83
(150, 10)	1.25	0.95	0.94	0.63	0.81
(200, 2)	1.20	0.88	0.86	0.59	0.82
(200, 4)	1.29	0.83	0.88	0.58	0.77
(200, 6)	1.19	0.91	0.87	0.63	0.73
(200, 8)	1.27	0.90	0.86	0.64	0.75
(200, 10)	1.23	0.98	0.95	0.56	0.73

Table A16. IGD of the proposed algorithm and the comparison algorithms (F = 2).

			F = 2
(n,s)	IMOMA-QL	IMPGA	MQSLFA	MOEA/D	NSGA-II
(50, 2)	0.096	0.131	0.159	0.211	0.189
(50, 4)	0.117	0.159	0.173	0.264	0.216
(50, 6)	0.105	0.173	0.163	0.267	0.246
(50, 8)	0.135	0.181	0.169	0.244	0.204
(50, 10)	0.146	0.166	0.152	0.239	0.279
(100, 2)	0.076	0.15	0.19	0.256	0.212
(100, 4)	0.110	0.174	0.182	0.284	0.234
(100, 6)	0.124	0.167	0.146	0.274	0.256
(100, 8)	0.066	0.15	0.19	0.257	0.21
(100, 10)	0.102	0.165	0.183	0.289	0.223
(150, 2)	0.133	0.153	0.16	0.227	0.189
(150, 4)	0.098	0.163	0.178	0.296	0.214
(150, 6)	0.131	0.152	0.1611	0.278	0.245
(150, 8)	0.105	0.131	0.159	0.296	0.144
(150, 10)	0.132	0.155	0.179	0.265	0.263
(200, 2)	0.089	0.166	0.135	0.254	0.223
(200, 4)	0.078	0.169	0.174	0.288	0.237
(200, 6)	0.066	0.132	0.158	0.275	0.257
(200, 8)	0.094	0.145	0.163	0.274	0.223
(200, 10)	0.107	0.158	0.165	0.231	0.17

Table A17. IGD of the proposed algorithm and the comparison algorithms (F = 3).

			F = 3
(n,s)	IMOMA-QL	IMPGA	MQSLFA	MOEA/D	NSGA-II
(50, 2)	0.06	0.12	0.144	0.195	0.189
(50, 4)	0.104	0.165	0.172	0.224	0.191
(50, 6)	0.129	0.182	0.179	0.235	0.229
(50, 8)	0.11	0.175	0.161	0.254	0.242
(50, 10)	0.133	0.164	0.151	0.228	0.232
(100, 2)	0.053	0.118	0.103	0.223	0.181
(100, 4)	0.083	0.101	0.118	0.264	0.245
(100, 6)	0.086	0.143	0.13	0.295	0.198
(100, 8)	0.078	0.128	0.139	0.251	0.183
(100, 10)	0.069	0.137	0.123	0.185	0.169
(150, 2)	0.071	0.158	0.145	0.258	0.171
(150, 4)	0.074	0.133	0.164	0.212	0.201
(150, 6)	0.085	0.179	0.176	0.231	0.218
(150, 8)	0.059	0.115	0.132	0.252	0.172
(150, 10)	0.115	0.169	0.158	0.252	0.192
(200, 2)	0.076	0.125	0.121	0.289	0.167
(200, 4)	0.071	0.124	0.136	0.296	0.175
(200, 6)	0.08	0.151	0.131	0.264	0.223
(200, 8)	0.096	0.136	0.142	0.255	0.233
(200, 10)	0.082	0.163	0.178	0.278	0.209

Table A18. IGD of the proposed algorithm and the comparison algorithms (F = 4).

			F = 4
(n,s)	IMOMA-QL	IMPGA	MQSLFA	MOEA/D	NSGA-II
(50, 2)	0.062	0.161	0.207	0.275	0.253
(50, 4)	0.132	0.156	0.161	0.255	0.235
(50, 6)	0.078	0.141	0.173	0.252	0.228
(50, 8)	0.076	0.172	0.189	0.246	0.216
(50, 10)	0.105	0.166	0.172	0.236	0.202
(100, 2)	0.058	0.117	0.106	0.303	0.223
(100, 4)	0.073	0.129	0.166	0.241	0.171
(100, 6)	0.069	0.141	0.136	0.254	0.154
(100, 8)	0.054	0.134	0.146	0.258	0.161
(100, 10)	0.076	0.174	0.157	0.223	0.238
(150, 2)	0.069	0.142	0.16	0.266	0.171
(150, 4)	0.091	0.167	0.145	0.229	0.179
(150, 6)	0.094	0.189	0.212	0.212	0.239
(150, 8)	0.063	0.143	0.16	0.264	0.151
(150, 10)	0.111	0.154	0.159	0.256	0.187
(200, 2)	0.102	0.142	0.134	0.262	0.157
(200, 4)	0.09	0.162	0.149	0.273	0.223
(200, 6)	0.069	0.131	0.123	0.245	0.205
(200, 8)	0.076	0.135	0.144	0.28	0.197
(200, 10)	0.066	0.133	0.142	0.259	0.182

Table A19. IGD of the proposed algorithm and the comparison algorithms (F = 5).

			F = 5
(n,s)	IMOMA-QL	IMPGA	MQSLFA	MOEA/D	NSGA-II
(50, 2)	0.107	0.163	0.143	0.229	0.185
(50, 4)	0.123	0.175	0.184	0.256	0.211
(50, 6)	0.125	0.135	0.158	0.256	0.196
(50, 8)	0.081	0.153	0.171	0.258	0.194
(50, 10)	0.113	0.189	0.215	0.276	0.233
(100, 2)	0.084	0.164	0.214	0.244	0.132
(100, 4)	0.135	0.153	0.173	0.269	0.236
(100, 6)	0.092	0.147	0.164	0.278	0.205
(100, 8)	0.069	0.129	0.111	0.324	0.206
(100, 10)	0.107	0.169	0.151	0.278	0.214
(150, 2)	0.129	0.183	0.25	0.319	0.269
(150, 4)	0.102	0.191	0.211	0.282	0.251
(150, 6)	0.125	0.142	0.154	0.303	0.188
(150, 8)	0.095	0.156	0.148	0.237	0.247
(150, 10)	0.11	0.162	0.175	0.251	0.267
(200, 2)	0.107	0.164	0.152	0.266	0.332
(200, 4)	0.079	0.132	0.25	0.29	0.311
(200, 6)	0.144	0.183	0.196	0.279	0.233
(200, 8)	0.057	0.147	0.153	0.225	0.165
(200, 10)	0.119	0.189	0.157	0.239	0.195

Table A20. IGD of the proposed algorithm and the comparison algorithms (F = 6).

			F = 6
(n,s)	IMOMA-QL	IMPGA	MQSLFA	MOEA/D	NSGA-II
(50, 2)	0.075	0.153	0.148	0.243	0.194
(50, 4)	0.121	0.139	0.146	0.220	0.174
(50, 6)	0.115	0.181	0.178	0.236	0.180
(50, 8)	0.114	0.167	0.188	0.232	0.216
(50, 10)	0.136	0.178	0.166	0.257	0.227
(100, 2)	0.075	0.136	0.133	0.246	0.177
(100, 4)	0.085	0.164	0.142	0.256	0.200
(100, 6)	0.088	0.157	0.112	0.230	0.173
(100, 8)	0.092	0.122	0.160	0.238	0.198
(100, 10)	0.072	0.129	0.161	0.249	0.212
(150, 2)	0.114	0.166	0.183	0.223	0.183
(150, 4)	0.082	0.135	0.171	0.253	0.247
(150, 6)	0.069	0.137	0.146	0.262	0.156
(150, 8)	0.114	0.194	0.197	0.249	0.228
(150, 10)	0.094	0.165	0.158	0.223	0.189
(200, 2)	0.056	0.128	0.136	0.224	0.178
(200, 4)	0.074	0.153	0.162	0.356	0.220
(200, 6)	0.115	0.184	0.183	0.263	0.202
(200, 8)	0.076	0.173	0.186	0.227	0.207
(200, 10)	0.068	0.132	0.146	0.256	0.187

Appendix A.2. Illustrative Example of SB2OX Crossover

This appendix provides a step-by-step illustrative example of the SB2OX crossover operator, complementing the description in Section 3.2.

Consider two parent solutions, each with two factories. The semicolon separates the factory sequences

\begin{matrix} Parent 1 : & {1, 5, 8, 11, 9, 6, 4, 3; 12, 2, 7, 13, 6, 14, 15, 10} \\ Parent 2 : & {2, 16, 8, 13, 10, 6, 4, 3; 11, 6, 5, 12, 14, 15, 9, 10} \end{matrix}

Step 1: Identical Block Inheritance—Both parents are compared position by position. Identical blocks containing at least two consecutive matching jobs are identified and directly inherited by the offspring. In the first factory, jobs 4 and 3 at positions 7–8 are identical in both parents. In the second factory, jobs 14 and 15 at positions 6–7 are identical in both parents. These blocks are transferred to the children, preserving their factory assignments and positions:

Child 1 and Child 2 after Step 1 : {_,_,_,_,_,_, 4, 3;_,_,_,_,_, 14, 15,_}

where _ denotes an empty position.

Step 2: Random Segment Inheritance—Two cut points are randomly selected in each factory. The segments between these cut points are inherited as follows. For Child 1 (derived from Parent 1), the first factory retains jobs at positions 3–5 (jobs 8, 11, and 13) and the second factory retains jobs at positions 4–5 (jobs 13 and 12). For Child 2 (derived from Parent 2), the first factory retains jobs at positions 2–4 (jobs 16, 8, and 13) and the second factory retains jobs at positions 2–5 (jobs 6, 5, 12, and 14). After Step 2

\begin{matrix} Child 1 : & {_,_, 8, 11, 13,_, 4, 3;_,_,_, 13, 12, 14, 15,_} \\ Child 2 : & {_, 16, 8, 13,_,_, 4, 3;_, 1, 6, 5, 12, 14, 15,_} \end{matrix}

Step 3: Remaining Position Filling—The empty positions are filled with the missing jobs in the order they appear in the other parent. For Child 1, missing jobs are taken from Parent 2’s sequence (excluding those already in Child 1) and filled left-to-right. For Child 2, missing jobs are taken from Parent 1’s sequence.

The final children are

\begin{matrix} Child 1 : & {2, 16, 8, 11, 9, 10, 4, 3; 7, 1, 6, 13, 12, 14, 15, 5} \\ Child 2 : & {11, 16, 8, 13, 9, 2, 4, 3; 7, 1, 6, 5, 12, 14, 15, 10} \end{matrix}

Appendix A.3. Q-Table Update Example

To illustrate the Q-learning update mechanism, consider a Q-table with three states and six actions per state. The initial Q-values are randomly initialized as integers between 1 and 3:

Table A21. Initial Q-table (with integer values 1–3).

State	Action 1	Action 2	Action 3	Action 4	Action 5	Action 6
State1	3	1	2	2	1	2
State2	1	2	1	3	2	1
State3	2	1	3	2	2	1

Assume the agent is in State 1 and greedily selects Action 1 (which has the highest Q-value of 3 in State 1). For the purpose of this example, we assume that after executing this action, the agent receives a reward

r = 1

and transitions to State 2.

Assuming learning rate

α = 1

and discount factor

γ = 0.9

Q (s, a) \leftarrow Q (s, a) + α [r + γ max_{a^{'}} Q (s^{'}, a^{'}) - Q (s, a)]

Current Q-value: $Q (State 1, Action 1) = 3$ .
Maximum Q-value in next state: ${max}_{a^{'}} Q (State 2, a^{'}) = 3$ (Action 4 in State 2).
Temporal difference target: $r + γ {max}_{a^{'}} Q (s^{'}, a^{'}) = 1 + 0.9 \times 3 = 1 + 2.7 = 3.7$ .
Updated Q-value: $Q (State 1, Action 1) \leftarrow 3.7$ .

After the update, the Q-table becomes

Table A22. Q-table after update with

α = 1.0

.

Table A22. Q-table after update with

α = 1.0

.

State	Action 1	Action 2	Action 3	Action 4	Action 5	Action 6
State 1	3.7	1	2	2	1	2
State 2	1	2	1	3	2	1
State 3	2	1	3	2	2	1

Figure A1. Flowchart of the proposed algorithm.

References

Rifai, A.P.; Nguyen, H.T.; Dawal, S.Z.M. Multi-objective adaptive large neighborhood search for distributed reentrant permutation flow shop scheduling. Appl. Soft Comput. 2016, 40, 42–57. [Google Scholar] [CrossRef]
Chen, J.-f.; Wang, L.; Peng, Z.-p. A collaborative optimization algorithm for energy-efficient multi-objective distributed no-idle flow-shop scheduling. Swarm Evol. Comput. 2019, 50, 100557. [Google Scholar] [CrossRef]
Li, J.Q.; Song, M.X.; Wang, L.; Duan, P.Y.; Han, Y.Y.; Sang, H.Y.; Pan, Q.K. Hybrid Artificial Bee Colony Algorithm for a Parallel Batching Distributed Flow-Shop Problem With Deteriorating Jobs. IEEE Trans. Cybern. 2020, 50, 2425–2439. [Google Scholar] [CrossRef]
Lin, J.; Zhang, S. An effective hybrid biogeography-based optimization algorithm for the distributed assembly permutation flow-shop scheduling problem. Comput. Ind. Eng. 2016, 97, 128–136. [Google Scholar] [CrossRef]
Huang, J.; Gu, X. Distributed assembly permutation flow-shop scheduling problem with sequence-dependent setup times using a novel biogeography-based optimization algorithm. Eng. Optim. 2021, 54, 593–613. [Google Scholar] [CrossRef]
Zhang, H.; Xu, G.; Pan, R.; Ge, H. A novel heuristic method for the energy-efficient flexible job-shop scheduling problem with sequence-dependent set-up and transportation time. Eng. Optim. 2021, 54, 1646–1667. [Google Scholar] [CrossRef]
Deng, J.; Zhang, J.; Yang, S. A hybrid genetic programming algorithm for the distributed assembly scheduling problems with transportation and sequence-dependent setup times. Eng. Optim. 2024, 57, 786–812. [Google Scholar] [CrossRef]
Fu, Y.; Tian, G.; Fathollahi-Fard, A.M.; Ahmadi, A.; Zhang, C. Stochastic multi-objective modelling and optimization of an energy-conscious distributed permutation flow shop scheduling problem with the total tardiness constraint. J. Clean. Prod. 2019, 226, 515–525. [Google Scholar] [CrossRef]
Lei, C.; Zhao, N.; Ye, S.; Wu, X. Memetic algorithm for solving flexible flow-shop scheduling problems with dynamic transport waiting times. Comput. Ind. Eng. 2020, 139, 105984. [Google Scholar] [CrossRef]
Xu, Y.; Jiang, X.; Li, J.; Xing, L.; Song, Y. A knowledge-driven memetic algorithm for the energy-efficient distributed homogeneous flow shop scheduling problem. Swarm Evol. Comput. 2024, 89, 101625. [Google Scholar] [CrossRef]
Deng, J.; Wang, L. A competitive memetic algorithm for multi-objective distributed permutation flow shop scheduling problem. Swarm Evol. Comput. 2017, 32, 121–131. [Google Scholar] [CrossRef]
Li, Y.; Li, X.; Gao, L.; Zhang, B.; Pan, Q.K.; Tasgetiren, M.F.; Meng, L. A discrete artificial bee colony algorithm for distributed hybrid flowshop scheduling problem with sequence-dependent setup times. Int. J. Prod. Res. 2021, 59, 3880–3899. [Google Scholar] [CrossRef]
Zhou, B.-h.; Hu, L.-m.; Zhong, Z.-y. A hybrid differential evolution algorithm with estimation of distribution algorithm for reentrant hybrid flow shop scheduling problem. Neural Comput. Appl. 2018, 30, 193–209. [Google Scholar] [CrossRef]
Shao, Z.; Pi, D.; Shao, W. Hybrid enhanced discrete fruit fly optimization algorithm for scheduling blocking flow-shop in distributed environment. Expert Syst. Appl. 2020, 145, 113147. [Google Scholar] [CrossRef]
Naderi, B.; Ruiz, R. A scatter search algorithm for the distributed permutation flowshop scheduling problem. Eur. J. Oper. Res. 2014, 239, 323–334. [Google Scholar] [CrossRef]
Wang, Y.; Cai, Z.; Guo, L.; Li, G.; Yu, Y.; Gao, S. A spherical evolution algorithm with two-stage search for global optimization and real-world problems. Inf. Sci. 2024, 665, 120424. [Google Scholar] [CrossRef]
Wang, S.; Cao, B.; Teng, B. Torsional tribological behavior of polytetrafluoroethylene composites filled with hexagonal boron nitride and phenyl p-hydroxybenzoate under different angular displacements. Ind. Lubr. Tribol. 2015, 67, 139–149. [Google Scholar] [CrossRef]
Zhang, G.; Xing, K. Memetic social spider optimization algorithm for scheduling two-stage assembly flowshop in a distributed environment. Comput. Ind. Eng. 2018, 125, 423–433. [Google Scholar] [CrossRef]
Wang, J.-j.; Wang, L. A Bi-Population Cooperative Memetic Algorithm for Distributed Hybrid Flow-Shop Scheduling. IEEE Trans. Emerg. Top. Comput. Intell. 2021, 5, 947–961. [Google Scholar] [CrossRef]
Zhang, G.; Liu, B.; Wang, L.; Yu, D.; Xing, K. Distributed Co-Evolutionary Memetic Algorithm for Distributed Hybrid Differentiation Flowshop Scheduling Problem. IEEE Trans. Evol. Comput. 2022, 26, 1043–1057. [Google Scholar] [CrossRef]
Shao, W.; Shao, Z.; Pi, D. A network memetic algorithm for energy and labor-aware distributed heterogeneous hybrid flow shop scheduling problem. Swarm Evol. Comput. 2022, 75, 101190. [Google Scholar] [CrossRef]
Wang, H.; Sarker, B.R.; Li, J.; Li, J. Adaptive scheduling for assembly job shop with uncertain assembly times based on dual Q-learning. Int. J. Prod. Res. 2021, 59, 5867–5883. [Google Scholar] [CrossRef]
Zhao, C.; Wu, L.; Zuo, C.; Zhang, H. An improved fruit fly optimization algorithm with Q-learning for solving distributed permutation flow shop scheduling problems. Complex Intell. Syst. 2024, 10, 5965–5988. [Google Scholar] [CrossRef]
Lu, B.; Gao, K.; Ren, Y.; Li, D.; Slowik, A. Combining meta-heuristics and Q-learning for scheduling lot-streaming hybrid flow shops with consistent sublots. Swarm Evol. Comput. 2024, 91, 101731. [Google Scholar] [CrossRef]
Cai, J.; Lei, D.; Li, M. A shuffled frog-leaping algorithm with memeplex quality for bi-objective distributed scheduling in hybrid flow shop. Int. J. Prod. Res. 2021, 59, 5404–5421. [Google Scholar] [CrossRef]
Lei, D.; Wang, T. Solving distributed two-stage hybrid flowshop scheduling using a shuffled frog-leaping algorithm with memeplex grouping. Eng. Optim. 2020, 52, 1461–1474. [Google Scholar] [CrossRef]
Lei, D.; Su, B. A multi-class teaching–learning-based optimization for multi-objective distributed hybrid flow shop scheduling. Knowl.-Based Syst. 2023, 263, 110252. [Google Scholar] [CrossRef]
Lei, D.; Zheng, Y. Hybrid flow shop scheduling with assembly operations and key objectives: A novel neighborhood search. Appl. Soft Comput. 2017, 61, 122–128. [Google Scholar] [CrossRef]
Chiang, T.-C.; Cheng, H.-C.; Fu, L.-C. NNMA: An effective memetic algorithm for solving multiobjective permutation flow shop scheduling problems. Expert Syst. Appl. 2011, 38, 5986–5999. [Google Scholar] [CrossRef]
Ruiz, R.; Maroto, C.; Alcaraz, J. Solving the flowshop scheduling problem with sequence dependent setup times using advanced metaheuristics. Eur. J. Oper. Res. 2005, 165, 34–54. [Google Scholar] [CrossRef]
Watkins, C.J.C.H.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
Xi, B.; Lei, D. Q-Learning-Based Teaching-Learning Optimization for Distributed Two-Stage Hybrid Flow Shop Scheduling with Fuzzy Processing Time. Complex Syst. Model. Simul. 2022, 2, 113–129. [Google Scholar] [CrossRef]
Sun, X.; Gong, Y.; Kang, H.; Lei, W.; Jin, Z.; Li, Z.; Shen, Y.; Chen, Q. A multi-objective differential evolution algorithm for the distributed hybrid flowshop scheduling problem with deteriorating jobs. Eng. Optim. 2025, 57, 3101–3133. [Google Scholar] [CrossRef]
Cui, H.; Li, X.; Gao, L.; Zhang, C. Multi-population genetic algorithm with greedy job insertion inter-factory neighbourhoods for multi-objective distributed hybrid flow-shop scheduling with unrelated-parallel machines considering tardiness. Int. J. Prod. Res. 2024, 62, 4427–4445. [Google Scholar] [CrossRef]
Li, H.; Zhang, Q. Multiobjective Optimization Problems With Complicated Pareto Sets, MOEA/D and NSGA-II. IEEE Trans. Evol. Comput. 2009, 13, 284–302. [Google Scholar] [CrossRef]
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]

Figure 1. Illustration of DHFSP.

Figure 2. The flowchart of the selection.

Figure 3. The crossover operator.

Figure 4. Learning process of Q-learning.

Figure 5. The initial Q-table.

Figure 6. Main effects plot.

Figure 7. Extended sensitivity analysis.

Figure 8. HV metrics: proposed algorithm vs. four variants.

Figure 9. IGD metrics: proposed algorithm vs. four variants.

Figure 10. HV metrics: proposed algorithm vs. comparison algorithms.

Figure 11. IGD metrics: proposed algorithm vs. comparison algorithms.

Figure 12. The 3D scatter plots.

Table 1. Sets, parameters, and variables.

Symbol	Description
Sets
N	Factories: $N = {1, \dots, f, \dots, F}$
I	Stages: $I = {1, \dots, i, \dots, k}$
J	Jobs: $J = {1, \dots, j, \dots, n}$
P	Positions: $P = {1, \dots, p, \dots, n}$
$M_{f}$	Machines at factory f. $M_{f} = {1, \dots, m, \dots, l}$
$M_{i, f}$	Machines at stage i in factory f. $M_{f} = {1, \dots, m, \dots, F \times l}$
Parameters
$p_{j, i}$	Processing time of job j at stage i
$d_{j}$	due date of job j
$S_{j, i}$	Start time of job j at stage i
$E_{j, i}$	Completion time of job j at stage i
$M S_{f, m, p}$	Start time of machine m at pth position in factory f
$M E_{f, m, p}$	Completion time of machine m at pth position in factory f
$s_{j, j^{'}, i}$	Setup time at stage i from job $j^{'}$ to job j
Variables
$U_{j}$	Binary variable: 1 if job j is tardy, 0 otherwise
$X_{j, f}$	Binary variable: 1 if job j is assigned to factory f, 0 otherwise
$Y_{j, f, m, p, i}$	Binary variable: 1 if job j is processed at the pth position
	of machine m at stage i in factory f, 0 otherwise

Table 2. HV of the proposed algorithm and its four variants.

	Instances	IMOMA-QL	IMOMA-QL1	IMOMA-QL2	IMOMA-QL3	IMOMA-QL4
F	2	1.21	1.13	0.95	1.02	1.11
	3	1.19	1.12	0.98	0.99	1.08
	4	1.17	1.10	0.94	1.03	1.11
	5	1.23	1.15	0.97	1.05	1.13
	6	1.22	1.13	0.95	1.04	1.11
n	50	1.21	1.13	0.96	1.03	1.13
	100	1.18	1.10	0.93	1.01	1.10
	150	1.22	1.12	0.96	1.04	1.09
	200	1.20	1.13	0.97	1.03	1.12
s	2	1.24	1.13	0.97	1.03	1.12
	4	1.21	1.14	0.98	1.05	1.13
	6	1.19	1.10	0.92	1.00	1.08
	8	1.19	1.12	0.96	1.03	1.11
	10	1.18	1.13	0.95	1.02	1.10

Table 3. IGD of the proposed algorithm and and its four variants.

	Instances	IMOMA-QL	IMOMA-QL1	IMOMA-QL2	IMOMA-QL3	IMOMA-QL4
F	2	0.105	0.130	0.176	0.167	0.128
	3	0.083	0.115	0.191	0.173	0.120
	4	0.102	0.134	0.181	0.161	0.120
	5	0.098	0.124	0.166	0.151	0.117
	6	0.102	0.114	0.177	0.170	0.114
n	50	0.103	0.132	0.182	0.169	0.125
	100	0.085	0.108	0.162	0.147	0.102
	150	0.107	0.131	0.181	0.168	0.124
	200	0.108	0.122	0.188	0.173	0.128
s	2	0.120	0.143	0.204	0.188	0.144
	4	0.111	0.134	0.189	0.180	0.132
	6	0.082	0.108	0.158	0.145	0.103
	8	0.082	0.112	0.162	0.143	0.100
	10	0.096	0.120	0.180	0.166	0.120

Table 4. Friedman test rankings across all algorithm variants.

Alogrithms	HV		IGD
Alogrithms	Rank	p-Value	Rank	p-Value
IMOMA-QL	1		1.000	0
IMOMA-QL1	2.32	3.56476 × 10⁻⁹	2.56	0
IMOMA-QL2	4.93	0	4.750	0
IMOMA-QL3	4.05	0	4.180	0
IMOMA-QL4	2.70	2.90878 × 10⁻¹⁴	2.48	5.68936 × 10⁻¹¹

Table 5. HV of the proposed algorithm and the comparison algorithms.

	Instances	IMOMA-QL	IMPGA	MQSLFA	MOEA/D	NSGA-II
F	2	1.25	0.95	0.94	0.64	0.75
	3	1.22	0.92	0.92	0.65	0.74
	4	1.21	0.91	0.87	0.61	0.73
	5	1.23	0.92	0.88	0.60	0.72
	6	1.20	0.88	0.90	0.64	0.75
n	50	1.22	0.93	0.90	0.69	0.73
	100	1.19	0.90	0.91	0.61	0.72
	150	1.24	0.90	0.91	0.63	0.74
s	2	1.19	0.93	0.97	0.69	0.78
	4	1.22	0.94	0.87	0.62	0.72
	6	1.25	0.93	0.91	0.66	0.75
	8	1.19	0.93	0.85	0.65	0.75
	10	1.21	0.86	0.92	0.62	0.69

Table 6. IGD of the proposed algorithm and comparison algorithms.

	Instances	IMOMA-QL	IMIPGA	MQSLFA	MOEA/D	NSGA-II
F	2	0.105	0.157	0.167	0.263	0.222
	3	0.086	0.144	0.145	0.247	0.201
	4	0.081	0.149	0.157	0.254	0.199
	5	0.105	0.161	0.177	0.268	0.224
	6	0.092	0.155	0.160	0.247	0.197
n	50	0.108	0.162	0.169	0.244	0.214
	100	0.083	0.144	0.149	0.259	0.200
	150	0.099	0.157	0.170	0.256	0.206
	200	0.085	0.150	0.157	0.266	0.212
s	2	0.085	0.147	0.156	0.251	0.199
	4	0.095	0.152	0.168	0.265	0.219
	6	0.099	0.157	0.159	0.259	0.212
	8	0.086	0.149	0.160	0.256	0.200
	10	0.103	0.161	0.162	0.249	0.214

Table 7. Friedman test rankings across IMOMA-QL and comparative algorithms.

	HV		IGD
	Rank	p-Value	Rank	p-Value
IMOMA-QL	1		1
IMPGA	2.420	1.60385 × 10⁻¹⁰	2.35	1.56633 × 10⁻⁹
MQSLFA	2.610	6.01519 × 10⁻¹³	2.8	8.88178 × 10⁻¹⁶
MOEAD	4.860	0	4.72	0
NSGA2	4.100	0	4.13	0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shen, Y.; Liu, Y.; Kang, H.; Sun, X.; Chen, Q. An Improved Multi-Objective Memetic Algorithm with Q-Learning for Distributed Hybrid Flow Shop Considering Sequence-Dependent Setup Times. Symmetry 2026, 18, 135. https://doi.org/10.3390/sym18010135

AMA Style

Shen Y, Liu Y, Kang H, Sun X, Chen Q. An Improved Multi-Objective Memetic Algorithm with Q-Learning for Distributed Hybrid Flow Shop Considering Sequence-Dependent Setup Times. Symmetry. 2026; 18(1):135. https://doi.org/10.3390/sym18010135

Chicago/Turabian Style

Shen, Yong, Yibo Liu, Hongwei Kang, Xingping Sun, and Qingyi Chen. 2026. "An Improved Multi-Objective Memetic Algorithm with Q-Learning for Distributed Hybrid Flow Shop Considering Sequence-Dependent Setup Times" Symmetry 18, no. 1: 135. https://doi.org/10.3390/sym18010135

APA Style

Shen, Y., Liu, Y., Kang, H., Sun, X., & Chen, Q. (2026). An Improved Multi-Objective Memetic Algorithm with Q-Learning for Distributed Hybrid Flow Shop Considering Sequence-Dependent Setup Times. Symmetry, 18(1), 135. https://doi.org/10.3390/sym18010135

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Improved Multi-Objective Memetic Algorithm with Q-Learning for Distributed Hybrid Flow Shop Considering Sequence-Dependent Setup Times

Abstract

1. Introduction

2. Description of the Problem

2.1. Multi-Objective Optimization Problem

2.2. Problem Definition

2.3. Encoding and Decoding Methods

2.4. A Simple Memetic Algorithm

3. Improved Multi-Objective Memetic Alogorithm

3.1. Algorithm Procedure

3.2. Hybrid Initialization

3.3. Selection

3.4. Genetic Operator

3.5. Problem-Specific Neighborhood Structures

3.6. Variable Neighborhood Search with Q-Learning

3.7. Non-Dominated Order and Elitism Strategy

4. Experimental Comparison and Analysis

4.1. Experiment Setting

4.2. Experimental Indicators

4.3. Parameter Calibration

4.4. Effectiveness of Each Improvement Component of IMOMA-QL

4.5. Comparison of IMOMA and Other Algorithms

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Detailed Experimental Results

Appendix A.2. Illustrative Example of SB2OX Crossover

Appendix A.3. Q-Table Update Example

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI