1. Introduction
With the development of globalization, many enterprises setup production bases in different countries and regions. Distributed flow shop can coordinate production activities between different geographical areas, achieve load balancing between different factories, avoid overloading of some factories or waste of resources in some factories, and improve the overall production efficiency. The distributed flow shop scheduling problem (DFSP) has been thoroughly considered, yielding numerous results addressing a set of pragmatic constraints such as re-entrant [
1], no idle [
2], deteriorating job [
3], sequence-dependent setup times (SDSTs) [
4,
5,
6,
7], and energy-conscious [
8]. Various evolution algorithms have been widely used to solve real-world problems extensively, such as the memetic algorithm [
9,
10,
11], artificial bee colony algorithm [
12], distributional estimation algorithm (DEA) [
13], discrete fruit fly optimization algorithm [
14], hybrid meta-heuristic [
15], variable neighborhood descent algorithm [
8], and spherical evolution algorithm [
16].
To address distributed scheduling problems (DSPs), researchers have developed diverse solution approaches. Meta-heuristic algorithms have gained widespread adoption for shop scheduling optimization due to their notable advantages: straightforward implementation, robust performance, rapid convergence characteristics, and seamless compatibility with other algorithmic frameworks.
Memetic algorithms (MAs) have gained significant attention for their effectiveness in tackling various NP-hard optimization problems, particularly in single-objective distributed scheduling problems (DSPs). For instance, Wang [
17] developed an MA based on EDA to address DFSP and minimize makespan. To optimize the makespan in a two-stage DFSP, Zhang [
18] integrated the social spider optimization method into an MA framework. Wang [
19] further explored a cooperative bi-population MA that incorporated collaborative initialization and inter-population cooperation and intensified local search for minimizing makespan in distributed hybrid flow shop scheduling problems (DHFSPs). Zhang et al. [
20] achieved makespan minimization by leveraging cooperation within MA.
Memetic algorithms (MAs) have also been widely applied to DFSPs. Deng and Wang [
11] investigated a multi-objective DFSP aimed at minimizing both makespan and total tardiness, and developed a competitive MA employing two populations with distinct operators for each objective. Wang [
19] examined an energy-focused variant targeting the reduction of energy consumption and makespan, introducing a collaborative MA guided by reinforcement learning policy agents. Shao [
21] proposed a network-based MA to minimize total tardiness, overall production cost, and carbon emissions.
The rapid advancement of AI is fundamentally transforming operations across a multitude of fields. As a typical reinforcement learning algorithm originating from dynamic programming, Q-learning makes the best decision at each step to optimize the overall process. To address the uncertainty in assembly job shop scheduling and to enhance scheduling algorithms under various production environments, Q-learning has been widely integrated into different frameworks. For instance, in assembly job shop scheduling, a dual-loop framework based on Q-learning is proposed to cope with environmental uncertainty by self-learning [
22]. In DFSP, Q-learning is combined with metaheuristics such as the fruit fly optimization algorithm to enhance neighborhood selection and improve solution quality [
23]. For the studied DFSP variant with consistent sublots, the method employs a value-based RL method. It is coupled with the meta-heuristic to achieve adaptive operator selection [
24].
In practical scenarios, decision-makers are often concerned not only with minimizing the makespan, but also with tardiness-related objectives. Tardiness-based objectives are particularly important in industries where late deliveries incur significant penalties or disrupt downstream processes. Cai [
25] proposed two enhanced shuffled leapfrog algorithms (SLFA) for solving DHFSP in a multi-processor setting, aiming to minimize both total tardiness and the makespan simultaneously. Later, Li developed neighborhood-based heuristic to address a two-stage DHFSP with SDST, targeting reductions in total tardiness and makespan [
12]. In addition, Lei [
26] investigated a SFLA with memeplex partitioning in DHFSP. To address the dual objectives of makespan and maximum tardiness minimization, Lei [
27] crafted a novel multi-class optimization approach based on the teaching–learning paradigm, which enhances search efficiency through inter-class interaction.
Few studies have treated tardiness-related objectives as the main focus in multi-objective optimization. Lei and Zheng [
28] tackled HFSP with assembly operations and minimized total tardiness, maximum tardiness, and makespan with tardiness objectives regarded as key ones. In the DHFSP, which is more complex than the standard HFSP, it is therefore of great importance to develop effective algorithms that can simultaneously optimize total tardiness, the number of tardy jobs, and makespan.
In light of the above literature on DFSP and memetic algorithms, this study addresses the DHFSP with SDST, aiming to optimize makespan, total tardiness, and the number of tardy jobs prioritizing tardiness-related objectives. To tackle this problem, a multi-objective memetic algorithm combined with Q-learning is proposed (IMOMA-QL). The major contributions of this paper are as follows:
Hybrid initialization method—A mixed initialization strategy is proposed to simultaneously optimize total tardiness, the number of tardy jobs, and makespan, generating a high-quality and diverse population.
Multi-factory SB2OX crossover operator—The Similar Block 2-Point Order Crossover (SB2OX) is extended to a multi-factory context, leveraging structural similarity of job sequences to retain high-quality sub-sequences and enhance information exchange between factories.
Problem-specific neighborhood structures are developed to guide the search process toward more promising regions. Considering the optimization objectives and problem characteristics, these neighborhood structures effectively explore the solution space.
Q-learning-guided variable neighborhood search—A Q-learning strategy is introduced to adaptively choose the most effective neighborhood structure during the search process. The reward is designed based on the change in distance between the new and old solutions to their nearest Pareto front solution, encouraging moves that improve convergence toward the Pareto front while balancing intensification and diversification.
The paper is divided into the following sections.
Section 2 formally describes DHFSP with SDST and presents its mathematical model. It also introduces the foundational framework of the memetic algorithm.
Section 3 elaborates on the details of our proposed IMOMA, including its novel initialization, genetic operators, and the Q-learning-guided variable neighborhood search.
Section 4 provides a comprehensive evaluation of IMOMA, including the experimental setup, sensitivity analysis, and comparisons with four other algorithms. Following this,
Section 5 summarizes the main findings of this study, discusses its limitations, and suggests potential directions for future research.
3. Improved Multi-Objective Memetic Alogorithm
3.1. Algorithm Procedure
The proposed multi-objective memetic algorithm (IMOMA) is composed of four main components. First, a hybrid initialization procedure adopts a combination of random generation and problem-specific heuristics to yield a diverse set of well-performing initial solutions. Second, the algorithm incorporates genetic operators, to perform population-based global exploration of the search space. Third, a Q-learning-guided multi-neighborhood search is applied, in which a reinforcement learning mechanism adaptively selects among multiple neighborhood structures to intensify the search and enhance convergence toward the Pareto front. The algorithm flowchart is illustrated in
Figure A1 and the pseudocode is shown in Algorithm 1. The proposed IMOMA algorithm consists of three main components per iteration, each contributing to the overall computational complexity. The time complexity per iteration is
, where
N denotes the population size and
n represents the number of jobs. Given a maximum of
T iterations, the overall time complexity of IMOMA is
.
| Algorithm 1 Algorithm of IMOMA |
- Input:
, , - Output:
- 1:
Initialize population ; - 2:
Set t=0; - 3:
while do - 4:
for each do - 5:
selecting two parents , from by using the binary tournament selection; - 6:
Generate offspring , by using Crossover Operator; - 7:
Generate offspring , by using Mutation Operator; - 8:
Merge , into - 9:
end for - 10:
Apply VNS guided by Q-learning to to generate offspring population - 11:
Combine and into - 12:
Perform non-dominated sorting on - 13:
Select N best individuals to form - 14:
end while - 15:
Output Pareto optimal solutions
|
3.2. Hybrid Initialization
High-quality initialization plays a pivotal role in multi-objective optimization algorithms for DHFSP by significantly influencing convergence speed, solution diversity, and computational efficiency. A well-designed initialization strategy reduces the algorithm’s exploration burden.
To address the multi-objective characteristics of the DHFSP—specifically the simultaneous minimization of makespan, total tardiness, and number of tardy jobs—this research develops a hybrid initialization approach. The proposed methodology combines targeted heuristic generation with stochastic exploration mechanisms, ensuring both high-quality initial solutions and adequate population diversity.
This paper introduces six initialization methods based on the LPT (Longest Processing Time) and EDD (Earliest Due Date) rules, combined with randomization, to enhance both the quality and diversity of the initial population.
Method 1: Generate a job vector in descending order of the total processing time (LPT), and then iteratively insert job into each factory in the order of the job vector to minimize the maximum completion time.
Method 2: Generate a job vector in ascending order of the due date (EDD), and then iteratively insert job into each factory in the order of the job vector to minimize the total tardiness.
Method 3: Generate a job vector in ascending order of the due date (EDD), and then iteratively insert job into each factory in the order of the job vector to minimize the number of tardy job.
Method 4: Randomly generate a job vector, and then iteratively insert job into each factory in the order of the job vector to minimize the maximum completion time.
Method 5: Randomly generate a job vector, and then iteratively insert job into each factory in the order of the job vector to minimize the total tardiness.
Method 6: Randomly generate a job vector, and then iteratively insert job into each factory in the order of the job vector to minimize the number of tardy job.
Methods 1–3 each generate one solution, while methods 4–6 generate the rest of the population in the ratio 2:2:1. Specifically, Method 4 and Method 5 each generate solutions, and Method 6 generates solutions to ensure the total population size is N. The random seed for the initialization is set to 2020.
3.3. Selection
Use binary tournament selection to select two parents. The steps are as follows:
Randomly select two individuals: Two candidate solutions are chosen at random from the existing population.
Selection of better individuals into the next generation.
- (1)
If a dominates b, choose a.
- (2)
If b dominates a, choose b.
If both individuals are non-dominated, one of the two individuals is chosen at random.
Repeat until the next generation is filled.
The flowchart of the selection process is shown in
Figure 2.
3.4. Genetic Operator
SB2OX solves flow shop problems that take into account SDST [
30], but it is only used to solve the flow shop scheduling problem for a single factory.
This article proposes crossover operators based on SB2OX that can be adapted to distributed flow shops. Two vectors are used to represent the scheduling sequences of the two parents.
, . The superscript x denotes the factory, and the subscript indicates the position in the scheduling sequence of that factory.
Step 1: Both parents check on a position-by-position basis. This step exclusively transfer identical blocks containing at least two consecutive matching jobs from both parents directly to their offspring. . In particular, these blocks that are retained in Parent 1 and Parent 2 are not necessarily in the same factory. An identical block in Parent 1 is processed in factory c1 and Parent 2 is processed in factory c2; Child 1 will retain the block in the same location in factory c1 and Child 2 will retain the block in the same location in factory c2.
Step 2: Two cut points are randomly selected in the scheduling sequence of each factory of the two parents, and the vector of each factory of Child 1 retains all the intercepts of cut point 1 and cut point 2 of Parent 1 and maintains their original positions. Child 2 is generated from Parent 2 using the same principle.
Step 3: In the final step, the missing element will be copied in the relative order of the other parent. In this step, the exchange of parental information is efficiently accomplished and the reassignment of jobs between factories is realized.
Nevertheless, relying exclusively on crossover operations proves inadequate. To enhance population diversity, a random swap mutation (RSM) mechanism is additionally implemented.
RSM: Randomly select two jobs in the sequence and swap their positions.
An illustration of the crossover process is shown in
Figure 3.
3.5. Problem-Specific Neighborhood Structures
To address the DHFSP, the designed neighborhood structures generate alternative feasible solutions by modifying the current ones. The effectiveness of these structures has a significant influence on both the solution quality and the computational efficiency of the algorithm. Well-designed neighborhoods can guide the search process toward more promising regions. Hence, in order to improve the performance of IQLMA, six tailored neighborhood structures are proposed to produce superior solutions.
NS1: Randomly select a job from the factory with the largest maximum completion time and insert the jobs into another position of the same factory.
NS2: Randomly select a job from the factory with the largest maximum completion time and exchange the job with another job in the same factory.
NS3: Randomly select a tardy job that does not have the largest tardiness, and exchange that job with all other jobs that have a greater tardiness.
NS4: Randomly select a tardy job and insert it in the more forward position of all the factory positions.
NS5: Randomly select a job in a factory with the largest total tardiness and insert the job into other locations in the same factory.
NS6: Randomly select a tardy job in a factory with the largest number of tardy jobs and insert the job into other locations in the same factory.
3.6. Variable Neighborhood Search with Q-Learning
Q-learning operates by maintaining and updating a state–action value table (Q-table), which stores estimated cumulative rewards to derive an optimal policy [
31]. As a model-free algorithm, it has been employed to tackle a range of scheduling problems [
23,
32].
Figure 4 illustrates this interaction process. The Q-table is updated according to the following formula:
and
are the learning rate and discount factor, respectively.
The agent selects actions according to Q-values stored in the Q-table. This work initializes the Q-table with zeros, as shown in
Figure 5, to indicate that the agent co-initializes the Q-table with zeros. Actions are selected using an
-greedy strategy that maximizes the expected reward while maintaining exploration. A random number p ∈ [0, 1] determines the selection: if
p <
, the action with the maximum Q value is selected (exploitation); otherwise, a random action is chosen (exploration). The Q-table is iteratively updated through this action–state–reward cycle. An example of a Q-table update is provided in
Appendix A.3.
Q-learning, when integrated with neighborhood search, plays the role of adaptively selecting the most promising neighborhood structures based on feedback from the search performance, rather than relying on a fixed or predetermined local search method. This reinforcement learning mechanism allows the search process to dynamically focus on neighborhoods that are more likely to yield improvements, thereby enhancing both convergence speed and solution quality. In this study, the six neighborhood structures described earlier are defined as the action set in the Q-learning framework, where each action corresponds to applying one specific neighborhood to generate new solutions.
In this paper, the state is determined according to the three objective values of each solution. First, the current population is sorted in ascending order of makespan. The top 30% of solutions with the smallest makespan are assigned to State 1. The remaining solutions are then sorted in ascending order of total tardiness, and the top 30% of these solutions are assigned to State 2. The remaining solutions are assigned to State 3.
This study employs a delayed reward mechanism based on population cooperation to train the Q-learning strategy. In each generation, all individuals conduct local search following the current strategy. Upon completion of the local search for the entire population, non-dominated sorting is applied to obtain an updated Pareto front. After obtaining the updated Pareto front via non-dominated sorting, we compute the minimum Euclidean distance from every solution to this front both before and after its local search. This signal is utilized to update the shared Q-learning strategy. Let
x be the original solution to be replaced, and
y be the candidate solution generated by the local search.
is the non-dominated solution set obtained from the population after applying the variable neighborhood local search, inserting the new solution, and removing dominated solutions. The Euclidean distance from a solution
z to
is defined as
denotes the objective value of
z on the
i-th objective, with
m representing the total number of objectives. The reward
is then given by
This binary reward provides a positive signal whenever the candidate solution is closer to the updated Pareto front than the original solution, thereby encouraging the selection of neighborhood structures that improve convergence towards the Pareto front.
The Q-learning-guided local search is illustrated in Algorithm 2.
| Algorithm 2 Variable Neighborhood Search using Q-learning |
- Input:
(current population), (population size), Q-table, (state of the i-th individual), (action taken by the i-th individual) - Output:
, Q-table - 1:
← Ranking the three objective values for all individuals in the population yields the state of each individual. - 2:
for each i in do - 3:
← select an action via the -greedy strategy and Q-table - 4:
←Apply action to the i-th individual () in the population - 5:
end for - 6:
← Combine ; - 7:
← Non-dominated sorting - 8:
for each i in do - 9:
For each individual and : - 10:
for do - 11:
- 12:
for each solution do - 13:
{Euclidean distance} - 14:
if then - 15:
- 16:
end if - 17:
end for - 18:
end for - 19:
Record the minimum distance for this individual - 20:
Check if to evaluate the improvement. - 21:
Obtain reward r and next state - 22:
Update Q-table: - 23:
end for
|
3.7. Non-Dominated Order and Elitism Strategy
For fast non-dominated ordering of the merged populations, low rank solutions are preferred to high rank solutions. If two solutions have the same rank, the one with the greater crowding distance is selected over the one with the smaller value. The population of the next generation of a given size is selected according to the above principle. In the iterative process, the population of the next generation is merged with the population of the parent. The merged population is then subjected to a fast non-dominated ordering.
4. Experimental Comparison and Analysis
4.1. Experiment Setting
All experiments were conducted in MATLAB R2022b (64-bit) with the Optimization Toolbox and Global Optimization Toolbox enabled. The operating system was Windows 11 (Version 22H2, 64-bit). All computations were performed on a desktop computer equipped with an Intel Core i7-12700K.
This article generates instances by following the design described by Sun [
33]. (F) makes up each instance, in which N = 50, 100, 150, 200, F = 2, 3, 4, 5, 6, and S = 2, 4, 6, 8, 10. The processing time of each job at each stage and the sequence-dependent setup time are randomly generated within the range [1, 99]. The number of identical parallel machines are randomly generated within the range [1, 5]. The random seed is set to 2025. In total, there are 4 × 5 × 5 = 100 instances. The CPU time per instance run is set to 0.08 × FN × JN × S s.
Equations (
22) and (
23) [
11] establish the due date for each job in each instance in the mathematical model provided in this study.
4.2. Experimental Indicators
To evaluate the behavior of IMOMA, two performance metrics were used in the experiments.
The hypervolume (HV) represents the volume of the region bounded by the non-dominated solution set produced by the algorithm in the objective space and the reference points. The reference point is set to (1.2, 1.2, 1.2). A higher HV value indicates better overall performance. HV is calculated as follows.
The Lebesgue measure is defined by
, while
denotes the hypervolume bounded by the reference point and the non-dominated solution set.
represents the minimum Euclidean distance from an individual x in
to an individual z in PF. The notation
denotes the the number of solutions.
4.3. Parameter Calibration
This section calibrates the main parameters of the algorithm, including population size, discount factor, greedy rate, crossover probability, and mutation probability. The parameters were calibrated using the design of experiments method. An exhaustive analytic factorization experiment is carried out on each parameter. The levels of each parameter were = {60, 80, 100}, = {0.7, 0.8, 0.9}, = {0.2, 0.4, 0.6}, = {0.7, 0.8, 0.9}, = {0.7, 0.8, 0.9}. The five key parameters were analyzed using the Taguchi method of experiment (DOE) using orthogonal array , which consists of 18 different combinations of , , .
Figure 6 drafts the main effects diagram of three parameters of IMOMA, so the optimal parameter configuration is identified as follows:
= 80,
= 0.7,
= 0.2,
= 0.8,
= 0.8.
To further investigate the influence of key parameters on the performance of the proposed IMOMA algorithm, an extended sensitivity analysis was conducted. This analysis adopts a one-factor-at-a-time (OFAT) approach to observe the individual effect of each parameter clearly. During the test of each target parameter, the remaining parameters were fixed at their empirically determined optimal baseline values. The five target parameters were tested across three levels each. The analysis was performed on three representative instances of different scales selected from the benchmark set: a small-scale instance (F = 2, n = 50, s = 2), a medium-scale instance (F = 4, n = 150, s = 4), and a large-scale instance (F = 6, n = 150, s = 6).
The experimental results are shown in
Figure 7. Based on the extended sensitivity analysis, the mutation probability (
) exhibits the most pronounced influence on algorithm performance, where increased values consistently lead to degradation across all tested instances, firmly validating the optimal baseline setting of
. The population size (
) shows moderate sensitivity, with its optimal value shifting slightly with the problem scale, though the baseline
remains robust. The Q-learning discount factor (
) demonstrates a moderate level of sensitivity, performing optimally near its baseline of
. In contrast, the crossover probability (
) and the Q-learning exploration rate (
) exhibit relatively low sensitivity within the tested ranges, confirming the stability of their baseline settings (
,
). The overall low-to-moderate sensitivity of the Q-learning hyperparameters (
and
) indicates that the proposed Q-learning-guided local search module possesses good robustness, as its performance does not critically depend on their precise tuning. Overall, the results strongly justify the selected parameter configuration as a reliable default, while indicating that fine-tuning efforts for future applications should primarily focus on
and
, particularly when addressing problems of substantially different scales.
4.4. Effectiveness of Each Improvement Component of IMOMA-QL
To examine the effectiveness of each component, IMOMA-QL was compared with four variant versions in which a specific component was removed. These variants are IMOMA-QL without the hybrid initialization strategy (denoted as IMOMA-QL1), IMOMA-QL without the genetic operators (denoted as IMOMA-QL2), IMOMA-QL without the multi-neighborhood search (denoted as IMOMA-QL3), and IMOMA-QL without the Q-learning selection mechanism (denoted as IMOMA-QL4). Specifically, IMOMA-QL1 adopts random population initialization instead of the hybrid method; IMOMA-QL2 and IMOMA-QL3, respectively, remove the genetic operators and the multi-neighborhood search while retaining the rest of the algorithm; and IMOMA-QL4 replaces the Q-learning mechanism with a random local search. To ensure a fair comparison, all algorithmic parameters for these variants remained consistent with those of the original IMOMA-QL. All algorithms were executed independently 10 times on the test instances. The reference Pareto front was constructed from the combined non-dominated solutions of all compared algorithms.
The experimental results systematically evaluate performance across different problem scales by grouping the 100 instances according to the number of factories F, jobs n, and stages s. The average HV and IGD values are presented in
Table 2 and
Table 3. The complete experimental data for each instance are presented in
Appendix A,
Table A1,
Table A2,
Table A3,
Table A4,
Table A5,
Table A6,
Table A7,
Table A8,
Table A9 and
Table A10. IMOMA-QL consistently achieved the best performance across all instances.
Figure 8 and
Figure 9 display interval plots with 95% confidence intervals for HV and IGD metrics across all instances, comparing IMOMA-QL with its four variants. The results demonstrate that each component of IMOMA-QL contributes to performance improvements at varying degrees.
The Friedman test rankings with 95% confidence intervals are presented in
Table 4. IMOMA-QL achieved first-place rankings against all four variants with statistically significant improvements (all
p-values < 0.05), demonstrating that each proposed enhancement contributes substantially to its superior performance. The results indicate that removing any of these components leads to a clear performance drop, demonstrating the necessity of each component.
4.5. Comparison of IMOMA and Other Algorithms
We select four comparative algorithms: IMPGA [
34], MQSFLA [
25], MOEA/D [
35], and NSGA-II [
36]. The first two are recent algorithms specifically designed for DHFSP, which is the same problem domain studied in this work. IMPGA minimizes both makespan and total tardiness—two of the three objectives optimized in our paper—and is composed of multiple populations that co-evolve in sub-regions, a greedy inter-factory job insertion neighborhood structure for local search, and a probability-sampling-based re-initialization procedure. MQSFLA also targets the minimization of makespan and total tardiness, sharing a highly similar objective set with our work. It incorporates a memeplex quality measurement mechanism, a search process guided by solution quality, and a novel memeplex shuffling that dynamically selects memeplexes based on evolution quality. The latter two, NSGA-II and MOEA/D, are classic multi-objective optimizers widely used in scheduling problems. NSGA-II employs non-dominated sorting and crowding distance to preserve solution diversity; in contrast, MOEA/D converts the multi-objective problem into a set of scalar subproblems. For IMPGA and MQSFLA, we directly adopt the parameter settings reported in their respective original papers. For NSGA-II and MOEA/D, we set the population size to 50, crossover probability pc to 0.7, and mutation probability pm to 0.2. Together, these algorithms provide diverse and strong benchmarks for evaluating our proposed method.
Figure 10 and
Figure 11 display interval plots with 95% confidence intervals for HV and IGD metrics across all instances, comparing IMOMA-QL with its comparative algorithms. As evidenced by the figure, IMOMA-QL outperforms all other algorithms, yielding superior results in both HV (higher) and IGD (lower) metrics. To validate the superior performance of the proposed algorithm, the Friedman test rankings with 95% confidence intervals are presented in
Table 7. It can be seen that IMOMA-QL outperforms the other comparison algorithms.
Figure 12 displays the 3D scatter plots representing the non-dominated set obtained from five distinct methods applied to one of the problems. We found that the non-dominated solution sets obtained by the respective algorithms form different layers. The solutions obtained by IMOMA are close to the point at which all target values are lowest. It was clearly found that the non-dominated set obtained by IMOMA-QL is better than the non-dominated set obtained by the other algorithms.
The superior performance of IMOMA-QL can be attributed to four key algorithmic innovations: Specifically, the hybrid initialization strategy improves the diversity and quality of the initial population, providing a better starting point for a search. The genetic operators enhance global exploration and recombination of high-quality solutions. The multi-neighborhood search mechanism diversifies local search patterns, improving the chance of escaping local optima. Finally, the Q-learning-guided variable neighborhood search adaptively selects promising neighborhoods based on search feedback, further enhancing search efficiency.
5. Conclusions
In the field of multi-objective optimization for tardiness-related scheduling problems, most existing studies focus on optimizing makespan along with only one tardiness-related objective. This study addresses DHFSP with SDST, optimizing three critical objectives: makespan, total tardiness, and the number of tardy jobs. This work emphasizes tardiness-related objectives, which are crucial in real-world manufacturing scenarios where meeting the due date is essential—such as just-in-time production, order-driven manufacturing, and supply chain scheduling with strict delivery commitments. To solve this problem, we propose a multi-objective memetic algorithm enhanced with Q-learning-guided variable neighborhood search (VNS). Extensive numerical experiments and comparisons with algorithms demonstrate that the proposed method significantly improves solution quality, convergence speed, and robustness in handling multi-objective DHFSP.
Despite the promising results, this study has certain limitations. The proposed model and algorithm operate under a set of standardized assumptions, such as deterministic processing times and static job availability. Consequently, they cannot be directly applied to real-world scheduling environments.
Future research can extend this work in several meaningful directions. A promising avenue is to investigate more comprehensive and environmentally conscious multi-objectives. For example, meaningful problem variations to explore could include minimizing makespan, total tardiness, and total energy consumption, or alternatively, minimizing makespan and maximizing tardiness and total energy consumption. Addressing such integrated problems would require developing new models that capture the energy dynamics of machines and designing efficient algorithms capable of balancing productivity.