An Improved NSGA-II for Three-Stage Distributed Heterogeneous Hybrid Flowshop Scheduling with Flexible Assembly and Discrete Transportation

Shi, Zhiyuan; Chen, Haojie; Yan, Fuqian; Deng, Xutao; Hao, Haiqiang; Zhang, Jialei; Yin, Qingwen

doi:10.3390/sym17081306

Open AccessArticle

An Improved NSGA-II for Three-Stage Distributed Heterogeneous Hybrid Flowshop Scheduling with Flexible Assembly and Discrete Transportation

by

Zhiyuan Shi

¹,

Haojie Chen

^2,*,

Fuqian Yan

¹,

Xutao Deng

¹,

Haiqiang Hao

³,

Jialei Zhang

¹ and

Qingwen Yin

¹

Dongfang Electric Academy of Science and Technology Co., Ltd., Chengdu 610063, China

²

School of Mechanical Engineering, Southwest Jiaotong University, Chengdu 610031, China

³

School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430070, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(8), 1306; https://doi.org/10.3390/sym17081306

Submission received: 7 July 2025 / Revised: 31 July 2025 / Accepted: 4 August 2025 / Published: 12 August 2025

(This article belongs to the Section Engineering and Materials)

Download

Browse Figures

Versions Notes

Abstract

This study tackles scheduling challenges in multi-product assembly within distributed manufacturing, where components are produced simultaneously at dedicated factories (single capacity per site) and assembled centrally upon completion. To minimize makespan and maximum tardiness, we design a symmetry-exploiting enhanced Non-dominated Sorting Genetic Algorithm II (NSGA-II) integrated with Q-learning. Our approach systematically explores the solution space using dual symmetric variable neighborhood search (VNS) strategies and two novel crossover operators that enhance solution-space symmetry and genetic diversity. An ε-greedy policy leveraging maximum Q-values guides the symmetry-aware search toward optimality while enabling strategic exploration. We validate an MILP model (Gurobi-implemented) and present our symmetry-refined algorithm against six heuristics. Multi-scale experiments confirm superiority, with Friedman tests demonstrating statistically significant gains over benchmarks, providing actionable insights for efficient distributed manufacturing scheduling.

Keywords:

scheduling; flexible manufacturing; reinforcement learning; makespan; tardiness

1. Introduction

In the context of global economic competition, the agility to navigate dynamic market demands while simultaneously optimizing cost structures is paramount for the design of efficacious production systems [1]. The assembly flowshop model represents a complex combinatorial production system where components are independently manufactured on parallel production lines before converging for final assembly [2]. These systems demonstrate versatility by synthesizing diverse goods through strategic integration of parts and subassemblies. From a scheduling perspective, they align with the assembly scheduling problem (ASP) framework [3].

In the contemporary economic milieu, characterized by decentralization and globalization, the ascendant trend of product customization and the advent of intelligent manufacturing have catalyzed the proliferation of assembly production methodologies [4]. The ubiquity of multinational corporations endowed with a plethora of production centers further attests to the preeminent and irreplaceable role that distributed and intelligent production assumes within the modern manufacturing sector. The scheduling methods related to these systems are widely implemented in supply chain management and the larger manufacturing sector [5].

To maintain a competitive stance within the flux of an ever-transforming marketplace, managerial acumen is requisite for the expeditious articulation of strategic decisions pertaining to the distribution of labor across facilities and the efficacious scheduling within each plant. Thus, the distributed scheduling problem has garnered significant scholarly and research interest, emerging as a salient topic of investigation among academicians and industry practitioners.

The Distributed Assembly Permutation Flowshop Scheduling Problem (DAPFSP) is an advanced variant and broader concept derived from the traditional ASP, gaining substantial importance in both industry and academic research [6]. The existing literature describes the DAPFSP as consisting of two separate operational stages: production and assembly. The production stage is similar to a distributed permutation flowshop scheduling situation, where a fixed set of n jobs is assigned for processing at f different manufacturing facilities, each having m machines set up in a flowshop layout. Each job is directly associated with a specific product entity. In the production stage, the sequential processing steps that are essential to each product cannot be interrupted or interleaved with processes related to other products. Subsequently, the assembly phase mirrors the Assembly Flowshop Scheduling Problem (AFSP), wherein s distinct products are subjected to assembly within a dedicated assembly facility housing a solitary assembly apparatus, contingent upon the completion of all processes initiated during the production phase [7].

Nevertheless, the interstitial transportation phase, situated between the production and assembly stages, is frequently either overlooked or oversimplified within the DAPFSP framework. Such an approach may prove inadequate within process-intensive industries, exemplified by the pharmaceutical sector, where job transfers are often enmeshed within a labyrinthine procedural tapestry. Moreover, the conventional DAPFSP posits that the constituent factories are homogenous in nature, with each operational phase confined to execution by a single machine [8,9]. These assumptions, however, often fail to resonate with the variegated realities of the real-world manufacturing landscape. Consequently, we introduce a novel paradigm: a three-stage Distributed Heterogeneous Hybrid Flowshop Scheduling with Transportation and Flexible Assembly (DHFSTFA). In this framework, jobs that are finished in the production phase can be easily moved to the assembly area. The motivation for DHFSTFA arises from the manufacturing process of wind turbines, which consists of large-scale tasks. Each task can be executed in any of the separate factories and needs to be brought to the same assembly station as other tasks associated with the same product.

Makespan has emerged as a pivotal and salient objective within the contemporary manufacturing milieu, particularly in scenarios where the expedited completion of a batch of tasks is imperative. This prioritization facilitates the minimization of workflow latency and the enhancement of resource allocation efficiency. Concurrently, the maximization of tardiness, which is the latest completion time of jobs in relation to their designated deadlines, assumes significant relevance. This parameter is instrumental in ensuring the punctual fulfillment of the most critical tasks, thus circumventing the potential for substantial penalties or operational disruptions within the manufacturing sequence.

DAPFSP is recognized as an NP-hard problem when m ≥ 2 represents an advanced derivative of the conventional permutation flowshop scheduling paradigm, incorporating the aforementioned criteria [10]. In a parallel vein, DHFSTFA is conceptualized as a sophisticated extension of the DAPFSP, introducing additional layers of complexity. Consequently, the DAPFSP, when formulated with objectives centered on minimizing makespan and the maximum tardiness, is classified within the robust NP-hard category.

Q-learning is a model-free reinforcement learning algorithm that is commonly utilized to address a wide range of issues, including scheduling challenges. In scheduling applications, Q-learning can be employed to determine an optimal strategy for task allocation and arrangement, resulting in enhanced efficiency and a shorter makespan. The algorithm modifies the action-value function (Q-function) by factoring in the reward received and the predicted future rewards associated with executing a specific action in a particular state. This iterative approach continues until the Q-values stabilize at their optimal levels, enabling the scheduler to make choices that enhance the overall performance criteria, such as reducing total flow time or delays. This paper presents an enhanced NSGA-II that incorporates the Q-learning algorithm for tackling the DHFSTFA with the objectives of minimizing makespan and tardiness ratio. The primary contributions of this paper are

A mathematical model has been created for the three-stage distributed assembly problem.
An enhanced version of the Nondominated Sorting Genetic Algorithm II, incorporating Q-learning (termed QNSGA), has been introduced to reduce both the makespan and the maximum tardiness. In QNSGA, Q-learning dynamically selects the optimal search strategy to improve the solution set, based on 12 states related to evaluating population quality and 8 actions that represent search operators and effective action selection.
Heuristics have been designed to produce initial solutions.
Extensive experiments have been carried out to assess the performance of QNSGA in comparison to other methods found in the literature. The computational results show that the introduction of new strategies, including the Q-learning algorithm, is both effective and efficient, and QNSGA yields promising results for the analyzed three-stage distributed assembly problem.

The rest of the paper is structured as follows: Section 2 presents related works, Section 3 describes the issues related to availability constraints, Section 4 covers the design of constructed heuristics, Section 5 reports on QNSGA for the considered ASP, Section 6 discusses numerical experiments on QNSGA, and the final section summarizes the conclusions and provides some topics for future research.

2. Literature Review

2.1. ASP and DAPFSP

Both ASP and DAPFSP are fundamental concepts in operations research, essential for enhancing production processes across different sectors. These problems involve scheduling jobs in a two-stage process: production and assembly, with the objective of optimizing various criteria such as makespan, total flowtime, and tardiness. Over the years, researchers have proposed a myriad of algorithms and methodologies to tackle the complexities of ASP and DAPFSP.

A detailed examination of the existing literature on deterministic assembly scheduling problems shows that a range of models and solutions have been put forward to tackle the ASP and its various forms. These include flowshop, jobshop, and permutation flowshop scheduling models, each characterized by distinct constraints and goals. The intricacy of these issues has prompted the creation of heuristic and metaheuristic methods, including genetic algorithms, simulated annealing, and particle swarm optimization, in order to discover near-optimal solutions.

Initially, the focus was on developing an understanding of the basic ASP, where the primary goal was to minimize makespan or total completion time [11]. The concept of the Distributed Assembly Permutation Flowshop Scheduling Problem (DAPFSP) was introduced, expanding the traditional ASP to account for distributed production environments. A Mixed Integer Linear Programming (MILP) model and several constructive algorithms were proposed, setting the stage for further advancements in the field. Recent advances have focused on multi-objective optimization, where the DAPFSP is considered with multiple criteria such as total flow time and total tardiness [12,13]. These multi-objective problems (MOP) require algorithms that can effectively balance the trade-offs between different objectives. A two-phase evolutionary algorithm (TEA) has been proposed to tackle the multi-objective DAPFSP (MO-DAPFSP), demonstrating improved performance in terms of solution quality and diversity [14].

The DAPFSP has also been extended to consider stochastic processing times, leading to the Stochastic DAPFSP (SDAPFSP) [15]. This version of the problem introduces uncertainty in job processing times, making scheduling more challenging. A biased-randomized Simheuristic algorithm has been proposed to address the SDAPFSP, integrating biased randomization with simulation techniques to effectively manage the uncertainty and optimize the expected makespan [16]. Researchers [3] have delved into more complex scenarios, such as ASP with nested operations, proposing heuristics and improved genetic algorithms to address the bi-criteria of makespan and average passage time. Ref. [17] addressed the combined optimization of production planning and scheduling in unpredictable re-entrance settings, presenting a different iterative method based on an Improved Genetic Algorithm (AI-IGA), which showcases the advancement of ASP towards more dynamic and uncertain circumstances.

The development of heuristic and metaheuristic approaches has been pivotal in addressing the computational challenges associated with ASP and DAPFSP. Study [18] presented an in-depth study of the AFSP with a focus on makespan minimization, introducing a dominance relation, and proposing effective heuristics and algorithms for the problem, with a comprehensive computational analysis comparing their performance. Study [19] proposed improved heuristics for the two-stage multi-machine assembly scheduling problem, demonstrating that these heuristics outperform existing methods. This research signifies the continuous improvement and refinement of solution approaches in ASP.

In parallel, the use of ASP and DAPFSP in practical applications has garnered significant attention. For example, researchers have investigated the role of ASP in supply chain management and its effects on supply chain efficiency. Notably, work by [9] introduced new distributed assembly permutation flowshop scheduling challenges that include flexible assembly and batch delivery aimed at reducing delivery and lateness costs. Meanwhile, ref. [20] created an immune algorithm to address the distributed ASP issues in the pharmaceutical sector, which resulted in effective job distribution and batch transport while minimizing completion times and the number of late products. Their findings indicated that the algorithm demonstrated better stability and reliability than six alternative strategies.

In conclusion, the ASP and DAPFSP are critical research areas in the scheduling literature, with applications in various sectors such as electronics, automotive, and aerospace. The development of effective solution algorithms for these problems is essential for improving manufacturing efficiency and reducing production costs. Future research directions may include the exploration of more realistic problem settings, the incorporation of machine learning techniques, and the development of hybrid algorithms that can handle large-scale and dynamic scheduling environments.

2.2. Reinforcement Learning and NSGA-II for ASP

Assembly scheduling is inherently complex, involving the orchestration of multiple tasks, resources, and constraints to achieve production goals. Traditional methods, such as heuristics and meta-heuristics, have served well in static environments. However, with the arrival of Industry 4.0 and the movement towards mass customization, there is growing research in dynamic, flexible jobshop scheduling (FJS) that can adapt to uncertainties and changes in real-time [21].

Reinforcement Learning (RL), with its ability to learn optimal policies through trial and error, presents a compelling solution to ASPs. By leveraging the agent-environment interaction framework, RL algorithms can adapt to the stochastic nature of assembly lines [22], optimizing scheduling decisions based on accumulated rewards. The NSGA-II has been widely recognized for its ability to handle multi-objective optimization problems effectively [23]. Its strength lies in its elitism, ensuring that only the best solutions are passed on to the next generation, and in its mechanism for preserving diversity, which keeps a varied set of solutions within the population.

The foray of RL into assembly scheduling begins with simpler scenarios, where the goal is to minimize makespan or maximize throughput. Early applications of RL focused on single-objective optimization, such as minimizing the completion time of assembly tasks. Q-learning algorithms were pioneers in this domain, providing a foundation for more complex scheduling strategies. For example, ref. [24] presented a Multi-Agent RL system designed for dynamic FJS in a robot assembly cell, utilizing a Double DQN-based algorithm. This method showed enhanced performance compared to rule-based heuristic techniques, highlighting the capabilities of RL in managing decentralized decision-making processes in assembly settings.

As assembly scheduling problems grow in complexity, so does the need for multi-objective optimization. RL’s evolution has seen the incorporation of algorithms such as NSGA-II, which is renowned for its efficiency in solving multi-objective problems by maintaining non-dominated solutions. Ref. [25] proposed a novel deep RL approach to solve the panel block AFSP, an important step in shipbuilding. This study highlighted the effectiveness of RL in improving computational efficiency and model performance, crucial for large-scale assembly operations. The application of NSGA-II in assembly scheduling is vast and varied. [26] used NSGA-II coupled with an iterated greedy strategy for a bi-objective three-stage assembly flowshop scheduling problem, demonstrating the efficiency of the hybrid approach. Similarly, ref. [26] applied a customized NSGA-III for a two-stage AFSP with release time, showing competitive solutions for the multi-objective case.

In many studies, researchers have opted for hybrid approaches that integrate NSGA-II with other heuristics or metaheuristics to enhance solution quality. For instance, ref. [27] proposed a simulation-based optimization algorithm combining the ARENA software with FLC-NSGA-II, which uses a fuzzy logic controller to adjust crossover and mutation probabilities. Ref. [28] tackles a three—stage assembly flowshop scheduling problem (with m machines at stage one, a transportation machine at stage two and an assembly machine at stage three) aiming to minimize total flowtime and total tardiness, proposes an NSGA—II with IG strategy, compares it with standard NSGA—II and GRASP, and shows the hybrid NSGA—II approach’s efficiency. Ref. [29] presented a comprehensive review and bibliometric analysis of NSGA-II adaptations in scheduling problems, including a detailed examination of its application in various domains. Their study underscores the wide-ranging impact of NSGA-II and identifies key gaps and opportunities for future research. The integration of energy efficiency considerations into assembly scheduling has gained traction, with NSGA-II playing a pivotal role. Ref. [30] developed an improved NSGA-II for an energy-efficient distributed assembly blocking flowshop, emphasizing the minimization of both maximum completion time and total energy consumption.

The collaboration between reinforcement learning (RL) and traditional metaheuristic approaches has created new opportunities in research related to answer set programming (ASP). By combining RL with algorithms such as shuffled frog-leaping and artificial bee colony, researchers seek to leverage the exploration features of metaheuristics while also taking advantage of the exploitation strengths found in RL. One study [31] introduced an innovative shuffled frog-learning algorithm integrated with Q-learning for distributed assembly hybrid flowshop scheduling, showcasing the effectiveness of merging RL with meta-heuristic strategies to tackle complex scheduling challenges. In the context of advanced manufacturing, digital twin technology has been combined with the NSGA-II for adaptive scheduling. Another research effort [32] put forward a method that employs digital twin technology for real-time observation and rescheduling within intricate product assembly environments, alongside an enhanced multi-objective evolutionary algorithm based on NSGA-II.

RL and the NSGA-II both show great potential for addressing ASPs. RL has the ability to learn the best policies by interacting with its environment, which can result in dynamic and adaptive scheduling solutions. On the other hand, NSGA-II excels at managing multi-objective optimization, making it a strong option for intricate ASP situations. Combining the online learning features of RL with the evolutionary techniques of NSGA-II could lead to more effective, intelligent, and ASPs that can adapt quickly to real-time changes and complex goals.

3. Problem Formulation

The problem under consideration, as depicted in Figure 1, can be delineated across three distinct stages. It involves the processing of n jobs across f disparate factories, aimed at manufacturing t products, utilizing w assembly machines. Each job encompasses m sequential operations within a factory. The processing capacities of the machines exhibit heterogeneity, which is denoted by

K_{i}^{l}

, signifying the quantity of machines available for the ith operation in the lth factory. The parallel machines allocated to the first operation in the first factory are indexed from 1 to

K_{1}^{1}

. For the second operation, the parallel machines commence their numbering from

K_{1}^{1} + 1 t o K_{1}^{1} + K_{2}^{1}

, and this pattern continues sequentially. The final machine dedicated to the ith operation in the gth factory is identified as

M_{g i} = \sum_{l = 1}^{g - 1} \sum_{s = 1}^{m} K_{s}^{l} + \sum_{s = 1}^{i} K_{s}^{g}

. Let

p_{j k}

represent the processing time for job j on machine

M_{k}

, where

k \in \{1, \dots, M_{f m}\}

. It is important to note that there are no buffers between consecutive stages within a factory. Consequently, if operation

O_{j i}

is completed while all machines for the subsequent operation are occupied, job

J_{j}

will be blocked on the current machine until a machine in the next stage becomes available. Different from batching transportation of work [20], each job is transported to the target assembly station individually. The transportation time from factory

F_{g}

to assembly station

A_{a}

for job

J_{j}

is denoted as

r_{g a}^{j}

, and it is assumed that the buffer during assembly is sufficiently large. The assembly time and the due date of product

P_{h}

are respectively

q_{h a}

and

d_{h}, r e s p e c t i v e l y

. The overarching objective is to minimize makespan and the maximum tardiness across all products, taking into account the inherent heterogeneity of the factories and machines involved.

3.1. Notations

M_{g i} (g = 1, \dots, f)

: The index of the last machine for the ith operation in the gth factory. Here, the indices represent different factories and operations, respectively. It is defined that M_0i = 0 and M_g_,0 = M_g_−1,m;

K = \{K_{1}, \dots, K_{M_{f} m}\}

: The set encompassing all machines;

S_{j i}

: Signifies the start time of the ith operation

O_{j i}

;

B_{h}

: Denotes the start time of the assembly process for the hth product;

y_{j h}

: Binary variable, equal to 1 if job j belongs to set

P_{h}

, and 0 otherwise.

X_{j}^{k}

: Binary variable, 1, if job j is processed on machine k;

γ_{j, j^{'}}^{k}

: Binary variable, 1, if job j precedes job j’ on machine k.

Y_{h}^{a}

: Binary variable, 1, if product h is processed on assembly machine a;

β_{h, h^{'}}^{a}

: Binary variable, 1, if product h precedes product h’ on assembly machine a.

U

: A large positive value, utilized as a big-M constant in the mathematical model to enforce certain logical constraints.

3.2. Mathematical Model

Equation (1) if the objective, aiming to minimize the maximum completion time and the maximum tardiness.

M i n C_{m a x} = \max (C_{h}), T_{m a x} = m a x (T_{h})

(1)

Equation (2) forces each factory to have at least one job, and it is sufficient to determine that at least one job passes through the first processing stage of each factory.

\sum_{j = 1}^{n} \sum_{k = M_{g - 1, m} + 1}^{M_{g 1}} X_{j}^{k} \geq 1, \forall g

(2)

Equations (3) and (4) indicate that each job needs to undergo m operations, and each operation can only be performed by one machine.

\sum_{k = 1}^{M_{f m}} X_{j}^{k} \geq m, \forall j, i

(3)

\sum_{g = 1}^{f} \sum_{k = M_{g, i - 1} + 1}^{M_{g i}} X_{j}^{k} \geq 1, \forall j, i

(4)

Equations (5) and (6) ensure that each job is assigned to only one factory, which means that all operations for each job are performed within that same factory. Furthermore, all operations, starting from the second one, are processed in the same factory as the first operation.

\sum_{k = M_{g - 1, m + 1}}^{M_{g m}} X_{j}^{k} = m \cdot \sum_{k = M_{g - 1, m} + 1}^{M_{g 1}} X_{j}^{k}, \forall j

(5)

\sum_{k = M_{g, i - 1} + 1}^{M_{g i}} X_{j}^{k} = \sum_{k = M_{g - 1, m} + 1}^{M_{g 1}} X_{j}^{k}, \forall g, i = 2, \dots, m

(6)

Equation (7) demonstrates that each machine can process only one job every time.

γ_{j^{'} j}^{k} + γ_{j j^{'}}^{k} = X_{j}^{k} \cdot X_{j^{'}}^{k}, \forall k, j \neq j^{'}

(7)

Equation (8) shows that the processing sequence of two jobs within a factory remains the same through all operations.

\sum_{k = M_{g, i - 1} + 1}^{M_{g i}} γ_{{j j}^{'}}^{k} \geq \sum_{k^{'} = M_{g, i^{'} - 1} + 1}^{M_{g i^{'}}} γ_{j^{'} j}^{k}, \forall g, j \neq j^{'}, i > 1, i^{'} = 1, \dots, i - 1

(8)

Equations (9) and (10) imply that once the processing of a job begins, it cannot be halted until completion.

S_{j 1} \geq 0, \forall j

(9)

S_{j, i + 1} \geq S_{j i} + X_{j}^{k} \cdot p_{j k}, \forall j, i \leq m - 1, k = M_{g, i - 1} + 1, \dots, M_{g i}

(10)

Equations (11) and (12) represent the constraints of job sequencing and blocking, stating that a job can leave a machine only when its next stage is released.

S_{j^{'}, i} \geq S_{j i} + p_{j k} + (γ_{j j^{'}}^{k} + X_{j}^{k} + X_{j^{'}}^{k} - 3) \cdot U, \forall j \neq j^{'}, k = M_{g, i - 1} + 1, \dots, M_{g i}

(11)

S_{j^{'} i} \geq S_{j, i + 1} + (γ_{j j^{'}}^{k} + X_{j}^{k} + X_{j^{'}}^{k} - 3) \cdot U, \forall j \neq j^{'}, k = M_{g, i - 1} + 1, \dots, M_{g i}

(12)

Equations (13) and (14) indicate that during the assembly stage, each product can only be completed by one assembly machine, and each assembly machine must process at least one product.

\sum_{a = 1}^{w} Y_{h}^{a} = 1, \forall h

(13)

\sum_{h = 1}^{t} Y_{h}^{a} \geq 1, \forall a

(14)

Equation (15) specifies that an assembly machine can assemble only one product each time.

β_{h^{'} h}^{a} + β_{h h^{'}}^{a} = Y_{h}^{a} \cdot Y_{h^{'}}^{a}, \forall a, h \neq h^{'}

(15)

Equations (16) and (17) mean that a product cannot begin to assemble until all completed jobs have been transported to the assembly station. In addition, the assembly machine must be available before a product can be assembled.

B_{h} \geq S_{j m} + p_{j k} + r_{g a}^{j} + (X_{j}^{k} + y_{j h} - 2) \cdot U, \forall a, h, j; k = M_{g, m - 1} + 1, \dots, M_{g m}

(16)

B_{h^{'}} \geq B_{h} + q_{h}^{a} + (β_{h h^{'}}^{a} + Y_{h}^{a} + Y_{h^{'}}^{a} - 3) \cdot U, \forall a, h \neq h'

(17)

Equations (18) and (19) respectively represent the completion time and tardiness for each product.

C_{h} \geq B_{h} + Y_{h}^{a} \cdot q_{h}^{a}, \forall a, h

(18)

T_{h} \geq C_{h} - d_{h}

(19)

The mathematical model has been verified and validated through the Gurobi solver. To illustrate this issue with a simple case, consider the following parameters: t = 10, n = 16, f = 2, and w = 2. Each operation can be carried out on 1 to 2 parallel machines. Figure 2 shows the Gantt charts for two different schedules. In Figure 2a, the schedule is generated without blocking constraints. While in Figure 2b, the same schedule is executed considering the blocking constraints. It is evident that the schedule with the blocking constraint results in a larger time span and a longer system idle time. Therefore, the optimal solution for a mixed-flow workshop may not remain optimal when the blocking constraint is taken into account.

4. Symmetry Analysis of Q-Learning Reinforced NSGA-II

4.1. Symmetric Solution Representation

The solution space is partitioned into four symmetric components: jobs sequence

S^{J}

, jobs assignment

M^{J}

, assembly sequence

S^{P}

and assembly assignment

M^{P}

. This decomposition establishes stage-wise symmetry between processing and assembly operations, enabling parallel optimization of manufacturing stages and consistent constraint handling across stages.

The approach prioritizes minimizing processing time, initially computing the average processing time for each job across all factories. Subsequently, the product’s overall job processing time is prioritized to obtain the sorted order of product parts. The next step involves distributing tasks to factories according to which can complete waiting tasks the fastest. Within each factory, machines are assigned based on which can finish the current operation the soonest, leading to the initial assignment of machines for the first phase. The job sequence S^J is constructed from t part sequences S_h^J. The part sequences S_h^J and product sequence π are determined first, followed by connecting the t part sequences according to π.

For example, if a problem contains three products and nine jobs, and its subsequences are, respectively, S₁^J = [8, 6, 1, 9], S₂^J = [3, 4], S₃^J = [2, 5, 7], π = [3, 2, 1], then the complete job sequence is S^J = [2, 5, 7, 3, 4, 8, 6, 19]. It is crucial to emphasize that the job assignment to factory machines and the allocation of products to assembly machines may exhibit variability, even when utilizing the same

S^{J}

and

S^{P}

within the problem.

4.1.1. Subsequence Initialization for Jobs

Subsequences for S_h^J are obtained by the following steps:

Set i = 1;
Calculate the average processing time of operation O_ji in available machines of factory F_g as $p_{j i}^{g} = \sum_{k = M_{g, i - 1} + 1}^{M_{g i}} p_{j k} / K_{i}^{g}$ ;
Compute the average processing time of O_ji across all factories as $p_{j i} = \sum_{g = 1}^{f} p_{j k}^{g} / f$ ;
Let i = i + 1. If i = m, proceed to the next step; otherwise, return to Step 2;
Compute the total processing time of job J_j with $p_{j} = \sum_{i = 1}^{m} p_{j i}$ .

Subsequently, the jobs for P_h are arranged in ascending order by the value of p_j, resulting in the sequence of parts S_h^J.

Then the products are sorted according to the expected urgency, which can be evaluated with

{E U}_{h} = d_{h} - \sum_{j = 1}^{n} p_{j} \cdot y_{j h}

. Finally,

π

is obtained by sorting

d_{h}

in ascending order, resulting in

S^{J} = ⋃_{h = 1}^{t} S_{h}^{J}

.

4.1.2. Jobs Assignment to the Target Factory and Machines

Taking into account the constraint (6), which stipulates that each job can arrive at only one factory. Once the target factory is determined by the target machine for the first operation, the target machine for the remaining operations of each job can be selected according to the heuristic rule.

Specifically, let the current available time of a machine in factory F_g for the first operation be T_gk, where

k = M_{g, 0} + 1, \dots, M_{g, 1}

.

Then the target machine and factory are determined by minimizing the expression in Equation (20):

(g^{*}, k^{*}) = \arg \underset{g = 1, \dots, f; k = M g, 0 + 1, \dots, M g, 1}{m i n} (T_{g k} + \sum_{i = 1}^{m} p_{j i}^{g})

(20)

Subsequently, the remaining job operations are processed in the target factory F_g_∗. The target machine is selected based on Equation (21), where T_gk is the current available time of the machine in factory g^∗ for the ith operation. Variable X_jk∗ = 1 is then satisfied for j = 1, …, n.

k^{*} = \arg \underset{k = M g^{*}, 0 + 1, \dots, M g^{*}, 1}{m i n} (T_{g^{*} k} + \sum_{i = 1}^{m} p_{j k})

(21)

4.1.3. Products Sequencing and Allocation

After the jobs are assigned, the completion time C_ji (for i = 1, …, m) of each job in factory Fg can be determined. Additionally, the expected lead time of each product is calculated using Equation (22). The sequence of products in the assembly stage is obtained by sorting

E_{h}

in non-descending order.

E_{h} = d_{h} - \underset{j = 1, \dots, n}{m a x} {y_{j h} \cdot (C_{j m} + \sum_{a = 1}^{w} r_{g a}^{j} / w)}

(22)

To minimize the maximum tardiness, the assembly sequence is determined using Equations (23) and (24). Here, Ta represents the current available time of assembly machine a, C_jm is the completion time of job j in the processing stage, and r_ga^j is the transportation time of job j from its optimal factory g to assembly machine a. The variable Y_h^a^∗ = 1 is then confirmed.

E_{h}^{a} = \sum_{j = 1}^{n} y_{j h} (C_{j m} + r_{g a}^{j})

(23)

a^{*} = \arg \underset{a = 1, \dots, w}{m i n} (\max (T_{a}, E_{h}^{a}) + q_{h}^{a})

(24)

4.2. Mechanism and Four-Tuple in Reinforcement-Learning-Enhanced NSGA-II

The NSGA-II effectively handles two-objective conflicts by balancing them through nondominated sorting and crowding distance while maintaining population diversity.

However, the decision-making process for complex problems can sometimes lead to local optima. To address this challenge, recent years have witnessed the integration of well-designed search methods to further enhance solutions [23,33,34,35,36]. In pursuit of this goal, we propose a QNSGA to expedite the optimization process, as shown in Figure 3.

RL algorithms excel in maximizing cumulative rewards through sequential actions in complex environments [37,38,39,40].

Q-learning is a classical model-free RL algorithm designed for value-based learning within a Markov Decision Process. Both Q-learning and SARSA continuously optimize the Q-value function using the Bellman equation, as shown in (25) and (26). The key distinction lies in their update methods: Q-learning, being bold and greedy, is indifferent to pitfalls, and SARSA adopts a more conservative approach, making it sensitive to errors and cautious. Q-learning is defined by a four-tuple (S, A, R, P), where S represents the state space capturing the agent’s schedule in the problem, A denotes the action space encompassing all possible selections when a state is reached, R reflects the reward function evaluating the effectiveness of an action in a given state, and P indicates the state transition probability function from one state to the next when an action is taken.

Q (s_{t}, a_{t}) \leftarrow (1 - α) \cdot Q (s_{t}, a_{t}) + α \cdot (r_{t} + γ \cdot m a x_{a^{'}} Q (s_{t}, a'))

(25)

Q (s_{t}, a_{t}) \leftarrow (1 - α) \cdot Q (s_{t}, a_{t}) + α \cdot (r_{t} + γ \cdot Q (s^{'}, a'))

(26)

Here,

α

represents the learning rate of the agent,

γ

is the discount factor,

r_{t}

denotes the immediate reward from the system by acting

a_{t}

in state

s_{t}

, and

m a x_{a^{'}} Q (s_{t}, a^{'})

is the largest Q-value among all actions at state

s_{t}

.

The agent makes decisions based on optimal actions derived from the current Q-values in the Q-table. Initially, all Q-values are initialized to zero and updated as the agent learns from its experiences. By incorporating feedback from the exploration environment, the agent leverages its acquired knowledge to explore better alternatives. The ultimate objective is to achieve the best Q-table, leading to the most effective decision sequence or solution for the problem. The process of optimizing the Q-table involves striking a balance between exploration and exploitation, making the choice of strategy crucial. The

ϵ

-greedy strategy is a common approach to updating the Q-table. Under this strategy, the agent randomly selects an action

a

if a random value is less than

ϵ

. Otherwise, it selects the action

a = a r g m a x_{a^{'}} Q (s_{t}, a')

with the highest Q-value.

The Q-learning process is used to improve NSGA-II (QNSGA), and the components of the four-tuple are associated with algorithm parameters and strategies, as discussed in previous work [41], which was proposed for single-objective problems. Specifically,

s_{t}

represents the population evaluation in the tth iteration. The quality of

a

‘population’ can be assessed through changes in non-dominated solutions, crowding distances of each non-dominated front, and fitness improvements in the best solutions.

a_{t}

denotes the selected update method,

r_{t}

quantifies the degree of improvement produced by the selected action for the population, and P can be a random transition probability due to random search strategies.

4.3. States Construction in Q-Learning

In the QNSGA, the initial population is generated with random solutions and subsequently updated using one or more search strategies. Each population is evaluated across three dimensions, each compared with the previous one. The first dimension of population quality is the relative count of advantageous objectives compared with the last population, denoted as

{N f}_{t}

. Suppose that the ith best solution fitness in the tth population is

D_{i t}

. In dual-objective problems,

{N f}_{t} = 2

if

D_{i t} > D_{i, t - 1} \forall i

.

The second dimension

L_{t}

assesses the number of non-dominated solutions. If the number of nondominated solutions

l_{t}

is larger than the former population

l_{t - 1}

, the indicator

L_{t}

is one, otherwise, it is zero.

The third dimension

T D_{t}

is the relatively large distance of the non-dominated front. If the total crowding distance is larger than the former population,

T D_{t}

is 1, otherwise, it is 0. Consequently, the total number of possible states is 12,

S = {S (1), \dots, S (12)}

, as summarized in Table 1.

12 states are defined by symmetric triple evaluation with three dimensions:

Nf_t: Dominance improvement (symmetric scaling);

L_t: Non-dominated solution count (cardinal symmetry);

TD_t: Crowding distance (diversity metric symmetry).

4.4. Actions Symmetry Construction in QNSGA

In QNSGA, actions are comprised of combinations of searching and preservation strategies to explore and update the population. Each solution consists of four parts: job sequence, job assignment to machines, product sequence, and product assignment to assembly machines. Different searching methods are developed for various scenarios. This section introduces two metaheuristics of VNS for solution evolution and two crossover operators for solution interaction. It is noteworthy that each update in the processing stage triggers the corresponding update in the assembly stage based on heuristic methods. However, the update in the assembly stage does not affect the processing stage.

As depicted in Figure 4,

N_{1}

attempts to exchange two jobs

y_{j h} = y_{j^{'} h}

belonging to a product but assigned to different factories

X_{j}^{k} \neq X_{j^{'}}^{k} (\forall k)

. The distinction between

N_{2}

and

N_{1}

is that the assigned factories of the two jobs are the same. Moving on to

N_{3}

, the target factory assigned to a job is randomly changed.

N_{4}

involves selecting two different jobs of a product and reversing their order. In

N_{5}

, the factory with the most jobs is identified, and one job is selected and assigned to the shortest factory.

N_{6}

is similar to

N_{5}

, but it transfers a job from the factory with the largest completion time to the one with the least completion time.

Continuing with the longest factory, two different jobs are selected and swapped in the sequence with

N_{7}

. In

N_{8}

, the processing machine with the most jobs are selected, and two jobs are swapped within it. In

N_{9}

, part sequences of

S_{1}^{J}

and

S_{2}^{J}

are exchanged in the total sequence.

The assembly stage is also updated by changing the assembly machine of a product using

N_{10}

; then one of the products in the most popular assembly machine is transferred to the one with the minimum load by

N_{11}

.

With any of the 11 neighborhood structures mentioned above, each solution can be partially updated independently. To balance the updating of the processing stage and assembly stage, two VNS methods,

{V N S}_{1}

and

{V N S}_{2}

, are designed by combining neighborhood structures

N_{1}

to

N_{11}

.

{V N S}_{1}

operates on a solution

x_{0}

with the following steps:

Set b = 1 and $e$ = 0;
Generate a solution $x_{b}$ with a random search $N_{b}$ based on $x_{0}$ ;
If $x_{b}$ is superior to $x_{0}$ , update $x_{0} = x_{b}$ , set $e = 1$ , and proceed to Step 5; otherwise, increment b and go to Step 4;
If b = 11, proceed to Step 5; otherwise, return to Step 2;
Output the solution $x_{0}$ and terminate the search.

As the eleven neighborhoods do not have prioritization during the search process,

{V N S}_{2}

reorders the eleven neighborhood structures and is executed similarly to

{V N S}_{1}

.

Neighborhood structures exhibit structural symmetry:

Intra-factory symmetry: N₂ (job swap in same factory) and N₈ (job swap on same machine) share identical permutation logic;
Cross-stage symmetry: N₅ (job transfer between factories) and N₁₁ (product transfer between assembly machines) both implement load-balancing heuristics;
Unified procedure: Both stages use identical VNS update mechanisms (VNS₁, VNS₂).

The VNS methods mentioned earlier operate on the solution itself. In contrast, two crossover operators,

C_{1}

and

C_{2}

, are designed to evolve the job sequence (Figure 5). The process of

C_{1}

with solutions

p_{1}

and

p_{2}

involves randomly generating two numbers, s and e (s < e), selecting jobs between s and e from both

p_{1}

and

p_{2}

, then swapping the order of jobs in

p_{1}

and

p_{2}

, keeping the others unchanged.

C_{2}

operates by finding a cycle of jobs and exchanges the jobs outside the cycle. For example, assume that the random job in

p_{1}

is 7, and the job in the same position of

p_{2}

is 6, then the position of the 6 in

p_{1}

is 7, the job in position 7 pf

p_{2}

is 8. The procedure continues until a cycle 7-6-8-9-3-7 is obtained. Jobs in the cycle remain while jobs outside the cycle exchange the position in two positions.

Both operators preserve permutation symmetry:

C₁: Positional swap maintaining sequence validity

C₂: Cycle-based exchange preserving topological order.

In addition to the updating methods, the preservation strategy is also a crucial part of updating the population actions. For QNSGA with an adaptive strategy, aggressive exploration is encouraged in the early stages, followed by conservative exploitation of the found experience or solution. The consequence of this phenomenon is the trade-off between exploration and exploitation, thus creating an equilibrium. Consequently, both the

ϵ

-greedy strategy and the greedy strategy are integrated into the action set.

Eight actions

A

= [

A (1), \dots, A (8)]

are established, as listed in Table 2, where

{A c c}_{1}

is the

ϵ

-greedy acceptance rule, and

{A c c}_{2}

is the greedy acceptance rule.

A_{1}

works as follows.

Randomly select two solutions $x_{1}$ and $x_{2}$ from different Pareto fronts, where $x_{1}$ dominates $x_{2}$ .
Generate two offspring solutions $x_{g 1}$ and $x_{g 2}$ with crossover $C_{1}$ and compare them with $x_{1}$ .
If the dominated one of $x_{g 1}$ and $x_{g 2}$ can dominate $x_{1}$ , substitute $x_{1}$ with the better solution; otherwise, let r = rand(0,1), if r $> ϵ$ , substitute $x_{1}$ with the better solution; otherwise, update $x_{1}$ with ${V N S}_{1}$ based on the $ϵ$ -greedy acceptance rule.

4.5. Reward Function Determination

A reward is generated when a superior population is produced. In Section 4.4, twelve states are proposed based on possible changes in a population. In the standard NSGA-II, if the best solution of a population dominates that of another population, then it is considered better and should be encouraged. Consequently, State 9 is valued more than State 1. On the contrary, we prioritize state 5 over state 1 to promote the exploration of solutions. Additionally, considering the number of non-dominated solutions, more substantial crowding distance may lead to better solutions. Therefore, State 3 is assigned greater importance than State 2. The most favorable state is State 12, indicating the generation of a dominant solution. In summary, the reward function is defined as

r_{t} = θ_{s t} - θ_{s t - 1}

, where θ_st is the labeled state of s_t in Table 1.

4.6. Procedure of QNSGA

The algorithm’s parameters consist of the population size (N), the maximum execution time (T_M), the crossover probability (p_c), and the acceptance probability for suboptimal solutions (ϵ). The flowchart of the QNSGA algorithm is presented in Figure 3.

Initially, a population comprising N solutions is generated randomly. Following this, the population undergoes fast non-dominated sorting, yielding a set of non-dominated front solutions, denoted as NF = {F₁, …, F_lt}, where lt (≤N) is the count of non dominated solutions in the tth generation. To create a new population, solutions from

P

\NF are selected to execute the action at = (Ck, VNSi, Accp). The procedure is outlined in Algorithm 1. Conversely, in study [20], batched transportation constraints yield fragmented feasible solutions ill-suited for reinforcement learning.

Upon obtaining a new population

P

, the next step involves merging

P

and P if the maximum generation count is not reached. Simultaneously, the quality of

P

is evaluated, updating the state st. Subsequently, the reward generated by the state transition and at is calculated, refreshing the Q-table. The procedure to update the Q table with twelve states and eight actions is determined by Algorithm 2. In this algorithm, the ϵ-greedy policy is taken to select the target action, while the ϵ-greedy acceptance rule in the designed actions Ai (i = 1, …, 8) in Section 4.5 is utilized to determine the updated solution.

Algorithm 1: Generate a new population of size N.

Require: Original population

P

, selected action a_t = (C_k, VNS_i, Acc_p)
Ensure: A new population

P

j \leftarrow l_{t}

, P \leftarrow N_{F}

while j ≤ N do
Select a solution

x r

randomly from

P

\NF
Select an elite solution x g randomly from NF
Cross xr and xg with Ck
if xr dominates x g then
Append x_r to

P

j++
else
Randomly generate r ⇐ rand (0, 1)
if r is acceptable with Acc_p then
Append x_r to set

P

else
Search the neighborhood solution of x_r with VNS_i
if xr is acceptable with Accp then
Append xr to set P
j ++
end if
end if
end if
end while

Algorithm 2: QNSGA for the problem

Require: Problem environment set, learning rate α, discount factor γ, exploration rate ϵ, maximum running time TM Initialize the population

P .

Perform non-dominated sorting and compute the crowding distance.
Initialize the Q-table, current time, and state s based on the population P of the initial generation.
Select an action a from state s by applying the ϵ-greedy policy informed by the Q-table.
while t < TM do
Take action a to generate a new population

P

, observe the new state s′ and reward
Choose action a′ from s′ using the ϵ-greedy policy derived from the Q-table
Perform non-dominated sorting and compute the crowding distance.
Update the Q-value for the current state-action pair using Equation (25)
Update current state and action: s ← s′, a ← a′
end while

5. Experimental Evaluation and Industrial Validation

This section systematically assesses the performance of the proposed QNSGA when applied to the three-stage DAPFSP. The evaluation is conducted through an extensive suite of numerical experiments, aimed at quantifying the algorithm’s efficiency and efficacy. Initially, the QNSGA undergoes a meticulous calibration process to fine-tune its parameters. Subsequently, its performance is juxtaposed with that of established multi-objective optimization algorithms (including Gurobi 11.0.0).

All computational experiments in this study were implemented in Python 3.10 for its robustness and versatility in scientific computing. The experiments were conducted on an Intel Core i7-8700H processor, 16 GB of RAM, and the Windows 11 operating system, ensuring a consistent and powerful computational environment for algorithmic comparison.

5.1. Experimental Methodology

Random instances are generated based on the problem-specific parameters outlined in Table 3. These instances vary according to the number of products (t), number of operations (m), number of factories (f), number of jobs per product (|P_h|), number of parallel machines (K_i^l), and number of assembly machines. Two separate experimental sets are designed to independently validate the proposed methodologies. For the small-scale cases, the number of parallel machines is selected from the set {1, 2, 3}, whereas for the large-scale cases, it is chosen from {1, 2, 3, 4, 5}. The number of parallel assembly machines is defined as ⌈t/2⌉. Machine operation times are randomly generated within the interval [1, 99], consistent with common practices in related scheduling research [15,42]. Assembly times for each product are dependent on the number of assigned jobs and are sampled from a specified interval. Transportation times between factories and assembly facilities are randomly drawn from the interval [1, 49], as reported in prior studies [3]. Due date times are established with reference to existing literature [43].

In total, 108 parameter combinations are constructed, with ten instances generated for each combination, resulting in a total of 1080 instances. The experiments utilize uniform termination criteria and consistent encoding and decoding procedures across all algorithms employed.

Three distinct performance metrics are employed to assess the effectiveness of the algorithms [4,14,44]: (1) Spread (Δ), which measures the uniformity of the distribution within the Pareto solution set; (2) Generational Distance (GD), which evaluates the convergence behavior of the algorithms; and (3) Inverse Generational Distance (IGD), a composite metric that concurrently captures both uniformity and convergence. Following the execution of each algorithm, the resulting non-dominated solution sets are meticulously documented and subsequently amalgamated with those generated by other algorithms. Following the conclusion of all experimental runs, a comprehensive ideal non-dominated solution set is to be established.

The experimental setup is illustrated in Figure 6. The parameters of the QNSGA algorithm are optimized in accordance with the principles of Design of Experiments (DOE) and Analysis of Variance (ANOVA), methodologies frequently utilized in scheduling research for algorithm calibration [7,38]. In order to validate the accuracy of the proposed mathematical model, it was solved using the Gurobi 11.0.0 solver across multiple problem instances. Furthermore, a comparative analysis was conducted between QNSGA and its variants to ascertain the efficacy of its core mechanisms. Subsequent to the comparative evaluation, statistical differences among the various methods were analyzed using ANOVA. In conclusion, the proposed approaches were applied to a pharmaceutical manufacturing case study.

5.2. Calibration of Algorithmic Parameters

The effectiveness of the QNSGA algorithm depends critically on the precise adjustment of four key parameters: the initial population size (N), the greediness coefficient (

ϵ

), the learning rate (

α

), and the discount factor (

γ

). The optimization of these parameters is conducted through a systematic approach employing Analysis of Variance (ANOVA) in conjunction with Design of Experiments (DOE) techniques. ANOVA serves as a robust statistical tool for discerning significant variances among group means, a technique instrumental in parameter optimization [45]. DOE, conversely, provides a structured framework for experimental planning, enabling a comprehensive assessment of how individual factors influence the outcome variable [46].

Preliminary empirical data have been utilized to delineate potential parameter ranges: N = {10,20,30,50},

ϵ = {0.01, 0.1, 0.15, 0.2}

,

α = {0.1,0.15, 0.2, 0.3}

,

γ = {0.8,0.85,0.9,0.99}

. To mitigate the risk of overfitting and to ensure the generalizability of the model, 16 distinct problem instances have been artificially constructed for the calibration process. These instances encapsulate a diverse spectrum of scenarios, defined by varying combinations of t ∈ {10, 20}, m ∈ {3, 5}, f ∈ {2, 5}, |P_h|

\in

{4,6}, with additional parameters detailed in Table 3. For each combination, five unique instances are crafted, and each is subjected to ten independent runs. The computational budget for each run is capped at T_max = ln (n) × f × m CPU seconds, a criterion that has been observed to yield near-optimal solutions for smaller instances when compared against the Gurobi solver.

The response metric is articulated as

ρ_{i} = \frac{| P F_{i} \cap P F^{*} |}{| P F^{*} |}

, for the ratio of nondominated solutions to the cardinality of the true Pareto front. A larger ρ_i indicates a better combination of algorithm parameters. The empirical results, as articulated in Table 4, reveal that the optimal parameter set is configuration 6, with corresponding p-values that are uniformly below the threshold of 0.05, denoting a statistically significant parameter impact on the QNSGA’s performance. Figure 7 displays a mean response plot accompanied by 95% Least Significant Difference (LSD) intervals, providing visual confirmation of the optimality of the parameter set: N = 20,

ϵ = 0.1

,

α = 0.15

and

γ = 0.99

.

5.3. Assessment of the Mathematical Model

The efficacy of the mathematical model is rigorously examined utilizing the Gurobi mathematical programming solver, adhering to the methodological frameworks established in pertinent literature [37]. We employ small-scale instances for this validation exercise, allocating a maximum solving duration of 2100 s to Gurobi, whereas the runtime for QNSGA is governed by T_max = t × f × m CPU seconds.

Since the Gurobi solver does not inherently support bi-objective optimization models, we effectively reformulate the problem into an equivalent single-objective framework. Let the objectives of the optimal solution derived from six heuristics be denoted as f₁⁰ and f₂⁰. The composite objective function is then articulated as

f = f_{1} / f_{1}^{0} + f_{2} / f_{2}^{0}

, where equal weighting is applied to both objectives, underscoring their equivalence in significance and the proportional reduction observed in our experiments. This methodological adaptation enables us to harness Gurobi’s computational prowess while adhering to the unique constraints inherent in our model. The Relative Percentage Deviation (RPD) serves as the evaluative metric, operationalized through Equation (27).

R P D = \frac{f_{i} - f_{b e s t}}{f_{b e s t}} \cdot 100 %

(27)

where

f_{i}

is the ith objective is yielded by a given algorithm for a specified instance, while

f_{b e s t}

represents the makespan or maximum tardiness of the optimal solution identified across all compared methodologies, inclusive of the MILP solver.

Table 5 presents a comparative analysis of the computational outcomes and the average execution time for both the MILP solver and QNSGA, delineating the mean and standard deviation (STD) of RPD. The results presented in Table 5 indicate that the MILP solver attains the optimal solution within relatively short computational times for both objectives solely in the case where t = 3. For t = 5, while the model identifies the optimal solution for maximum tardiness, the computational expense escalates, nearing 1404.29 s. The model encounters insurmountable challenges in optimally resolving the majority of instances with t = 8, with computational times reaching the threshold of 2100 s. QNSGA, while slightly outperformed by the MILP solver for t = 3 and 5 on the second objective, demonstrates superiority over the Gurobi solver for t = 8 on the first objective. As the problem scale escalates, QNSGA consistently yields more satisfactory outcomes compared to the MILP solver. More notably, QNSGA’s computational demands are significantly lower than those of the MILP solver. Concurrently, QNSGA exhibits a lower standard deviation across the majority of instances, indicative of its enhanced stability. Consequently, QNSGA is capable of generating promising solutions for small-scale instances within a compressed time frame, whereas the MILP solver is deemed more suitable for instances of a smaller scale.

5.4. Effectiveness of Eight Actions

This section delves into the efficacy of the eight distinct actions within the QNSGA, as enumerated in Table 2. To assess these actions, we introduce eight specialized variants of QNSGA, denoted as

V_{i}

, each exclusively incorporating a single action

a_{i}

. A rigorous experimental framework is employed, with each variant subjected to ten iterations on large-scale problem instances. The computational budget for these experiments is capped by a maximum running time, calculated as

T_{m a x} = t \times l n (f \times m \times | P_{h} |)

CPU seconds, ensuring a consistent evaluation baseline.

Table 6 synthesizes a comparative analysis between the original QNSGA and its eight variants across large-scale instances, while Figure 8 graphically represents the average performance metrics per instance. Employing Equation (22), the optimal performance metric is identified by benchmarking against all compared algorithms.

The data in Table 6 consistently indicate that QNSGA outperforms its variants in identifying dominant solutions across three key metrics. Notably, the variant

V_{1}

excels, whereas

V_{8}

lags behind. The performance metrics escalate with an increase in parameters t, m, and |P_h|, whereas an increase in the factory parameter f is associated with a decline in metric values. This inverse relationship suggests that the problem’s complexity diminishes with a higher number of factories, potentially due to increased flexibility in scheduling. The variants, constructed from three pairs of operators, reveal the intrinsic differences between the searching operators. Figure 8 presents the 95% interval Least Significant Difference (LSD) among the algorithms, highlighting an ascending trend in performance metrics for the variants, with the exception of QNSGA. The operators forming

V_{1}

emerge as the most efficacious.

Further validation is provided by the Friedman test in Table 7, applied to the nine algorithms, which confirms significant performance differences among them at a threshold of 0.05. This prompts a deeper investigation into the frequency of operator application. Figure 9, presenting boxplots of operator frequency, underscores the superiority of operator C₁ over C₂, Acc₁ over Acc₂, and the

ϵ

-greedy policy over a purely greedy approach.

Additionally, we meticulously recorded the application frequency of each operator throughout the QNSGA’s iterative process. The findings indicate that the VNS operator VNS₁ contributes most significantly, followed by the

ϵ

-greedy policy, crossover

C

₁ and the remaining trio of operators.

5.5. Comparison with State-of-the-Art Algorithms

This section provides a comparative analysis of the proposed algorithm in relation to several state-of-the-art methods developed for similar problems. The algorithms selected for this comparison are MOPSO [22], QSFL [31], RLABC [41], CWWORL [40], and CMAF [47]. Additionally, to benchmark the performance of our Q-learning integrated NSGA-II (QNSGA), we have adapted the SARSA algorithm into NSGA-II, denoted as SNSGA. Given that these selected algorithms were not originally tailored for the specific problem addressed in this paper, we retained their core algorithmic frameworks while modifying other components, such as encoding, decoding, crossover, mutation, and neighborhood structures, to align with the requirements of our study.

Employing the established metrics of Δ, GD, and IGD, the comparative results of the metaheuristic algorithms are displayed in Table 8 and illustrated in Figure 10. The findings indicate that both SNSGA and QNSGA generally outperform the other five algorithms across various instances. Notably, SNSGA surpasses QNSGA in certain scenarios where the problem scale is modest; however, QNSGA demonstrates a marked dominance over SNSGA in the majority of cases, particularly as the problem scale escalates. The LSD intervals depicted in Figure 10 underscore significant performance differences among the algorithms, which are especially pronounced with large-scale problems. Furthermore, the metric values are positively influenced by an increase in parameters t, m, and |P_h|, and inversely affected by an increase in the factory parameter f.

To statistically validate the comparative results, the Friedman test is applied to the three metrics obtained from the seven metaheuristic algorithms. The results, as documented in Table 9, confirm the existence of statistically significant differences between the algorithms. Additionally, the Pareto front solutions from twelve distinct cases are visualized in Figure 11, where the non-dominated solutions of the seven algorithms are juxtaposed. The visualization clearly demonstrates that QNSGA consistently identifies superior dominant solutions.

5.6. Industrial Case Study: Wind Turbine Manufacturing Application

This section presents an industrial case study within the wind turbine manufacturing sector, focusing on the application of the DHFSTFA model and the QNSGA algorithm to a real-world problem [48]. The case involves the assembly of nine distinct products, each requiring a sequence of operations including material intake, identification engraving, milling, polishing, cleaning, and final inspection. The workshop is equipped with multiple machines capable of performing identical operations, such as milling machines X52K, HCN6800-II, and BV100, among others, which underscores the complexity and variability inherent in the scheduling process.

The objective of this case study is to optimize the scheduling order for each job to minimize the makespan and reduce the maximum tardiness. Utilizing seven metaheuristic algorithms, including the proposed QNSGA, the case is solved, and Pareto front solutions are obtained. Figure 12 illustrates the mean objective values of each set of non-dominated solutions for each algorithm across iterations. It is evident that QNSGA can identify superior solutions more rapidly and converge to the non-dominated solution set with greater efficiency.

A comparative analysis within Figure 13 examines one of the initial solutions and the final solution yielded by QNSGA. The makespan is notably reduced by 49.36% (from 952.65 to 482.43), and the maximum tardiness is eliminated (from 369.36 to 0), which is a testament to the high search efficiency of QNSGA. These results are particularly significant in the context of wind turbine manufacturing, where operational efficiency and timeliness are critical for meeting production goals and ensuring customer satisfaction.

The industrial case study demonstrates the practical applicability of the QNSGA algorithm in a complex manufacturing environment. The significant improvements in makespan and tardiness highlight the potential of QNSGA for optimizing scheduling in the wind turbine industry and other sectors with similar logistical challenges. The case study serves as a validation of the QNSGA’s robustness and effectiveness in real-world scenarios, providing a promising solution for industry professionals seeking to enhance their production scheduling strategies.

6. Conclusions and Future Work

This study introduces a novel three-stage Distributed Heterogeneous Hybrid Flowshop Scheduling with Transportation and Flexible Assembly (DHFSTFA) model, tailored for the wind turbine manufacturing process. The model addresses the intricate logistics of transporting individual workpieces from various manufacturing stages to an assembly station, where final assembly is conducted. Given the complexity of this scheduling paradigm, we have devised a Q-learning augmented Non-dominated Sorting Genetic Algorithm II (QNSGA). Our approach integrates two distinct crossover operators, two variable neighborhood search operators, and an ε-greedy policy to enhance the exploration and exploitation balance within the solution space.

Through rigorous experimentation, we have substantiated the efficacy of our proposed algorithm. Initially, a Taguchi experimental design was employed to identify the optimal parameters of the algorithm. Subsequent small-scale experiments, leveraging the Gurobi solver, validated the mathematical model’s effectiveness, demonstrating that QNSGA achieves performance comparable to Gurobi within a significantly shorter time frame. Ablation studies on the eight actions within QNSGA confirmed their individual effectiveness and established significant differences among them, thereby pinpointing the most effective actions.

Furthermore, we conducted comparative experiments with other algorithms, focusing on the optimization of makespan and maximum tardiness. The results of these comparisons corroborate the superior performance of QNSGA. After each comparative trial, Least Significant Difference (LSD) tests and Friedman tests were conducted. These statistical methodologies provided auxiliary evidence, affirming the superiority of QNSGA.

In future research, we aim to expand our exploration of diverse distributed scheduling challenges and their applicability in real-world scenarios, while simultaneously enhancing the design of RL algorithms to achieve greater efficiency. Additionally, we plan to broaden the scope of our investigations to address the multifaceted demands of contemporary production environments, thereby contributing to the advancement of scheduling methodologies in industrial settings.

Author Contributions

Z.S.: methodology, conceptualization, validation, writing—original draft, H.C.: funding acquisition, writing—review and editing. F.Y.: writing—review and editing. X.D.: writing—review and editing. H.H.: writing—review and editing. J.Z.: writing—original draft, terms, data curation. Q.Y.: methodology, conceptualization, validation, writing—original draft, term, data curation. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the National Natural Science Foundation of China under Grant No. 52305533, the Major science and Technology Project of the Sichuan Province of China under No. 2022ZDZX0003, and the National Key Research and Development Program of China under Grant No. 2023YFB3307900.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Zhiyuan Shi, Fuqian Yan, Xutao Deng, Jialei Zhang, and Qingwen Yin were employed by Dongfang Electric Academy of Science and Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article, or the decision to submit it for publication.

References

Liu, C.G.; Yang, N.; Li, W.J.; Lian, J.; Evans, S.; Yin, Y. Training and assignment of multi-skilled workers for implementing seru production systems. Int. J. Adv. Manuf. Technol. 2013, 69, 937–959. [Google Scholar] [CrossRef]
Zhang, G.; Xing, K.; Zhang, G.; He, Z. Memetic Algorithm with Meta-Lamarckian Learning and Simplex Search for Distributed Flexible Assembly Permutation Flowshop Scheduling Problem. IEEE Access 2020, 8, 96115–96128. [Google Scholar] [CrossRef]
Hao, H.; Zhu, H.; Shen, L.; Zhen, G.; Chen, Z. Research on assembly scheduling problem with nested operations. Comput. Ind. Eng. 2023, 175, 108830. [Google Scholar] [CrossRef]
Wu, X.L.; Liu, X.J.; Zhao, N. An improved differential evolution algorithm for solving a distributed assembly flexible job shop scheduling problem. Memetic Comput. 2019, 11, 335–355. [Google Scholar] [CrossRef]
Dolgui, A.; Ivanov, D.; Sethi, S.P.; Sokolov, B. Scheduling in production, supply chain and Industry 4.0 systems by optimal control: Fundamentals, state-of-the-art and applications. Int. J. Prod. Res. 2019, 57, 411–432. [Google Scholar] [CrossRef]
Zhang, Z.-Q.; Hu, R.; Qian, B.; Jin, H.-P.; Wang, L.; Yang, J.-B. A matrix cube-based estimation of distribution algorithm for the energy-efficient distributed assembly permutation flow-shop scheduling problem. Expert Syst. Appl. 2022, 194, 116484. [Google Scholar] [CrossRef]
Hosseini, S.M.H. Distributed assembly permutation flow-shop scheduling problem with non-identical factories and considering budget constraints. Kybernetes 2022, 52, 2018–2044. [Google Scholar] [CrossRef]
Zheng, J.; Wang, Y. A Hybrid Bat Algorithm for Solving the Three-Stage Distributed Assembly Permutation Flowshop Scheduling Problem. Appl. Sci. 2021, 11, 10102. [Google Scholar] [CrossRef]
Yang, S.; Xu, Z. The distributed assembly permutation flowshop scheduling problem with flexible assembly and batch delivery. Int. J. Prod. Res. 2021, 59, 4053–4071. [Google Scholar] [CrossRef]
Hatami, S.; Ruiz, R.; Andres-Romano, C. The Distributed Assembly Permutation Flowshop Scheduling Problem. Int. J. Prod. Res. 2013, 51, 5292–5308. [Google Scholar] [CrossRef]
Hatami, S.; Ruiz García, R.; Romano, C.A. The Distributed Assembly Parallel Machine Scheduling Problem with eligibility constraints. Int. J. Prod. Manag. Eng. 2015, 3, 13–23. [Google Scholar] [CrossRef][Green Version]
Li, Z.; Qian, B.; Hu, R.; Zhang, C. Adaptive hybrid estimation of distribution algorithm for solving a certain kind of three-stage assembly flowshop scheduling problem. Comput. Integr. Manuf. Syst. 2015, 21, 1829–1845. [Google Scholar]
Kim, M.-G.; Yu, J.-M.; Lee, D.-H. Scheduling algorithms for remanufacturing systems with parallel flow-shop-type reprocessing lines. Int. J. Prod. Res. 2015, 53, 1819–1831. [Google Scholar] [CrossRef]
Huang, Y.-Y.; Pan, Q.-K.; Gao, L.; Miao, Z.-H.; Peng, C. A two-phase evolutionary algorithm for multi-objective distributed assembly permutation flowshop scheduling problem. Swarm Evol. Comput. 2022, 74, 101128. [Google Scholar] [CrossRef]
Fu, Y.; Zhang, Z.; Liang, P.; Tian, G.; Zhang, C. Integrated remanufacturing scheduling of disassembly, reprocessing and reassembly considering energy efficiency and stochasticity through group teaching optimization and simulation approaches. Eng. Optim. 2024, 56, 2018–2039. [Google Scholar] [CrossRef]
Maria Gonzalez-Neira, E.; Ferone, D.; Hatami, S.; Juan, A.A. A biased-randomized simheuristic for the distributed assembly permutation flowshop problem with stochastic processing times. Simul. Model. Pract. Theory 2017, 79, 23–36. [Google Scholar] [CrossRef]
Jiang, N.-Y.; Yan, H.-S. Integrated optimization of production planning and scheduling in uncertain re-entrance environment for fixed-position assembly workshops. J. Intell. Fuzzy Syst. 2022, 42, 1705–1722. [Google Scholar] [CrossRef]
Allahverdi, A.; Al-Anzi, F.S. Evolutionary heuristics and an algorithm for the two-stage assembly scheduling problem to minimize makespan with setup times. Int. J. Prod. Res. 2006, 44, 4713–4735. [Google Scholar] [CrossRef]
Talens, C.; Fernandez-Viagas, V.; Perez-Gonzalez, P.; Framinan, J.M. New efficient constructive heuristics for the two-stage multi-machine assembly scheduling problem. Comput. Ind. Eng. 2020, 140, 106223. [Google Scholar] [CrossRef]
Hao, H.; Zhu, H.; Luo, Y. A multi-objective Immune Balancing Algorithm for Distributed Heterogeneous Batching-integrated Assembly Hybrid Flowshop Scheduling. Expert Syst. Appl. 2025, 259, 125288. [Google Scholar] [CrossRef]
Yuan, Y.; Xu, H. Multiobjective Flexible Job Shop Scheduling Using Memetic Algorithms. IEEE Trans. Autom. Sci. Eng. 2015, 12, 336–353. [Google Scholar] [CrossRef]
Zhao, Y.; Wang, Y.; Tan, Y.; Zhang, J.; Yu, H. Dynamic Jobshop Scheduling Algorithm Based on Deep Q Network. IEEE Access 2021, 9, 122995–123011. [Google Scholar] [CrossRef]
Zhu, C.; Sun, G.; Yang, K.; Huang, Y.; Cheng, D.; Zhao, Q.; Zhang, H.; Fang, X. A comprehensive study on integrated optimization of flexible manufacturing system layout and scheduling for nylon components production. Int. J. Ind. Eng. Theory Appl. Pract. 2022, 29, 979–1001. [Google Scholar]
Johnson, D.; Chen, G.; Lu, Y. Multi-Agent Reinforcement Learning for Real-Time Dynamic Production Scheduling in a Robot Assembly Cell. IEEE Robot. Autom. Lett. 2022, 7, 7684–7691. [Google Scholar] [CrossRef]
Zhou, T.; Luo, L.; He, Y.; Fan, Z.; Ji, S. Solving Panel Block Assembly Line Scheduling Problem via a Novel Deep Reinforcement Learning Approach. Appl. Sci. 2023, 13, 8483. [Google Scholar] [CrossRef]
Sheikh, S.; Komaki, G.M.; Kayvanfar, V. Multi objective two-stage assembly flow shop with release time. Comput. Ind. Eng. 2018, 124, 276–292. [Google Scholar] [CrossRef]
Li, X.; Chehade, H.; Yalaoui, F.; Amodeo, L. A new method coupling simulation and a hybrid metaheuristic to solve a multiobjective hybrid flowshop scheduling problem. In Proceedings of the EUSFLAT Conference, Aix-les-Bains, France, 18–22 July 2011. [Google Scholar]
Campos, S.C.; Arroyo, J.E.C. NSGA-II with iterated greedy for a bi-objective three-stage assembly flowshop scheduling problem. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, Vancouver, BC, Canada, 12–16 July 2014; Association for Computing Machinery: New York, NY, USA, 2014; pp. 429–436. [Google Scholar]
Rahimi, I.; Gandomi, A.H.; Deb, K.; Chen, F.; Nikoo, M.R. Scheduling by NSGA-II: Review and Bibliometric Analysis. Processes 2022, 10, 98. [Google Scholar] [CrossRef]
Du, S.-L.; Zhou, W.-J.; Fei, M.-R.; Nee, A.Y.C.; Ong, S.K. Bi-objective scheduling for energy-efficient distributed assembly blocking flow shop. CIRP Ann. 2024, 73, 357–360. [Google Scholar] [CrossRef]
Cai, J.; Lei, D.; Wang, J.; Wang, L. A novel shuffled frog-leaping algorithm with reinforcement learning for distributed assembly hybrid flow shop scheduling. Int. J. Prod. Res. 2023, 61, 1233–1251. [Google Scholar] [CrossRef]
Gao, Q.; Liu, J.; Li, H.; Zhuang, C.; Liu, Z. Digital twin-driven dynamic scheduling for the assembly workshop of complex products with workers allocation. Robot. Comput.-Integr. Manuf. 2024, 89, 102786. [Google Scholar] [CrossRef]
Tiwari, A.; Chang, P.C.; Tiwari, M.K.; Kollanoor, N.J. A Pareto block-based estimation and distribution algorithm for multi-objective permutation flow shop scheduling problem. Int. J. Prod. Res. 2015, 53, 793–834. [Google Scholar] [CrossRef]
Wang, F.C.; Deng, G.L.; Jiang, T.H.; Zhang, S.N. Multi-Objective Parallel Variable Neighborhood Search for Energy Consumption Scheduling in Blocking Flow Shops. IEEE Access 2018, 6, 68686–68700. [Google Scholar] [CrossRef]
Wang, Z.-Y.; Lu, C. An integrated job shop scheduling and assembly sequence planning approach for discrete manufacturing. J. Manuf. Syst. 2021, 61, 27–44. [Google Scholar] [CrossRef]
Seyyedi, M.H.; Saghih, A.M.F.; Azimi, Z.N. A fuzzy mathematical model for multi-objective flexible job-shop scheduling problem with new job insertion and earliness/tardiness penalty. Int. J. Ind. Eng.-Theory Appl. Pract. 2021, 28, 256–276. [Google Scholar]
Luo, S.; Zhang, L.; Fan, Y. Dynamic multi-objective scheduling for flexible job shop by deep reinforcement learning. Comput. Ind. Eng. 2021, 159, 107489. [Google Scholar] [CrossRef]
Luo, S.; Zhang, L.; Fan, Y. Real-Time Scheduling for Dynamic Partial-No-Wait Multiobjective Flexible Job Shop by Deep Reinforcement Learning. IEEE Trans. Autom. Sci. Eng. 2021, 19, 3020–3038. [Google Scholar] [CrossRef]
Zhang, Y.; Yang, Q.; An, D.; Li, D.; Wu, Z. Multistep Multiagent Reinforcement Learning for Optimal Energy Schedule Strategy of Charging Stations in Smart Grid. IEEE Trans. Cybern. 2022, 53, 4292–4305. [Google Scholar] [CrossRef]
Zhao, F.; Zhang, L.; Cao, J.; Tang, J. A cooperative water wave optimization algorithm with reinforcement learning for the distributed assembly no-idle flowshop scheduling problem. Comput. Ind. Eng. 2021, 153, 107082. [Google Scholar] [CrossRef]
Wang, J.; Lei, D.; Cai, J. An adaptive artificial bee colony with reinforcement learning for distributed three-stage assembly scheduling with maintenance. Appl. Soft Comput. 2022, 117, 108371. [Google Scholar] [CrossRef]
Hao, H.; Zhu, H. A self-learning particle swarm optimization for bi-level assembly scheduling of material-sensitive orders. Comput. Ind. Eng. 2024, 195, 110427. [Google Scholar] [CrossRef]
Maulidya, R.; Wangsaputra, R.; Halim, A.H. A Batch Scheduling Model for a Three-stage Hybrid Flowshop Producing Products with Hierarchical Assembly Structures. Int. J. Technol. 2020, 11, 608–618. [Google Scholar] [CrossRef]
Zhou, B.; Zhao, L. A multi-objective decomposition evolutionary algorithm based on the double-faced mirror boundary for a milk-run material feeding scheduling optimization problem. Comput. Ind. Eng. 2022, 171, 108385. [Google Scholar] [CrossRef]
Xiong, F.; Xing, K.; Wang, F.; Lei, H.; Han, L. Minimizing the total completion time in a distributed two stage assembly system with setup times. Comput. Oper. Res. 2014, 47, 92–105. [Google Scholar] [CrossRef]
Hatami, S.; Ruiz García, R.; Romano, C.A. Heuristics and metaheuristics for the distributed assembly permutation flowshop scheduling problem with sequence dependent setup times. Int. J. Prod. Econ. 2015, 169, 76–88. [Google Scholar] [CrossRef]
Wang, J.-J.; Wang, L. A cooperative memetic algorithm with feedback for the energy-aware distributed flow-shops with flexible assembly scheduling. Comput. Ind. Eng. 2022, 168, 108126. [Google Scholar] [CrossRef]
Zhang, B.; Meng, L.; Lu, C.; Han, Y.; Sang, H. Automatic design of constructive heuristics for a reconfigurable distributed flowshop group scheduling problem. Comput. Oper. Res. 2024, 161, 106432. [Google Scholar] [CrossRef]

Figure 1. Process of DHFSTFA.

Figure 2. Comparison between with and without blocking scheduling for a solution.

Figure 3. Flow of QNSGA.

Figure 4. Partial representation of neighborhood structures.

Figure 5. Two crossover operators.

Figure 6. Experimental Methodology.

Figure 7. Mean plots accompanied by 95% Least Significant Difference (LSD) intervals for QNSGA parameters.

Figure 8. Mean plots with 95% LSD intervals for RPD values of eight variants and QNAGSA in three metrics.

Figure 9. Distribution of count rates for six search operators during large-scale iterations.

Figure 10. Mean plots with 95% LSD intervals for RPD values of state-of-the-art algorithms in three metrics.

Figure 11. Pareto front solutions of state-of-the-art algorithms in twelve cases.

Figure 12. Objectives comparison among metaheuristics and QNSGA during iteration.

Figure 13. Gantt chart comparison between heuristic solution and QNSGA solution. (a) Scheduling Gantt chart of the initial solution; (b) Scheduling Gantt chart of QNSGA.

Table 1. Possible States of the Population in the tth Iteration.

State	Indicator	State	Indicator
$S (1)$	${N f}_{t} = 0, L_{t} = 0, T D_{t} = 0$	$S (7)$	${N f}_{t} = 1, L_{t} = 0, T D_{t} = 1$
$S (2)$	${N f}_{t} = 0, L_{t} = 1, T D_{t} = 0$	$S (8)$	${N f}_{t} = 1, L_{t} = 1, T D_{t} = 1$
$S (3)$	${N f}_{t} = 0, L_{t} = 0, T D_{t} = 1$	$S (9)$	${N f}_{t} = 2, L_{t} = 0, T D_{t} = 0$
$S (4)$	${N f}_{t} = 0, L_{t} = 1, T D_{t} = 1$	$S (10)$	${N f}_{t} = 2, L_{t} = 1, T D_{t} = 0$
$S (5)$	${N f}_{t} = 1, L_{t} = 0, T D_{t} = 0$	$S (11)$	${N f}_{t} = 2, L_{t} = 0, T D_{t} = 1$
$S (6)$	${N f}_{t} = 1, L_{t} = 1, T D_{t} = 0$	$S (12)$	${N f}_{t} = 2, L_{t} = 1, T D_{t} = 1$

Table 2. Eight actions in the construction of QNSGA.

Action	Indicator	Action	Indicator
$A (1)$	$C_{1} + {V N S}_{1} + {A c c}_{1}$	$A (5)$	$C_{2} + {V N S}_{2} + {A c c}_{1}$
$A (2)$	$C_{2} + {V N S}_{1} + {A c c}_{1}$	$A (6)$	$C_{2} + {V N S}_{1} + {A c c}_{2}$
$A (3)$	$C_{1} + {V N S}_{2} + {A c c}_{1}$	$A (7)$	$C_{1} + {V N S}_{2} + {A c c}_{2}$
$A (4)$	$C_{1} + {V N S}_{1} + {A c c}_{2}$	$A (8)$	$C_{2} + {V N S}_{2} + {A c c}_{2}$

Table 3. Parameters settings for both small- and large-scale problems.

Parameter	Small	Large
t	{3, 5, 8}	{10, 30, 50}
m	{2, 3, 4}	{6, 8, 10}
f	{2, 3, 4}	{5, 6, 8}
\|P_h\|	{2, 4}	{10, 15}
w	⌈t/2⌉	⌈t/2⌉
K_i^l	RandSelect {1,2,3}	RandSelect {1,2,3,4,5}
p_jk	U [1,99]	U [1,99]
q_h^a	$[1 \cdot \|N_{h}\|, 99 \cdot \|N_{h}\|]$	$[1 \cdot \|N_{h}\|, 99 \cdot \|N_{h}\|]$
$r_{g a}^{j}$	U [1,49]	U [1,49]
d_h	$(\sum_{j = 1}^{n} p_{j} \cdot y_{j h} + \sum_{a = 1}^{w} q_{h a}) / f$	$(\sum_{j = 1}^{n} p_{j} \cdot y_{j h} + \sum_{a = 1}^{w} q_{h a}) / f)$

Table 4. Taguchi experimental results of QNSGA.

No.	N	$ϵ$	$α$	$γ$	RV
1	1	1	1	1	0.09
2	1	2	2	2	0.11
3	1	3	3	3	0.09
4	1	4	4	4	0.12
5	2	1	2	3	0.12
6	2	2	1	4	0.15
7	2	3	4	1	0.09
8	2	4	3	2	0.08
9	3	1	3	4	0.06
10	3	2	4	3	0.06
11	3	3	1	2	0.06
12	3	4	2	1	0.03
13	4	1	4	2	0.01
14	4	2	3	1	0.01
15	4	3	2	4	0.08
16	4	4	1	3	0.01
I	0.10	0.08	0.08	0.06
II	0.11	0.08	0.09	0.07
III	0.05	0.08	0.07	0.07
IV	0.03	0.06	0.10	0.10
R	0.08	0.02	0.03	0.05

Table 5. Computational results of RPD and CPU time (seconds) for the MILP solver and QNSGA on small-scale instances (best bolded).

Instance		MILP Solver					QNSGA
		Cmax		Tmax		Time	Cmax		Tmax		Time
		Mean	STD	Mean	STD	Time	Mean	STD	Mean	STD	Time
t	3	0.000	0.000	0.000	0.000	345.21	0.084	0.013	0.013	0.014	27
	5	0.146	0.023	0.021	0.029	1404.29	0.150	0.022	0.045	0.022	45
	8	0.243	0.101	0.032	0.051	1887.67	0.153	0.025	0.079	0.041	72
m	2	0.015	0.017	0.016	0.022	1162.25	0.066	0.013	0.013	0.017	32
	3	0.127	0.025	0.022	0.032	1152.81	0.100	0.013	0.049	0.024	48
	4	0.300	0.043	0.030	0.055	1680.75	0.220	0.021	0.075	0.040	64
f	2	0.240	0.035	0.031	0.045	1679.10	0.174	0.022	0.078	0.035	32
	3	0.123	0.019	0.021	0.025	1248.82	0.119	0.017	0.043	0.018	48
	4	0.079	0.013	0.017	0.017	1067.89	0.093	0.013	0.016	0.013	64
\|P_h\|	2	0.096	0.016	0.013	0.021	1297.78	0.074	0.015	0.026	0.016	48
	4	0.199	0.027	0.032	0.044	1451.48	0.183	0.021	0.065	0.035	48
Average		0.130	0.029	0.021	0.031	1307.1	0.129	0.018	0.046	0.025	48

Table 6. Average RPD values on three metrics comparing variants and QNSGA in large-scale instances (best results bolded).

a. Based on the Δ metric
Instance		$V_{1}$	$V_{2}$	$V_{3}$	$V_{4}$	$V_{5}$	$V_{6}$	$V_{7}$	$V_{8}$	QNSGA
t	10	0.138	0.164	0.201	0.217	0.259	0.309	0.333	0.367	0.1110
	30	0.159	0.179	0.225	0.225	0.307	0.326	0.351	0.408	0.1245
	50	0.173	0.209	0.227	0.240	0.313	0.350	0.378	0.441	0.1464
m	6	0.140	0.164	0.192	0.209	0.267	0.313	0.311	0.369	0.1042
	8	0.158	0.188	0.222	0.231	0.285	0.332	0.348	0.391	0.1269
	10	0.172	0.198	0.239	0.242	0.328	0.340	0.403	0.456	0.1499
f	5	0.170	0.215	0.232	0.252	0.317	0.355	0.405	0.439	0.1442
	6	0.155	0.177	0.216	0.223	0.298	0.327	0.352	0.398	0.1253
	8	0.145	0.159	0.205	0.206	0.265	0.304	0.306	0.379	0.1124
\|P_h\|	10	0.148	0.170	0.212	0.213	0.298	0.317	0.332	0.389	0.1250
	15	0.166	0.197	0.223	0.241	0.289	0.340	0.376	0.421	0.1296
Average		0.157	0.184	0.218	0.227	0.293	0.328	0.354	0.405	0.1273
b. Based on the GD metric
Instance		$V_{1}$	$V_{2}$	$V_{3}$	$V_{4}$	$V_{5}$	$V_{6}$	$V_{7}$	$V_{8}$	QNSGA
t	10	0.059	0.071	0.085	0.086	0.115	0.133	0.141	0.162	0.042
	30	0.062	0.078	0.087	0.098	0.129	0.140	0.145	0.169	0.044
	50	0.067	0.083	0.096	0.102	0.134	0.156	0.157	0.189	0.049
m	6	0.059	0.070	0.082	0.085	0.117	0.129	0.142	0.151	0.040
	8	0.060	0.073	0.087	0.094	0.125	0.131	0.148	0.165	0.044
	10	0.068	0.089	0.099	0.107	0.136	0.168	0.154	0.204	0.050
f	5	0.068	0.082	0.101	0.101	0.137	0.151	0.167	0.183	0.047
	6	0.064	0.076	0.087	0.095	0.124	0.144	0.149	0.172	0.045
	8	0.055	0.075	0.080	0.089	0.118	0.133	0.127	0.163	0.041
\|P_h\|	10	0.060	0.073	0.084	0.088	0.121	0.129	0.140	0.171	0.044
	15	0.065	0.082	0.095	0.102	0.131	0.157	0.155	0.175	0.046
Average		0.062	0.078	0.089	0.095	0.126	0.143	0.148	0.173	0.045
c. Based on the IGD metric
Instance		$V_{1}$	$V_{2}$	$V_{3}$	$V_{4}$	$V_{5}$	$V_{6}$	$V_{7}$	$V_{8}$	QNSGA
t	10	0.006	0.007	0.009	0.008	0.012	0.014	0.014	0.017	0.005
	30	0.007	0.008	0.009	0.011	0.014	0.015	0.016	0.018	0.005
	50	0.008	0.009	0.010	0.012	0.014	0.014	0.018	0.021	0.005
m	6	0.006	0.008	0.009	0.008	0.011	0.013	0.015	0.016	0.005
	8	0.007	0.008	0.009	0.009	0.013	0.015	0.016	0.019	0.005
	10	0.008	0.009	0.010	0.013	0.016	0.016	0.018	0.021	0.005
f	5	0.008	0.008	0.010	0.011	0.014	0.015	0.018	0.020	0.006
	6	0.007	0.008	0.010	0.010	0.014	0.015	0.016	0.020	0.005
	8	0.007	0.008	0.009	0.009	0.012	0.013	0.015	0.017	0.004
\|P_h\|	10	0.007	0.007	0.009	0.010	0.013	0.013	0.015	0.017	0.005
	15	0.007	0.009	0.010	0.010	0.014	0.016	0.018	0.021	0.005
Average		0.007	0.008	0.009	0.010	0.014	0.014	0.016	0.019	0.005

Table 7. Friedman test results of comparison between variants and QNSGA in large-scale instances.

Metric	$V_{1}$	$V_{2}$	$V_{3}$	$V_{4}$	$V_{5}$	$V_{6}$	$V_{7}$	$V_{8}$	QNSGA
Δ	3.421	3.768	4.308	4.207	5.607	6.445	7.105	8.339	2.902
p-value	0.000
GD	2.914	3.320	4.342	4.341	6.143	6.512	6.692	7.966	2.068
p-value	0.000
IGD	2.893	3.173	4.311	4.511	6.100	6.597	7.374	8.588	2.375
p-value	0.000

Table 8. Average RPD values on three metrics comparisons between state-of-the-art algorithms in large-scale instances (best results bolded).

a. Based on the Δ metric
Instance		MOPSO	QSFL	RLABC	CWWORL	CMAF	SNSGA	QNSGA
t	10	5.68 × 10⁻³	5.14 × 10⁻³	6.18 × 10⁻³	6.40 × 10⁻³	5.54 × 10⁻³	3.02 × 10⁻³	3.15 × 10⁻³
	30	8.14 × 10⁻³	7.06 × 10⁻³	7.87 × 10⁻³	7.80 × 10⁻³	7.87 × 10⁻³	6.31 × 10⁻³	5.22 × 10⁻³
	50	1.13 × 10⁻²	1.01 × 10⁻²	1.08 × 10⁻²	9.04 × 10⁻³	9.42 × 10⁻³	9.25 × 10⁻³	9.19 × 10⁻³
m	6	5.64 × 10⁻³	5.29 × 10⁻³	6.06 × 10⁻³	6.38 × 10⁻³	5.67 × 10⁻³	3.12 × 10⁻³	3.17 × 10⁻³
	8	7.85 × 10⁻³	7.15 × 10⁻³	7.90 × 10⁻³	8.09 × 10⁻³	8.10 × 10⁻³	6.95 × 10⁻³	6.22 × 10⁻³
	10	1.16 × 10⁻²	9.86 × 10⁻³	1.09 × 10⁻²	8.77 × 10⁻³	9.07 × 10⁻³	8.52 × 10⁻³	8.18 × 10⁻³
f	5	1.15 × 10⁻²	1.01 × 10⁻²	1.04 × 10⁻²	9.34 × 10⁻³	9.52 × 10⁻³	9.06 × 10⁻³	8.67 × 10⁻³
	6	7.86 × 10⁻³	6.99 × 10⁻³	8.16 × 10⁻³	7.51 × 10⁻³	7.65 × 10⁻³	6.55 × 10⁻³	5.95 × 10⁻³
	8	5.79 × 10⁻³	5.18 × 10⁻³	6.30 × 10⁻³	6.39 × 10⁻³	5.67 × 10⁻³	2.97 × 10⁻³	2.94 × 10⁻³
\|P_h\|	10	6.91 × 10⁻³	6.10 × 10⁻³	7.02 × 10⁻³	7.10 × 10⁻³	6.71 × 10⁻³	4.13 × 10⁻³	4.09 × 10⁻³
	15	9.85 × 10⁻³	8.77 × 10⁻³	9.55 × 10⁻³	8.40 × 10⁻³	8.52 × 10⁻³	8.25 × 10⁻³	7.62 × 10⁻³
Average		8.38 × 10⁻³	7.44 × 10⁻³	8.28 × 10⁻³	7.75 × 10⁻³	7.61 × 10⁻³	6.19 × 10⁻³	5.86 × 10⁻³
b. Based on the GD metric
Instance		MOPSO	QSFL	RLABC	CWWORL	CMAF	SNSGA	QNSGA
t	10	4.67 × 10⁻⁵	3.75 × 10⁻⁵	4.70 × 10⁻⁵	4.03 × 10⁻⁵	4.10 × 10⁻⁵	3.25 × 10⁻⁵	2.73 × 10⁻⁵
	30	6.12 × 10⁻⁵	5.70 × 10⁻⁵	6.37 × 10⁻⁵	5.58 × 10⁻⁵	5.94 × 10⁻⁵	5.94 × 10⁻⁵	4.79 × 10⁻⁵
	50	7.22 × 10⁻⁵	8.38 × 10⁻⁵	8.42 × 10⁻⁵	8.04 × 10⁻⁵	8.92 × 10⁻⁵	7.54 × 10⁻⁵	7.25 × 10⁻⁵
m	6	4.52 × 10⁻⁵	3.89 × 10⁻⁵	4.67 × 10⁻⁵	4.23 × 10⁻⁵	4.24 × 10⁻⁵	3.36 × 10⁻⁵	2.77 × 10⁻⁵
	8	6.24 × 10⁻⁵	5.90 × 10⁻⁵	6.45 × 10⁻⁵	5.63 × 10⁻⁵	5.71 × 10⁻⁵	6.02 × 10⁻⁵	4.89 × 10⁻⁵
	10	7.24 × 10⁻⁵	8.05 × 10⁻⁵	8.37 × 10⁻⁵	7.80 × 10⁻⁵	9.01 × 10⁻⁵	7.35 × 10⁻⁵	7.11 × 10⁻⁵
f	5	7.24 × 10⁻⁵	7.05 × 10⁻⁵	8.37 × 10⁻⁵	7.80 × 10⁻⁵	9.01 × 10⁻⁵	7.35 × 10⁻⁵	7.61 × 10⁻⁵
	6	6.03 × 10⁻⁵	5.99 × 10⁻⁵	6.35 × 10⁻⁵	5.76 × 10⁻⁵	5.85 × 10⁻⁵	6.29 × 10⁻⁵	5.02 × 10⁻⁵
	8	4.73 × 10⁻⁵	4.79 × 10⁻⁵	4.77 × 10⁻⁵	4.09 × 10⁻⁵	4.10 × 10⁻⁵	3.09 × 10⁻⁵	2.14 × 10⁻⁵
\|P_h\|	10	5.38 × 10⁻⁵	4.89 × 10⁻⁵	5.56 × 10⁻⁵	4.93 × 10⁻⁵	4.97 × 10⁻⁵	4.69 × 10⁻⁵	3.83 × 10⁻⁵
	15	6.62 × 10⁻⁵	7.00 × 10⁻⁵	7.43 × 10⁻⁵	6.84 × 10⁻⁵	7.67 × 10⁻⁵	6.47 × 10⁻⁵	6.02 × 10⁻⁵
Average		6.00 × 10⁻⁵	5.95 × 10⁻⁵	6.50 × 10⁻⁵	5.88 × 10⁻⁵	6.32 × 10⁻⁵	5.58 × 10⁻⁵	4.93 × 10⁻⁵
c. Based on the IGD metric
Instance		MOPSO	QSFL	RLABC	CWWORL	CMAF	SNSGA	QNSGA
t	10	1.24 × 10⁻³	1.13 × 10⁻³	1.16 × 10⁻³	1.85 × 10⁻³	1.87 × 10⁻³	1.02 × 10⁻³	1.11 × 10⁻³
	30	2.33 × 10⁻³	2.00 × 10⁻³	2.33 × 10⁻³	2.20 × 10⁻³	3.21 × 10⁻³	1.92 × 10⁻³	1.50 × 10⁻³
	50	4.07 × 10⁻³	3.53 × 10⁻³	3.76 × 10⁻³	3.37 × 10⁻³	4.27 × 10⁻³	2.95 × 10⁻³	2.42 × 10⁻³
m	6	1.45 × 10⁻³	1.33 × 10⁻³	1.32 × 10⁻³	1.92 × 10⁻³	1.85 × 10⁻³	1.06 × 10⁻³	1.25 × 10⁻³
	8	2.41 × 10⁻³	2.04 × 10⁻³	2.34 × 10⁻³	2.31 × 10⁻³	3.18 × 10⁻³	1.94 × 10⁻³	1.56 × 10⁻³
	10	3.78 × 10⁻³	3.29 × 10⁻³	3.59 × 10⁻³	3.19 × 10⁻³	4.33 × 10⁻³	2.89 × 10⁻³	2.22 × 10⁻³
f	5	3.18 × 10⁻³	3.12 × 10⁻³	3.14 × 10⁻³	3.26 × 10⁻³	4.33 × 10⁻³	2.98 × 10⁻³	2.63 × 10⁻³
	6	2.46 × 10⁻³	2.07 × 10⁻³	2.41 × 10⁻³	2.20 × 10⁻³	3.14 × 10⁻³	1.74 × 10⁻³	1.57 × 10⁻³
	8	2.00 × 10⁻³	1.47 × 10⁻³	1.70 × 10⁻³	1.96 × 10⁻³	1.89 × 10⁻³	1.17 × 10⁻³	8.34 × 10⁻⁴
\|P_h\|	10	1.83 × 10⁻³	1.58 × 10⁻³	1.73 × 10⁻³	2.11 × 10⁻³	2.51 × 10⁻³	1.50 × 10⁻³	1.40 × 10⁻³
	15	3.26 × 10⁻³	2.86 × 10⁻³	3.10 × 10⁻³	2.83 × 10⁻³	3.72 × 10⁻³	2.43 × 10⁻³	1.95 × 10⁻³
Average		2.55 × 10⁻³	2.22 × 10⁻³	2.42 × 10⁻³	2.47 × 10⁻³	3.12 × 10⁻³	1.96 × 10⁻³	1.68 × 10⁻³

Table 9. Friedman test results of comparison between state-of-the-art algorithms in large-scale instances.

Metric.	MOPSO	QSFL	RLABC	CWWORL	CMAF	SNSGA	QNSGA
Δ	6.555	3.043	7.504	4.212	4.139	2.567	2.184
p-value	0.000
GD	6.077	3.041	7.416	3.999	4.296	2.790	2.347
p-value	0.000
IGD	6.350	3.790	7.126	4.223	5.323	3.353	2.863
p-value	0.000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, Z.; Chen, H.; Yan, F.; Deng, X.; Hao, H.; Zhang, J.; Yin, Q. An Improved NSGA-II for Three-Stage Distributed Heterogeneous Hybrid Flowshop Scheduling with Flexible Assembly and Discrete Transportation. Symmetry 2025, 17, 1306. https://doi.org/10.3390/sym17081306

AMA Style

Shi Z, Chen H, Yan F, Deng X, Hao H, Zhang J, Yin Q. An Improved NSGA-II for Three-Stage Distributed Heterogeneous Hybrid Flowshop Scheduling with Flexible Assembly and Discrete Transportation. Symmetry. 2025; 17(8):1306. https://doi.org/10.3390/sym17081306

Chicago/Turabian Style

Shi, Zhiyuan, Haojie Chen, Fuqian Yan, Xutao Deng, Haiqiang Hao, Jialei Zhang, and Qingwen Yin. 2025. "An Improved NSGA-II for Three-Stage Distributed Heterogeneous Hybrid Flowshop Scheduling with Flexible Assembly and Discrete Transportation" Symmetry 17, no. 8: 1306. https://doi.org/10.3390/sym17081306

APA Style

Shi, Z., Chen, H., Yan, F., Deng, X., Hao, H., Zhang, J., & Yin, Q. (2025). An Improved NSGA-II for Three-Stage Distributed Heterogeneous Hybrid Flowshop Scheduling with Flexible Assembly and Discrete Transportation. Symmetry, 17(8), 1306. https://doi.org/10.3390/sym17081306

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Improved NSGA-II for Three-Stage Distributed Heterogeneous Hybrid Flowshop Scheduling with Flexible Assembly and Discrete Transportation

Abstract

1. Introduction

2. Literature Review

2.1. ASP and DAPFSP

2.2. Reinforcement Learning and NSGA-II for ASP

3. Problem Formulation

3.1. Notations

3.2. Mathematical Model

4. Symmetry Analysis of Q-Learning Reinforced NSGA-II

4.1. Symmetric Solution Representation

4.1.1. Subsequence Initialization for Jobs

4.1.2. Jobs Assignment to the Target Factory and Machines

4.1.3. Products Sequencing and Allocation

4.2. Mechanism and Four-Tuple in Reinforcement-Learning-Enhanced NSGA-II

4.3. States Construction in Q-Learning

4.4. Actions Symmetry Construction in QNSGA

4.5. Reward Function Determination

4.6. Procedure of QNSGA

5. Experimental Evaluation and Industrial Validation

5.1. Experimental Methodology

5.2. Calibration of Algorithmic Parameters

5.3. Assessment of the Mathematical Model

5.4. Effectiveness of Eight Actions

5.5. Comparison with State-of-the-Art Algorithms

5.6. Industrial Case Study: Wind Turbine Manufacturing Application

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI