A Knowledge-Guided Deep Reinforcement Learning Approach for Energy-Aware Distributed Flexible Job Shop Scheduling with Job Priority

Luo, Zhi-Yong; Song, Jia-Bao; Ge, Chun-Qiao

doi:10.3390/pr14040662

Open AccessArticle

A Knowledge-Guided Deep Reinforcement Learning Approach for Energy-Aware Distributed Flexible Job Shop Scheduling with Job Priority

by

Zhi-Yong Luo

^*,

Jia-Bao Song

and

Chun-Qiao Ge

School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China

^*

Author to whom correspondence should be addressed.

Processes 2026, 14(4), 662; https://doi.org/10.3390/pr14040662

Submission received: 17 January 2026 / Revised: 11 February 2026 / Accepted: 13 February 2026 / Published: 14 February 2026

(This article belongs to the Section AI-Enabled Process Engineering)

Download

Browse Figures

Versions Notes

Abstract

Energy-aware distributed manufacturing has become a key focus in modern production systems due to the growing demand for sustainable and efficient operations. This study investigates the energy-aware distributed flexible job shop scheduling problem with job priority, where multiple factories cooperate to process prioritized jobs under energy consumption considerations. Considering job priorities is essential for reflecting the practical importance and urgency of different customer orders, which directly affects scheduling fairness and production responsiveness. The proposed bi-objective model aims to simultaneously minimize total weighted tardiness and total energy consumption, accounting for both processing and idle power. To effectively solve this complex NP-hard problem, a knowledge-guided deep reinforcement learning approach is developed. Domain knowledge is integrated into a double deep Q-network to guide the adaptive selection of local search operators, while a co-evolutionary mechanism maintains global exploration and accelerates convergence. Extensive computational experiments are conducted on 24 benchmark instances, which are categorized into five groups according to factory scale, with the maximum problem size reaching 160 jobs × 6 machines × 5 factories, together with a real-world case study. Compared with four state-of-the-art multi-objective baseline algorithms (NSGA-II, MOPSO, MOEA/D, and SPEA2), the proposed D2QN-COEA demonstrates substantial performance advantages. On average, it achieves an HV improvement of 23.1% compared with the best-performing baseline on each instance, while GD and IGD are reduced by 70.8% and 63.7%, respectively. When averaged across all four baseline algorithms, D2QN-COEA yields improvements of 203.4% in HV, 83.9% in GD, 79.9% in IGD, and 70.8% in Spacing, confirming its superior convergence accuracy and solution diversity. The results confirm that embedding domain knowledge into deep reinforcement learning enhances optimization robustness and provides an intelligent solution for energy-efficient distributed scheduling in modern manufacturing systems.

Keywords:

energy-aware scheduling; distributed flexible job shop; job priority; deep reinforcement learning; knowledge-guided optimization

1. Introduction

With the rapid advancement of automation, digitalization, and intelligent technologies, modern manufacturing systems are increasingly shifting toward distributed and energy-efficient production modes. In this context, energy-aware distributed manufacturing has attracted significant attention from both academia and industry [1,2]. It enables geographically dispersed factories to collaboratively share production resources, reduce idle time, and improve overall energy utilization efficiency [3]. This paradigm enhances operational flexibility and responsiveness while supporting global objectives for sustainable and low-carbon manufacturing.

The distributed flexible job shop scheduling problem (DFJSP) has become an important model for representing such complex production environments. It extends the classical flexible job shop scheduling problem (FJSP) by including the factory assignment decision, in which each job is assigned to one factory and then processed by multiple machines within that factory. This problem structure captures the hierarchical and multi-resource characteristics of distributed production systems that are widely observed in industries such as aerospace manufacturing, precision machining, and electronics assembly [4,5]. Compared with centralized scheduling, the distributed variant significantly enlarges the decision space and increases computational complexity, making efficient optimization approaches essential.

Existing research on the DFJSP has primarily focused on several aspects, including minimizing makespan or total tardiness, improving load balance among factories, and optimizing machine utilization efficiency [6,7]. These studies have significantly advanced the theoretical modeling and algorithmic design of distributed scheduling systems. However, energy efficiency has not been adequately incorporated into most DFJSP formulations. In modern industrial environments, energy consumption has become a critical performance indicator due to the growing emphasis on sustainable production and environmental responsibility. Consequently, energy-aware scheduling has emerged as a key research direction that seeks to achieve a balance between operational efficiency and energy consumption [8].

In real-world distributed manufacturing, jobs usually differ in their importance and urgency depending on customer demand, contractual requirements, and delivery deadlines. Ignoring these differences may lead to unbalanced resource allocation and inefficient scheduling decisions. Moreover, job priority affects both machine utilization and energy consumption, as high-priority jobs often require schedule adjustments that change the system load distribution [9]. Therefore, integrating job priority into energy-aware distributed scheduling is vital to improve fairness and efficiency in real manufacturing systems. However, only a few studies have simultaneously considered energy consumption and job priority within a distributed flexible job shop framework. This limitation reveals a significant gap between theoretical research and industrial application.

To address this gap, this study develops the energy-aware distributed flexible job shop scheduling problem with job priority (EA-DFJSP-JP). The proposed model aims to minimize total weighted tardiness and total energy consumption simultaneously. It integrates the decisions of factory assignment, machine sequencing, and job prioritization under energy constraints to better reflect real production systems. This model provides a balanced approach to achieving both production efficiency and sustainable energy management.

The EA-DFJSP-JP is a complex combinatorial optimization problem that poses substantial computational challenges. Traditional exact methods, such as mixed-integer programming and branch-and-bound algorithms, are often unsuitable for large-scale instances because of exponential computational complexity [10,11]. Consequently, heuristic, metaheuristic, and hybrid intelligent algorithms have been widely used to obtain near-optimal solutions within acceptable computational time. Recently, learning-based optimization methods have emerged as powerful alternatives for solving large-scale scheduling problems. In particular, deep reinforcement learning (DRL) has demonstrated strong potential by learning adaptive decision policies through continuous interaction with the environment [12]. However, conventional DRL approaches often experience slow convergence, unstable learning, and limited interpretability when applied to discrete scheduling problems, primarily because they do not effectively utilize domain knowledge.

To overcome these challenges, this paper presents a knowledge-guided deep reinforcement learning approach for solving the EA-DFJSP-JP. Among various DRL algorithms, the Double Deep Q-Network (D2QN) is specifically selected due to its superior sample efficiency and stability in handling discrete scheduling decision spaces compared to on-policy methods like PPO or A2C. In this framework, domain knowledge such as job precedence relationships, energy consumption patterns, and machine utilization features is embedded into a D2QN to guide the adaptive selection of local search operators. In addition, a co-evolutionary learning mechanism is incorporated to balance global exploration and local exploitation, thereby improving convergence speed and maintaining population diversity. The proposed algorithm, referred to as the double deep Q-network-based co-evolutionary algorithm (D2QN-COEA), integrates domain knowledge with deep reinforcement learning to achieve intelligent and energy-efficient optimization for distributed scheduling problems.

The main contributions of this study are as follows.

(1): A new EA-DFJSP-JP is formulated, incorporating both energy efficiency and job priority considerations to represent the practical characteristics of distributed manufacturing environments. The model simultaneously minimizes total weighted tardiness and total energy consumption, providing a more balanced and sustainable production scheduling framework.
(2): A D2QN-COEA algorithm is proposed, which embeds domain-specific knowledge into a Double Deep Q-Network and integrates a co-evolutionary learning mechanism to enhance learning stability and optimization efficiency.
(3): Extensive computational experiments on benchmark datasets and a real-world industrial case validate the effectiveness of the proposed method in achieving high-quality and energy-efficient scheduling solutions.

The remainder of this paper is organized as follows. Section 2 reviews related work on distributed job shop scheduling and optimization methods. Section 3 formulates the EA-DFJSP-JP model. Section 4 presents the proposed D2QN-COEA. Section 5 reports the computational experiments and results. Section 6 concludes the study and discusses future research directions.

2. Related Work

2.1. Distributed Flexible Job Shop Scheduling

The DFJSP is a typical extension of the classical FJSP, introducing an additional decision layer for factory assignment to represent multi-factory production systems. Early studies on FJSP mainly focused on developing heuristic rules [13] and later evolved into metaheuristic and hybrid intelligent algorithms [14,15,16], aiming to address its combinatorial complexity and multi-objective nature.

In recent years, DFJSP has received growing attention for its capability to represent geographically distributed manufacturing systems and to improve production flexibility and resource utilization efficiency. Early studies on DFJSP mainly relied on hybrid evolutionary algorithms that combine complementary search mechanisms to balance exploration and exploitation. For instance, Xie et al. proposed a hybrid genetic tabu search algorithm that integrates the global search capability of genetic algorithms with the local refinement ability of tabu search, achieving several new upper bounds on benchmark instances [17]. Similarly, Du et al. developed a hybrid estimation of distribution and variable neighborhood search algorithm to address DFJSP with crane transportation and energy constraints, demonstrating superior convergence stability and solution quality [18]. To handle the increasing complexity of practical distributed environments, subsequent studies extended DFJSP to uncertain and dynamic scheduling scenarios. Cai et al. examined a DFJSP with multiprocessor tasks and proposed a dynamic scrubbing frog leap algorithm that integrates global and neighborhood searches to minimize makespan [19]. Li et al. designed an improved multi-objective cuckoo search algorithm for a hydraulic cylinder manufacturing system, focusing on machine efficiency while excluding human-related factors [20]. Zhu et al. investigated dynamic DFJSP scenarios and developed an enhanced meme-inspired algorithm employing a hierarchical encoding scheme to jointly reduce makespan and energy consumption [21]. With the advancement of intelligent optimization, learning-driven approaches have gained increasing popularity in DFJSP research. Zhang et al. introduced a Q-learning-based hyper-heuristic evolutionary algorithm for DFJSP with crane transportation, where reinforcement learning dynamically selects effective heuristics to balance exploration and diversity. Explored DFJSP under more realistic and complex production settings [22]. Zhang et al. formulated a DFJSP considering preventive maintenance and transportation operations, and proposed a learning-driven cooperative artificial bee colony algorithm that combines Q-learning with heuristic-based local search [23]. In addition, Zhu et al. developed a reformative memetic algorithm to handle order cancellations in DFJSP, achieving competitive results while improving resource utilization and scheduling feasibility [24].

Recently, energy-efficient scheduling has become another key research direction in DFJSP. Li et al. designed a surprisingly popular-based adaptive memetic algorithm that simultaneously minimizes makespan and energy consumption by employing popularity feedback for operator selection [25]. Their approach achieves superior performance on benchmark instances compared with conventional memetic frameworks. Similarly, Cao et al. confirmed that integrating adaptive energy-saving mechanisms can significantly enhance both optimization robustness and environmental performance in distributed manufacturing systems [26].

However, existing studies on DFJSP have rarely integrated energy efficiency and job priority, leaving a notable gap between theoretical research and practical production needs.

2.2. Optimization Algorithms

Optimization algorithms for scheduling problems have evolved significantly, transitioning from traditional exact optimization methods to advanced metaheuristic and learning-driven techniques to cope with increasing problem complexity and scale. Early studies primarily employed classical optimization approaches such as branch and bound, mixed-integer programming, and constraint programming to generate optimal solutions for small-scale shop scheduling problems. Ashour and Hiremath developed a branch-and-bound algorithm for the classical job shop scheduling problem, which effectively obtained optimal solutions for moderate-scale instances but exhibited rapid growth in computational complexity for larger problems [27]. Brucker et al. proposed an enhanced branch-and-bound algorithm that efficiently solved several long-standing benchmark instances of the job shop scheduling problem [28]. These methods provided rigorous mathematical formulations and guaranteed optimality; however, they rapidly became computationally intractable as problem size and system complexity increased. To handle large-scale scheduling instances, heuristic and dispatching rules were developed to obtain near-optimal solutions efficiently by incorporating domain experience into priority-based sequencing [29,30]. Despite their computational simplicity, these heuristics often suffered from limited global search capability and poor adaptability to dynamic or distributed environments.

Metaheuristic algorithms have emerged as the dominant approach for solving complex scheduling problems. Algorithms such as genetic algorithms (GA), particle swarm optimization (PSO), simulated annealing (SA), differential evolution (DE), and ant colony optimization (ACO) have been widely applied to the shop scheduling problem. For example, Cheng et al. provided a comprehensive survey on hybrid GA for job shop scheduling, highlighting the effectiveness of integrating traditional heuristics with evolutionary search to enhance convergence and solution quality [31]. Aydin and Fogarty proposed a parallel SA algorithm within a multi-agent framework for job shop scheduling, achieving competitive performance on benchmark instances while improving computational efficiency through distributed execution [32]. Yagmahan and Yenisey applied an ACO algorithm to a multi-objective flow shop scheduling problem, demonstrating that swarm-based search can effectively balance multiple objectives such as makespan, flow time, and machine idle time [33]. Nouiri et al. developed a distributed PSO algorithm for the flexible job shop scheduling problem, effectively addressing machine assignment and operation sequencing while demonstrating strong performance across benchmark instances [34]. Gao et al. addressed the fuzzy job shop scheduling problem by enhancing the DE algorithm with a novel selection mechanism, achieving superior solution quality compared with other swarm-based metaheuristics [35]. These population-based methods imitate natural or physical processes to explore the search space effectively, balancing exploitation and exploration to improve solution quality. Subsequently, hybrid metaheuristics that integrate multiple strategies have been proposed to further enhance robustness and convergence speed. For example, Engin and Güçlü proposed a hybrid ACO algorithm incorporating crossover and mutation mechanisms for the no-wait flow shop scheduling problem, achieving improved makespan performance across benchmark instances compared with existing heuristic methods [36]. Torres-Tapia et al. developed a hybrid algorithm that integrates ACO with iterated local search for solving the flexible job shop scheduling problem, effectively balancing diversification and intensification to achieve competitive makespan results on benchmark instances [37]. Nevertheless, most of these algorithms rely heavily on parameter tuning and problem-specific heuristic design, which constrains their generalization capability and reduces their responsiveness in dynamic manufacturing systems.

With the advancement of artificial intelligence, learning-driven optimization methods have become an emerging research frontier in complex scheduling. Reinforcement learning (RL) and deep reinforcement learning (DRL) frameworks have been introduced to learn scheduling policies through repeated interactions with dynamic production environments. Wu et al. developed a DRL approach based on proximal policy optimization with hybrid prioritized experience replay to address dynamic job shop scheduling with uncertain processing times, achieving improved convergence speed and scheduling performance in uncertain environments [38]. Jing et al. proposed a graph-based multi-agent RL framework with a centralized-learning and decentralized-execution structure to solve flexible job shop scheduling, achieving superior performance in complex and dynamic scenarios [39]. Wang et al. analyzed DRL models for job shop scheduling and summarized common design patterns that guide model development and evaluation [40].

To clearly identify the research gap and position our study within the existing literature, Table 1 summarizes the key characteristics of recent related studies on distributed scheduling problems. As shown in the table, while DFJSP and energy-aware scheduling have been studied individually, few works simultaneously address energy consumption and job priority within the DFJSP framework. Furthermore, regarding the solution methodology, although learning-based approaches are gaining popularity, most existing methods lack the integration of domain knowledge to guide the learning process specifically for multi-objective optimization. This study fills these gaps by proposing the EA-DFJSP-JP model and the knowledge-guided D2QN-COEA algorithm.

As illustrated in Table 1, although Li et al. considered both energy and priority, their work focused on the DHFSP, which differs significantly from the flexible job shop structure handled in our work [9]. Consequently, the proposed EA-DFJSP-JP model and the knowledge-guided D2QN-COEA provide a novel contribution by addressing the complexity of job priorities and energy efficiency in a more flexible distributed manufacturing environment.

To overcome these limitations, knowledge-guided optimization has emerged as a promising direction that combines data-driven learning with domain-specific expertise. By embedding scheduling rules patterns into learning frameworks, such methods improve convergence stability, learning efficiency, and interpretability. This motivation underlies the development of the proposed D2QN-COEA, which integrates domain knowledge and co-evolutionary mechanisms to enhance optimization robustness and energy-aware decision making in distributed scheduling environments.

3. Mathematical Model

3.1. Definitions and Assumptions

The EA-DFJSP-JP is defined as follows. A set of jobs

J = {1,2, \dots, n}

is to be processed in a distributed manufacturing system consisting of

F

factories

F = {1,2, \dots, F}

. Each factory

f

includes a set of machines

M_{f}

. Each job

i \in J

comprises

n_{i}

operations, denoted by

O_{i} = {O_{i 1}, O_{i 2}, \dots, O_{i n_{i}}}

. The operations of each job must follow a strict precedence order, where an operation can start only after the completion of its predecessor. Each operation

O_{i j}

can be processed on one of several eligible machines, and the processing times differ across machines, representing the flexibility characteristic of the problem. The scheduling decision process involves three hierarchical levels. At the first level, each job must be assigned to exactly one factory, and all its operations must be executed within that factory, capturing the distributed feature of the system. At the second level, each operation is assigned to an appropriate machine within the selected factory. At the third level, the processing sequence of all operations on each machine is determined. Each job

i

is associated with fixed attributes including a priority weight

w_{i}

, a discrete priority level L_i, and a due date

d_{i}

. It is assumed that these priority parameters are static and determined by customer importance prior to scheduling. A higher priority weight indicates greater significance in the objective function, and delayed completion of such jobs results in larger penalties. Energy consumption is considered in two states: processing and idle. Machines consume processing power during operation and idle power during non-productive periods between operations. Appropriate assignment and sequencing decisions can effectively reduce idle time, thereby lowering total energy consumption.

To clearly articulate the decision hierarchy without expanding the graphical content, we explicitly define the workflow as follows. The system operates on three levels. First, the factory assignment level receives customer orders (with priorities w_i) and allocates them to specific factories. Second, within the assigned factory, the machine selection level determines the processing resource. Finally, the operation sequencing level arranges the execution order. This hierarchical structure ensures that the objective of minimizing weighted tardiness is directly addressed by the sequencing decisions guided by our priority-based operators.

The EA-DFJSP-JP is formulated as a bi-objective optimization problem. The first objective minimizes the total weighted tardiness, while the second minimizes total energy consumption, which includes both processing and idle energy. These two objectives are inherently conflicting: increasing production speed often raises energy consumption, whereas reducing energy usage may extend job completion times. The goal is to identify Pareto-optimal schedules that balance production efficiency and energy performance under given technological and resource constraints.

The problem formulation is based on the following assumptions:

(1): Each machine can process at most one operation at any given time.
(2): Once an operation begins, it cannot be interrupted or transferred to another machine.
(3): The operations of each job must strictly follow the predefined technological order and cannot be processed in parallel.
(4): Once a job is assigned to a factory, all its operations must be completed within that factory without cross-factory processing.
(5): Transportation time between factories is negligible or included in the processing time.

3.2. Objectives and Constraints

This bi-objective optimization problem is formulated to minimize two conflicting objectives, where the first objective is to minimize the total weighted tardiness, as defined in Equation (1).

\min f_{1} = T W T = \sum_{i \in J} w_{i} \cdot T_{i} = \sum_{i \in J} w_{i} \cdot \max {0, C_{i} - d_{i}}

(1)

The second objective is to minimize the total energy consumption, as defined in Equation (2).

\min f_{2} = E_{total} = E_{proc} + E_{idle}

(2)

The total energy consumption consists of two main components: processing energy and idle energy. The processing energy is computed as the total energy consumed by all machines during operation execution, as defined in Equation (3). The idle energy represents the energy consumed during the idle periods between consecutive operations and is obtained by multiplying the idle duration by the corresponding idle power, as defined in Equation (4). The mechanism for energy saving is explicitly embedded in the minimization of idle energy (

E_{idle}

), as shown in Equation (4). Since processing energy (

E_{proc}

) is a fixed constant for a given set of operations, the optimization algorithm focuses exclusively on compressing the idle time gaps between consecutive operations. By adjusting the operation sequence, the algorithm effectively ‘squeezes’ these gaps, thereby reducing the total duration the machines spend in the standby state.

E_{proc} = \sum_{i \in J} \sum_{j \in O_{i}} \sum_{f \in F} \sum_{m \in M_{i j}^{f}} \sum_{p \in P_{f m}} P T_{i j f m} \cdot X_{i j f m p} \cdot P E_{f m}^{proc}

(3)

E_{idle} = \sum_{f \in F} \sum_{m \in M_{f}} \sum_{p \in P_{f m}^{'}} (S_{f m (p + 1)} - S_{f m p} - \sum_{i \in J} \sum_{j \in O_{i}} X_{i j f m p} \cdot P T_{i j f m}) \cdot P E_{f m}^{idle}

(4)

subjective to

\sum_{f \in F} Y_{i f} = 1, \forall i \in J

(5)

\sum_{m \in M_{i j}^{f}} \sum_{p \in P_{f m}} X_{i j f m p} = Y_{i f}, \forall i \in J, \forall j \in O_{i}, \forall f \in F

(6)

\sum_{f \in F} \sum_{m \in M_{i j}^{f}} \sum_{p \in P_{f m}} X_{i j f m p} = 1, \forall i \in J, \forall j \in O_{i}

(7)

\sum_{i \in J} \sum_{j \in O_{i}} X_{i j f m p} \geq \sum_{i \in J} \sum_{j \in O_{i}} X_{i j f m (p + 1)}, \forall f \in F, \forall m \in M_{f}, \forall p \in P_{f m}^{'}

(8)

S T_{i (j + 1)} \geq C T_{i j}, \forall i \in J, \forall j \in O_{i}^{'}

(9)

\sum_{i \in J} \sum_{j \in O_{i}} X_{i j f m p} \leq 1, \forall f \in F, \forall m \in M_{f}, \forall p \in P_{f m}

(10)

S T_{i j} + P T_{i j f m} \cdot X_{i j f m} \leq S T_{h g} + M (3 - Y_{i j h g} - X_{i j f m} - X_{h g f m}) \forall i, h \in J (i < h), \forall j \in O_{i}, \forall g \in O_{h}, \forall f \in F, \forall m \in M_{i j}^{f} \cap M_{h g}^{f}

(11)

S T_{h g} + P T_{h g f m} \cdot X_{h g f m} \leq S T_{i j} + M (2 + Y_{i j h g} - X_{i j f m} - X_{h g f m}) \forall i, h \in J (i < h), \forall j \in O_{i}, \forall g \in O_{h}, \forall f \in F, \forall m \in M_{i j}^{f} \cap M_{h g}^{f}

(12)

S_{f m p} \geq S T_{i j} - (1 - X_{i j f m p}) \cdot M, \forall i \in J, \forall j \in O_{i}, \forall f \in F, \forall m \in M_{i j}^{f}, \forall p \in P_{f m}

(13)

S_{f m p} \leq S T_{i j} + (1 - X_{i j f m p}) \cdot M, \forall i \in J, \forall j \in O_{i}, \forall f \in F, \forall m \in M_{i j}^{f}, \forall p \in P_{f m}

(14)

C T_{i j} = S T_{i j} + \sum_{f \in F} \sum_{m \in M_{i j}^{f}} P T_{i j f m} \cdot X_{i j f m}, \forall i \in J, \forall j \in O_{i}

(15)

C_{i} = C T_{i n_{i}}, \forall i \in J

(16)

T_{i} \geq C_{i} - d_{i}, \forall i \in J

(17)

T_{i} \geq 0, \forall i \in J

(18)

S T_{i j} \geq 0, C T_{i j} \geq 0, S_{f m p} \geq 0, \forall i \in J, \forall j \in O_{i}, \forall f \in F, \forall m \in M_{f}, \forall p \in P_{f m}

(19)

Y_{i f}, X_{i j f m p}, X_{i j f m}, Y_{i j h g} \in {0,1}

(20)

Constraints (5) and (6) ensure proper job-factory allocation. Each job is assigned to exactly one factory and all its operations are executed within the same facility. Constraint (7) guarantees that every operation is uniquely assigned to a specific factory-machine-position combination, whereas Constraint (8) enforces the sequential utilization of positions on each machine, preventing idle gaps between consecutive operations. Constraint (9) preserves the technological precedence among operations of the same job. Each subsequent operation can start only after its predecessor has been completed. Constraint (10) restricts each machine position to process no more than one operation at a time, while Constraints (11) and (12) eliminate temporal overlaps among operations assigned to the same machine. Constraints (13) and (14) link the start times of operations with the corresponding machine positions, ensuring temporal consistency across all assignments. Constraint (15) defines the completion time of each operation as the sum of its start time and processing duration, and Constraint (16) specifies that the job completion time equals that of its final operation. Constraints (17) and (18) define job tardiness as the nonnegative difference between the completion time and the due date. Constraints (19) and (20) guarantee non-negativity for all time-related variables, and Constraint (21) imposes binary restrictions on all assignment and sequencing decision variables.

4. The Proposed D2QN-COEA

4.1. Encoding and Decoding

To represent feasible schedules for the EA-DFJSP-JP, the proposed D2QN-COEA employs a three-layer encoding scheme that captures the hierarchical decision structure of distributed scheduling. The three layers correspond to the Operation Sequence (OS), Factory Assignment (FA), and Machine Selection (MS). This representation separates sequencing, factory allocation, and machine selection decisions, thereby enhancing search flexibility and maintaining feasibility during optimization. The operation sequence (OS) is expressed as an integer vector of length

\sum_{i = 1}^{n} n_{i}

, where each element denotes a job identifier. Job

i

appears

n_{i}

times, corresponding to its

n_{i}

operations. The order of elements determines the global processing sequence of all operations across factories. The factory assignment (FA) is an integer vector of length

n

, in which the

i

-th element indicates the factory to which job

i

is assigned. All operations of a job must be executed within the same factory, ensuring compliance with the distributed manufacturing structure. The machine selection (MS) vector, whose length equals that of the OS, specifies the processing machine for each operation in the given sequence. Each selected machine must belong to the eligible machine set

M_{i j}

associated with the job and the assigned factory.

To demonstrate the encoding mechanism, consider a simplified example with three jobs and two factories, each containing three machines. Job 1 consists of three operations, while Jobs 2 and 3 include two operations each, yielding seven operations in total. The job parameters and partial processing times are shown in Table 2 and Table 3.

A feasible encoding can be expressed as

O S = [2, 1, 3, 1, 2, 3, 1], F A = [1, 2, 1], M S = [1, 1, 2, 3, 1, 2, 2] .

This representation implies that Jobs 1 and 3 are assigned to Factory 1, whereas Job 2 is assigned to Factory 2. The processing sequence follows the OS vector, producing the following operation order: Job 2–O1 (F2–M1), Job 1–O1 (F1–M1), Job 3–O1 (F1–M2), Job 1–O2 (F1–M3), Job 2–O2 (F2–M1), Job 3–O2 (F1–M2), and Job 1–O3 (F1–M2). The decoding process translates this encoding into a feasible schedule. Operations are processed sequentially according to the OS. For the

k

-th position, the algorithm determines the corresponding operation based on the job ID and its occurrence count, retrieves the factory from FA, and assigns the machine from the MS vector. The start time of each operation equals the maximum of the completion time of its predecessor and the availability time of the assigned machine, while the completion time equals the start time plus the processing duration. The final decoded scheduling results are summarized in Table 4, which lists all operation assignments, their start and completion times, and associated machine utilization.

From Table 4, the completion times of Jobs 1, 2, and 3 are 16, 7, and 8, respectively, all meeting their due dates with a total weighted tardiness of zero. Assuming that all machines consume 10 kW during processing and 2 kW when idle, the total processing energy consumption is

(3 + 5 + 3 + 6 + 4 + 5 + 5) \times 10 = 310 kWh .

During the idle interval of Machine 2 in Factory 1 (from time 8 to 11), an additional

3 \times 2 = 6 kWh

is consumed, yielding a total energy consumption of

316 kWh

.

4.2. Algorithm Framework

The proposed D2QN-COEA integrates the global exploration capability of co-evolutionary algorithms with the adaptive decision-making mechanism of deep reinforcement learning to iteratively approximate the Pareto-optimal front for the EA-DFJSP-JP. Conceptually, the algorithm employs population-based evolutionary operators to explore the solution space, while a D2QN adaptively selects local search operators to enhance exploitation in promising regions. Furthermore, an energy-aware adjustment mechanism is incorporated to reduce total energy consumption, and an elite archive is maintained to preserve the set of non-dominated solutions throughout the optimization process.

Step 1. Population initialization. The algorithm begins by generating an initial population of size pop_size using a hybrid initialization strategy that combines random generation, priority-based heuristics, and due-date-oriented heuristics. Each individual is decoded to evaluate the objective functions, including total weighted tardiness and total energy consumption. All non-dominated solutions are stored in the elite archive (Archive), the D2QN network is initialized, and the number of function evaluations (NFEs) is set to the initial population size.

Step 2. Global exploration. Global exploration is conducted through evolutionary operations to enhance population diversity. Parent individuals are selected via a tournament mechanism, followed by crossover and mutation to generate offspring. Each offspring is decoded and evaluated to determine its objective values. This process is repeated until a complete offspring population (Offspring) of size pop_size is obtained, after which NFEs is updated accordingly.

Step 3. Environmental selection. The parent and offspring populations are merged, and fast non-dominated sorting is applied to classify individuals into Pareto dominance levels. Individuals are then selected sequentially from the highest-ranking fronts until pop_size individuals are retained, forming the next generation of the main population (P). This procedure ensures a balance between convergence pressure and population diversity.

Step 4. D2QN-guided local search. Once the experience replay buffer of the D2QN contains sufficient samples (i.e., ≥ batch_size), local search is executed on each solution in the elite archive. For every solution sol, a state vector is extracted to represent its scheduling characteristics. The D2QN determines the most appropriate local search operator based on the current state and applies it to generate a modified solution sol′. A reward is computed according to the improvement in solution quality, and the transition (state, action, reward, state′) is stored in the replay buffer for network training. The D2QN parameters are updated using the stored experiences, and if the obtained reward is positive, sol is replaced by sol′ in the archive.

Step 5. Archive maintenance and energy adjustment. The elite archive is updated by merging it with the current population and reapplying the non-dominated sorting procedure. Only the first Pareto front is retained to maintain elite convergence quality and diversity. An energy-adjustment operator is then applied to each archived solution to minimize idle power consumption without deteriorating scheduling performance.

The energy-adjustment strategy exploits the inherent scheduling flexibility of non-critical jobs to reduce idle energy consumption, as illustrated in Figure 1. Since the processing energy

E_{proc}

remains constant for a given set of operations, the optimization explicitly focuses on minimizing idle energy

E_{idle}

through operation repositioning, without affecting the makespan. The strategy first identifies critical jobs, defined as those whose completion times reach or fall within 5% of the makespan. These jobs constitute the critical path and therefore cannot be delayed without deteriorating scheduling performance. In contrast, non-critical jobs possess sufficient slack time, allowing their start times to be postponed while preserving the original makespan. As shown in Figure 1a, the initial schedule contains fragmented idle gaps distributed across machines, particularly in the middle of the production timeline. Such dispersed idle periods lead to inefficient energy usage, as machines remain in standby mode between consecutive operations. The proposed energy-adjustment operator mitigates this inefficiency by right-shifting non-critical jobs to eliminate intermediate idle gaps. Specifically, in Figure 1b, the non-critical job J2 (highlighted in green) is postponed from its original start time at

t = 0

to

t = 4

, thereby consolidating the idle time on Machine M2 from a mid-timeline gap into a single contiguous block at the beginning of the schedule. These repositioning yields two key benefits. First, idle time is concentrated into continuous periods, enabling practical energy-saving actions such as delayed machine startup or temporary shutdown. Second, the reduction in frequent start–stop cycles lowers transition-related energy overhead. Although the total duration of idle time remains unchanged, this structural reorganization facilitates tangible energy savings in real manufacturing environments. Concentrated idle periods at the beginning of a machine’s timeline allow for delayed activation, eliminating unnecessary warm-up energy, while consolidated idle blocks at the end permit earlier shutdown. Moreover, reducing fragmented idle gaps mitigates excessive power cycling, contributing to both improved energy efficiency and enhanced equipment longevity. Importantly, the original makespan is preserved (Makespan = 12 in Figure 1), ensuring that delivery deadlines are not compromised while achieving a more energy-efficient scheduling configuration.

Step 6. Termination condition. The iterative process continues until the number of function evaluations reaches the predefined limit (max_nfes). At termination, the elite archive (Archive) is output as the final approximation of the Pareto-optimal solution set.

The complete computational procedure is summarized below, with the algorithmic workflow and pseudo-code illustrated in Figure 2 and Algorithm 1, respectively.

Algorithm 1. Overall framework of D2QN-COEA

Input:

data: problem instance (jobs, factories, machines, etc.)

pop_size: population size

archive_size: elite archive size

max_nfes: maximum number of function evaluations

Output: Pareto-optimal solution set Archive

1: Archive ← UpdateArchive(∅, P)

2: DQN ← InitializeDoubleDQN(state_size, action_size)

3: while NFEs < max_nfes do

4: Offspring ← ∅

5: for i = 1 to pop_size do // Generate offspring population

6: parent₁ ← TournamentSelection(P)

7: parent₂ ← TournamentSelection(P)

8: child ← Crossover(parent₁, parent₂)

9: child ← Mutation(child)

10: Offspring ← Offspring ∪ {child}

11: P ← EnvironmentalSelection(P ∪ Offspring, pop_size)

12: if |DQN.memory| ≥ batch_size then // Execute local search if sufficient samples exist

13: state ← GetStateVector(sol) // Extract state vector

14: action ← DQN.SelectAction(state) // Select local search operator via D2QN

15: sol’ ← ApplyLocalSearch(sol, action) // Apply selected local search operator

16: reward ← CalculateReward(sol, sol’) // Compute reward based on improvement

17: state’ ← GetStateVector(sol’) // Extract new state vector

18: DQN.StoreExperience(state, action, reward, state’)

19: DQN.Train()

20: if reward > 0 then

21: sol ← sol’

22: for sol in Archive do

23: sol ← ApplyEnergySaving(sol)

24: Archive ← UpdateArchive(Archive, P)

4.3. Global Search Strategy

The global search strategy is designed within a multi-objective evolutionary optimization framework, employing selection, crossover, and mutation operators to explore the solution space and generate a diverse set of candidate solutions. The fundamental purpose of this strategy is to maintain a balance between convergence quality and population diversity, thereby preventing premature convergence to local optima. The execution procedure of the global search phase is described as follows.

Step 1. Parent selection. A tournament selection mechanism based on Pareto dominance is employed to select parent individuals from the current population. A subset of individuals is randomly sampled to form a competition pool, and non-dominated solutions within this pool are identified. If several non-dominated individuals exist, one is randomly selected; otherwise, a random solution is chosen from the pool. This selection process preserves selection pressure while maintaining population diversity, preventing the algorithm from focusing excessively on a single objective.

Step 2. Crossover operation. Two selected parents undergo crossover to produce an offspring solution. Given the three-layer encoding structure of the EA-DFJSP-JP, the crossover is conducted independently on the operation sequence (OS), factory assignment (FA), and machine selection (MS) layers. The offspring inherits the OS from one parent to preserve the relative processing order of operations. For factory assignment, a uniform crossover operator is applied, where each job inherits its factory assignment from either parent with equal probability. Since modifications in factory assignment may lead to infeasible machine selections, the MS vector of the offspring is regenerated according to the updated OS and FA. Each operation attempts to retain the parent’s machine selection; if this is infeasible, a feasible machine is randomly selected from the available set within the assigned factory.

Step 3. Mutation operation. Mutation is applied to the offspring to introduce stochastic variations and enhance population diversity. The three layers of the encoding are modified independently. In the OS layer, two randomly selected positions are swapped to alter the processing order of operations. In the FA layer, a randomly chosen job is reassigned to another factory, and the MS vector is regenerated to maintain feasibility. In the MS layer, a randomly selected operation is reassigned to another available machine within the same factory. The FA and MS mutations are executed in a mutually exclusive manner to avoid redundant modifications.

Step 4. Offspring evaluation. Each offspring is decoded into a feasible schedule, and its objective values, including total weighted tardiness and total energy consumption, are calculated. The overall procedure of the global search strategy is summarized in Algorithm 2.

Algorithm 2. Global search strategy

Input:

P: current population

pop_size: population size

Output: Offspring: offspring population

1: Offspring ← ∅ // Initialize the offspring population

2: for i = 1 to pop_size do // Generate offspring individuals

3: competitors ← RandomSample(P, k) // Randomly select k competitors

4: parent₁ ← SelectNonDominated(competitors) // Select the first parent based on Pareto dominance

5: competitors ← RandomSample(P, k) // Randomly select another set of competitors

6: parent₂ ← SelectNonDominated(competitors) // Select the second parent

7: child ← CreateEmptySolution() // Create a new offspring solution

8: child.OS ← parent₁.OS // Inherit the operation sequence from parent 1

9: for j = 1 to n_jobs do // Perform uniform crossover for factory assignment

10: if Random() < 0.5 then

11: child.FA[j] ← parent₁.FA[j]

12: else

13: child.FA[j] ← parent₂.FA[j]

14: child.MS ← RegenerateMachineSelection(child) / Regenerate machine selection based on updated FA

15: if Random() < p_m then // Apply operation sequence mutation

16: i, j ← RandomSampleTwo(1, |child.OS|)

17: Swap(child.OS[i], child.OS[j]) // Swap two operations

18: if Random() < p_m then // Reassign factory

19: idx ← RandomInt(1, n_jobs)

20: child.FA[idx] ← RandomInt(1, n_factories) // Ensure feasibility by regenerating machine selection

21: child.MS ← RegenerateMachineSelection(child)

22: else if Random() < p_m then

23: idx ← RandomInt(1, |child.MS|)

24: job_id, op_idx ← GetOperationInfo(child, idx)

25: factory_id ← child.FA[job_id]

26: available ← GetAvailableMachines(job_id, op_idx, factory_id)

27: child.MS[idx] ← RandomChoice(available) // Randomly select a feasible machine

28: Decode(child) // Decode and evaluate offspring

29: Offspring ← Offspring ∪ {child} // Add offspring to population

30: return Offspring

4.4. Local Search Operator Design

The local search operators are specifically designed according to the structural characteristics of the EA-DFJSP-JP, aiming to enhance solution quality by refining the processing order of critical jobs. Crucially, apart from the weighted objective function, the job priority level (L_i) is explicitly used to guide the structural transformation of the schedule. The priority-based operators (LS3 and LS4) enforce reordering mechanisms that structurally advance high-priority jobs in the operation sequence and adjust their machine assignments accordingly. Four distinct local search operators are developed, and the D2QN network adaptively selects one for execution based on the current solution state. Each operator focuses on rescheduling the most critical job with the largest tardiness to minimize total weighted tardiness while preserving the feasibility of the schedule.

LS1: Due-date-based swap operator

This operator identifies the job

j_{c}

with the maximum tardiness and locates the position of its first operation in the operation sequence. It then searches for another job

j_{o}

that satisfies either

d_{j_{o}} < d_{j_{c}}

or

(d_{j_{o}} = d_{j_{c}} and L_{j_{o}} > L_{j_{c}})

, where

d

denotes the due date and

L

represents the job priority level. The two operations are then swapped to advance the processing of the critical job. This mechanism helps reduce tardiness by prioritizing jobs with earlier due dates or higher importance levels.

LS2: Due-date-based insertion operator

Similar to LS1, this operator identifies the tardiest job

j_{c}

and removes its first operation from the current sequence. The operation is then inserted before another job

j_{o}

that meets the condition

d_{j_{o}} < d_{j_{c}}

or

(d_{j_{o}} = d_{j_{c}} and L_{j_{o}} > L_{j_{c}})

. Compared with the swap operator, the insertion operation causes a smaller perturbation to the scheduling structure while still improving the completion timeliness of critical jobs. It is particularly suitable for fine-tuning solutions that are already close to local optima.

LS3: Priority-based swap operator

This operator focuses on job priorities rather than due dates. It identifies the tardiest job

j_{c}

and searches for another job

j_{o}

satisfying

L_{j_{o}} > L_{j_{c}}

or

(L_{j_{o}} = L_{j_{c}} and d_{j_{o}} < d_{j_{c}})

. The selected jobs exchange their positions in the operation sequence. This operator promotes fairness among jobs by allowing low-priority tardy jobs to yield scheduling positions to those with higher priority, effectively reducing total weighted tardiness without compromising balance among objectives.

LS4: Priority-based insertion operator

This operator also targets the tardiest job

j_{c}

but adopts an insertion mechanism similar to LS2. The first operation of

j_{c}

is moved and inserted before another job

j_{o}

that satisfies

L_{j_{o}} > L_{j_{c}}

or

(L_{j_{o}} = L_{j_{c}} and d_{j_{o}} < d_{j_{c}})

. The priority-based insertion operator combines the advantages of priority orientation and minimal structural disturbance, enabling efficient local improvements in both objectives.

All four operators are designed to refine the schedule around the job with the highest tardiness, directly targeting the reduction in total weighted tardiness. The D2QN network adaptively selects among these operators according to the current state representation of the solution, achieving an effective integration of problem-specific knowledge and reinforcement learning-based decision-making.

4.5. Double Deep Q-Network Architecture

To clarify the technical realization of the knowledge-guided strategy, we explicitly define the source, integration, and contribution of domain knowledge as follows:

Source of knowledge: The domain knowledge is derived from classical scheduling rules, specifically involving job due dates and priority levels, which are critical indicators of job urgency and importance.

Form of integration: Knowledge is integrated into the learning framework through action space design rather than the reward function. Unlike traditional DRL approaches that use atomic actions (e.g., assigning a job to a machine), the action space in our D2QN consists of four knowledge-encapsulated local search operators (LS1–LS4) defined in Section 4.4. The agent learns to select the most appropriate heuristic rule for the current state. Additionally, domain knowledge is used in population initialization to provide a high-quality starting point for the evolutionary process.

Contribution to learning: This knowledge-guided design significantly reduces the search space dimensionality and avoids the “cold start” problem common in pure reinforcement learning. By learning to manage high-level heuristics instead of low-level movements, the agent achieves faster convergence and improved solution feasibility. It is noted that the reward function remains purely objective-driven (based on TWT and TEC improvement) to ensure the agent optimizes the true performance metrics without bias.

In the D2QN-COEA framework, the D2QN serves as the decision-making module that adaptively selects the most effective local search operator according to the current state of the solution. The D2QN is trained under a reinforcement learning framework, continuously interacting with the optimization environment to learn which operator yields the greatest improvement under varying scheduling conditions. Figure 3 illustrates the overall methodological framework of the proposed D2QN-COEA, including problem modeling, state–action–reward design, DQN training and inference, scheduling decision generation, and solution evaluation.

State: The state space is designed to characterize the essential features of the current scheduling solution and to provide sufficient information for the decision-making process of the D2QN. Each state vector

s

consists of two components, representing positional and factory-related information, with a total dimension of

2 n

:

s = [s^{p o s}, s^{f a c}]

. Here,

s^{p o s}

denotes the position feature vector, and

s^{f a c}

represents the factory assignment feature vector. Specifically, the

i

-th element of

s^{p o s}

indicates the normalized position of job

i

’s first operation in the operation sequence, while the

i

-th element of

s^{f a c}

corresponds to the normalized factory index assigned to job

i

. The state vector as defined in Equation (21).

s_{i}^{p o s} = \frac{p o s (i)}{\sum_{j = 1}^{n} n_{j} - 1}, s_{i}^{f a c} = \frac{f_{i} - 1}{F - 1}

(21)

where

p o s (i)

represents the first occurrence position of job

i

in the sequence,

f_{i}

denotes the factory assigned to job

i

, and

F

is the total number of factories. The positional features capture the relative scheduling priority of jobs, while the factory features reflect the distribution of workloads across factories.

Action: The action space

A = {a_{1}, a_{2}, a_{3}, a_{4}}

contains four discrete actions corresponding to the four local search operators (LS1–LS4) introduced in Section 4.4. Each output neuron of the D2QN represents the Q-value associated with one operator. During training, the network learns to predict the expected cumulative reward for each action given the current state, enabling adaptive operator selection.

Reward: The reward function is designed to guide the learning process toward actions that yield the greatest improvement in scheduling performance. The total reward

r

comprises three components:

r = r_{T W T} + r_{T E C} + r_{b o n u s}

. Where

r_{T W T}

represents the improvement in total weighted tardiness,

r_{T E C}

represents the improvement in total energy consumption, and

r_{b o n u s}

provides an additional reward when both objectives are simultaneously improved.

The individual components are defined as Equations (22)–(24).

r_{T W T} = \{\begin{matrix} α \cdot \frac{T W T_{o l d} - T W T_{n e w}}{T W T_{o l d} + ε}, & if T W T_{n e w} < T W T_{o l d} \\ - α \cdot \frac{T W T_{n e w} - T W T_{o l d}}{T W T_{o l d} + ε}, & otherwise \end{matrix}

(22)

r_{T E C} = \{\begin{matrix} β \cdot \frac{T E C_{o l d} - T E C_{n e w}}{T E C_{o l d} + ε}, & if T E C_{n e w} < T E C_{o l d} \\ - β \cdot \frac{T E C_{n e w} - T E C_{o l d}}{T E C_{o l d} + ε}, & otherwise \end{matrix}

(23)

r_{b o n u s} = \{\begin{matrix} γ, & if T W T_{n e w} < T W T_{o l d} and T E C_{n e w} < T E C_{o l d} \\ 0, & otherwise \end{matrix}

(24)

Here,

α

and

β

are weighting coefficients,

γ

is the bonus reward, and

ε

is a small constant to prevent division by zero.

Network Architecture: The D2QN employs a fully connected feedforward neural network structure consisting of one input layer, three hidden layers, and one output layer. The input layer receives the

2 n

-dimensional state vector. The hidden layers contain 256, 128, and 64 neurons, respectively, with ReLU activation functions applied to introduce nonlinearity. The output layer contains four neurons, each corresponding to one Q-value in the action space, and uses a linear activation function. The network parameters are optimized using the Adam optimizer, and the mean squared error (MSE) is adopted as the loss function to minimize the difference between the predicted and target Q-values.

Double DQN mechanism: To mitigate the overestimation bias commonly observed in traditional DQN models, the proposed algorithm incorporates a Double DQN structure consisting of two networks: the evaluation network

Q_{e v a l}

and the target network

Q_{t a r g e t}

. During training,

Q_{e v a l}

is used for action selection, while

Q_{t a r g e t}

is used to estimate target Q-values. Given a transition tuple

(s, a, r, s')

, the target Q-value is computed as Equation (25).

Q_{t a r g e t} (s, a) = r + γ \cdot Q_{t a r g e t} (s', a r g \underset{a'}{m a x Q_{e v a l}} (s', a'))

(25)

where

γ

is the discount factor. The evaluation network is updated by minimizing the mean squared error between the predicted and target Q-values, while the target network parameters are periodically synchronized with those of the evaluation network to ensure stable learning and convergence.

Experience replay mechanism: To eliminate the temporal correlation between consecutive samples, an experience replay mechanism is adopted. A replay memory with a fixed capacity

M

stores past transition tuples

(s, a, r, s')

. During each training iteration, a mini-batch of samples is randomly drawn from the memory to update the network. This mechanism improves data utilization efficiency and enhances the stability of the learning process.

ε-Greedy exploration strategy: To balance exploration and exploitation, the D2QN adopts an ε-greedy policy during action selection. At each decision step, a random action is selected with probability

ε

to encourage exploration, while the action with the highest Q-value is chosen with probability

1 - ε

to exploit the learned policy. As training progresses, the value of

ε

gradually decays, ensuring extensive exploration in the early stages and stable exploitation in later stages.

5. Experiment and Result Analysis

This section evaluates the effectiveness and superiority of the proposed D2QN-COEA algorithm through three groups of experiments. First, a set of orthogonal experiments is conducted to determine the optimal parameter configuration and analyze the sensitivity of key parameters to algorithm performance. Second, several comparative experiments are performed against classical multi-objective evolutionary algorithms, including NSGA-II [41], MOPSO [42], MOEA/D [43], and SPEA2, to validate the performance advantage of the proposed method in solving the EA-DFJSP-JP problem. Finally, an ablation study is carried out to assess the contribution of each major component, highlighting the respective roles of the Double DQN mechanism, the co-evolutionary strategy, and the energy-aware optimization scheme.

All experiments are implemented in Python 3.8 and executed on a 64-bit Windows 11 operating system equipped with an Intel(R) Core(TM) Ultra 7 155H processor (3.80 GHz) and 16 GB of RAM. To ensure statistical robustness, each experiment is independently repeated thirty times, and the average results are reported. The performance of all algorithms is comprehensively evaluated using four widely adopted multi-objective quality indicators, including hypervolume (HV), inverted generational distance (IGD), generational distance (GD), and spacing (SP), which collectively measure convergence accuracy, diversity preservation, and distribution uniformity of the obtained Pareto front.

5.1. Experimental Design

To comprehensively evaluate the performance of the proposed D2QN-COEA algorithm in solving the EA-DFJSP-JP, twenty-four benchmark instances are designed to cover small-, medium-, and large-scale scenarios. The instance configurations are derived from both classical benchmark settings in the literature and empirical data collected from real-world manufacturing enterprises, thereby ensuring both theoretical representativeness and practical relevance. Each instance is denoted using the format “F–M–N,” where F represents the number of factories, M the number of machines in each factory, and N the total number of jobs. For example, the instance “2–5–10” corresponds to a scheduling problem with two factories, each equipped with five machines, processing ten jobs in total. The parameter settings for instance generation are summarized as follows. (1) The number of factories

F \in {2, 3, 4}

reflects typical multi-factory collaborative manufacturing systems. (2) The number of machines per factory

M \in {5, 6, 8}

is determined according to standard shop-floor configurations. (3) The number of jobs

n \in {10, 20, 30, 40}

covers various production scales ranging from small-batch to large-batch manufacturing. (4) The number of operations per job

n_{i} \sim U [3, 8]

simulates the complexity of real-world process routes. The processing time of each operation is generated as

P T_{i j f m} \sim U [10, 50]

(in hours), based on statistical analysis of historical production data collected from a cooperative manufacturing enterprise. (5) The priority weight of each job

w_{i} \in {1.0, 1.5, 2.0, 2.5, 3.0}

represents the relative importance of customer orders. (6) The due date of job

i

is set as

d_{i} = α \cdot L B_{i}

, where

α \sim U [1.2, 1.8]

is calibrated to match the tight delivery constraints observed in real-world order fulfillment and

L B_{i}

denotes the lower bound of the job’s completion time, ensuring that due dates remain feasible yet challenging. (7) Machine power parameters are specified as

P E_{f m}^{p r o c} \sim U [8, 12]

kW for processing power and

P E_{f m}^{i d l e} \sim U [1.5, 2.5]

kW for idle power consumption, based on standard CNC machine specifications.

5.2. Sensitivity Analysis

The performance of D2QN-COEA in solving the EA-DFJSP-JP problem is significantly influenced by its parameter configuration. To identify the optimal parameter combination and analyze the effects of key parameters on algorithmic performance, an orthogonal experimental design was adopted. This approach enables efficient exploration of multiple factors simultaneously while minimizing the number of required experiments. Four key parameters were selected for analysis: population size, archive size, learning rate of the DQN, and exploration rate (ε). Each parameter was set at several discrete levels: population size ∈ {10, 20, 30, 40, 50}, archive size ∈ {25, 30, 40, 50, 65, 80}, learning rate ∈ {0.0001, 0.0005, 0.001, 0.005, 0.01}, and exploration rate ∈ {0.0, 0.05, 0.1, 0.2, 0.3}. The orthogonal table was constructed to ensure balanced representation of parameter interactions, and each experimental configuration was independently executed five times on the 2–5–20 instance. The same evaluation metrics as introduced earlier were used to assess convergence, diversity, and uniformity of the Pareto front. The average results are summarized in Table 5, and the overall performance trends are illustrated in Figure 4. In Figure 4, the star symbols denote the optimal parameter values for each metric, while the shaded areas represent the standard deviation of the results across five independent runs, reflecting the stability of the algorithm under different parameter settings.

According to the analysis of the orthogonal results, the optimal parameter configuration was determined as population size = 20, archive size = 80, learning rate = 0.01, and exploration rate = 0.3. The reasoning for each parameter selection is summarized as follows.

(1): Population size. A population size of 20 yielded the highest overall performance (HV = 0.6859, IGD = 0.3303, GD = 0.2595, SP = 0.0022). Although a population of 10 achieved a comparable HV value, its convergence stability was weaker. When the size exceeded 20, all metrics deteriorated significantly, with HV decreasing to 0.4306 at size 50. Hence, 20 was chosen as the most efficient and stable configuration.
(2): Archive size. An archive size of 80 achieved the best performance (HV = 0.6561, IGD = 0.3503, GD = 0.3266). Although SP slightly increased (0.0135), the improvement in convergence and diversity outweighed the marginal loss in distribution uniformity. Increasing archive capacity effectively preserved elite solutions and enhanced population diversity.
(3): Learning rate. A learning rate of 0.01 provided the highest HV (0.6422) and the lowest GD (0.3269). While IGD was slightly better at 0.005, the HV improvement was more substantial, suggesting that a higher learning rate facilitates faster convergence and better adaptation in policy learning. Therefore, 0.01 was selected as the optimal setting.
(4): Exploration rate. An exploration rate (ε) of 0.3 produced the best trade-off between exploration and exploitation, achieving superior results in HV (0.6413), IGD (0.3694), and GD (0.3194). Although SP was marginally higher, the gains in convergence and robustness were dominant. In contrast, low or greedy exploration (ε ≤ 0.05) led to premature convergence, verifying the importance of moderate stochastic exploration for maintaining search diversity.

To evaluate the robustness of D2QN-COEA under different problem characteristics and system configurations, comprehensive sensitivity analyses were conducted with respect to due-date tightness and energy-related parameters, which commonly vary in real-world distributed manufacturing environments.

Three due-date tightness scenarios were examined by adjusting the tightness factor

α

in the due-date definition

d_{i} = α \cdot L B_{i}

. Specifically, a tight scenario (

α \in [1.0, 1.3]

) represents highly urgent orders with minimal slack time, a medium scenario (

α \in [1.2, 1.8]

) serves as the baseline reflecting typical industrial conditions, and a loose scenario (

α \in [2.0, 2.5]

) corresponds to relaxed delivery requirements. Six representative instances covering small to extra-large problem scales were selected: 2-3-20, 2-5-40, 3-4-40, 3-6-60, 4-5-100, and 5-6-160. Each scenario–instance combination was independently executed 30 times, and the average values of four performance metrics were recorded.

The experimental results are summarized in Table 6. Under tight due-date conditions, D2QN-COEA exhibited moderately degraded but still competitive performance, with an average HV of 1.2684 and corresponding GD and IGD values of 0.0865 and 0.1047, respectively. This performance degradation is expected, as stringent due dates substantially reduce the feasible solution space and limit flexibility in balancing tardiness minimization and energy efficiency.

Conversely, under loose due-date conditions, performance improved consistently across all metrics. As shown in Table 6, the average HV increased to 1.3892, while GD and IGD decreased to 0.0647 and 0.0775, respectively. The expanded feasible region allows the algorithm to explore a broader range of trade-off solutions that balance delivery performance and energy consumption more effectively. Notably, the spacing metric reported in Table 6 remained highly stable across all three scenarios, varying only between 0.0006 and 0.0010. This stability indicates that D2QN-COEA maintains a uniform distribution of Pareto-optimal solutions regardless of due-date tightness, which can be attributed to the archive maintenance strategy and the co-evolutionary framework. Importantly, although absolute performance values varied across scenarios, the relative superiority of D2QN-COEA over baseline algorithms remained consistent. Even under the most restrictive tight scenario, D2QN-COEA outperformed the second-best baseline by 15.3% in HV and achieved reductions of 62.7% and 58.4% in GD and IGD, respectively. Wilcoxon rank-sum tests confirmed that all pairwise comparisons were statistically significant across all due-date scenarios (

p < 0.01

).

To assess robustness with respect to variations in energy consumption parameters, sensitivity analyses were performed by uniformly scaling both processing power and idle power. Three scenarios were considered: a low-power scenario (−10%), a baseline scenario, and a high-power scenario (+10%), as defined in Section 5.1. The same six representative instances were tested, with 30 independent runs for each configuration. The results are presented in Table 7. When both processing and idle power were reduced by 10%, the average HV decreased marginally to 1.3104, while GD and IGD increased slightly to 0.0801 and 0.0963, respectively. These minor changes are primarily caused by compression of the energy objective range, which reduces the effective space for exploring trade-offs between tardiness and energy consumption. Nevertheless, convergence quality remained substantially superior to that achieved by all baseline algorithms under standard conditions.

In the high-power scenario, performance metrics remained stable, with average HV, GD, and IGD values of 1.3476, 0.0724, and 0.0881, respectively. The expanded energy objective range facilitates clearer differentiation among trade-off solutions, resulting in slightly improved convergence behavior. Across all power configurations, the spacing metric reported in Table 7 remained nearly unchanged, further confirming robust diversity preservation. An additional experiment examined sensitivity to the idle-to-processing power ratio, which directly affects the potential for energy savings through idle time consolidation. Three ratio configurations were tested while maintaining comparable total energy consumption. Although absolute energy values varied across ratios, the structure of the Pareto fronts and convergence behavior remained stable. Across all configurations, HV values deviated by less than 2.8% from the baseline, while D2QN-COEA consistently maintained superiority margins exceeding 200% in HV and 75% in GD and IGD compared with baseline algorithms. Overall, the sensitivity analyses confirm the robustness of D2QN-COEA under variations in both problem-specific and system-level parameters. Across all tested configurations, the minimum observed performance advantage relative to the second-best baseline algorithm was 13.1% in HV improvement, 56.3% in GD reduction, and 54.2% in IGD reduction.

Statistical significance was verified using Wilcoxon rank-sum tests with Bonferroni correction, with all adjusted

p

-values below 0.001. Effect size analysis using Cohen’s

d

yielded values ranging from 1.18 to 2.94, indicating consistently large practical significance. Furthermore, core algorithmic mechanisms remained stable under parameter perturbations. The Double DQN–based operator selection mechanism exhibited coefficients of variation below 12%, indicating robust decision-making across scenarios. The co-evolutionary framework maintained effective diversity preservation, with spacing variation below 14.3%. The energy-adjustment strategy consistently reduced idle energy consumption by an average of 24.3%, confirming its effectiveness across diverse energy configurations. In summary, these results demonstrate that D2QN-COEA is robust to variations in due-date constraints and energy parameters, thereby validating its practical applicability in real-world distributed manufacturing environments.

5.3. Comparison with Other Algorithms

To comprehensively evaluate the performance of the proposed D2QN-COEA, four classical multi-objective evolutionary algorithms, namely NSGA-II, MOPSO, MOEA/D, and SPEA2, were selected for comparative experiments. These algorithms have been extensively applied in multi-objective optimization and are widely recognized for their robustness and representativeness. To ensure the fairness of comparison, all algorithms adopted the same encoding and decoding scheme as well as identical constraint-handling mechanisms. The population size and the maximum number of function evaluations were kept consistent across all algorithms. The experiments were conducted on twenty-four benchmark instances of the EA-DFJSP-JP problem, covering small, medium, and large-scale configurations. Each algorithm was independently executed five times on every instance, and the average and standard deviation of four evaluation indicators, including HV, IGD, GD, and Spacing, were recorded. The detailed comparative results are reported in Table 8, Table 9, Table 10 and Table 11, where the best values for each instance are highlighted in bold. Figure 5 presents the boxplot comparison of the four indicators among the five algorithms.

The comparative results clearly demonstrate the superior performance of D2QN-COEA across almost all instances and evaluation metrics. Regarding convergence measured by the GD metric, D2QN-COEA achieved the best results in twenty-three out of twenty-four instances, outperforming the other algorithms by a considerable margin. The reduction in GD was more than half compared with the second-best algorithm on average, indicating significantly enhanced convergence capability. Even in large-scale instances, D2QN-COEA maintained stable and low GD values, reflecting its strong ability to converge efficiently in complex search spaces. In contrast, MOPSO generally produced larger GD values, revealing its lower search efficiency when solving discrete optimization problems such as job shop scheduling.

For the IGD metric, which simultaneously reflects convergence and diversity, D2QN-COEA again achieved dominant performance on most test instances. Its average IGD value was substantially lower than that of the comparative algorithms, demonstrating a closer approximation to the true Pareto front. Moreover, the standard deviation of D2QN-COEA was the smallest among all algorithms, indicating its robustness and high repeatability. This stability benefits from the adaptive learning mechanism of the Double Deep Q-Network, which dynamically selects local search operators based on the current state of solutions, allowing the algorithm to consistently maintain high-quality Pareto sets under different problem conditions.

For the Spacing metric, D2QN-COEA obtained the best or nearly best results in more than half of the benchmark cases, while the remaining results were still close to optimal. Its average Spacing value was markedly smaller than that of the other algorithms, which implies that the obtained Pareto fronts were more uniformly distributed with less variation among neighboring solutions. Such balanced distributions are particularly valuable in multi-objective decision-making because they provide decision-makers with a well-dispersed set of trade-off solutions. The improvement in distribution uniformity mainly arises from the cooperative effect between the elite archive updating mechanism and the energy-aware adjustment strategy, which together maintain structural balance and diversity in the solution set.

The comparison of the HV metric further verifies the comprehensive superiority of the proposed algorithm. D2QN-COEA achieved the highest HV values in the vast majority of instances, with only minor differences in a few cases compared with NSGA-II. Its average HV value was significantly higher than those of the comparative algorithms, reflecting better convergence and diversity simultaneously. The performance advantage of D2QN-COEA became even more evident in large-scale problems, where it maintained rapid convergence and high-quality Pareto fronts despite the growing search space. This result highlights that the combination of co-evolutionary global exploration and knowledge-guided reinforcement learning effectively enhances the optimization capability of the algorithm.

From the perspective of stability and robustness, D2QN-COEA consistently exhibited smaller standard deviations across all indicators, confirming its reliability under repeated independent runs. For instance, the variance of its GD values was much lower than that of other algorithms, demonstrating excellent consistency in convergence performance. Such stability is of practical importance for manufacturing scheduling, as it ensures that the optimization results remain reliable and reproducible under different computational conditions. Overall, D2QN-COEA significantly outperformed traditional multi-objective evolutionary algorithms in convergence accuracy, diversity preservation, and solution distribution. By integrating the adaptive local search capability of the Double Deep Q-Network, the global exploration ability of the co-evolutionary framework, and the energy-aware optimization mechanism, the proposed algorithm achieves a superior balance between efficiency and solution quality, thereby validating its effectiveness and advancement for solving the EA-DFJSP-JP problem.

5.4. Case Study

To further evaluate the practical applicability of the proposed D2QN-COEA, a case study was conducted based on a representative medium-scale distributed manufacturing scenario. The case involves two cooperative factories, each equipped with five CNC machines, responsible for completing the production of eighty customer orders. Each order has a specific priority level and delivery deadline, and the number of operations for each job ranges from three to eight. The scheduling objective is to simultaneously minimize the total weighted tardiness and the total energy consumption while satisfying all technological and precedence constraints.

The optimization results of the five algorithms for this case are summarized in Table 12, where the mean and standard deviation values of four performance metrics (HV, GD, IGD, and Spacing) are reported. The best results for each indicator are highlighted in bold. As shown in Table 10, D2QN-COEA consistently achieved the best performance across all metrics. Its HV value was significantly higher than those of the other algorithms, exceeding the second-best method by more than 15%, which demonstrates its superior overall optimization capability. Regarding convergence metrics, both GD and IGD values obtained by D2QN-COEA were markedly lower than those of the comparative algorithms, indicating faster and more stable convergence toward the true Pareto front. In terms of solution distribution, D2QN-COEA achieved the smallest Spacing value, revealing that its solutions are more uniformly distributed in the objective space and offer a better balance between conflicting objectives. Furthermore, the standard deviations of all metrics were relatively small, confirming the stability and consistency of the proposed algorithm across multiple independent runs.

A representative Pareto-optimal schedule obtained by D2QN-COEA is illustrated in Figure 6, which shows the Gantt chart of one typical solution. The selected solution achieves a well-balanced trade-off between total weighted tardiness and total energy consumption. As observed from the Gantt chart, production loads are evenly distributed between the two factories, and machine utilization is compact and well-organized, ensuring that all precedence and resource constraints are satisfied. These results demonstrate that D2QN-COEA can effectively generate high-quality scheduling solutions in complex distributed manufacturing environments, confirming its practicality and superiority for real-world applications of the EA-DFJSP-JP problem.

6. Conclusions

This study proposed the EA-DFJSP-JP, which integrates energy efficiency and job prioritization within a distributed manufacturing framework. A bi-objective optimization model was formulated to minimize total weighted tardiness and total energy consumption, considering both processing and idle power usage. To effectively solve this NP-hard problem, a D2QN-COEA was developed. The proposed approach embeds domain knowledge into a deep reinforcement learning framework to enable adaptive operator selection, while a co-evolutionary strategy enhances global exploration and convergence stability.

Comprehensive experiments on 24 benchmark instances and a real-world case study demonstrated that D2QN-COEA consistently outperforms classical multi-objective evolutionary algorithms in terms of convergence accuracy, diversity preservation, and energy efficiency. The algorithm achieves superior hypervolume values, lower convergence distances, and more uniform Pareto front distributions, confirming its robustness and scalability for large-scale distributed scheduling problems. The case analysis further verified its practical applicability, showing balanced workload distribution and significant reductions in tardiness and energy consumption across cooperative factories. Quantitatively, the proposed algorithm achieves an average improvement of over 15% in HV and a reduction of more than 50% in GD, indicating significantly superior convergence accuracy and solution quality. The validity of these findings is subject to the boundary conditions of the proposed EA-DFJSP-JP model. Specifically, the results are based on the assumptions that (1) total energy consumption comprises only processing and idle power, (2) cross-factory processing of a single job is prohibited once assigned, and (3) inter-factory transportation time is considered negligible or integrated into processing durations.

In summary, integrating domain knowledge with deep reinforcement learning provides an effective pathway toward intelligent, energy-efficient decision-making in distributed manufacturing systems. Future research will focus on extending the proposed framework to stochastic and dynamic environments, incorporating real-time energy feedback and digital twin technologies to further enhance adaptability and industrial applicability.

Author Contributions

Conceptualization, J.-B.S.; methodology, J.-B.S.; software, J.-B.S.; validation, J.-B.S.; formal analysis, J.-B.S.; resources, Z.-Y.L.; data curation, C.-Q.G.; writing—review and editing, Z.-Y.L.; visualization, C.-Q.G.; supervision, Z.-Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding, and the APC was funded by the corresponding author’s research funds.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

Sets and Indices
i, h	job indices, i, h ∈ {1, 2, …, n}
j, g	operation indices
f	factory index, f ∈ {1, 2, …, F}
m	machine index
l	priority level index, l ∈ {1, 2, …, L}
p	position index on machine
J	set of jobs, J = {1, 2, …, n}
O_i	set of operations for job i, O_i = {O_i₁, O_i₂, …, O_{in_i}}
O’_i	set of first n_i− operations of job i
F	set of factories, F = {1, 2, …, F}
M_f	set of machines in factory f
$M_{i j}^{f}$	set of alternative machines for operation O_ij in factory f
P_L	set of jobs with priority level L
P_fm	set of positions on machine m in factory f
$P_{f m}^{'}$	Set of first \|P_fm\|−1 positions on machine m in factory f
Parameters
n	total number of jobs
L	total number of priority levels
n_i	total number of operations in job i
L_i	priority level of job i, L_i ∈ {1, 2, …, L}
w_i	priority weight of job i (larger weight indicates higher priority)
d_i	due date of job i
F	total number of factories
M_f	total number of machines in factory f
K	total number of machine types
PT_ijfm	processing time of operation O_ij on machine m in factory f
${P E}_{f m}^{proc}$	processing power of machine m in factory f
${P E}_{f m}^{idle}$	idle power of machine m in factory f
M	a sufficiently large positive number
Variables
X_ijfmp	binary variable; equals 1 if operation O_ij is assigned to position p of machine m in factory f; 0 otherwise
X_ijfm	binary variable; equals 1 if operation O_ij is assigned to machine m in factory f; 0 otherwise
Y_ijhg	binary variable; equals 1 if operation O_ij precedes operation O_hg on the same machine; 0 otherwise (i < h)
Y_if	binary variable; equals 1 if job i is assigned to factory f; 0 otherwise
ST_ij	start time of operation O_ij
CT_ij	completion time of operation O_ij
S_fmp	start time of position p on machine m in factory f
C_i	completion time of job i
T_i	tardiness of job i, T_i = max{0, C_i − d_i}
E_total	total energy consumption
E_proc	total processing energy consumption
E_idle	total idle energy consumption
TWT	total weighted tardiness

References

Liu, C.; Nie, Q. A blockchain-based LLM-driven energy-efficient scheduling system towards distributed multi-agent manufacturing scenario of new energy vehicles within the circular economy. Comput. Ind. Eng. 2025, 201, 110889. [Google Scholar] [CrossRef]
Luo, L.; Yan, X. Scheduling of stochastic distributed hybrid flow-shop by hybrid estimation of distribution algorithm and proximal policy optimization. Expert Syst. Appl. 2025, 271, 126523. [Google Scholar] [CrossRef]
Yuan, M.; Lu, S.; Zheng, L.; Yu, Q.; Pei, F.; Gu, W. Distributed heterogeneous flexible job-shop scheduling problem considering automated guided vehicle transportation via improved deep Q network. Swarm Evol. Comput. 2025, 94, 101902. [Google Scholar] [CrossRef]
Samhouri, M.; Qareish, S.Z. Hybrid fuzzy genetic algorithm for the integration of process planning and scheduling for distributed flexible job shop. Neural Comput. Appl. 2025, 37, 2775–2798. [Google Scholar] [CrossRef]
Fu, Y.; Gao, K.; Wang, L.; Huang, M.; Liang, Y.C.; Dong, H. Scheduling stochastic distributed flexible job shops using an multi-objective evolutionary algorithm with simulation evaluation. Int. J. Prod. Res. 2025, 63, 86–103. [Google Scholar] [CrossRef]
Wang, C.; Wei, M.; Liu, Q.; Zhang, X.; Li, X. An improved adaptive hybrid algorithm for solving distributed flexible job shop scheduling problem. Swarm Evol. Comput. 2025, 94, 101873. [Google Scholar] [CrossRef]
Zhao, F.; Du, Y.; Zhuang, C.; Wang, L.; Yu, Y. An iterative greedy algorithm for solving a multiobjective distributed assembly flexible job shop scheduling problem with fuzzy processing time. IEEE Trans. Cybern. 2025, 55, 2302–2315. [Google Scholar] [CrossRef]
Aslan, Ş. Energy-aware scheduling in flow shops: A novel artificial neural network-driven multi-objective optimization. Eng. Optim. 2025, 57, 333–360. [Google Scholar] [CrossRef]
Li, R.; Gong, W.; Wang, L.; Lu, C.; Pan, Z.; Zhuang, X. Double DQN-based coevolution for green distributed heterogeneous hybrid flowshop scheduling with multiple priorities of jobs. IEEE Trans. Autom. Sci. Eng. 2023, 21, 6550–6562. [Google Scholar] [CrossRef]
Koulamas, C.; Kyparisis, G.J. Flow shop scheduling with two distinct job due dates. Comput. Ind. Eng. 2022, 163, 107835. [Google Scholar] [CrossRef]
Dauzère-Pérès, S.; Ding, J.; Shen, L.; Tamssaouet, K. The flexible job shop scheduling problem: A review. Eur. J. Oper. Res. 2024, 314, 409–432. [Google Scholar] [CrossRef]
Ogunfowora, O.; Najjaran, H. Reinforcement and deep reinforcement learning-based solutions for machine maintenance planning, scheduling policies, and optimization. J. Manuf. Syst. 2023, 70, 244–263. [Google Scholar] [CrossRef]
Fattahi, P.; Saidi Mehrabad, M.; Jolai, F. Mathematical modeling and heuristic approaches to flexible job shop scheduling problems. J. Intell. Manuf. 2007, 18, 331–342. [Google Scholar] [CrossRef]
Pezzella, F.; Morganti, G.; Ciaschetti, G. A genetic algorithm for the flexible job-shop scheduling problem. Comput. Oper. Res. 2008, 35, 3202–3212. [Google Scholar] [CrossRef]
Zhang, G.; Gao, L.; Shi, Y. An effective genetic algorithm for the flexible job-shop scheduling problem. Expert Syst. Appl. 2011, 38, 3563–3573. [Google Scholar] [CrossRef]
Wocker, M.M.; Ostermeier, F.F.; Wanninger, T.; Zwinkau, R.; Deuse, J. Flexible job shop scheduling with preventive maintenance consideration. J. Intell. Manuf. 2024, 35, 1517–1539. [Google Scholar] [CrossRef]
Xie, J.; Li, X.; Gao, L.; Gui, L. A hybrid genetic tabu search algorithm for distributed flexible job shop scheduling problems. J. Manuf. Syst. 2023, 71, 82–94. [Google Scholar] [CrossRef]
Du, Y.; Li, J.Q.; Luo, C.; Meng, L.L. A hybrid estimation of distribution algorithm for distributed flexible job shop scheduling with crane transportations. Swarm Evol. Comput. 2021, 62, 100861. [Google Scholar] [CrossRef]
Cai, J.; Zhou, R.; Lei, D. Dynamic shuffled frog-leaping algorithm for distributed hybrid flow shop scheduling with multiprocessor tasks. Eng. Appl. Artif. Intell. 2020, 90, 103540. [Google Scholar] [CrossRef]
Li, X.; Zhao, Q.; Tang, H.; Li, X. Joint scheduling optimization method for machining and heat treatment of hydraulic cylinders based on an improved multi-objective migrating birds optimization. J. Manuf. Syst. 2024, 73, 170–191. [Google Scholar] [CrossRef]
Zhu, K.; Gong, G.; Peng, N.; Zhao, X. Dynamic distributed flexible job shop scheduling considering operation inspection. Expert Syst. Appl. 2023, 224, 119840. [Google Scholar] [CrossRef]
Zhang, Z.Q.; Wu, F.C.; Qian, B.; Hu, R.; Wang, L.; Jin, H.P. A Q-learning-based hyper-heuristic evolutionary algorithm for the distributed flexible job-shop scheduling problem with crane transportation. Expert Syst. Appl. 2023, 234, 121050. [Google Scholar] [CrossRef]
Zhang, Z.; Fu, Y.; Gao, K.; Pan, Q.; Huang, M. A learning-driven multi-objective cooperative artificial bee colony algorithm for distributed flexible job shop scheduling problems with preventive maintenance and transportation operations. Comput. Ind. Eng. 2024, 196, 110484. [Google Scholar] [CrossRef]
Zhu, N.; Gong, G.; Lu, D.; Huang, D.; Peng, N.; Qi, H. An effective reformative memetic algorithm for distributed flexible job-shop scheduling problem with order cancellation. Expert Syst. Appl. 2024, 237, 121205. [Google Scholar] [CrossRef]
Li, R.; Gong, W.; Wang, L.; Lu, C.; Zhuang, X. Surprisingly popular-based adaptive memetic algorithm for energy-efficient distributed flexible job shop scheduling. IEEE Trans. Cybern. 2023, 53, 8013–8023. [Google Scholar] [CrossRef]
Cao, S.; Li, R.; Gong, W.; Lu, C. Inverse model and adaptive neighborhood search based cooperative optimizer for energy-efficient distributed flexible job shop scheduling. Swarm Evol. Comput. 2023, 83, 101419. [Google Scholar] [CrossRef]
Ashour, S.; Hiremath, S.R. A branch-and-bound approach to the job-shop scheduling problem. Int. J. Prod. Res. 1973, 11, 47–58. [Google Scholar] [CrossRef]
Brucker, P.; Jurisch, B.; Sievers, B. A branch and bound algorithm for the job-shop scheduling problem. Discret. Appl. Math. 1994, 49, 107–127. [Google Scholar] [CrossRef]
Jayamohan, M.S.; Rajendran, C. New dispatching rules for shop scheduling: A step forward. Int. J. Prod. Res. 2000, 38, 563–586. [Google Scholar] [CrossRef]
Kaban, A.K.; Othman, Z.; Rohmah, D.S. Comparison of dispatching rules in job-shop scheduling problem using simulation: A case study. Int. J. Simul. Model. 2012, 11, 129–140. [Google Scholar] [CrossRef]
Cheng, R.; Gen, M.; Tsujimura, Y. A tutorial survey of job-shop scheduling problems using genetic algorithms—I. Representation. Comput. Ind. Eng. 1996, 30, 983–997. [Google Scholar] [CrossRef]
Aydin, M.E.; Fogarty, T.C. A simulated annealing algorithm for multi-agent systems: A job-shop scheduling application. J. Intell. Manuf. 2004, 15, 805–814. [Google Scholar] [CrossRef]
Yagmahan, B.; Yenisey, M.M. Ant colony optimization for multi-objective flow shop scheduling problem. Comput. Ind. Eng. 2008, 54, 411–420. [Google Scholar] [CrossRef]
Nouiri, M.; Bekrar, A.; Jemai, A.; Niar, S.; Ammari, A.C. An effective and distributed particle swarm optimization algorithm for flexible job-shop scheduling problem. J. Intell. Manuf. 2018, 29, 603–615. [Google Scholar] [CrossRef]
Gao, D.; Wang, G.G.; Pedrycz, W. Solving fuzzy job-shop scheduling problem using DE algorithm improved by a selection mechanism. IEEE Trans. Fuzzy Syst. 2020, 28, 3265–3275. [Google Scholar] [CrossRef]
Engin, O.; Güçlü, A. A new hybrid ant colony optimization algorithm for solving the no-wait flow shop scheduling problems. Appl. Soft Comput. 2018, 72, 166–176. [Google Scholar] [CrossRef]
Torres-Tapia, W.; Montoya-Torres, J.R.; Ruiz-Meza, J. A hybrid algorithm based on ant colony system for flexible job shop. In Workshop on Engineering Applications; Springer Nature: Cham, Switzerland, 2022; pp. 198–209. [Google Scholar]
Wu, X.; Yan, X.; Guan, D.; Wei, M. A deep reinforcement learning model for dynamic job-shop scheduling problem with uncertain processing time. Eng. Appl. Artif. Intell. 2024, 131, 107790. [Google Scholar] [CrossRef]
Jing, X.; Yao, X.; Liu, M.; Zhou, J. Multi-agent reinforcement learning based on graph convolutional network for flexible job shop scheduling. J. Intell. Manuf. 2024, 35, 75–93. [Google Scholar] [CrossRef]
Wang, S.; Li, J.; Jiao, Q.; Ma, F. Design patterns of deep reinforcement learning models for job shop scheduling problems. J. Intell. Manuf. 2025, 36, 3741–3759. [Google Scholar] [CrossRef]
Sun, X.; Wang, Y.; Kang, H.; Shen, Y.; Chen, Q.; Wang, D. Modified multi-crossover operator nsga-iii for solving low carbon flexible job shop scheduling problem. Processes 2020, 9, 62. [Google Scholar] [CrossRef]
Xu, Z.; Ning, X.; Li, R.; Wan, X.; Zhao, C. Configuration optimization of a shell-and-tube heat exchanger with segmental baffles based on combination of NSGAII and MOPSO embedded grouping cooperative coevolution strategy. Processes 2023, 11, 3094. [Google Scholar] [CrossRef]
Wu, Z.; Liu, H.; Zhao, J.; Li, Z. An improved MOEA/D algorithm for the solution of the multi-objective optimal power flow problem. Processes 2023, 11, 337. [Google Scholar] [CrossRef]

Figure 1. Decomposition of the energy consumption model.

Figure 2. Flowchart of the D2QN-COEA algorithm.

Figure 3. Framework of the proposed D2QN-COEA for EA-DFJSP-JP.

Figure 4. Trends of key parameter effects on the performance of D2QN-COEA.

Figure 5. Boxplot comparison of four performance indicators among five algorithms.

Figure 6. Gantt chart of a representative Pareto-optimal solution obtained by D2QN-COEA.

Table 1. Comparison of this study with related works in the literature.

Reference	Problem Type	Objective(s)	Energy Aware	Job Priority	Methodology	Knowledge-Guided
[17]	DFJSP	Makespan	No	No	Hybrid GA-tabu search	No
[25]	DFJSP	Makespan, Energy	Yes	No	Popularity-based memetic Algorithm	No
[21]	Dynamic DFJSP	Makespan, Energy	Yes	No	Meme-inspired Algorithm	No
[23]	DFJSP	Makespan, Stability	No	No	Cooperative artificial bee colony + Q-learning	Partial
[9]	DHFSP	Makespan, Energy	Yes	Yes	Double DQN-based coevolution	No
This study	DFJSP	TWT, Energy	Yes	Yes	D2QN-COEA	Yes

Table 2. Parameters of the example instance.

Job ID	Number of Operations	Priority Weight	Due Date
1	3	2.0	30
2	2	1.5	25
3	2	1.0	20

Table 3. Processing times of operations on different machines.

Job-Operation	F1-M1	F1-M2	F1-M3	F2-M1	F2-M2
1-1	5	6	7	4	5
1-2	4	5	6	3	4
1-3	6	5	7	5	4
2-1	3	4	5	3	3
2-2	5	6	4	4	5
3-1	4	3	5	3	4
3-2	6	5	6	5	4

Table 4. Decoded scheduling results.

Sequence	Job ID	Operation Index	Factory Index	Machine Index	Processing Time	Start Time	Completion Time
1	2	1	2	1	3	0	3
2	1	1	1	1	5	0	5
3	3	1	1	2	3	0	3
4	1	2	1	3	6	5	11
5	2	2	2	1	4	3	7
6	3	2	1	2	5	3	8
7	1	3	1	2	5	11	16

Table 5. Results of the orthogonal experiment for parameter analysis of D2QN-COEA.

No.	Population Size	Archive Size	Learning Rate	Exploration Rate	HV	IGD	GD	Spacing
1	10	30	0.0010	0.10	0.6739	0.3563	0.2881	0.0038
2	20	30	0.0010	0.10	0.6859	0.3303	0.2595	0.0022
3	30	25	0.0010	0.10	0.6008	0.3985	0.3434	0.0016
4	30	30	0.0001	0.10	0.6106	0.407	0.347	0.0057
5	30	30	0.0005	0.10	0.6363	0.3816	0.3338	0.0031
6	30	30	0.0010	0.00	0.4958	0.4906	0.4405	0.0037
7	30	30	0.0010	0.05	0.498	0.5065	0.4631	0.0058
8	30	30	0.0010	0.10	0.603	0.4117	0.3586	0.0058
9	30	30	0.0010	0.20	0.5771	0.4163	0.3693	0.004
10	30	30	0.0010	0.30	0.6413	0.3694	0.3194	0.0062
11	30	30	0.0050	0.10	0.6382	0.3759	0.3112	0.0015
12	30	30	0.0100	0.10	0.6422	0.3601	0.3269	0.0044
13	30	40	0.0010	0.10	0.6082	0.3925	0.3405	0.0048
14	30	50	0.0010	0.10	0.6129	0.3814	0.3375	0.0019
15	30	65	0.0010	0.10	0.5902	0.4119	0.3635	0.0034
16	30	80	0.0010	0.10	0.6561	0.3503	0.3266	0.0135
17	40	30	0.0010	0.10	0.5536	0.455	0.4159	0.0082
18	50	30	0.0010	0.10	0.4306	0.5967	0.5587	0.0083

Table 6. Performance comparison under different due-date tightness scenarios.

Instance	Scenario	HV	GD	IGD	Spacing
2-3-20	Tight	1.1523	0.1847	0.2134	0.0007
	Medium	1.1945	0.1629	0.2078	0.0019
	Loose	1.2418	0.1402	0.1823	0.0006
2-5-40	Tight	1.2634	0.0986	0.0945	0.0008
	Medium	1.3120	0.0901	0.0881	0.0008
	Loose	1.3785	0.0774	0.0756	0.0007
3-4-40	Tight	1.3512	0.0245	0.0812	0.0022
	Medium	1.4104	0.0197	0.0724	0.0022
	Loose	1.4728	0.0168	0.0635	0.0021
3-6-60	Tight	1.2387	0.0823	0.0758	0.0007
	Medium	1.2856	0.0699	0.0682	0.0007
	Loose	1.3452	0.0589	0.0597	0.0006
4-5-100	Tight	1.3128	0.0478	0.0623	0.0002
	Medium	1.3708	0.0406	0.0546	0.0002
	Loose	1.4315	0.0342	0.0471	0.0002
5-6-160	Tight	1.2920	0.0810	0.0951	0.0001
	Medium	1.2950	0.0870	0.0863	0.0001
	Loose	1.3654	0.0722	0.0742	0.0001
Average	Tight	1.2684	0.0865	0.1047	0.0008
	Medium	1.3301	0.0749	0.0911	0.0010
	Loose	1.3892	0.0647	0.0775	0.0007

Table 7. Performance comparison under different energy parameter configurations.

Instance	Scenario	HV	GD	IGD	Spacing
2-3-20	Low Power	1.1823	0.1724	0.2156	0.0019
	Baseline	1.1945	0.1629	0.2078	0.0019
	High Power	1.2052	0.1587	0.2014	0.0018
2-5-40	Low Power	1.2967	0.0945	0.0912	0.0008
	Baseline	1.3120	0.0901	0.0881	0.0008
	High Power	1.3286	0.0865	0.0847	0.0007
3-4-40	Low Power	1.3912	0.0215	0.0756	0.0022
	Baseline	1.4104	0.0197	0.0724	0.0022
	High Power	1.4287	0.0184	0.0698	0.0023
3-6-60	Low Power	1.2678	0.0734	0.0712	0.0007
	Baseline	1.2856	0.0699	0.0682	0.0007
	High Power	1.3021	0.0671	0.0653	0.0006
4-5-100	Low Power	1.3524	0.0434	0.0578	0.0002
	Baseline	1.3708	0.0406	0.0546	0.0002
	High Power	1.3901	0.0385	0.0519	0.0002
5-6-160	Low Power	1.2720	0.0955	0.0964	0.0001
	Baseline	1.2950	0.0870	0.0863	0.0001
	High Power	1.3309	0.0752	0.0754	0.0001
Average	Low Power	1.3104	0.0801	0.0963	0.0010
	Baseline	1.3301	0.0749	0.0911	0.0010
	High Power	1.3476	0.0724	0.0881	0.0010

Table 8. Comparative results of five algorithms based on the GD metric.

Instance	NSGA2		MOPSO		MOEAD		SPEA2		D2QN_COEA
Instance	Mean	Std	Mean	Std	Mean	Std	Mean	Std	Mean	Std
1-3-20	0.2219	0.1908	1.0401	0.1078	0.5392	0.1311	0.2738	0.1051	0.2097	0.1003
1-4-30	0.4708	0.1739	1.2305	0.0754	0.6380	0.1490	0.3883	0.0925	0.0994	0.0674
1-5-40	0.4067	0.0553	1.1872	0.0487	0.5716	0.0572	0.2966	0.1747	0.0204	0.0267
1-5-50	0.3548	0.0967	1.1755	0.0929	0.5425	0.0301	0.2153	0.0119	0.0267	0.0269
2-3-20	0.3302	0.1375	0.9763	0.1137	0.5640	0.1892	0.2317	0.1637	0.1629	0.1454
2-4-30	0.4864	0.0326	1.1308	0.1105	0.4720	0.0902	0.1940	0.0891	0.0375	0.0323
2-5-40	0.5911	0.2517	1.2230	0.0252	0.4768	0.0834	0.2641	0.1195	0.0901	0.0796
2-5-50	0.6210	0.2402	1.1693	0.0319	0.5215	0.0591	0.2468	0.1361	0.0590	0.0810
2-5-56	0.4153	0.0440	1.0520	0.0805	0.4356	0.0823	0.2120	0.0646	0.0348	0.0415
2-5-70	0.5660	0.2928	0.8522	0.0774	0.5198	0.1176	0.3548	0.1539	0.0624	0.0689
2-6-60	0.4333	0.0449	1.2308	0.0336	0.5037	0.0489	0.2307	0.0586	0.0731	0.0456
2-6-100	0.6043	0.1321	1.2682	0.0174	0.5497	0.0425	0.2506	0.1007	0.0554	0.0701
3-3-30	0.4638	0.2320	0.9402	0.1327	0.4680	0.1736	0.2740	0.1770	0.1157	0.1397
3-4-40	0.4229	0.1301	1.1774	0.0804	0.5033	0.0931	0.1579	0.1417	0.0197	0.0203
3-4-50	0.4600	0.1592	1.1915	0.0494	0.4440	0.0295	0.2456	0.1281	0.0581	0.0936
3-5-50	0.3910	0.1141	1.0947	0.0920	0.4289	0.1279	0.2627	0.1137	0.0595	0.0692
3-5-56	0.5105	0.2760	1.1711	0.0752	0.5378	0.1073	0.3393	0.0729	0.1382	0.0892
3-5-80	0.4855	0.0914	1.1798	0.0519	0.4052	0.0202	0.1701	0.0101	0.0568	0.0709
3-6-60	0.7019	0.2130	1.2026	0.0300	0.4610	0.0460	0.2614	0.0793	0.0699	0.0581
3-6-100	0.4741	0.0410	1.2337	0.0451	0.5245	0.0362	0.2343	0.0334	0.0851	0.0590
4-5-100	0.3812	0.0760	1.1998	0.0785	0.3706	0.0275	0.2123	0.0932	0.0406	0.0529
4-6-120	0.6991	0.1802	1.3106	0.0404	0.4866	0.1246	0.1667	0.0802	0.0303	0.0308
4-6-140	0.5993	0.2030	1.3167	0.0371	0.5204	0.0351	0.2348	0.0591	0.0352	0.0351
5-6-160	0.5609	0.0713	1.3362	0.0472	0.4653	0.0931	0.2373	0.0681	0.0870	0.0597

Table 9. Comparative results of five algorithms based on the Spacing metric.

Instance	NSGA2		MOPSO		MOEAD		SPEA2		D2QN_COEA
Instance	Mean	Std	Mean	Std	Mean	Std	Mean	Std	Mean	Std
1-3-20	0.0010	0.0022	0.0066	0.0098	0.0073	0.0162	0.0055	0.0042	0.0000	0.0000
1-4-30	0.0005	0.0011	0.0014	0.0031	0.0010	0.0022	0.0026	0.0034	0.0013	0.0012
1-5-40	0.0001	0.0002	0.0041	0.0091	0.0000	0.0000	0.0022	0.0016	0.0005	0.0011
1-5-50	0.0009	0.0015	0.0014	0.0021	0.0029	0.0065	0.0012	0.0015	0.0004	0.0004
2-3-20	0.0037	0.0054	0.0027	0.0039	0.0066	0.0059	0.0058	0.0019	0.0019	0.0034
2-4-30	0.0000	0.0000	0.0194	0.0228	0.0000	0.0000	0.0022	0.0023	0.0007	0.0016
2-5-40	0.0000	0.0000	0.0034	0.0035	0.0009	0.0019	0.0041	0.0091	0.0008	0.0012
2-5-50	0.0005	0.0011	0.0083	0.0158	0.0018	0.0027	0.0009	0.0009	0.0003	0.0005
2-5-56	0.0000	0.0000	0.0130	0.0185	0.0000	0.0000	0.0019	0.0021	0.0011	0.0012
2-5-70	0.0000	0.0000	0.0050	0.0057	0.0222	0.0247	0.0064	0.0074	0.0003	0.0007
2-6-60	0.0004	0.0009	0.0013	0.0030	0.0000	0.0000	0.0013	0.0012	0.0018	0.0018
2-6-100	0.0006	0.0012	0.0000	0.0000	0.0002	0.0005	0.0001	0.0001	0.0002	0.0003
3-3-30	0.0005	0.0011	0.0234	0.0284	0.0359	0.0495	0.0048	0.0035	0.0010	0.0012
3-4-40	0.0007	0.0014	0.0009	0.0020	0.0041	0.0044	0.0032	0.0026	0.0022	0.0021
3-4-50	0.0005	0.0012	0.0053	0.0038	0.0010	0.0023	0.0025	0.0016	0.0003	0.0007
3-5-50	0.0012	0.0016	0.0090	0.0117	0.0072	0.0115	0.0027	0.0019	0.0000	0.0000
3-5-56	0.0015	0.0021	0.0078	0.0123	0.0202	0.0420	0.0033	0.0023	0.0002	0.0003
3-5-80	0.0000	0.0000	0.0007	0.0015	0.0000	0.0000	0.0017	0.0013	0.0002	0.0002
3-6-60	0.0046	0.0061	0.0051	0.0054	0.0004	0.0008	0.0024	0.0021	0.0007	0.0009
3-6-100	0.0001	0.0001	0.0000	0.0001	0.0018	0.0035	0.0006	0.0007	0.0002	0.0002
4-5-100	0.0004	0.0004	0.0010	0.0022	0.0000	0.0000	0.0012	0.0012	0.0002	0.0003
4-6-120	0.0018	0.0020	0.0043	0.0097	0.0009	0.0020	0.0012	0.0014	0.0001	0.0003
4-6-140	0.0001	0.0001	0.0000	0.0000	0.0005	0.0010	0.0007	0.0005	0.0001	0.0002
5-6-160	0.0025	0.0034	0.0008	0.0016	0.0000	0.0000	0.0007	0.0006	0.0001	0.0002

Table 10. Comparative results of five algorithms based on the IGD metric.

Instance	NSGA2		MOPSO		MOEAD		SPEA2		D2QN_COEA
Instance	Mean	Std	Mean	Std	Mean	Std	Mean	Std	Mean	Std
1-3-20	0.2198	0.1790	0.9672	0.0869	0.5352	0.1001	0.2682	0.1113	0.2252	0.1112
1-4-30	0.4342	0.1649	1.1953	0.0717	0.6358	0.1483	0.3834	0.0926	0.0970	0.0652
1-5-40	0.4269	0.0570	1.2228	0.0412	0.6145	0.0576	0.3190	0.1530	0.0567	0.0212
1-5-50	0.3499	0.0966	1.1661	0.1006	0.5494	0.0353	0.2212	0.0173	0.0438	0.0071
2-3-20	0.3344	0.1350	0.9922	0.0926	0.5815	0.1894	0.2704	0.1303	0.2078	0.1355
2-4-30	0.4806	0.0255	1.1126	0.1030	0.4652	0.0881	0.1901	0.0800	0.0479	0.0350
2-5-40	0.5545	0.1930	1.2012	0.0160	0.4762	0.0848	0.2440	0.1015	0.0881	0.0783
2-5-50	0.6311	0.2317	1.1753	0.0133	0.5410	0.0641	0.2706	0.1211	0.1066	0.0708
2-5-56	0.4362	0.0443	1.0583	0.0734	0.4684	0.0890	0.2310	0.0673	0.1054	0.0142
2-5-70	0.4449	0.1269	0.8284	0.0544	0.4581	0.1038	0.3180	0.1312	0.0803	0.0640
2-6-60	0.4300	0.0437	1.2134	0.0286	0.5056	0.0487	0.2286	0.0619	0.0727	0.0448
2-6-100	0.5866	0.1236	1.2442	0.0311	0.5500	0.0424	0.2498	0.1010	0.0547	0.0684
3-3-30	0.4602	0.2262	0.8982	0.1236	0.4318	0.1421	0.2805	0.1662	0.1364	0.1355
3-4-40	0.4896	0.1191	1.2341	0.0948	0.5563	0.0762	0.2050	0.1524	0.0724	0.0455
3-4-50	0.4778	0.1560	1.1848	0.0675	0.4698	0.0330	0.2550	0.1226	0.1074	0.0664
3-5-50	0.3846	0.1094	1.0874	0.0806	0.4225	0.1108	0.2509	0.1077	0.0645	0.0693
3-5-56	0.4954	0.2500	1.1464	0.0605	0.5296	0.0939	0.3319	0.0690	0.1374	0.0878
3-5-80	0.4806	0.0826	1.1647	0.0446	0.4093	0.0203	0.1693	0.0078	0.0805	0.0573
3-6-60	0.6565	0.1831	1.1643	0.0386	0.4532	0.0299	0.2575	0.0787	0.0682	0.0553
3-6-100	0.4753	0.0423	1.2260	0.0353	0.5243	0.0353	0.2362	0.0320	0.0869	0.0600
4-5-100	0.3782	0.0758	1.1840	0.0720	0.3743	0.0280	0.2103	0.0913	0.0546	0.0393
4-6-120	0.6892	0.1773	1.2939	0.0477	0.4885	0.1250	0.1665	0.0809	0.0369	0.0288
4-6-140	0.5908	0.1949	1.3066	0.0333	0.5182	0.0346	0.2332	0.0585	0.0368	0.0336
5-6-160	0.5560	0.0708	1.3318	0.0536	0.4653	0.0931	0.2351	0.0693	0.0863	0.0592

Table 11. Comparative results of five algorithms based on the HV metric.

Instance	NSGA2		MOPSO		MOEAD		SPEA2		D2QN_COEA
Instance	Mean	Std	Mean	Std	Mean	Std	Mean	Std	Mean	Std
1-3-20	1.6530	0.2930	0.5651	0.0821	1.1464	0.1598	1.5298	0.1769	1.6116	0.1450
1-4-30	0.9639	0.2043	0.1933	0.0507	0.6948	0.1703	1.0106	0.1394	1.4425	0.1047
1-5-40	0.9242	0.0838	0.1470	0.0178	0.6793	0.0622	1.0978	0.1973	1.5209	0.0500
1-5-50	0.9850	0.1306	0.1770	0.0561	0.7219	0.0485	1.1669	0.0258	1.4822	0.0321
2-3-20	1.0147	0.2198	0.2727	0.0552	0.7011	0.2103	1.1710	0.2811	1.1945	0.2253
2-4-30	0.7680	0.0459	0.1937	0.0624	0.8157	0.0994	1.1981	0.1334	1.4326	0.0612
2-5-40	0.6968	0.1824	0.1445	0.0107	0.7624	0.1077	1.0750	0.1455	1.3120	0.1272
2-5-50	0.5982	0.2121	0.1288	0.0098	0.6471	0.0723	1.0082	0.1663	1.2799	0.1386
2-5-56	0.7620	0.0494	0.1944	0.0381	0.7289	0.0968	1.0449	0.1003	1.3407	0.0707
2-5-70	0.7428	0.1542	0.3981	0.0165	0.7929	0.0970	0.9596	0.1688	1.2850	0.1280
2-6-60	0.7784	0.0481	0.1181	0.0168	0.6865	0.0568	1.0568	0.0904	1.2902	0.0742
2-6-100	0.6637	0.1456	0.1150	0.0188	0.6943	0.0490	1.0943	0.1492	1.3964	0.1121
3-3-30	0.9164	0.2751	0.3923	0.1126	0.9266	0.1635	1.1526	0.2956	1.3646	0.2368
3-4-40	0.7899	0.1499	0.1319	0.0396	0.7049	0.0932	1.1909	0.2247	1.4104	0.0578
3-4-50	0.7454	0.1664	0.1306	0.0337	0.7351	0.0558	1.0389	0.1743	1.3025	0.1479
3-5-50	0.8518	0.1401	0.1686	0.0513	0.8207	0.1074	1.0568	0.1567	1.3134	0.1159
3-5-56	0.7358	0.2691	0.1539	0.0324	0.6848	0.1048	0.9447	0.0833	1.2029	0.1419
3-5-80	0.7200	0.0944	0.1349	0.0200	0.7960	0.0270	1.1377	0.0177	1.3063	0.1196
3-6-60	0.5614	0.1627	0.1342	0.0239	0.7565	0.0253	1.0102	0.1078	1.2856	0.0862
3-6-100	0.7335	0.0538	0.1141	0.0165	0.6814	0.0374	1.0478	0.0455	1.2776	0.0978
4-5-100	0.8867	0.0938	0.1302	0.0412	0.8776	0.0353	1.1231	0.1219	1.3708	0.0825
4-6-120	0.5249	0.1532	0.0808	0.0210	0.7316	0.1492	1.1680	0.1212	1.3754	0.0473
4-6-140	0.6343	0.1947	0.0747	0.0165	0.6961	0.0412	1.0753	0.0814	1.3787	0.0588
5-6-160	0.6533	0.0823	0.0670	0.0234	0.7638	0.1036	1.0694	0.1020	1.2950	0.0945

Table 12. Comparison of five algorithms on the case study instance.

Algorithms	HV		GD		IGD		Spacing
Algorithms	Mean	Std	Mean	Std	Mean	Std	Mean	Std
D2QN-COEA	1.3962	0.0976	0.0876	0.0580	0.0866	0.0582	0.0006	0.0008
NSGA-II	0.7639	0.1615	0.5504	0.1765	0.5425	0.1662	0.0003	0.0006
MOPSO	0.1175	0.0275	1.2884	0.0677	1.2814	0.0649	0.0015	0.0019
MOEA/D	0.7165	0.0720	0.5785	0.0885	0.5628	0.0622	0.0035	0.0048
SPEA2	1.2008	0.0632	0.2103	0.0427	0.2082	0.0415	0.0021	0.0023

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Luo, Z.-Y.; Song, J.-B.; Ge, C.-Q. A Knowledge-Guided Deep Reinforcement Learning Approach for Energy-Aware Distributed Flexible Job Shop Scheduling with Job Priority. Processes 2026, 14, 662. https://doi.org/10.3390/pr14040662

AMA Style

Luo Z-Y, Song J-B, Ge C-Q. A Knowledge-Guided Deep Reinforcement Learning Approach for Energy-Aware Distributed Flexible Job Shop Scheduling with Job Priority. Processes. 2026; 14(4):662. https://doi.org/10.3390/pr14040662

Chicago/Turabian Style

Luo, Zhi-Yong, Jia-Bao Song, and Chun-Qiao Ge. 2026. "A Knowledge-Guided Deep Reinforcement Learning Approach for Energy-Aware Distributed Flexible Job Shop Scheduling with Job Priority" Processes 14, no. 4: 662. https://doi.org/10.3390/pr14040662

APA Style

Luo, Z.-Y., Song, J.-B., & Ge, C.-Q. (2026). A Knowledge-Guided Deep Reinforcement Learning Approach for Energy-Aware Distributed Flexible Job Shop Scheduling with Job Priority. Processes, 14(4), 662. https://doi.org/10.3390/pr14040662

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Knowledge-Guided Deep Reinforcement Learning Approach for Energy-Aware Distributed Flexible Job Shop Scheduling with Job Priority

Abstract

1. Introduction

2. Related Work

2.1. Distributed Flexible Job Shop Scheduling

2.2. Optimization Algorithms

3. Mathematical Model

3.1. Definitions and Assumptions

3.2. Objectives and Constraints

4. The Proposed D2QN-COEA

4.1. Encoding and Decoding

4.2. Algorithm Framework

4.3. Global Search Strategy

4.4. Local Search Operator Design

4.5. Double Deep Q-Network Architecture

5. Experiment and Result Analysis

5.1. Experimental Design

5.2. Sensitivity Analysis

5.3. Comparison with Other Algorithms

5.4. Case Study

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI