Solving Dynamic Multi-Objective Flexible Job Shop Scheduling Problems Using a Dual-Level Integrated Deep Q-Network Approach

Xu, Hua; Zheng, Jianlu; Huang, Lingxiang; Tao, Juntai; Zhang, Chenjie

doi:10.3390/pr13020386

Open AccessArticle

Solving Dynamic Multi-Objective Flexible Job Shop Scheduling Problems Using a Dual-Level Integrated Deep Q-Network Approach

by

Hua Xu

^*,

Jianlu Zheng

,

Lingxiang Huang

,

Juntai Tao

and

Chenjie Zhang

School of Artificial Intelligence and Computer Science, Jiangnan University, Li Hu Avenue, Wuxi 214122, China

^*

Author to whom correspondence should be addressed.

Processes 2025, 13(2), 386; https://doi.org/10.3390/pr13020386

Submission received: 29 November 2024 / Revised: 30 December 2024 / Accepted: 29 January 2025 / Published: 31 January 2025

(This article belongs to the Section Automation Control Systems)

Download

Browse Figures

Versions Notes

Abstract

Economic performance in modern manufacturing enterprises is often influenced by random dynamic events, requiring real-time scheduling to manage multiple conflicting production objectives simultaneously. However, traditional scheduling methods often fall short due to their limited responsiveness in dynamic environments. To address this challenge, this paper proposes an innovative online rescheduling framework called the Dual-Level Integrated Deep Q-Network (DLIDQN). This framework is designed to solve the dynamic multi-objective flexible job shop scheduling problem (DMOFJSP), which is affected by six types of dynamic events: new job insertion, job operation modification, job deletion, machine addition, machine tool replacement, and machine breakdown. The optimization focuses on three key objectives: minimizing makespan, maximizing average machine utilization (

U_{a v e}

), and minimizing average job tardiness rate (

T R_{a v e}

). The DLIDQN framework leverages a hierarchical reinforcement learning approach and consists of two integrated IDQN-based agents. The high-level IDQN serves as the decision-maker during rescheduling, implementing dual-level decision-making by dynamically selecting optimization objectives based on the current system state and guiding the low-level IDQN’s actions. To meet diverse optimization requirements, two reward mechanisms are designed, focusing on job tardiness and machine utilization, respectively. The low-level IDQN acts as the executor, selecting the best scheduling rules to achieve the optimization goals determined by the high-level agent. To improve scheduling adaptability, nine composite scheduling rules are introduced, enabling the low-level IDQN to flexibly choose strategies for job sequencing and machine assignment, effectively addressing both sub-tasks to achieve optimal scheduling performance. Additionally, a local search algorithm is incorporated to further enhance efficiency by optimizing idle time between jobs. The numerical experimental results show that in 27 test scenarios, the DLIDQN framework consistently outperforms all proposed composite scheduling rules in terms of makespan, surpasses the widely used single scheduling rules in 26 instances, and always exceeds other reinforcement learning-based methods. Regarding the

U_{a v e}

metric, the framework demonstrates superiority in 21 instances over all composite scheduling rules and maintains a consistent advantage over single scheduling rules and other RL-based strategies. For the

T R_{a v e}

metric, DLIDQN outperforms composite and single scheduling rules in 20 instances and surpasses other RL-based methods in 25 instances. Specifically, compared to the baseline methods, our model achieves maximum performance improvements of approximately 37%, 34%, and 30% for the three objectives, respectively. These results validate the robustness and adaptability of the proposed framework in dynamic manufacturing environments and highlight its significant potential to enhance scheduling efficiency and economic benefits.

Keywords:

dynamic flexible job shop scheduling; multi-objective; real-time scheduling; deep reinforcement learning (DRL); dual-level decision-making; hierarchical deep reinforcement learning (HDRL); composite dispatching rules

1. Introduction

The flexible job shop scheduling problem (FJSP) is a common challenge in production scheduling and extends the classical job shop scheduling problem (JSP). JSP is a combinatorial optimization problem (COP) that is NP-hard [1]. In JSP, each job consists of a series of operations, with each operation assigned to a specific machine. In contrast, FJSP introduces a machine selection mechanism, allowing each operation to be assigned to any machine from a set of compatible options, increasing scheduling flexibility and complexity.

Most research on FJSP focuses on static production environments. However, real-world production is often uncertain, with dynamic events such as order changes, machine breakdowns, and task adjustments being unavoidable [2]. These disruptions can cause deviations from the original plan, rendering existing schedules ineffective and negatively impacting production efficiency and resource utilization. As a result, developing dynamic scheduling methods for multi-objective FJSP is crucial. These methods not only enhance theoretical models and algorithms but also offer significant practical value.

In practical manufacturing environments, dynamic real-time scheduling methods help manage uncertainty by improving system flexibility and robustness. They allow quick adjustments to job sequences and resource allocations, minimizing the impact of disruptions like machine breakdowns or order changes. Real-time scheduling optimizes multiple objectives, such as minimizing makespan, maximizing machine utilization, and reducing job tardiness. This multi-objective optimization improves efficiency, balances resource use, and ensures smooth operation despite dynamic disturbances.

Moreover, real-time scheduling supports continuous decision-making in highly automated production environments. It allows systems to quickly respond to changing conditions, reducing the need for human intervention and decision delays. This enhances operational efficiency and stability. In smart manufacturing, real-time scheduling helps businesses quickly recover from disruptions, improving customer satisfaction while boosting competitiveness, efficiency, and resource utilization.

The dynamic multi-objective flexible job shop scheduling problem (DMOFJSP) has been extensively studied in recent years, with numerous solutions emerging from years of exploration. Traditional methods for solving DMOFJSP generally fall into two categories: meta-heuristic algorithms and dispatching rules [3]. Meta-heuristic methods encompass a variety of algorithms, such as genetic algorithms (GA) [4,5], particle swarm optimization (PSO) [6], simulated annealing (SA) [7], grey wolf optimization (GWO) [8], ant colony optimization (ACO) [9,10], and tabu search (TS) [11]. These methods typically decompose dynamic scheduling problems into a series of static sub-problems, which are then solved sequentially. Although these methods can generate solutions close to the global optimum, their high computational complexity necessitates recalculating an optimal solution whenever a dynamic event occurs. This often results in prolonged computation times, hindering quick responses to dynamic changes and limiting their suitability for real-time scheduling.

Dispatching rules are widely used in dynamic job shop scheduling, where they prioritize jobs and machines based on specific criteria to allocate and process jobs. Known for their computational efficiency and simplicity, these rules are highly responsive in dynamic environments. However, many dispatching rules have limitations, often being too myopic to yield optimal long-term results [12]. In different shop configurations, no single dispatching rule consistently outperforms others [13,14]. A promising approach to balance time efficiency and solution quality involves dynamically selecting the most appropriate rule at each decision point. This enables short-term optimization of multiple objectives, ensuring timely scheduling while improving overall shop efficiency.

To select the most appropriate rule at each scheduling decision point, the DMOFJSP can be modeled as a Markov Decision Process (MDP), providing a formal framework for decision-making under uncertainty. In this framework, an intelligent agent chooses the optimal action from the action space based on the current state features of the environment. This involves selecting from predefined scheduling rules to optimize the scheduling objectives. Recently, reinforcement learning (RL), a proven method for solving MDP problems [15], has made significant strides in dynamic real-time scheduling. For instance, Aydin and Oztemel [16] developed an enhanced Q-learning algorithm that trains an agent based on real-time shop floor conditions. The agent selects the most appropriate dispatching rule from three candidates, aiming to minimize mean tardiness in a dynamic job shop with new job insertions. Bouazza et al. [17] applied Q-learning to solve the dynamic flexible job shop scheduling problem with new job insertions. Two Q-matrices were used to store the probabilities for machine selection and scheduling rules, with the goal of minimizing the weighted average waiting time. More relevant research findings will be discussed in detail in Section 2.

Traditional reinforcement learning (RL) has made strides in dynamic scheduling, yet two critical challenges persist. Firstly, early studies primarily depended on Q-learning or SARSA algorithms [15,18,19], which employ Q-tables to record Q-values for state–action pairs. However, in real-world production scenarios, state features are usually continuous, leading to a rapid expansion—or even an infinite growth—of the state space. For intricate scheduling problems, the sheer volume of state–action combinations renders Q-table storage and computation unfeasible. Secondly, classical RL faces difficulties in managing continuous or high-dimensional state spaces. A common workaround is discretizing these spaces into finite intervals [20], but this approach sacrifices precision by overlooking minor state variations. Furthermore, identifying the optimal discretization strategy is complex, as it can significantly influence outcomes. Additionally, traditional RL is constrained in handling multi-objective optimization and adaptability, limiting its efficacy in complex production settings. These methods often struggle to reconcile conflicting objectives and lack the agility to respond to unforeseen disruptions, such as equipment failures or urgent task insertions.

In recent years, deep reinforcement learning (DRL), capable of directly handling continuous state spaces, has emerged as a promising alternative to traditional RL methods [21,22]. Among DRL methods, the Deep Q-Network (DQN) is one of the most recognized. It maps continuous states to the maximum Q-values for each action, enabling the selection of optimal actions at decision points, and has been widely applied in real-time dynamic scheduling. However, designing a single RL agent to optimize multiple objectives in multi-objective rescheduling problems remains highly challenging. Each objective corresponds to a distinct reward function and strategy, making it difficult for a single agent to balance them effectively. To address this issue, hierarchical reinforcement learning (HRL) provides a promising solution [23,24]. HRL leverages hierarchical temporal and spatial abstraction, where a high-level controller learns strategies for different objectives over extended time scales, while a low-level executor determines specific actions in real-time to achieve the high-level goals. By allowing the high-level controller to adaptively adjust optimization objectives (i.e., reward functions) and the low-level executor to select suitable scheduling rules, HRL facilitates balanced optimization across multiple objectives throughout the scheduling process.

Given the reasons outlined above, this study proposes a real-time dynamic scheduling framework based on hierarchical deep reinforcement learning (HDRL) to effectively address the complex scheduling challenges in real-world production environments. The framework integrates multiple optimization techniques and employs an integrated DQN structure, specifically designed to solve the dynamic multi-objective flexible job shop scheduling problem (DMOFJSP), which involves six types of dynamic disturbances: new job insertions, job cancellation, job operation modification, machine addition, machine tool replacement and machine breakdown. The model simultaneously optimizes three key objectives: makespan, average machine utilization, and average job tardiness rate, providing more accurate and efficient scheduling solutions.

Building upon the motivations discussed above, this paper introduces a model called the Dual-Level Integrated Deep Q-Network (DLIDQN) to solve the dynamic multi-objective flexible job shop scheduling problem (DMOFJSP). The key contributions of this study are as follows:

State Feature Extraction: Six state features are extracted and normalized to the range [0, 1] to accurately represent the system’s state at each rescheduling point. These features are specifically designed to account for six types of dynamic events: new job insertions, job cancellation, job operation modification, machine addition, machine tool replacement and machine breakdown. By normalizing state features, the model’s adaptability and convergence speed can be significantly improved.
Composite Dispatching Rule Design: Three job selection rules and three machine assignment rules are formulated, resulting in nine composite dispatching rules through pairwise combinations. These rules constitute the agent’s action space, addressing both job sequencing and machine allocation tasks. They exhibit strong generalization capabilities, making them suitable for scheduling problems across various production configurations.
Dual-Level Integrated Deep Q-Network (DLIDQN): This paper proposes a Dual-Level Integrated Deep Q-Network (DLIDQN) framework, which combines various customized network architectures and employs a dual-level structure to enhance the efficiency and accuracy of scheduling decisions. The framework consists of high-level and low-level IDQN agents. The high-level IDQN is responsible for strategy decision-making, analyzing state features to set optimization goals for the low-level agent (using different reward functions). The low-level IDQN operates at the execution layer, calculating the Q-values for each dispatching rule and selecting the optimal scheduling action based on the optimization goals set by the high-level agent. At each rescheduling point, the framework dynamically adjusts strategies through an efficient collaboration mechanism, maximizing cumulative rewards, thereby significantly improving decision-making efficiency and enhancing adaptability to dynamic environmental changes.
Multi-Objective Optimization and Local Search Algorithm: This paper optimizes three key indicators: makespan, average machine utilization, and average job tardiness rate. Two high-level optimization goals are defined to balance these objectives, each with its own reward function during training. To further improve scheduling performance and stability, a specialized local search algorithm is introduced to ensure the quality and stability of the scheduling solution.

Finally, extensive experiments conducted on the dataset developed in this study validate the effectiveness and advantages of the proposed model, highlighting its capability and practicality in addressing dynamic scheduling problems across various scenarios.

The remainder of this paper is organized as follows: Section 2 reviews dynamic scheduling methods based on classical RL and DRL. Section 3 introduces the theoretical foundations of RL, including Q-learning and deep Q-learning, and defines various architectures of IDQN. Section 4 establishes the mathematical model of DMOFJSP. Section 5 details the key elements of MDP, including the definition of states, actions, and rewards, describes the implementation of the local search algorithm, and provides the specific implementation details of DLIDQN. Section 6 presents and analyzes the experimental results. Finally, Section 7 summarizes this research and suggests directions for future work.

2. Literature Review

In research on the application of reinforcement learning (RL) to dynamic job shop scheduling problems, several effective models have been developed, though methodological challenges remain. Early studies predominantly employed traditional RL algorithms (such as Q-learning) to tackle these problems. For instance, Wei and Mingyang [25] proposed a set of composite scheduling rules encompassing both job and machine selection, specifically for dynamic job shop scheduling with random job arrivals. Using the Q-learning algorithm, they developed agents capable of intelligently identifying and selecting appropriate scheduling rules during each rescheduling process. Zhang et al. [26] modeled the scheduling problem as a semi-Markov Decision Process (semi-MDP) and applied Q-learning to learn the optimal choice among five heuristic rules. Chen et al. [27] introduced a rule-driven approach for multi-objective dynamic scheduling, where Q-learning was employed to learn optimal weights within a discrete value range to form composite scheduling rules. Shahrabi et al. [28] applied Q-learning to optimize the parameters of the variable neighborhood search (VNS) algorithm in dynamic job shop environments, addressing challenges such as random job arrivals and machine failures. Shiue et al. [29] proposed a reinforcement learning (RL)-based real-time scheduling mechanism, where agents were trained via Q-learning to select the most appropriate scheduling rule from a predefined set. Targeting the minimization of earliness and tardiness penalties in a job shop with new job insertions, Wang [30] developed a dynamic multi-agent scheduling model and employed a weighted Q-learning algorithm to train the agents. Both studies predominantly relied on Q-learning, utilizing Q-tables to determine the optimal scheduling rule for each state. However, in real-world production environments, the state space is often continuous or infinite, rendering the construction and maintenance of large Q-tables impractical. A typical workaround is to discretize the continuous state space, though this approach often compromises accuracy to some degree.

Deep reinforcement learning (DRL), which combines the strengths of deep learning and reinforcement learning, has become an effective approach for addressing complex dynamic job shop scheduling problems. Among DRL-based models, Deep Q-Network (DQN) methods have been widely applied in this area. Waschneck et al. [31] proposed a dynamic scheduling strategy using multiple DQN agents, each focused on optimizing the scheduling rules for a specific work center while also considering interactions with other agents to improve machine utilization. Altenmüller et al. [32] developed a DQN-based framework to address job scheduling problems with strict deadlines. Luo [33] addressed dynamic scheduling problems involving new job insertions, aiming to minimize delays. His method utilized DQN to train intelligent agents capable of selecting the optimal rule from six predefined composite scheduling rules at each decision point, using seven continuous state features. Luo et al. [34] proposed a two-level Deep Q-Network (THDQN) algorithm to optimize both total weighted tardiness and average machine utilization. According to Luo et al. [35], they developed a hierarchical deep reinforcement learning framework to address the multi-objective flexible job shop scheduling problem with partial no-wait constraints, training it using a multi-agent PPO algorithm. Li et al. [36] introduced a hybrid Deep Q-Network (HDQN) algorithm that effectively solves multi-objective problems in dynamic flexible job shop scheduling with insufficient transportation resources. In the flexible job shop scheduling problem with random job arrivals, Zhao et al. [37] enhanced the DRL method by incorporating an attention mechanism to optimize the total tardiness objective. Wu et al. [38] proposed a dual-layer DDQN method that utilizes decision points to select scheduling rules, aiming to reduce both maximum completion time and total tardiness in dynamic flexible job shop scheduling with random workpiece arrivals. Table 1 summarizes and compares the results of these Q-learning and DQN-based studies with those of the current work.

3. Preliminaries

3.1. Definition of RL and Q-Learning

In reinforcement learning (RL), common problems are modeled as a Markov Decision Process (MDP), represented by the quintuple

(S, A, P, γ, R)

. In this model, S denotes the set of states s, A the set of actions a, P represents the transition probabilities between states, R is the reward received when taking action a in state s, and

γ

is the discount factor that balances the importance of future rewards. Within the MDP framework, an agent interacts with its environment based on a policy

π

, aiming to maximize the total expected return over time. At each time step t, the agent observes the current state

s_{t} \in S

, selects an action

a_{t} \in A

according to the policy

π (S \to A)

, transitions to a new state

s_{t + 1}

based on the state transition probability

p (s_{t + 1} | s_{t}, a_{t}) \in P (S \times A \to S)

, and receives an immediate reward

r_{t} \in R

.

The core objective of reinforcement learning is to find the optimal policy

π^{*}

that maximizes the expected value of future discounted rewards when selecting action a in state s and acting according to the policy

π

, as demonstrated in Equation (1).

Q_{π^{*}} (s, a) = max_{π} Q_{π} (s, a) = max_{π} E [R_{t} + γ R_{t + 1} + γ^{2} R_{t + 2} + \dots ∣ s_{t} = s, a_{t} = a, π]

(1)

where the discount factor

γ \in (0, 1]

decays over time, balancing the importance of short-term and long-term rewards.

Q_{π} (s, a)

, also known as the Q-function or action-value function, was proven by Bellman [39] to satisfy the Bellman optimality equation under the optimal policy

π^{*}

, as shown in Equation (2). This equation forms the theoretical foundation of the Q-learning algorithm.

Q_{π^{*}} (s, a) = \sum_{s^{'}} p (s^{'} ∣ s, a) [r (s, a, s^{'}) + γ max_{a^{'}} Q_{π^{*}} (s^{'}, a^{'})]

(2)

3.2. Definition of Deep Q-Learning

Q-learning requires continuous updates and maintenance of a Q-table that stores the Q-values for each state–action pair when addressing decision-making problems. Q-learning performs well when the state space is small. However, as the state space grows, the size of the Q-table increases, consuming significant resources and causing a sharp decline in decision-making efficiency. To overcome the curse of dimensionality in standard Q-learning, Mnih et al. [40] introduced the concept of a Deep Q-Network (DQN). DQN uses the state features at time step t (typically continuous values) as input to a deep neural network, which outputs the Q-values corresponding to each action in state

s_{t}

. The size of the output layer corresponds to the number of actions. This enables DQN to efficiently handle complex decision-making processes in continuous state spaces.

To overcome the limitations of standard Q-learning, deep Q-learning (DQL) introduces two key improvements. First, DQL incorporates an experience replay mechanism, which stores the agent’s experiences at each time step t (

s_{t}

,

a_{t}

,

r_{t}

,

s_{t + 1}

) in an experience replay pool. When updating the neural network, a random batch of past experiences is sampled from this buffer for training. This method reduces the correlation between consecutive data, enhancing the stability of the training process. Second, DQL introduces an independent target network

\hat{Q} (s, a; θ^{'})

. During training, the weights

θ^{'}

of the target network remain fixed and are only updated by copying the evaluation network’s weights

θ

every C steps. The loss between the target network’s output

y_{t}

and the evaluation network’s output y is used to update the evaluation network’s parameters, with the calculation of y provided in Equation (3).

y = max Q (s_{t}, a; θ)

(3)

The target value

y_{t}

, as given in Equation (4), is calculated where

max \hat{Q} (s_{t + 1}, a^{'}; θ^{'})

represents the maximum Q value the target network can obtain in state

s_{t + 1}

.

y_{t} = r_{t} + γ max \hat{Q} (s_{t + 1}, a^{'}; θ^{'})

(4)

3.3. Definition of IDQN

The Integrated Deep Q-Network (IDQN) algorithm combines the strengths of DQN, double DQN (DDQN), noisy DQN (NDQN), prioritized experience replay (PER), and dueling DQN (D3QN) frameworks. It models states using neural networks, mitigates Q-value overestimation through DDQN, improves exploration diversity by introducing noise, accelerates training via prioritized experience replay, and enhances Q-value estimation accuracy with the dueling structure. Crucially, these optimization techniques are complementary, effectively enhancing overall performance while mitigating individual weaknesses.

The standard DQN tends to overestimate Q-values because it uses the same Q-values for both action selection and evaluation [41]. To address this issue, Van Hasselt et al. [42] proposed the double DQN (DDQN) algorithm. This algorithm employs two DQN networks: the online Q network is responsible for selecting the action with the highest Q-value, while the target network

\hat{Q}

is used to estimate the state–action value, which is then used to compute the target value

y_{t}

, as defined in Equation (5). By decoupling action selection from evaluation, DDQN effectively mitigates the overestimation problem, leading to a more stable learning process.

y_{t} = r_{t} + γ \hat{Q} (s_{t + 1}, arg max_{a^{'}} Q (s_{t + 1}, a^{'}; θ); θ^{'})

(5)

Noisy DQN (NDQN) enhances exploration diversity by introducing noise into the parameters of the Q-network. At the beginning of each episode, Gaussian noise is added to each parameter of the Q-network

Q (s, a; w)

, resulting in

\tilde{Q} (s, a, ξ; μ, σ)

. Here,

μ

and

σ

represent the mean and standard deviation, respectively, which are parameters to be learned, while

ξ

denotes random noise, with each element independently sampled from a standard normal distribution

N (0, 1)

. The next action

a_{t}

is selected using

\tilde{Q} (s, a, ξ; μ, σ)

, as shown in Equation (6).

a_{t} = arg max_{a \in A} \tilde{Q} (s, a, ξ; μ, σ)

(6)

In classic DQN, experience replay samples randomly from the experience pool. However, the TD errors of different samples may vary. The TD error in the Q-network is defined as the difference between the target value provided by the target network and the estimated value produced by the online network. Priority Experience Replay (PER) DQN assigns a priority to each sample based on its TD error. The loss function, which incorporates the sample priorities, is defined in Equation (7), where

ω_{i}

represents the weight.

L_{i} (θ_{i}) = E {[ω_{i} (y_{t} - Q (s, a; θ_{i}))]}^{2}

(7)

In the DQN algorithm,

Q (s, a)

is influenced by both the state and the action, though the impact of these two factors is not equal. Dueling DQN decomposes the Q-network into two separate components: one calculates the action advantage function

A (s, a)

, while the other computes the state value function

V (s)

. These components are then combined to form the state–action value function. The ’advantage’ refers to how one action compares to others within the current state. If the advantage is greater than zero, it indicates that the action outperforms the average; if it is less than zero, it suggests that the action under performs the average. This approach assigns higher values to actions with greater advantages, which facilitates faster convergence. The

Q (s, a)

function is then expressed as shown in Equation (8).

Q (s, a) = V (s) + A (s, a)

(8)

4. DMOFJSP Formulation

The dynamic multi-objective flexible job shop scheduling problem (DMOFJSP) addressed in this paper, which involves six disturbance events, is defined as follows: There are n jobs,

J = J_{1}, J_{2}, J_{3}, \dots, J_{n}

, to be processed on m machines,

M = M_{1}, M_{2}, M_{3}, \dots, M_{m}

. Each job

J_{i}

consists of

n_{i}

operations, with the j-th operation of job

J_{i}

denoted as

O_{i, j}

. Each operation

O_{i, j}

corresponds to a set of compatible machines,

M_{i, j}

(

M_{i, j} \subseteq M

), from which any machine

M_{k}

can be selected for processing the operation. The processing time of operation

O_{i, j}

on machine

M_{k}

is denoted as

t_{i, j, k}

, and the actual completion time is represented by

C_{i, j}

. The relevant notations are presented in Table 2.

This study chooses makespan, average machine utilization rate (

U_{a v e}

), and average job tardiness rate (

T R_{a v e}

) as optimization goals because they are critical in production scheduling. These goals also account for the impact of dynamic events and the capabilities of the DLIDQN model. Makespan is important for measuring production efficiency, as disruptions like new job insertion and machine breakdown can extend production time. Minimizing makespan helps DLIDQN reduce delays caused by these events and improves system responsiveness.

U_{a v e}

shows how well resources are allocated. Events like machine addition or replacement can cause fluctuations, but DLIDQN dynamically adjusts resource use to avoid both under-utilization and overload.

T R_{a v e}

affects delivery times and is influenced by events like job modification or deletion. By focusing on these goals, DLIDQN effectively manages dynamic environments, improving the robustness and practical value of the scheduling system.

To simplify the problem, the following predefined constraints must be satisfied.

Each machine is capable of processing only one operation at a given time.
All operations within each job are required to adhere to a predetermined priority sequence and must be processed without interruption.
Job transportation times and machine setup time are disregarded in this study.
Each job is assigned a specific delivery deadline. Jobs not completed by their deadlines are classified as late, with their delays duly recorded.
The arrival times of jobs may differ. When a new job is introduced, all ongoing operations are allowed to finish before the remaining operations and the newly added job proceed to the scheduling stage to formulate a revised production plan.
If a task must be canceled, the system ceases the current process and shifts all remaining tasks to the scheduling stage in order to devise a fresh production plan.
Job operation modification involves expanding the existing job by introducing new operations. Once the current operation is completed, the additional operations will be integrated into the job, and all remaining tasks will be transferred to the rescheduling phase to develop a new production plan.
The process of machine addition refers to the system’s ability to deploy additional machines. After the completion of current tasks by all machines, the system will assess the average machine utilization $U_{a v e}$ . If $U_{a v e}$ is higher than a given threshold, new machines will be introduced and integrated into the new scheduling plan.
Machines are capable of switching between different processing types by replacing tools. During the tool replacement process, machines enter a conversion state and temporarily halt operations. The processing time for a machine accounts for the additional time required for tool changes.
When a machine experiences a breakdown, it transitions into a repair state. If alternative machines are unavailable, the affected job enters a blocked state. Upon completion of the machine repair, the job exits the blocked state, and all jobs re-enter the scheduling phase with an updated production plan.
If no available machine is able to process the current operation, the job enters a blocked state. Once the final operation of the job is completed, the job transitions into a finished state, and no further processing is carried out.

When the operation has not been assigned to a specific machine, its processing time cannot be represented by

t_{i, j, k}

. Instead, the average processing time of operation

O_{i, j}

across all machines in its compatible machine set is used, denoted as

\bar{t_{i, j}}

, as shown in Equation (9).

\bar{t_{i, j}} = \frac{\sum_{k \in M_{i, j}} t_{i, j, k}}{| M_{i, j} |}

(9)

The estimated completion time

E T_{i}

for the remaining operations of job

J_{i}

is the theoretically calculated time required to complete all unfinished operations, which is the sum of the average processing times of all remaining operations. The details are shown in Equation (10).

E T_{i} = \sum_{j = O P_{i} + 1}^{n_{i}} \bar{t_{i, j}}

(10)

The due date

D_{i}

of job

J_{i}

is defined as the job’s arrival time

A_{i}

plus the product of the job’s due date tightness (

D D T

) and the estimated completion time of all remaining operations. The formula is given in Equation (11).

D_{i} = A_{i} + (D D T \cdot U r_{i}) \cdot E T_{i} = A_{i} + (0.2 + 0.5 \cdot U r_{i}) \cdot E T_{i}

(11)

At the current decision point,

O P T_{i}

indicates the total processing time of the completed operations for job

J_{i}

, while

O P_{i}

refers to the number of operations that have been completed for job

J_{i}

up to this point. This formula can be found in Equation (12).

O P T_{i} = A_{i} + \sum_{j = 1}^{O P_{i}} \sum_{k \in M_{i, j}} X_{i, j, k} t_{i, j, k}

(12)

The completion rate

C R J_{i}

of job

J_{i}

is used to describe the job’s completion status, defined as the ratio of the processing time of completed operations to the estimated total processing time. The formula is provided in Equation (13).

C R J_{i} = \frac{O P T_{i}}{O P T_{i} + E T_{i}}

(13)

C T_{k}

denotes the completion time of the most recently scheduled operation on machine

M_{k}

. It represents the time at which the latest operation assigned to machine

M_{k}

finishes. The corresponding formula is given in Equation (14).

C T_{k} = max \{C_{i, j} ∣ X_{i, j, k} = 1\}

(14)

The machine utilization

U_{k}

represents the operational efficiency of machine

M_{k}

, indicating the proportion of machine

M_{k}

’s working time in the total running time. Its mathematical definition is illustrated in Equation (15).

U_{k} = \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{O P_{i}} X_{i, j, k} \cdot t_{i, j, k}}{C T_{k}}

(15)

The processing tardiness rate

T R_{i}

of job

J_{i}

represents the difference between the estimated completion time of all operations and the job’s deadline

D_{i}

, divided by the total theoretical processing time of the job. It is defined in Equation (16).

T R_{i} = \frac{O P T_{i} + E T_{i} - D_{i}}{O P T_{i} + E T_{i}}

(16)

This study optimizes three metrics: minimizing the maximum job processing time (makespan)

C_{m a x}

, minimizing the inverse of the average utilization of all machines

1 / U_{a v e}

, and minimizing the average job processing tardiness rate

T R_{a v e}

. The mathematical formulation of the objective functions is as follows:

Minimize \{\begin{matrix} C_{m a x} = max_{1 \leq i \leq n} C_{i, n_{i}} \\ 1 / U_{a v e} = \frac{\sum_{k = 1}^{m} U_{k}}{m} \\ T R_{a v e} = \frac{\sum_{i = 1}^{n} T R_{i}}{n} \end{matrix}

(17)

5. Methods for DMOFJSP

This section begins by defining the key attributes of the MDP, including state features, candidate composite scheduling rules (i.e., actions), and the reward function configuration. Next, it details the DLIDQN network architecture, which incorporates two deep neural networks for hierarchical decision-making and integrates a real-time scheduling framework to address dynamic events. Finally, a local search algorithm leveraging machine idle time is introduced to refine the scheduling results further.

5.1. Definition of State Features

Reinforcement learning requires the agent to select the next action based on the current environmental state, highlighting the importance of accurately representing the environment through state features. In intricate real-world scenarios, this study identifies six normalized ratios, each within the range [0, 1], as inputs to the network. By scaling features to the same range, the model becomes more robust to dynamic events like machine breakdowns and job insertions. It is less affected by changes in scheduling scale or event frequency, which improves adaptability. This approach creates a uniform data distribution and avoids gradient instability caused by varying feature scales. It also ensures fair treatment of all features, preventing bias toward specific objectives. Additionally, reducing input variation helps the model explore the policy space more efficiently, especially in complex scenarios with dynamic disturbances. Detailed definitions of these features are provided below.

5.1.1. Detailed Definitions and Formula Explanations

(1) The average utilization rate of machines, denoted as

U_{a v e}

, is defined in Equation (18).

U_{a v e} = \frac{\sum_{k = 1}^{m} U_{k}}{m}

(18)

(2) The standard deviation of machine utilization rate

U_{s t d}

is outlined in Equation (19).

U_{s t d} = \sqrt{\frac{\sum_{k = 1}^{m} {(U_{k} - U_{a v e})}^{2}}{m}}

(19)

(3) The average completion rate of jobs

C R J_{a v e}

is given by Equation (20).

C R J_{a v e} = \frac{\sum_{i = 1}^{n} C R J_{i}}{n}

(20)

(4) The standard deviation of job completion rate

C R J_{s t d}

is provided in Equation (21).

C R J_{s t d} = \sqrt{\frac{\sum_{i = 1}^{n} {(C R J_{i} - C R J_{a v e})}^{2}}{n}}

(21)

(5) The average processing tardiness rate of jobs

T R_{a v e}

is expressed in Equation (22).

T R_{a v e} = \frac{\sum_{j = 1}^{n} T R_{i}}{n}

(22)

(6) The standard deviation of job processing tardiness rate

T R_{s t d}

is shown in Equation (23).

T R_{s t d} = \sqrt{\frac{\sum_{i = 1}^{n} {(T R_{i} - T R_{a v e})}^{2}}{n}}

(23)

5.1.2. Impact of State Feature Changes on Scheduling Decisions

To illustrate more intuitively how changes in the above state features impact scheduling decisions, we provide three examples. These examples focus on the variations in average machine utilization rate (

U_{a v e}

), average job completion rate (

C R J_{a v e}

), and average job processing tardiness rate (

T R_{a v e}

) and explore how these changes drive the dynamic adjustment of scheduling strategies.

Changes in $U_{a v e}$ : If the $U_{a v e}$ increases from 50% to 80%, the scheduling system may adjust accordingly. For example, the system may determine that machine resources are being well-utilized and decide to increase the workload by scheduling more jobs. Additionally, the system may assign new jobs to machines with lower utilization rates to balance the workload and avoid overloading specific machines.
Changes in $C R J_{a v e}$ : When the $C R J_{a v e}$ increases, it means most jobs are nearly finished. In this case, the scheduling system may shift focus to newly inserted or unstarted jobs, prioritizing them over those that are almost complete. The system may also delay the start of some new jobs to prevent congestion, which helps improve resource allocation and overall efficiency.
Changes in $T R_{a v e}$ : If the $T R_{a v e}$ increases, it indicates potential delays, possibly due to machine breakdowns or resource shortages. The scheduling system may respond by prioritizing jobs with higher tardiness rates, ensuring that critical tasks are completed on time. It may also reassign resources, such as moving delayed jobs to idle machines or prioritizing repairs on affected machines, to reduce overall delays and improve performance.

5.2. Definition of Proposed Composite Dispatching Rules

Two critical sub-tasks in scheduling are operation sequencing and machine allocation. The agent addresses these sub-tasks based on state features. Single dispatching rules, due to their myopic nature (Nie et al., 2013), often fail to perform well across diverse configurations, necessitating the use of composite dispatching rules for action selection. Accordingly, this study designs three dispatching rules for sub-task 1 and three additional rules for sub-task 2. By pairing these two types of rules, a total of nine composite dispatching rules are formed. Detailed descriptions of each dispatching rule are provided below.

5.2.1. Scheduling Rule 1 for Sub-Task 1

First, let

U C_{j o b}

represent the set of unfinished jobs. Sort the jobs

J_{i}

in this set based on their completion rate

C R J_{i}

, prioritizing jobs with the lowest completion rate for execution. Simultaneously, the calculated job completion rate

C R J_{i}

is weighted by an urgency coefficient. This rule preferentially selects jobs with relatively low completion rates but high urgency, effectively reducing the average job processing tardiness rate

T R_{a v e}

. The specific procedure is detailed in Algorithm 1.

Algorithm 1: Procedure for Dispatching Rule 1 in Sub-task 1

5.2.2. Scheduling Rule 2 for Sub-Task 1

For rule 2 of sub-task 1, begin by defining the set of unfinished jobs,

U C_{j o b}

. From this set, extract the delayed jobs

J_{i}

, where the delivery deadline

D_{i}

has been exceeded, to form a set of tardy jobs,

T a r d_{j o b}

. If delayed jobs exist, prioritize those with the longest overdue time and highest urgency. If no delayed jobs are present, select jobs from the

U C_{j o b}

set which exhibit a smaller difference between their overdue time and estimated completion time, combined with high urgency. The detailed procedure is outlined in Algorithm 2.

Algorithm 2: Procedure for Dispatching Rule 2 in Sub-task 1

5.2.3. Scheduling Rule 3 for Sub-Task 1

The third dispatching rule for sub-task 1 involves randomly selecting a job

J_{i}

and processing its next operation. The procedure is expressed in Algorithm 3.

Algorithm 3: Procedure for Dispatching Rule 3 in Sub-task 1

Input: Current time

T_{c u r}

Set of unfinished jobs

U C_{j o b}

1: Randomly select a job i from $U C_{j o b}$ ;
2: Set the next operation of job i as the current processing operation;
3: Update the operation number for job i:

$O P_{i} = O P_{i} + 1$

Output: Selected job i and updated operation number

O P_{i}

5.2.4. Scheduling Rule 1 for Sub-Task 2

After selecting the operation to be processed, a set of available machines from its compatible machine set

M_{i, j}

, which are neither broken down nor undergoing tool changes, can be identified. The goal of the second sub-task is to allocate the operation

O_{i, j}

to a suitable machine

M_{k}

. Rule 1 selects the machine that can provide service the earliest for operation

O_{i, j}

, thereby enhancing the average machine utilization rate

U_{a v e}

. The detailed procedure is illustrated in Algorithm 4.

Algorithm 4: Procedure for Dispatching Rule 1 in Sub-task 2

5.2.5. Scheduling Rule 2 for Sub-Task 2

The second scheduling rule of sub-task 2 builds on the first rule by combining the processing time

t_{i, j, k}

of operation

O_{i, j}

on machine

M_{k}

with the earliest available time. This rule takes both the availability time and processing capability into account. The process is outlined in Algorithm 5.

Algorithm 5: Procedure for Dispatching Rule 2 in Sub-task 2

5.2.6. Scheduling Rule 3 for Sub-Task 2

To enhance the robustness of the scheduling process and avoid local optima, a random selection rule is recommended as the third rule for sub-task 2. The step-by-step procedure can be found in Algorithm 6.

Algorithm 6: Procedure for Dispatching Rule 3 in Sub-task 2

Input: Current time

T_{c u r}

Operation

O_{i, j}

Set of compatible machines

M_{i, j}

Output: Selected machine

M_{k}

By combining the rules of sub-task 1 and sub-task 2, nine composite scheduling rules are generated. This design enhances the flexibility and diversity of the scheduling strategies. The job sequencing and machine allocation rules address multiple scheduling needs, such as job delays, machine availability, and urgency, while also handling the impact of six dynamic events on the system. These combined rules form various scheduling strategies, allowing the system to adapt to complex production scenarios and improving its overall adaptability.

Although the rule set contains only nine composite rules, they effectively cover the main production scheduling needs. First, rules 1 and 2 prioritize jobs with higher delays and urgency, ensuring timely scheduling and proper prioritization. They address the conflict between job completion rate and deadlines, especially when dynamic events, such as job insertion or modification occur, enabling quick strategy adjustments. Second, rules 1 and 2 consider machine availability and job processing time to select the most suitable machines. This improves machine utilization (

U_{a v e}

) and reduces production bottlenecks, especially when dynamic events like machine breakdowns or tool changes affect the system, ensuring optimal resource allocation.

5.3. Definition of Reward Function

In the deep Q-learning algorithm, when different actions

a_{t}

are executed in state

s_{t}

, the agent transitions to a new state

s_{t + 1}

, and this process generates a corresponding reward

r_{t}

. The magnitude of the reward

r_{t}

directly influences the agent’s strategy selection, determining how it will choose the next action in subsequent states. To achieve multi-objective optimization, this study uses a Dual-Level Integrated Deep Q-Network structure and switches between different reward mechanisms. However, using makespan as the optimization goal leads to a sparse reward problem in reinforcement learning. In DMOFJSP, makespan is only determined once all operations for each job are completed. This means that the agent cannot obtain immediate feedback on makespan during the early stages of scheduling. The agent only receives a final reward based on the maximum completion time when all jobs are finished. Before that, there is little reward signal related to makespan optimization, making it difficult to learn effective strategies. To address this, the study uses job tardiness rate (

T R_{a v e}

) and machine utilization rate (

U_{a v e}

) as the main reward mechanisms. These two objectives provide timely feedback after each decision, helping the agent make better choices faster and improving the scheduling strategy, which indirectly helps reduce makespan.

Specifically, the high-level IDQN first receives the state features as input, and based on this input, it outputs different goal values

g \in 1, 2

. When the high-level IDQN selects a goal value g, the low-level IDQN chooses distinct reward algorithms depending on the selected goal. If

g = 1

, the reward algorithm is based on the average machine utilization rate

U_{a v e}

, as defined in Equations (13) and (16). If

g = 2

, the reward algorithm focuses on the average job processing tardiness rate

T R_{a v e}

, defined in Equations (14) and (20). The detailed execution process of these two distinct reward algorithms is outlined in the pseudocode in Algorithm 7.

Algorithm 7: Reward Calculation Based on

g = 1

or

g = 2

This multi-objective optimization approach enables the agent to flexibly choose different reward mechanisms, allowing it to balance machine utilization and job tardiness rates in complex dispatching tasks, ultimately optimizing overall scheduling performance.

5.4. Local Search Algorithm

This study uses a unique local search algorithm to determine the processing time interval for operation

O_{i, j}

on machine

M_{k}

. The algorithm focuses on selecting idle time intervals for local optimization, which is essential for solving the DMOFJSP. This problem involves dynamic events, such as job insertions and machine breakdowns, making the scheduling process complex and uncertain. Idle time intervals, which represent machine downtime, provide a flexible optimization space. By scheduling jobs within these intervals, machine utilization is maximized, resource wastage is avoided, and scheduling conflicts caused by unexpected events are reduced. The local optimization targets the best idle time intervals, ensuring proper job sequencing and resource allocation. This reduces computational load and speeds up the optimization process. Overall, this method enhances the system’s adaptability and stability in dynamic environments, effectively addressing varying scheduling needs while optimizing objectives like minimizing makespan, maximizing

U_{a v e}

, and minimizing

T R_{a v e}

.

The search process is detailed in Algorithm 8. We create an idle time interval list

I T I L_{k}

for machine

M_{k}

, consisting of

I_{k}

idle time intervals, denoted as

I T I_{1}, I T I_{2}, \dots, I T I_{I_{k}}

. Next, we search through the list

I T I L_{k}

for idle time intervals that satisfy the following conditions: we search for intervals where the end time

E_{I T I}

is later than the end time

E_{O_{i, j - 1}}

of operation

O_{i, j - 1}

, and the duration is long enough to accommodate the processing time of

O_{i, j}

. Once a suitable idle time interval is found, we schedule operation

O_{i, j}

within that interval. During scheduling, we ensure that the start time

S_{I T I}

of the idle time interval is no earlier than the availability time

A_{T_{k}}

of machine

M_{k}

, and the end time

E_{O_{i, j - 1}}

of the previous operation

O_{i, j - 1}

is later than the start time

S_{T_{i}}

of the job. These time constraints are crucial for ensuring the feasibility and effectiveness of the scheduling plan.

Algorithm 8: Local Search Process

5.5. The Network of the DLIDQN

The proposed DLIDQN framework integrates multiple network structures, consisting of two IDQNs. Each IDQN is primarily composed of a dueling DQN model and includes noisy layers to enhance exploration capabilities. The higher-level IDQN model consists of three hidden layers, each with ten nodes. It takes a 6-dimensional state feature as input and outputs a Q-value that combines the state value and advantage function, where the state value branch outputs one node, and the advantage function branch outputs two nodes, corresponding to the two high-level optimization goals (i.e., the two forms of reward functions). The lower-level IDQN model is more complex, containing seven hidden layers, each with 50 nodes. With a 7-dimensional state feature (six state features and one high-level goal) as input, it calculates the final Q-value by combining the state value and the advantage function. The advantage function branch produces nine nodes, each corresponding to one of the nine composite scheduling rules. Both models use the ReLU activation function and are optimized with the mean square error (MSE) loss function and the Adam optimizer. At each rescheduling event, scheduling rules are executed to generate a new state, and a reward algorithm is selected based on the goal values to obtain corresponding rewards. The state, action, reward, new state, reward function targets, and completion indicator are stored for prioritized experience replay to update the network parameters. Figure 1 illustrates the structure of the proposed DLIDQN and the overall algorithm process.

6. Numerical Experiments

A detailed account of the numerical experiments used to assess the effectiveness and generalization of the DLIDQN framework is provided in this section, along with an analysis of the results. We first introduce the parameter settings of the dataset and algorithms used during the training and testing process. After completing the training, we evaluate the DLIDQN model against each composite dispatching rule introduced in this study, as well as commonly adopted classic dispatching rules. To further highlight the strengths of the proposed model, we compare DLIDQN with three other deep reinforcement learning algorithms that utilize the same action space of nine composite dispatching rules. Through experiments and visualizations, the high performance and strong generalization capability of the DLIDQN model are confirmed.

6.1. Training Process

This section explains the model training process, which is carried out over 200 runs using randomly generated data. To ensure the model can adapt to different workshop configurations, the initial number of jobs, machines, operations per job, and new job insertions are varied randomly during each training session. New jobs arrive according to a Poisson distribution, with the time between consecutive job insertions following an exponential distribution

exp (1 / λ)

. Other environmental parameters are also randomly updated within a specified range. The instance parameters used during training are listed in Table 3. The number of machines available for each operation is randomly generated, with a maximum of

(M_{n u m} - 2)

to ensure at least two machines are always available.

Figure 2 illustrates the training logic flow of the DLIDQN model. During the training process, the dual IDQN models collaborate to optimize performance. First, the high-level model receives the state feature

S_{t}

and generates the target value g, then combines

S_{t}

and g to create a new state

S^{'} t

, which is passed to the low-level model. The low-level model selects action

A_{t}

based on

S^{'} t

, applies the scheduling rules, and uses a local search algorithm to improve the results. Next, the new state

S_{t + 1}

is calculated, and the reward

R_{t + 1}

is obtained. If all jobs are scheduled, the final results are output, marking the end of the training. Otherwise, the tuple

(S_{t}, A_{t}, g, R_{t + 1}, S_{t + 1}, d o n e)

is saved in the replay buffer. After the defined step count is reached, the prioritized experience replay mechanism is used to update the network parameters, continuing the process until all jobs are scheduled.

Table 4 lists the parameter settings for the training algorithm. Most of these parameters are based on pre-experimental results, while some use common default values from deep reinforcement learning algorithms. Specifically, the buffer capacity is set to 2000. When the capacity is exceeded, old data will be overwritten to make space for new data. The learning rate (

η

) is set to 0.001, balancing convergence speed and model stability. Pre-experimental results show that a high learning rate can cause instability, while a low learning rate slows convergence, making it hard to achieve good results in a limited number of iterations. The target network update frequency (C) is set to 200 steps. Regular updates help stabilize the training process and reduce fluctuations. The experiments also found that too low of an update frequency causes significant fluctuations, while too high of a frequency can slow down convergence.

A test instance containing 16 machines, with

E_{a v e}

set to 70 and 40 new job insertions, was prepared to validate the DLIDQN model obtained after each training round. During testing, the curve of makespan for each training round and the average makespan over every 20 training rounds are shown in Figure 3. The graph shows the trend of average makespan achieved by DLIDQN during each training epoch. The green “Makespan” curve represents the makespan for each epoch, with some fluctuations. The red “Average Makespan Obtained Every 20 Training Epochs” line demonstrates a steady decline as training progresses.

In the first 100 epochs, the makespan decreases quickly, indicating that the model is learning the key features of the task. Between 100 and 200 epochs, the decline slows, suggesting that the model is entering a fine-tuning phase. After 200 epochs, the makespan stabilizes, and further training does not significantly improve performance.

Comparing the performance across different epochs shows that there is still room for improvement after 100 epochs. However, after 200 epochs, the makespan reaches its lowest point. The small performance gap between 150 and 200 epochs indicates that 200 epochs are enough to cover the task’s complexity and achieve effective learning. Moreover, the use of diverse training instances ensures the model’s ability to generalize and perform well in different production environments. Therefore, 200 training epochs are sufficient for the model to reach optimal training and provide strong scheduling performance.

6.2. Testing Settings

After completing model training, the evaluation phase begins, with all test samples generated randomly. Table 5 shows the specific parameters for each test sample. The initial number of jobs is set to 20, 30, or 40, with the number of operations per job assigned arbitrarily and processing times for each operation determined randomly. The number of machines is initialized to 8, 12, or 16 units, and the number of dynamic events is selected randomly from 30, 50, or 70, with event types also allocated at random. Other parameters related to jobs and machines follow the same setup as in the training stage. A total of 27 distinct test configurations were established, each tested 20 times independently. These test configurations cover different task scales, machine numbers, and dynamic event distributions, validating the robustness of the DLIDQN model in complex dynamic environments. The results show that despite significant randomness and uncertainty in the input parameters, the model consistently produces high-quality scheduling results. This demonstrates that the DLIDQN model can adapt to changes in task scale, processing time fluctuations, and disruptions from dynamic events. This robustness highlights the model’s wide applicability and reliability in real-world production environments.

The testing was implemented in Python, running in a Python 3.8 environment on a computer with an Intel i7-9750H processor, 24 GB of RAM, and a GTX 1650Ti graphics card. In this setup, we compared the performance of our DLIDQN algorithm against the baseline algorithms using the same set of test cases.

6.3. Comparisons with the Proposed Composite Dispatching Rules

To assess the performance of DLIDQN, this study conducted 20 independent experiments on each of 27 test instances with various parameter configurations, comparing DLIDQN with the proposed nine composite scheduling rules. During the experiments, we calculated the mean and standard deviation of makespan,

U_{a v e}

, and

T R_{a v e}

across 20 runs for each instance and summarized the results in Table 6, Table 7 and Table 8, where the best performance values are highlighted in bold. To more intuitively present the data in the tables, we created corresponding line charts (Figure 4, Figure 5 and Figure 6) to compare the performance of single composite rules and DLIDQN across different scenarios. These line charts clearly illustrate DLIDQN’s advantages in key metrics such as average makespan, average utilization, and average turnaround time, further demonstrating its superior performance under various parameter configurations.

The results show that DLIDQN, with its action space covering nine scheduling rules, can flexibly select the best rules for composite scheduling. This allows it to outperform most single scheduling rules in various production environments. In comparisons with the random strategy (Rule 9), DLIDQN performs better in nearly all test cases. However, in some scenarios, its performance is slightly lower than certain single rules. For example, in the

U_{a v e}

metric, DLIDQN falls short of Rule 1 in six instances. Rule 1 prioritizes urgent jobs by calculating their completion rate

C R J_{i}

and urgency coefficient. It also selects machines that can start jobs earlier, minimizing idle time. Similarly, DLIDQN’s

T R_{a v e}

is slightly worse than Rules 4 and 7 in some cases, as these rules also prioritize urgent jobs. Single scheduling rules are simple and clear, requiring fewer calculations. This makes them faster and more efficient in certain test cases, avoiding the complexity and computational costs of DLIDQN while achieving better performance in specific scenarios.

DLIDQN may require more computation and strategy exploration during learning, which can lead to suboptimal performance in some cases. However, no single scheduling rule can consistently achieve optimal results across various production environments. This demonstrates DLIDQN’s strong adaptability and excellent performance when handling untrained production configurations. Figure 7 shows the three-dimensional Pareto front for DLIDQN and composite scheduling rules in three test cases, further proving the superiority of the DLIDQN model.

6.4. Comparisons with Classic Dispatching Rules

We compared DLIDQN with five widely recognized classic dispatching rules to more comprehensively validate its superiority, followed by a detailed explanation of each rule.

First In First Out (FIFO): Decide the next operation to start based on the arrival time of jobs, selecting the next operation for the job that arrived first.
Earliest Due Date (EDD): Sort jobs by their due dates and choose the next operation for the job with the earliest due date to ensure timely delivery.
Shortest Processing Time (SPT): Schedule based on the estimated shortest time required to complete each job, selecting the next operation for the job with the shortest processing time.
Longest Processing Time (LPT): Opposite to SPT, choose the job with the longest processing time, typically used for optimizing resource allocation to expedite the processing of tasks that have been waiting longer.
Most Remaining Processing Time (MRT): Consider jobs with the most remaining processing time (those with the longest remaining time) to balance the overall schedule.

It should be noted that the aforementioned classic scheduling strategies lack explicit machine assignments when handling job allocations, making them difficult to adapt to the flexible job shop scheduling problem (FJSP) addressed in this paper. To overcome this shortcoming, we established a supplementary machine assignment principle: for each job, we select the earliest available machine from its compatible set for job allocation, allowing for a fair performance comparison between these classic strategies and our method.

We independently conducted 20 experiments for each test instance, and the results are summarized in Table 9, Table 10 and Table 11. These tables provide the averages and standard deviations for three key performance indicators: makespan, average machine utilization (

U_{a v e}

), and average tardiness rate (

T R_{a v e}

). The optimal values are highlighted in bold. The line charts corresponding to the tables are shown in Figure 8, Figure 9 and Figure 10. After analyzing the data, we found that our algorithm performs better than traditional scheduling strategies in most test cases across all three performance metrics. Although in some scenarios, the DLIDQN framework performs slightly worse than EDD in terms of

T R_{a v e}

, this is because EDD prioritizes jobs with the earliest due dates, effectively reducing tardiness risk. In cases where due dates are closely distributed or many jobs are urgent, EDD’s focus on due dates helps lower the average tardiness rate. In comparison, the framework uses dynamic rule selection to optimize multiple objectives, so it does not always prioritize the most urgent jobs. However, it achieves the best performance in most instances. In certain specific cases, the difference in

T R_{a v e}

between the framework and the optimal value is small. Therefore, the experimental results show that the proposed composite scheduling rules are effective in reducing tardiness and improving machine utilization.

Furthermore, as shown in Figure 11, the Pareto optimal front clearly highlights DLIDQN’s advantage over other well-known dispatching rules in a range of representative instances.

6.5. Comparisons with Other RL-Based Scheduling Algorithms

In order to demonstrate the superiority of our method for multi-objective optimization tasks, we carried out a comprehensive comparison, evaluating DLIDQN against three mainstream reinforcement learning-based training algorithms. These three algorithms are DQN (the classic Deep Q-Network), dueling double DQN (D3QN, an improved architecture of double DQN), and HDMDDQN, a novel two-layer deep double Q-network training method proposed by Wang et al. (2022) [43].

DQN, as the control group, features a simple design, relying on a single RL agent for rule selection without higher-level scheduling objectives. Details of its reward mechanism are provided in Algorithm 9. For fairness, all methods share the same action space as DLIDQN, consisting of nine composite scheduling rules. We compared the means and standard deviations of makespan,

U_{a v e}

, and

T R_{a v e}

across various production environments. The results are presented in Table 12, Table 13 and Table 14. The optimal values are highlighted in bold. The line charts corresponding to the tables are shown in Figure 12, Figure 13 and Figure 14.

Algorithm 9: DQN Reward Algorithm

The comparison results show that DLIDQN demonstrates a significant advantage over other RL training methods in the vast majority of test instances. Through performance comparisons with the single-layer DQN, we convincingly prove that the proposed two-layer structure is not only indispensable but also exhibits high effectiveness in practical applications. The high-level DQN component of DLIDQN intelligently selects the optimal optimization objective at different rescheduling points, thereby achieving an excellent balance among makespan,

U_{a v e}

, and

T R_{a v e}

. Additionally, compared to the other two advanced algorithms with a dual-layer structure, our proposed DLIDQN model, which integrates multiple network structures and a prioritized experience replay mechanism, also displays significant performance advantages, further highlighting the exceptional capabilities of DLIDQN. In Figure 15, the performance of DLIDQN and various RL-based methods is evaluated across multiple representative instances for three objectives. It is clear that DLIDQN outperforms the others in all three objectives. Moreover, a view of Figure 16 indicates that our method remains at the forefront of the Pareto optimal front. The results from the Pareto chart and other graphs show that DLIDQN performs well in reducing completion time, improving resource utilization, and lowering delay rates. This proves its advantage in single-objective optimization and also demonstrates its strong performance in multi-objective optimization, highlighting its efficiency and broad applicability in practical scenarios.

7. Conclusions

This paper presents the Dual-Level Integrated Deep Q-Network (DLIDQN) algorithm, which provides an effective solution to the dynamic multi-objective flexible job shop scheduling problem (DMOFJSP), involving six dynamic events: new job insertions, job cancellation, job operation modification, machine addition, machine tool replacement and machine breakdown. The algorithm implements goal-oriented scheduling through dual IDQN agents. At rescheduling moments, the high-level IDQN dynamically generates an optimization target for the low-level IDQN based on the current state features. The low-level IDQN, guided by this goal and considering the state features, executes the most suitable scheduling rules to ensure the achievement of the specified goals. In addition, DLIDQN integrates the benefits of diverse network architectures, resulting in powerful optimization capabilities. By designing two reward algorithms and six essential state features, as well as developing nine composite scheduling rules, the algorithm demonstrates outstanding performance in optimizing makespan, average machine utilization (

U_{a v e}

), and average job processing tardiness rate (

T R_{a v e}

).

To comprehensively assess the performance and generalization ability of DLIDQN, a series of extensive numerical experiments were conducted across various shop configurations. The results indicate that DLIDQN not only outperforms all the proposed composite scheduling rules but also surpasses existing classical scheduling rules and other reinforcement learning-based scheduling methods. Notably, DLIDQN performs exceptionally well in untrained production environments, demonstrating its remarkable algorithmic performance and adaptability.

Future research will focus on exploring the application of advanced policy-based algorithms, such as actor–critic (AC) and Proximal Policy Optimization (PPO), to dynamic multi-objective flexible job shop scheduling problems (DMOFJSP). Additionally, we plan to investigate multi-agent reinforcement learning (MARL) to analyze how multiple agents can achieve their goals through interactive learning in a shared environment. By integrating these insights with policy-based training methods, we aim to further enhance the performance of scheduling algorithms in complex dynamic environments.

Author Contributions

Conceptualization, H.X., J.Z., L.H. and J.T.; methodology, H.X., J.Z. and C.Z.; software, H.X., J.Z. and J.T.; validation, H.X., J.Z. and L.H.; formal analysis, J.Z. and C.Z.; writing—original draft preparation, H.X. and J.Z.; writing—review and editing, L.H., J.T. and C.Z.; visualization, H.X. and J.Z.; supervision, L.H., J.T. and C.Z.; project administration, H.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Garey, M.R.; Johnson, D.S.; Sethi, R. The Complexity of Flowshop and Jobshop Scheduling. Math. Oper. Res. 1976, 1, 117–129. [Google Scholar] [CrossRef]
Ouelhadj, D.; Petrovic, S. A survey of dynamic scheduling in manufacturing systems. J. Sched. 2009, 12, 417–431. [Google Scholar] [CrossRef]
Lou, P.; Liu, Q.; Zhou, Z.; Wang, H.; Sun, S.X. Multi-agent-based proactive–reactive scheduling for a job shop. Int. J. Adv. Manuf. Technol. 2012, 59, 311–324. [Google Scholar] [CrossRef]
Pezzella, F.; Morganti, G.; Ciaschetti, G. A genetic algorithm for the Flexible Job-shop Scheduling Problem. Comput. Oper. Res. 2008, 35, 3202–3212. [Google Scholar] [CrossRef]
Kundakcı, N.; Kulak, O. Hybrid genetic algorithms for minimizing makespan in dynamic job shop scheduling problem. Comput. Ind. Eng. 2016, 96, 31–51. [Google Scholar] [CrossRef]
Ning, T.; Huang, M.; Liang, X.; Jin, H. A novel dynamic scheduling strategy for solving flexible job-shop problems. J. Ambient Intell. Humaniz. Comput. 2016, 7, 721–729. [Google Scholar] [CrossRef]
Cruz-Chávez, M.A.; Martínez-Rangel, M.G.; Cruz-Rosales, M.H. Accelerated simulated annealing algorithm applied to the flexible job shop scheduling problem. Int. Trans. Oper. Res. 2017, 24, 1119–1137. [Google Scholar] [CrossRef]
Zhu, Z.; Zhou, X. An efficient evolutionary grey wolf optimizer for multi-objective flexible job shop scheduling problem with hierarchical job precedence constraints. Comput. Ind. Eng. 2020, 140, 106280. [Google Scholar] [CrossRef]
El Khoukhi, F.; Boukachour, J.; Alaoui, A.E.H. The “Dual-Ants Colony”: A novel hybrid approach for the flexible job shop scheduling problem with preventive maintenance. Comput. Ind. Eng. 2017, 106, 236–255. [Google Scholar] [CrossRef]
Zhang, S.; Li, X.; Zhang, B.; Wang, S. Multi-objective optimisation in flexible assembly job shop scheduling using a distributed ant colony system. Eur. J. Oper. Res. 2020, 283, 441–460. [Google Scholar] [CrossRef]
Poppenborg, J.; Knust, S.; Hertzberg, J. Online scheduling of flexible job-shops with blocking and transportation. Eur. Ind. Eng. 2012, 6, 497–518. [Google Scholar] [CrossRef]
Mohan, J.; Lanka, K.; Rao, A.N. A Review of Dynamic Job Shop Scheduling Techniques. Procedia Manuf. 2019, 30, 34–39. [Google Scholar] [CrossRef]
Baker, K.R. Sequencing Rules and Due-Date Assignments in a Job Shop. Manag. Sci. 1984, 30, 1093–1104. [Google Scholar] [CrossRef]
Nie, L.; Gao, L.; Li, P.; Li, X. A GEP-based reactive scheduling policies constructing approach for dynamic flexible job shop scheduling problem with job release dates. J. Intell. Manuf. 2013, 24, 763–774. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Aydin, M.; Öztemel, E. Dynamic job-shop scheduling using reinforcement learning agents. Robot. Auton. Syst. 2000, 33, 169–178. [Google Scholar] [CrossRef]
Bouazza, W.; Sallez, Y.; Beldjilali, B. A distributed approach solving partially flexible job-shop scheduling problem with a Q-learning effect. IFAC-PapersOnLine 2017, 50, 15890–15895. [Google Scholar] [CrossRef]
Wei, Y.; Zhao, M. Composite rules selection using reinforcement learning for dynamic job-shop scheduling. In Proceedings of the IEEE Conference on Robotics, Automation and Mechatronics, Singapore, 1–3 December 2004; Volume 2, pp. 1083–1088. [Google Scholar]
Wang, Y.C.; Usher, J.M. Learning policies for single machine job dispatching. Robot. Comput.-Integr. Manuf. 2004, 20, 553–562. [Google Scholar] [CrossRef]
Chen, R.; Yang, B.; Li, S.; Wang, S. A self-learning genetic algorithm based on reinforcement learning for flexible job-shop scheduling problem. Comput. Ind. Eng. 2020, 149, 106778. [Google Scholar] [CrossRef]
Li, Y. Deep Reinforcement Learning: An Overview. arXiv 2017, arXiv:1701.07274. [Google Scholar]
Wang, L.; Pan, Z.; Wang, J. A Review of Reinforcement Learning Based Intelligent Optimization for Manufacturing Scheduling. Complex Syst. Model. Simul. 2021, 1, 257–270. [Google Scholar] [CrossRef]
Rafati, J.; Noelle, D.C. Learning representations in model-free hierarchical reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 10009–10010. [Google Scholar]
Li, A.C.; Florensa, C.; Clavera, I.; Abbeel, P. Sub-policy adaptation for hierarchical reinforcement learning. arXiv 2019, arXiv:1906.05862. [Google Scholar]
Wei, Y.; Zhao, M. A reinforcement learning-based approach to dynamic job-shop scheduling. Acta Autom. Sin. 2005, 31, 765. [Google Scholar]
Zhang, Z.; Zheng, L.; Weng, M.X. Dynamic parallel machine scheduling with mean weighted tardiness objective by Q-Learning. Int. J. Adv. Manuf. Technol. 2007, 34, 968–980. [Google Scholar] [CrossRef]
Chen, X.; Hao, X.; Lin, H.W.; Murata, T. Rule driven multi objective dynamic scheduling by data envelopment analysis and reinforcement learning. In Proceedings of the 2010 IEEE International Conference on Automation and Logistics, Hong Kong, China, 16–20 August 2010; pp. 396–401. [Google Scholar] [CrossRef]
Shahrabi, J.; Adibi, M.A.; Mahootchi, M. A reinforcement learning approach to parameter estimation in dynamic job shop scheduling. Comput. Ind. Eng. 2017, 110, 75–82. [Google Scholar] [CrossRef]
Shiue, Y.R.; Lee, K.C.; Su, C.T. Real-time scheduling for a smart factory using a reinforcement learning approach. Comput. Ind. Eng. 2018, 125, 604–614. [Google Scholar] [CrossRef]
Wang, Y.F. Adaptive job shop scheduling strategy based on weighted Q-learning algorithm. J. Intell. Manuf. 2020, 31, 417–432. [Google Scholar] [CrossRef]
Waschneck, B.; Reichstaller, A.; Belzner, L.; Altenmüller, T.; Bauernhansl, T.; Knapp, A.; Kyek, A. Optimization of global production scheduling with deep reinforcement learning. Procedia CIRP 2018, 72, 1264–1269. [Google Scholar] [CrossRef]
Altenmüller, T.; Stüker, T.; Waschneck, B.; Kuhnle, A.; Lanza, G. Reinforcement learning for an intelligent and autonomous production control of complex job-shops under time constraints. Prod. Eng. 2020, 14, 319–328. [Google Scholar] [CrossRef]
Luo, S. Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning. Appl. Soft Comput. 2020, 91, 106208. [Google Scholar] [CrossRef]
Luo, S.; Zhang, L.; Fan, Y. Dynamic multi-objective scheduling for flexible job shop by deep reinforcement learning. Comput. Ind. Eng. 2021, 159, 107489. [Google Scholar] [CrossRef]
Luo, S.; Zhang, L.; Fan, Y. Real-time scheduling for dynamic partial-no-wait multiobjective flexible job shop by deep reinforcement learning. IEEE Trans. Autom. Sci. Eng. 2022, 19, 3020–3038. [Google Scholar] [CrossRef]
Li, Y.; Gu, W.; Yuan, M.; Tang, Y. Real-time data-driven dynamic scheduling for flexible job shop with insufficient transportation resources using hybrid deep Q network. Robot. Comput.-Integr. Manuf. 2022, 74, 102283. [Google Scholar] [CrossRef]
Zhao, L.; Fan, J.; Zhang, C.; Shen, W.; Zhuang, J. A DRL-Based Reactive Scheduling Policy for Flexible Job Shops with Random Job Arrivals. IEEE Trans. Autom. Sci. Eng. 2023, 21, 2912–2923. [Google Scholar] [CrossRef]
Wu, Z.; Fan, H.; Sun, Y.; Peng, M. Efficient Multi-Objective Optimization on Dynamic Flexible Job Shop Scheduling Using Deep Reinforcement Learning Approach. Processes 2023, 11, 2018. [Google Scholar] [CrossRef]
Bellman, R. A Markovian decision process. J. Math. Mech. 1957, 6, 679–684. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with Deep Reinforcement Learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
Hasselt, H. Double Q-learning. In Advances in Neural Information Processing Systems 23, Proceedings of the 24th Annual Conference on Neural Information Processing Systems 2010, Vancouver, BC, Canada, 6–9 December 2010; Lafferty, J., Williams, C., Shawe-Taylor, J., Zemel, R., Culotta, A., Eds.; Neural Information Processing Systems Foundation, Inc. (NeurIPS): San Diego, CA, USA, 2010. [Google Scholar]
Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar] [CrossRef]
Wang, H.; Cheng, J.; Liu, C.; Zhang, Y.; Hu, S.; Chen, L. Multi-objective reinforcement learning framework for dynamic flexible job shop scheduling problem with uncertain events. Appl. Soft Comput. 2022, 131, 109717. [Google Scholar] [CrossRef]

Figure 1. The structure of DLIDQN and the overall algorithm process.

Figure 2. Dynamic multi-objective scheduling workflow of DLIDQN.

Figure 3. Average makespan achieved by DLIDQN at each training epoch. The green “Makespan” curve represents the makespan recorded for each epoch. The red “Average makespan obtained every 20 training epochs” line indicates the average makespan over every 20 epochs.

Figure 4. Average makespan comparison across scenarios with single composite rules vs. DLIDQN.

Figure 5. Average machine utilization rate comparison across scenarios with single composite rules vs. DLIDQN.

Figure 6. Average job tardiness rate comparison across scenarios with single composite rules vs. DLIDQN.

Figure 7. Pareto fronts obtained by DLIDQN and the proposed composite dispatching rules after 20 runs on three representative instances.

Figure 8. Average makespan comparison across scenarios with classic dispatching rules vs. DLIDQN.

Figure 9. Average machine utilization rate comparison with classic dispatching rules vs. DLIDQN.

Figure 10. Average job tardiness rate comparison across scenarios with classic dispatching rules vs. DLIDQN.

Figure 11. Pareto fronts obtained by DLIDQN and well-known classic dispatching rules after 20 runs on three representative instances.

Figure 12. Average makespan comparison across scenarios with other RL-based algorithms vs. DLIDQN.

Figure 13. Average machine utilization rate comparison across scenarios with other RL-based algorithms vs. DLIDQN.

Figure 14. Average job tardiness rate comparison across scenarios with other RL-based algorithms vs. DLIDQN.

Figure 15. The average performance of DLIDQN and other RL-based algorithms across three objectives was calculated for three representative instances.

Figure 16. Pareto fronts obtained by DLIDQN and other RL-based algorithms after 20 runs on three representative instances.

Table 1. Existing reinforcement learning-based research on the dynamic job shop scheduling problem.

Work	Algorithm	Problem	Objective	Dynamic Events
Aydin and Oztemel (2000) [16]	Q-learning	Dynamic job scheduling	Mean tardiness	New job insertions
Bouazza et al. (2017) [17]	Q-learning	Dynamic multi-objective flexible job scheduling	Makespan; Total weighted completion time; Weighted average waiting time	New job insertions
Yingzi and Mingyang (2004) [18]	Q-learning	Dynamic job scheduling	Mean tardiness	New job insertions
Wang (2020) [30]	Q-learning	Dynamic job scheduling	Earliness punishment; Tardiness punishment	New job insertions
Zhang et al. (2007) [26]	Q-learning	Dynamic parallel-machine scheduling	Mean weighted tardiness	Sequence-dependent setup times; Machine job qualification
Chen et al. (2010) [27]	Q-learning	Dynamic multi-objective job scheduling	Mean tardiness; Mean flow time	Fluctuation of work in process
Shahrabi et al. (2017) [28]	Q-learning	Dynamic job scheduling	Mean flow time	New job insertions; Machine breakdowns
Waschneck et al. (2018) [31]	Deep Q-learning	Dynamic flexible job scheduling	Uptime utilization	Machine breakdowns
Altenmüller et al. (2020) [32]	Deep Q-learning	Dynamic job scheduling	Count of violation events	New job insertions; Machine breakdowns
Luo (2020) [33]	DQN	Dynamic multi-objective flexible job scheduling	Total tardiness	New job insertions
Luo et al. (2021) [34]	THDQN	Dynamic flexible job scheduling	Total weighted tardiness rate; Average machine utilization	New job insertions
Luo et al. (2021b) [35]	HMAPPO	Dynamic partial-no-wait multi-objective flexible job shop scheduling (DMOFJSP-PNW)	Total weighted tardiness; Average machine utilization rate; Variance of machine workload	New job insertions; Machine breakdowns
Li et al. (2022) [36]	HDQN	Dynamic multi-objective flexible job scheduling	Completion time; Total energy	Insufficient transportation resources
Wu et al. (2023) [38]	Dual layer DDQN	Dynamic multi-objective flexible job scheduling	Maximum completion time; Total tardiness	Random workpiece arrivals

Table 2. List of relevant notations.

Parameter	Definition
n	The total number of jobs
m	The total number of machines
$J_{i}$	The i-th job
$O_{i, j}$	The j-th operation of job $J_{i}$
$n_{i}$	The number of operations for job $J_{i}$
$M_{k}$	The k-th machine
$M_{i, j}$	The set of compatible machines for operation $O_{i, j}$
$t_{i, j, k}$	The processing time of operation $O_{i, j}$ on machine $M_{k}$
$A_{i}$	The arrival time of job $J_{i}$
$U r_{i}$	The urgency of job $J_{i}$ is classified into three levels: level 1 being the most urgent, and level 3 being the least. Jobs with higher urgency are assigned shorter due dates
$D D T$	Due date tightness (DDT) is an indicator that measures the relationship between a job’s due date and its estimated completion time. A lower DDT value indicates a tighter due date, leaving less time to complete the task, whereas a higher DDT value suggests that the job has more flexibility in completing the task
$D_{i}$	The due date of job $J_{i}$
$O P_{i}$	The number of operations completed for the job $J_{i}$ at the current time
$X_{i, j, k}$	Determine whether the operation $O_{i, j}$ is being processed on machine $M_{k}$ . If $O_{i, j}$ is assigned to machine $M_{k}$ , then $X_{i, j, k} = 1$ ; otherwise, $X_{i, j, k} = 0$
$Y_{i, j, h, g}$	Determine the relative priority between operations $O_{i, j}$ and $O_{h, g}$ . If $O_{i, j}$ is a predecessor of $O_{h, g}$ , then $Y_{i, j, h, g} = 1$ ; if $O_{i, j}$ is a successor of $O_{h, g}$ , then $Y_{i, j, h, g} = - 1$
$C_{i, j}$	The completion time of operation $O_{i, j}$
$U_{k}$	The utilization of machine $M_{k}$
$T r_{i}$	The processing tardiness rate of job $J_{i}$

Table 3. Parameter settings for various production configurations in the training process.

Parameter	Value
The total number of initial jobs	Unif [1, 20]
The total number of machines	Unif [8, 16]
Number of operations belonging to a job	Unif [1, 20]
Available machines per operation	Unif [1, m⁻²]
Operation time on an available machine	Unif [1, 50]
Urgency degree $U r_{i}$ of each job	[1, 2, 3]
Due date tightness ( $D D T$ )	[0.7, 1.2, 1.7]
Mean value ( $E_{a v e}$ ) of the interarrival time $exp (1 / λ)$	[30, 50, 70]
Machine repair time	Unif [1, 99]
Tool replacement time	Unif [1, 50]

Table 4. Hyperparameter settings for the training algorithm.

Hyperparameter	Value
Number of training epochs (L)	20
Priority experience replay buffer size (N)	2000
Minibatch size to perform gradient descent (batch_size)	32
Controls the influence of sample prioritization ( $α$ )	0.6
Adjusts the correction level of importance sampling ( $β$ )	0.4
Epsilon in $ϵ$ -greedy action policy ( $ϵ$ )	Linear drop from 0.6 to 0.01
Target network update frequency (C)	200
Discount rate ( $γ$ )	0.95
Learning rate ( $η$ )	0.001
Optimizer	Adam

Table 5. Parameter settings for various production configurations in the testing process.

Parameter	Value
The total number of initial jobs	Unif [1, 20]
The total number of machines	[8, 12, 16]
Number of operations belonging to a job	Unif [1, 20]
Number of available machines of each operation	Unif [1, m⁻²]
Processing time of an operation on an available machine	Unif [1, 50]
Urgency degree $U r_{i}$ of each job	[1, 2, 3]
Due date tightness ( $D D T$ )	[0.7, 1.2, 1.7]
Mean value ( $E_{a v e}$ ) of the interarrival time $exp (1 / λ)$	[30, 50, 70]
The total number of new insert jobs	[20, 30, 40]
Machine repair time	Unif [1, 99]
Tool replacement time	Unif [1, 50]

Table 6. Analysis of the averages and standard deviations of makespan across 20 runs for single composite rules vs. DLIDQN.

n	m	e	Ours	rule1	rule2	rule3	rule4	rule5	rule6	rule7	rule8	rule9 (Random)
20	8	30	$1.635 \times 10^{3} / 7.285 \times 10^{1}$	$1.760 \times 10^{3} / 8.863 \times 10^{1}$	$1.956 \times 10^{3} / 1.091 \times 10^{2}$	$3.170 \times 10^{3} / 2.472 \times 10^{2}$	$1.947 \times 10^{3} / 9.476 \times 10^{1}$	$2.258 \times 10^{3} / 1.192 \times 10^{2}$	$4.365 \times 10^{3} / 3.926 \times 10^{2}$	$3.300 \times 10^{3} / 2.255 \times 10^{2}$	$3.528 \times 10^{3} / 1.585 \times 10^{2}$	$1.074 \times 10^{4} / 5.153 \times 10^{2}$
		50	$1.867 \times 10^{3} / 4.618 \times 10^{1}$	$1.989 \times 10^{3} / 5.903 \times 10^{1}$	$2.204 \times 10^{3} / 1.403 \times 10^{2}$	$3.283 \times 10^{3} / 2.414 \times 10^{2}$	$2.015 \times 10^{3} / 7.731 \times 10^{1}$	$2.409 \times 10^{3} / 1.035 \times 10^{2}$	$4.428 \times 10^{3} / 3.635 \times 10^{2}$	$3.184 \times 10^{3} / 1.990 \times 10^{2}$	$4.090 \times 10^{3} / 2.830 \times 10^{2}$	$1.099 \times 10^{4} / 5.016 \times 10^{2}$
		70	$1.937 \times 10^{3} / 5.959 \times 10^{1}$	$2.033 \times 10^{3} / 6.116 \times 10^{1}$	$2.295 \times 10^{3} / 1.025 \times 10^{2}$	$3.644 \times 10^{3} / 3.290 \times 10^{2}$	$2.215 \times 10^{3} / 1.224 \times 10^{2}$	$2.563 \times 10^{3} / 1.619 \times 10^{2}$	$4.803 \times 10^{3} / 3.720 \times 10^{2}$	$4.026 \times 10^{3} / 2.684 \times 10^{2}$	$4.210 \times 10^{3} / 2.481 \times 10^{2}$	$1.282 \times 10^{4} / 6.665 \times 10^{2}$
	12	30	$1.087 \times 10^{3} / 6.275 \times 10^{1}$	$1.253 \times 10^{3} / 3.809 \times 10^{1}$	$1.431 \times 10^{3} / 7.658 \times 10^{1}$	$2.570 \times 10^{3} / 2.763 \times 10^{2}$	$1.358 \times 10^{3} / 6.320 \times 10^{1}$	$1.668 \times 10^{3} / 1.043 \times 10^{2}$	$3.526 \times 10^{3} / 2.826 \times 10^{2}$	$2.196 \times 10^{3} / 8.715 \times 10^{1}$	$2.765 \times 10^{3} / 1.205 \times 10^{2}$	$9.697 \times 10^{3} / 5.170 \times 10^{2}$
		50	$1.220 \times 10^{3} / 4.861 \times 10^{1}$	$1.397 \times 10^{3} / 8.655 \times 10^{1}$	$1.699 \times 10^{3} / 1.828 \times 10^{2}$	$2.684 \times 10^{3} / 2.016 \times 10^{2}$	$1.563 \times 10^{3} / 1.071 \times 10^{2}$	$1.886 \times 10^{3} / 1.116 \times 10^{2}$	$3.824 \times 10^{3} / 3.419 \times 10^{2}$	$2.244 \times 10^{3} / 2.145 \times 10^{2}$	$2.885 \times 10^{3} / 1.982 \times 10^{2}$	$1.051 \times 10^{4} / 6.544 \times 10^{2}$
		70	$9.584 \times 10^{2} / 3.765 \times 10^{1}$	$1.220 \times 10^{3} / 7.093 \times 10^{1}$	$1.382 \times 10^{3} / 1.150 \times 10^{2}$	$2.198 \times 10^{3} / 1.623 \times 10^{2}$	$1.263 \times 10^{3} / 8.941 \times 10^{1}$	$1.601 \times 10^{3} / 1.058 \times 10^{2}$	$3.278 \times 10^{3} / 2.886 \times 10^{2}$	$1.954 \times 10^{3} / 8.076 \times 10^{1}$	$2.316 \times 10^{3} / 1.332 \times 10^{2}$	$8.610 \times 10^{3} / 5.791 \times 10^{2}$
	16	30	$7.884 \times 10^{2} / 5.579 \times 10^{1}$	$1.046 \times 10^{3} / 7.275 \times 10^{1}$	$1.191 \times 10^{3} / 7.147 \times 10^{1}$	$2.031 \times 10^{3} / 2.084 \times 10^{2}$	$9.994 \times 10^{2} / 6.505 \times 10^{1}$	$1.311 \times 10^{3} / 9.132 \times 10^{1}$	$2.951 \times 10^{3} / 3.290 \times 10^{2}$	$1.430 \times 10^{3} / 7.051 \times 10^{1}$	$1.902 \times 10^{3} / 1.376 \times 10^{2}$	$8.592 \times 10^{3} / 6.833 \times 10^{2}$
		50	$7.910 \times 10^{2} / 2.519 \times 10^{1}$	$1.072 \times 10^{3} / 6.334 \times 10^{1}$	$1.309 \times 10^{3} / 1.137 \times 10^{2}$	$2.211 \times 10^{3} / 2.079 \times 10^{2}$	$1.045 \times 10^{3} / 8.777 \times 10^{1}$	$1.394 \times 10^{3} / 5.468 \times 10^{1}$	$3.098 \times 10^{3} / 2.996 \times 10^{2}$	$3.206 \times 10^{3} / 2.254 \times 10^{2}$	$2.112 \times 10^{3} / 1.028 \times 10^{2}$	$8.832 \times 10^{3} / 7.015 \times 10^{2}$
		70	$7.568 \times 10^{2} / 4.819 \times 10^{1}$	$1.064 \times 10^{3} / 7.385 \times 10^{1}$	$1.241 \times 10^{3} / 1.171 \times 10^{2}$	$1.889 \times 10^{3} / 2.262 \times 10^{2}$	$1.004 \times 10^{3} / 6.775 \times 10^{1}$	$1.386 \times 10^{3} / 1.198 \times 10^{2}$	$2.762 \times 10^{3} / 2.020 \times 10^{2}$	$1.428 \times 10^{3} / 1.060 \times 10^{2}$	$1.723 \times 10^{3} / 8.687 \times 10^{1}$	$8.073 \times 10^{3} / 3.702 \times 10^{2}$
30	8	30	$1.841 \times 10^{3} / 4.900 \times 10^{1}$	$1.912 \times 10^{3} / 5.429 \times 10^{1}$	$2.111 \times 10^{3} / 1.031 \times 10^{2}$	$3.596 \times 10^{3} / 2.777 \times 10^{2}$	$1.986 \times 10^{3} / 8.544 \times 10^{1}$	$2.326 \times 10^{3} / 9.872 \times 10^{1}$	$4.684 \times 10^{3} / 3.704 \times 10^{2}$	$3.664 \times 10^{3} / 9.822 \times 10^{1}$	$4.016 \times 10^{3} / 2.115 \times 10^{2}$	$1.204 \times 10^{4} / 5.869 \times 10^{2}$
		50	$1.841 \times 10^{3} / 6.275 \times 10^{1}$	$1.939 \times 10^{3} / 5.414 \times 10^{1}$	$2.185 \times 10^{3} / 1.616 \times 10^{2}$	$3.954 \times 10^{3} / 4.840 \times 10^{2}$	$2.069 \times 10^{3} / 1.001 \times 10^{2}$	$2.378 \times 10^{3} / 1.120 \times 10^{2}$	$4.990 \times 10^{3} / 4.393 \times 10^{2}$	$3.631 \times 10^{3} / 2.182 \times 10^{2}$	$4.195 \times 10^{3} / 1.729 \times 10^{2}$	$1.218 \times 10^{4} / 4.072 \times 10^{2}$
		70	$2.547 \times 10^{3} / 4.407 \times 10^{1}$	$2.624 \times 10^{3} / 4.412 \times 10^{1}$	$2.826 \times 10^{3} / 1.044 \times 10^{2}$	$4.143 \times 10^{3} / 3.045 \times 10^{2}$	$2.644 \times 10^{3} / 9.337 \times 10^{1}$	$3.064 \times 10^{3} / 1.507 \times 10^{2}$	$5.462 \times 10^{3} / 4.116 \times 10^{2}$	$4.995 \times 10^{3} / 2.614 \times 10^{2}$	$5.369 \times 10^{3} / 1.531 \times 10^{2}$	$1.580 \times 10^{4} / 5.752 \times 10^{2}$
	12	30	$1.785 \times 10^{3} / 6.856 \times 10^{1}$	$1.941 \times 10^{3} / 3.315 \times 10^{1}$	$2.285 \times 10^{3} / 1.889 \times 10^{2}$	$3.569 \times 10^{3} / 2.648 \times 10^{2}$	$2.127 \times 10^{3} / 1.225 \times 10^{1}$	$2.535 \times 10^{3} / 1.649 \times 10^{2}$	$4.919 \times 10^{3} / 4.702 \times 10^{2}$	$3.877 \times 10^{3} / 2.003 \times 10^{2}$	$5.016 \times 10^{3} / 1.578 \times 10^{2}$	$1.551 \times 10^{4} / 6.152 \times 10^{2}$
		50	$1.491 \times 10^{3} / 5.342 \times 10^{1}$	$1.651 \times 10^{3} / 4.640 \times 10^{1}$	$1.897 \times 10^{3} / 1.000 \times 10^{2}$	$3.003 \times 10^{3} / 2.336 \times 10^{2}$	$1.742 \times 10^{3} / 8.557 \times 10^{1}$	$2.102 \times 10^{3} / 1.125 \times 10^{2}$	$4.318 \times 10^{3} / 2.883 \times 10^{2}$	$3.403 \times 10^{3} / 2.269 \times 10^{2}$	$3.954 \times 10^{3} / 2.443 \times 10^{2}$	$1.401 \times 10^{4} / 5.852 \times 10^{2}$
		70	$1.366 \times 10^{3} / 4.562 \times 10^{1}$	$1.539 \times 10^{3} / 3.842 \times 10^{1}$	$1.766 \times 10^{3} / 1.054 \times 10^{2}$	$3.119 \times 10^{3} / 2.519 \times 10^{2}$	$1.570 \times 10^{3} / 7.571 \times 10^{1}$	$1.986 \times 10^{3} / 1.050 \times 10^{2}$	$4.137 \times 10^{3} / 3.324 \times 10^{2}$	$2.753 \times 10^{3} / 1.771 \times 10^{2}$	$3.059 \times 10^{3} / 1.326 \times 10^{2}$	$1.320 \times 10^{4} / 6.762 \times 10^{2}$
	16	30	$8.613 \times 10^{2} / 5.363 \times 10^{1}$	$1.090 \times 10^{3} / 5.304 \times 10^{1}$	$1.288 \times 10^{3} / 8.674 \times 10^{1}$	$2.358 \times 10^{3} / 1.419 \times 10^{2}$	$1.087 \times 10^{3} / 7.024 \times 10^{1}$	$1.453 \times 10^{3} / 7.323 \times 10^{1}$	$3.313 \times 10^{3} / 3.232 \times 10^{2}$	$1.883 \times 10^{3} / 1.270 \times 10^{2}$	$2.423 \times 10^{3} / 1.872 \times 10^{2}$	$8.405 \times 10^{3} / 4.059 \times 10^{2}$
		50	$1.226 \times 10^{3} / 4.746 \times 10^{1}$	$1.411 \times 10^{3} / 6.309 \times 10^{1}$	$1.629 \times 10^{3} / 1.817 \times 10^{2}$	$2.822 \times 10^{3} / 2.216 \times 10^{2}$	$1.452 \times 10^{3} / 7.994 \times 10^{1}$	$1.949 \times 10^{3} / 1.152 \times 10^{2}$	$4.038 \times 10^{3} / 3.789 \times 10^{2}$	$2.550 \times 10^{3} / 1.941 \times 10^{2}$	$3.278 \times 10^{3} / 2.100 \times 10^{2}$	$1.397 \times 10^{4} / 8.220 \times 10^{2}$
		70	$1.027 \times 10^{3} / 4.635 \times 10^{1}$	$1.230 \times 10^{3} / 6.183 \times 10^{1}$	$1.458 \times 10^{3} / 1.564 \times 10^{2}$	$2.711 \times 10^{3} / 2.615 \times 10^{2}$	$1.295 \times 10^{3} / 9.345 \times 10^{1}$	$1.688 \times 10^{3} / 1.083 \times 10^{2}$	$3.625 \times 10^{3} / 3.254 \times 10^{2}$	$2.364 \times 10^{3} / 1.891 \times 10^{2}$	$2.581 \times 10^{3} / 1.516 \times 10^{2}$	$1.204 \times 10^{4} / 6.382 \times 10^{2}$
40	8	30	$2.068 \times 10^{3} / 6.852 \times 10^{1}$	$2.153 \times 10^{3} / 5.187 \times 10^{1}$	$2.396 \times 10^{3} / 1.644 \times 10^{2}$	$4.473 \times 10^{3} / 4.886 \times 10^{2}$	$2.219 \times 10^{3} / 8.323 \times 10^{1}$	$2.618 \times 10^{3} / 9.296 \times 10^{1}$	$5.361 \times 10^{3} / 3.166 \times 10^{2}$	$4.267 \times 10^{3} / 2.126 \times 10^{2}$	$6.022 \times 10^{3} / 2.332 \times 10^{2}$	$1.420 \times 10^{4} / 8.249 \times 10^{2}$
		50	$3.585 \times 10^{3} / 4.697 \times 10^{1}$	$3.630 \times 10^{3} / 5.702 \times 10^{1}$	$4.011 \times 10^{3} / 2.587 \times 10^{2}$	$6.367 \times 10^{3} / 5.365 \times 10^{2}$	$3.760 \times 10^{3} / 1.398 \times 10^{2}$	$4.133 \times 10^{3} / 1.037 \times 10^{2}$	$8.084 \times 10^{3} / 4.942 \times 10^{2}$	$7.740 \times 10^{3} / 3.701 \times 10^{2}$	$8.743 \times 10^{3} / 3.495 \times 10^{2}$	$2.401 \times 10^{4} / 7.333 \times 10^{2}$
		70	$3.578 \times 10^{3} / 5.479 \times 10^{1}$	$3.651 \times 10^{3} / 6.753 \times 10^{1}$	$3.859 \times 10^{3} / 1.471 \times 10^{2}$	$5.369 \times 10^{3} / 2.846 \times 10^{2}$	$3.624 \times 10^{3} / 1.232 \times 10^{2}$	$4.108 \times 10^{3} / 1.300 \times 10^{2}$	$7.084 \times 10^{3} / 4.547 \times 10^{2}$	$6.700 \times 10^{3} / 2.265 \times 10^{2}$	$7.399 \times 10^{3} / 2.355 \times 10^{2}$	$2.181 \times 10^{4} / 7.874 \times 10^{2}$
	12	30	$1.757 \times 10^{3} / 5.697 \times 10^{1}$	$1.865 \times 10^{3} / 4.361 \times 10^{1}$	$2.139 \times 10^{3} / 1.564 \times 10^{2}$	$3.729 \times 10^{3} / 4.427 \times 10^{2}$	$1.929 \times 10^{3} / 8.429 \times 10^{1}$	$2.407 \times 10^{3} / 8.586 \times 10^{1}$	$4.866 \times 10^{3} / 2.835 \times 10^{2}$	$4.057 \times 10^{3} / 2.235 \times 10^{2}$	$4.137 \times 10^{3} / 1.524 \times 10^{2}$	$1.634 \times 10^{4} / 8.712 \times 10^{2}$
		50	$1.797 \times 10^{3} / 6.543 \times 10^{1}$	$1.925 \times 10^{3} / 6.132 \times 10^{1}$	$2.181 \times 10^{3} / 1.151 \times 10^{2}$	$3.812 \times 10^{3} / 3.176 \times 10^{2}$	$2.022 \times 10^{3} / 9.291 \times 10^{1}$	$2.437 \times 10^{3} / 1.527 \times 10^{2}$	$5.150 \times 10^{3} / 3.098 \times 10^{2}$	$3.868 \times 10^{3} / 2.097 \times 10^{2}$	$4.454 \times 10^{3} / 2.197 \times 10^{2}$	$1.706 \times 10^{4} / 9.795 \times 10^{2}$
		70	$1.893 \times 10^{3} / 4.507 \times 10^{1}$	$1.988 \times 10^{3} / 4.965 \times 10^{1}$	$2.301 \times 10^{3} / 1.727 \times 10^{2}$	$3.902 \times 10^{3} / 4.383 \times 10^{2}$	$2.049 \times 10^{3} / 9.024 \times 10^{1}$	$2.467 \times 10^{3} / 1.061 \times 10^{2}$	$4.598 \times 10^{3} / 3.579 \times 10^{2}$	$3.975 \times 10^{3} / 2.067 \times 10^{2}$	$4.642 \times 10^{3} / 2.020 \times 10^{2}$	$1.761 \times 10^{4} / 8.169 \times 10^{2}$
	16	30	$1.410 \times 10^{3} / 4.413 \times 10^{1}$	$1.610 \times 10^{3} / 5.994 \times 10^{1}$	$1.888 \times 10^{3} / 1.368 \times 10^{2}$	$3.466 \times 10^{3} / 3.161 \times 10^{2}$	$1.639 \times 10^{3} / 1.052 \times 10^{2}$	$2.011 \times 10^{3} / 8.084 \times 10^{1}$	$4.702 \times 10^{3} / 3.209 \times 10^{2}$	$3.171 \times 10^{3} / 1.935 \times 10^{2}$	$3.863 \times 10^{3} / 2.113 \times 10^{2}$	$1.664 \times 10^{4} / 9.417 \times 10^{2}$
		50	$1.477 \times 10^{3} / 3.263 \times 10^{1}$	$1.643 \times 10^{3} / 3.908 \times 10^{1}$	$1.891 \times 10^{3} / 9.949 \times 10^{1}$	$3.622 \times 10^{3} / 2.864 \times 10^{2}$	$1.643 \times 10^{3} / 5.604 \times 10^{1}$	$2.141 \times 10^{3} / 1.384 \times 10^{2}$	$4.856 \times 10^{3} / 3.314 \times 10^{2}$	$2.682 \times 10^{3} / 1.192 \times 10^{2}$	$3.792 \times 10^{3} / 2.014 \times 10^{2}$	$1.857 \times 10^{4} / 7.537 \times 10^{2}$
		70	$1.357 \times 10^{3} / 6.114 \times 10^{1}$	$1.581 \times 10^{3} / 3.710 \times 10^{1}$	$1.842 \times 10^{3} / 2.554 \times 10^{2}$	$3.147 \times 10^{3} / 2.013 \times 10^{2}$	$1.644 \times 10^{3} / 8.888 \times 10^{1}$	$2.095 \times 10^{3} / 1.163 \times 10^{2}$	$4.598 \times 10^{3} / 3.579 \times 10^{2}$	$3.599 \times 10^{3} / 1.606 \times 10^{2}$	$3.528 \times 10^{3} / 1.585 \times 10^{2}$	$1.676 \times 10^{4} / 9.295 \times 10^{2}$

Table 7. Analysis of the averages and standard deviations of

U_{a v e}

across 20 runs for single composite rules vs. DLIDQN.

Table 7. Analysis of the averages and standard deviations of

U_{a v e}

across 20 runs for single composite rules vs. DLIDQN.

n	m	e	Ours	rule1	rule2	rule3	rule4	rule5	rule6	rule7	rule8	rule9
20	8	30	$9.819 \times 10^{- 1} / 7.412 \times 10^{- 3}$	$9.786 \times 10^{- 1} / 8.309 \times 10^{- 3}$	$9.126 \times 10^{- 1} / 4.266 \times 10^{- 2}$	$6.194 \times 10^{- 1} / 3.469 \times 10^{- 2}$	$8.157 \times 10^{- 1} / 2.368 \times 10^{- 2}$	$8.014 \times 10^{- 1} / 2.618 \times 10^{- 2}$	$4.408 \times 10^{- 1} / 3.713 \times 10^{- 2}$	$4.303 \times 10^{- 1} / 2.308 \times 10^{- 2}$	$4.716 \times 10^{- 1} / 1.629 \times 10^{- 2}$	$1.700 \times 10^{- 1} / 7.108 \times 10^{- 3}$
		50	$9.884 \times 10^{- 1} / 5.444 \times 10^{- 3}$	$9.856 \times 10^{- 1} / 4.914 \times 10^{- 3}$	$9.026 \times 10^{- 1} / 6.047 \times 10^{- 2}$	$6.401 \times 10^{- 1} / 3.804 \times 10^{- 2}$	$8.391 \times 10^{- 1} / 2.246 \times 10^{- 2}$	$8.176 \times 10^{- 1} / 2.058 \times 10^{- 2}$	$4.714 \times 10^{- 1} / 3.564 \times 10^{- 2}$	$4.876 \times 10^{- 1} / 2.612 \times 10^{- 2}$	$4.621 \times 10^{- 1} / 2.803 \times 10^{- 2}$	$1.772 \times 10^{- 1} / 6.924 \times 10^{- 3}$
		70	$9.910 \times 10^{- 1} / 4.426 \times 10^{- 3}$	$9.889 \times 10^{- 1} / 4.923 \times 10^{- 3}$	$9.117 \times 10^{- 1} / 5.952 \times 10^{- 2}$	$6.360 \times 10^{- 1} / 5.165 \times 10^{- 2}$	$8.411 \times 10^{- 1} / 2.252 \times 10^{- 2}$	$8.160 \times 10^{- 1} / 2.760 \times 10^{- 2}$	$4.676 \times 10^{- 1} / 3.738 \times 10^{- 2}$	$3.988 \times 10^{- 1} / 2.379 \times 10^{- 2}$	$4.761 \times 10^{- 1} / 2.182 \times 10^{- 2}$	$1.683 \times 10^{- 1} / 6.905 \times 10^{- 3}$
	12	30	$9.244 \times 10^{- 1} / 7.699 \times 10^{- 3}$	$9.349 \times 10^{- 1} / 1.177 \times 10^{- 2}$	$8.570 \times 10^{- 1} / 5.049 \times 10^{- 2}$	$5.355 \times 10^{- 1} / 4.538 \times 10^{- 2}$	$7.627 \times 10^{- 1} / 2.820 \times 10^{- 2}$	$7.701 \times 10^{- 1} / 2.083 \times 10^{- 2}$	$3.759 \times 10^{- 1} / 2.850 \times 10^{- 2}$	$4.180 \times 10^{- 1} / 1.998 \times 10^{- 2}$	$4.493 \times 10^{- 1} / 1.883 \times 10^{- 2}$	$1.320 \times 10^{- 1} / 7.380 \times 10^{- 3}$
		50	$9.204 \times 10^{- 1} / 1.044 \times 10^{- 2}$	$9.031 \times 10^{- 1} / 2.308 \times 10^{- 2}$	$8.002 \times 10^{- 1} / 7.483 \times 10^{- 2}$	$5.320 \times 10^{- 1} / 3.260 \times 10^{- 2}$	$7.474 \times 10^{- 1} / 2.953 \times 10^{- 2}$	$7.371 \times 10^{- 1} / 2.061 \times 10^{- 2}$	$3.818 \times 10^{- 1} / 3.268 \times 10^{- 2}$	$4.955 \times 10^{- 1} / 2.253 \times 10^{- 2}$	$4.963 \times 10^{- 1} / 2.350 \times 10^{- 2}$	$1.285 \times 10^{- 1} / 7.699 \times 10^{- 3}$
		70	$9.278 \times 10^{- 1} / 1.432 \times 10^{- 2}$	$8.977 \times 10^{- 1} / 2.094 \times 10^{- 2}$	$8.189 \times 10^{- 1} / 6.008 \times 10^{- 2}$	$5.540 \times 10^{- 1} / 4.414 \times 10^{- 2}$	$7.420 \times 10^{- 1} / 2.454 \times 10^{- 2}$	$7.382 \times 10^{- 1} / 2.469 \times 10^{- 2}$	$3.720 \times 10^{- 1} / 3.083 \times 10^{- 2}$	$4.439 \times 10^{- 1} / 1.414 \times 10^{- 2}$	$4.769 \times 10^{- 1} / 2.009 \times 10^{- 2}$	$1.348 \times 10^{- 1} / 8.405 \times 10^{- 3}$
	16	30	$8.353 \times 10^{- 1} / 1.646 \times 10^{- 2}$	$8.424 \times 10^{- 1} / 1.960 \times 10^{- 2}$	$7.469 \times 10^{- 1} / 4.256 \times 10^{- 2}$	$4.671 \times 10^{- 1} / 3.732 \times 10^{- 2}$	$6.820 \times 10^{- 1} / 2.214 \times 10^{- 2}$	$7.263 \times 10^{- 1} / 1.677 \times 10^{- 2}$	$3.269 \times 10^{- 1} / 3.158 \times 10^{- 2}$	$4.659 \times 10^{- 1} / 2.116 \times 10^{- 2}$	$5.027 \times 10^{- 1} / 2.083 \times 10^{- 2}$	$1.055 \times 10^{- 1} / 8.867 \times 10^{- 3}$
		50	$8.436 \times 10^{- 1} / 1.386 \times 10^{- 2}$	$8.635 \times 10^{- 1} / 1.786 \times 10^{- 2}$	$7.527 \times 10^{- 1} / 4.805 \times 10^{- 2}$	$4.765 \times 10^{- 1} / 3.658 \times 10^{- 2}$	$7.282 \times 10^{- 1} / 2.635 \times 10^{- 2}$	$7.160 \times 10^{- 1} / 1.896 \times 10^{- 2}$	$3.395 \times 10^{- 1} / 2.480 \times 10^{- 2}$	$3.778 \times 10^{- 1} / 2.153 \times 10^{- 2}$	$4.818 \times 10^{- 1} / 2.864 \times 10^{- 2}$	$1.110 \times 10^{- 1} / 8.002 \times 10^{- 3}$
		70	$8.654 \times 10^{- 1} / 1.417 \times 10^{- 2}$	$8.540 \times 10^{- 1} / 1.844 \times 10^{- 2}$	$7.686 \times 10^{- 1} / 5.307 \times 10^{- 2}$	$5.165 \times 10^{- 1} / 5.378 \times 10^{- 2}$	$7.100 \times 10^{- 1} / 2.767 \times 10^{- 2}$	$7.311 \times 10^{- 1} / 1.813 \times 10^{- 2}$	$3.536 \times 10^{- 1} / 2.611 \times 10^{- 2}$	$4.712 \times 10^{- 1} / 1.760 \times 10^{- 2}$	$5.431 \times 10^{- 1} / 1.854 \times 10^{- 2}$	$1.154 \times 10^{- 1} / 6.833 \times 10^{- 3}$
30	8	30	$9.945 \times 10^{- 1} / 4.542 \times 10^{- 3}$	$9.944 \times 10^{- 1} / 2.196 \times 10^{- 3}$	$9.397 \times 10^{- 1} / 3.625 \times 10^{- 2}$	$6.126 \times 10^{- 1} / 3.676 \times 10^{- 2}$	$8.624 \times 10^{- 1} / 2.012 \times 10^{- 2}$	$8.460 \times 10^{- 1} / 2.231 \times 10^{- 2}$	$4.657 \times 10^{- 1} / 3.859 \times 10^{- 2}$	$4.319 \times 10^{- 1} / 1.412 \times 10^{- 2}$	$4.567 \times 10^{- 1} / 1.861 \times 10^{- 2}$	$1.697 \times 10^{- 1} / 8.199 \times 10^{- 3}$
		50	$9.942 \times 10^{- 1} / 4.098 \times 10^{- 3}$	$9.886 \times 10^{- 1} / 5.555 \times 10^{- 3}$	$9.163 \times 10^{- 1} / 5.653 \times 10^{- 2}$	$5.773 \times 10^{- 1} / 6.025 \times 10^{- 2}$	$8.345 \times 10^{- 1} / 1.971 \times 10^{- 2}$	$8.352 \times 10^{- 1} / 2.325 \times 10^{- 2}$	$4.316 \times 10^{- 1} / 3.053 \times 10^{- 2}$	$4.297 \times 10^{- 1} / 1.486 \times 10^{- 2}$	$4.521 \times 10^{- 1} / 1.653 \times 10^{- 2}$	$1.684 \times 10^{- 1} / 5.152 \times 10^{- 3}$
		70	$9.985 \times 10^{- 1} / 1.511 \times 10^{- 3}$	$9.984 \times 10^{- 1} / 1.219 \times 10^{- 3}$	$9.563 \times 10^{- 1} / 3.517 \times 10^{- 2}$	$7.039 \times 10^{- 1} / 4.722 \times 10^{- 2}$	$8.841 \times 10^{- 1} / 2.262 \times 10^{- 2}$	$8.786 \times 10^{- 1} / 1.950 \times 10^{- 2}$	$5.089 \times 10^{- 1} / 3.653 \times 10^{- 2}$	$3.882 \times 10^{- 1} / 1.385 \times 10^{- 2}$	$4.792 \times 10^{- 1} / 1.180 \times 10^{- 2}$	$1.673 \times 10^{- 1} / 5.804 \times 10^{- 3}$
	12	30	$9.744 \times 10^{- 1} / 8.648 \times 10^{- 3}$	$9.724 \times 10^{- 1} / 5.246 \times 10^{- 3}$	$8.892 \times 10^{- 1} / 7.667 \times 10^{- 2}$	$5.838 \times 10^{- 1} / 3.681 \times 10^{- 2}$	$8.022 \times 10^{- 1} / 2.833 \times 10^{- 2}$	$7.809 \times 10^{- 1} / 2.531 \times 10^{- 2}$	$4.218 \times 10^{- 1} / 3.261 \times 10^{- 2}$	$4.124 \times 10^{- 1} / 2.179 \times 10^{- 2}$	$4.215 \times 10^{- 1} / 1.135 \times 10^{- 2}$	$1.251 \times 10^{- 1} / 4.692 \times 10^{- 3}$
		50	$9.853 \times 10^{- 1} / 5.352 \times 10^{- 3}$	$9.753 \times 10^{- 1} / 6.076 \times 10^{- 3}$	$9.044 \times 10^{- 1} / 6.014 \times 10^{- 2}$	$6.064 \times 10^{- 1} / 4.969 \times 10^{- 2}$	$8.177 \times 10^{- 1} / 2.110 \times 10^{- 2}$	$8.009 \times 10^{- 1} / 1.651 \times 10^{- 2}$	$4.182 \times 10^{- 1} / 2.874 \times 10^{- 2}$	$3.680 \times 10^{- 1} / 2.715 \times 10^{- 2}$	$4.075 \times 10^{- 1} / 2.342 \times 10^{- 2}$	$1.233 \times 10^{- 1} / 5.013 \times 10^{- 3}$
		70	$9.689 \times 10^{- 1} / 7.467 \times 10^{- 3}$	$9.658 \times 10^{- 1} / 7.288 \times 10^{- 3}$	$9.015 \times 10^{- 1} / 4.312 \times 10^{- 2}$	$5.416 \times 10^{- 1} / 3.870 \times 10^{- 2}$	$8.102 \times 10^{- 1} / 2.006 \times 10^{- 2}$	$8.034 \times 10^{- 1} / 1.722 \times 10^{- 2}$	$4.026 \times 10^{- 1} / 3.295 \times 10^{- 2}$	$4.342 \times 10^{- 1} / 2.216 \times 10^{- 2}$	$4.929 \times 10^{- 1} / 1.353 \times 10^{- 2}$	$1.197 \times 10^{- 1} / 5.064 \times 10^{- 3}$
	16	30	$9.234 \times 10^{- 1} / 1.387 \times 10^{- 2}$	$9.278 \times 10^{- 1} / 8.972 \times 10^{- 3}$	$8.542 \times 10^{- 1} / 5.236 \times 10^{- 2}$	$5.089 \times 10^{- 1} / 4.835 \times 10^{- 2}$	$7.481 \times 10^{- 1} / 2.249 \times 10^{- 2}$	$7.656 \times 10^{- 1} / 1.815 \times 10^{- 2}$	$3.525 \times 10^{- 1} / 3.206 \times 10^{- 2}$	$4.167 \times 10^{- 1} / 1.526 \times 10^{- 2}$	$4.703 \times 10^{- 1} / 3.045 \times 10^{- 2}$	$1.066 \times 10^{- 1} / 5.622 \times 10^{- 3}$
		50	$9.202 \times 10^{- 1} / 1.081 \times 10^{- 2}$	$9.309 \times 10^{- 1} / 1.089 \times 10^{- 2}$	$8.409 \times 10^{- 1} / 7.432 \times 10^{- 2}$	$5.202 \times 10^{- 1} / 4.224 \times 10^{- 2}$	$7.576 \times 10^{- 1} / 2.185 \times 10^{- 2}$	$7.436 \times 10^{- 1} / 2.586 \times 10^{- 2}$	$3.687 \times 10^{- 1} / 2.584 \times 10^{- 2}$	$4.203 \times 10^{- 1} / 1.529 \times 10^{- 2}$	$4.516 \times 10^{- 1} / 2.002 \times 10^{- 2}$	$1.005 \times 10^{- 1} / 5.314 \times 10^{- 3}$
		70	$9.194 \times 10^{- 1} / 1.180 \times 10^{- 2}$	$9.156 \times 10^{- 1} / 1.311 \times 10^{- 2}$	$8.334 \times 10^{- 1} / 7.675 \times 10^{- 2}$	$4.941 \times 10^{- 1} / 3.913 \times 10^{- 2}$	$7.551 \times 10^{- 1} / 2.314 \times 10^{- 2}$	$7.598 \times 10^{- 1} / 2.102 \times 10^{- 2}$	$3.540 \times 10^{- 1} / 2.470 \times 10^{- 2}$	$3.903 \times 10^{- 1} / 1.986 \times 10^{- 2}$	$4.704 \times 10^{- 1} / 2.615 \times 10^{- 2}$	$1.034 \times 10^{- 1} / 4.783 \times 10^{- 3}$
40	8	30	$9.942 \times 10^{- 1} / 5.093 \times 10^{- 3}$	$9.927 \times 10^{- 1} / 3.991 \times 10^{- 3}$	$9.348 \times 10^{- 1} / 5.970 \times 10^{- 2}$	$5.885 \times 10^{- 1} / 4.101 \times 10^{- 2}$	$8.675 \times 10^{- 1} / 1.696 \times 10^{- 2}$	$8.648 \times 10^{- 1} / 1.646 \times 10^{- 2}$	$4.630 \times 10^{- 1} / 2.494 \times 10^{- 2}$	$3.948 \times 10^{- 1} / 1.510 \times 10^{- 2}$	$3.919 \times 10^{- 1} / 1.124 \times 10^{- 2}$	$1.692 \times 10^{- 1} / 7.472 \times 10^{- 3}$
		50	$9.996 \times 10^{- 1} / 5.870 \times 10^{- 4}$	$9.994 \times 10^{- 1} / 7.161 \times 10^{- 4}$	$9.422 \times 10^{- 1} / 5.475 \times 10^{- 2}$	$6.686 \times 10^{- 1} / 4.412 \times 10^{- 2}$	$9.016 \times 10^{- 1} / 1.741 \times 10^{- 2}$	$8.867 \times 10^{- 1} / 1.208 \times 10^{- 2}$	$5.025 \times 10^{- 1} / 2.854 \times 10^{- 2}$	$3.613 \times 10^{- 1} / 1.458 \times 10^{- 2}$	$3.811 \times 10^{- 1} / 1.664 \times 10^{- 2}$	$1.583 \times 10^{- 1} / 3.981 \times 10^{- 3}$
		70	$9.998 \times 10^{- 1} / 4.882 \times 10^{- 4}$	$9.992 \times 10^{- 1} / 1.274 \times 10^{- 3}$	$9.630 \times 10^{- 1} / 3.884 \times 10^{- 2}$	$7.334 \times 10^{- 1} / 3.733 \times 10^{- 2}$	$9.147 \times 10^{- 1} / 1.560 \times 10^{- 2}$	$8.896 \times 10^{- 1} / 1.367 \times 10^{- 2}$	$5.386 \times 10^{- 1} / 3.623 \times 10^{- 2}$	$3.943 \times 10^{- 1} / 1.182 \times 10^{- 2}$	$4.280 \times 10^{- 1} / 1.049 \times 10^{- 2}$	$1.637 \times 10^{- 1} / 4.873 \times 10^{- 3}$
	12	30	$9.928 \times 10^{- 1} / 3.390 \times 10^{- 3}$	$9.905 \times 10^{- 1} / 3.948 \times 10^{- 3}$	$9.284 \times 10^{- 1} / 5.917 \times 10^{- 2}$	$5.807 \times 10^{- 1} / 5.659 \times 10^{- 2}$	$8.411 \times 10^{- 1} / 1.804 \times 10^{- 2}$	$8.244 \times 10^{- 1} / 1.774 \times 10^{- 2}$	$4.330 \times 10^{- 1} / 2.074 \times 10^{- 2}$	$3.627 \times 10^{- 1} / 1.503 \times 10^{- 2}$	$4.491 \times 10^{- 1} / 1.010 \times 10^{- 2}$	$1.217 \times 10^{- 1} / 5.432 \times 10^{- 3}$
		50	$9.880 \times 10^{- 1} / 5.327 \times 10^{- 3}$	$9.857 \times 10^{- 1} / 6.843 \times 10^{- 3}$	$9.235 \times 10^{- 1} / 5.124 \times 10^{- 2}$	$5.820 \times 10^{- 1} / 4.482 \times 10^{- 2}$	$8.427 \times 10^{- 1} / 2.015 \times 10^{- 2}$	$8.336 \times 10^{- 1} / 1.317 \times 10^{- 2}$	$4.200 \times 10^{- 1} / 2.054 \times 10^{- 2}$	$4.051 \times 10^{- 1} / 1.549 \times 10^{- 2}$	$4.427 \times 10^{- 1} / 1.612 \times 10^{- 2}$	$1.200 \times 10^{- 1} / 5.070 \times 10^{- 3}$
		70	$9.911 \times 10^{- 1} / 3.814 \times 10^{- 3}$	$9.917 \times 10^{- 1} / 3.356 \times 10^{- 3}$	$9.308 \times 10^{- 1} / 6.190 \times 10^{- 2}$	$5.911 \times 10^{- 1} / 5.269 \times 10^{- 2}$	$8.510 \times 10^{- 1} / 2.217 \times 10^{- 2}$	$8.485 \times 10^{- 1} / 1.387 \times 10^{- 2}$	$4.265 \times 10^{- 1} / 3.274 \times 10^{- 2}$	$3.847 \times 10^{- 1} / 1.434 \times 10^{- 2}$	$4.242 \times 10^{- 1} / 1.317 \times 10^{- 2}$	$1.193 \times 10^{- 1} / 3.164 \times 10^{- 3}$
	16	30	$9.781 \times 10^{- 1} / 6.968 \times 10^{- 3}$	$9.738 \times 10^{- 1} / 7.856 \times 10^{- 3}$	$9.030 \times 10^{- 1} / 6.627 \times 10^{- 2}$	$5.299 \times 10^{- 1} / 4.800 \times 10^{- 2}$	$8.122 \times 10^{- 1} / 2.140 \times 10^{- 2}$	$8.144 \times 10^{- 1} / 1.328 \times 10^{- 2}$	$3.816 \times 10^{- 1} / 2.113 \times 10^{- 2}$	$3.894 \times 10^{- 1} / 2.641 \times 10^{- 2}$	$4.332 \times 10^{- 1} / 2.197 \times 10^{- 2}$	$1.003 \times 10^{- 1} / 4.763 \times 10^{- 3}$
		50	$9.917 \times 10^{- 1} / 2.110 \times 10^{- 3}$	$9.885 \times 10^{- 1} / 3.234 \times 10^{- 3}$	$9.287 \times 10^{- 1} / 5.615 \times 10^{- 2}$	$5.478 \times 10^{- 1} / 4.127 \times 10^{- 2}$	$8.505 \times 10^{- 1} / 1.465 \times 10^{- 2}$	$8.177 \times 10^{- 1} / 2.072 \times 10^{- 2}$	$3.947 \times 10^{- 1} / 2.433 \times 10^{- 2}$	$4.850 \times 10^{- 1} / 1.660 \times 10^{- 2}$	$4.555 \times 10^{- 1} / 2.156 \times 10^{- 2}$	$9.741 \times 10^{- 2} / 4.147 \times 10^{- 3}$
		70	$9.734 \times 10^{- 1} / 4.744 \times 10^{- 3}$	$9.596 \times 10^{- 1} / 8.007 \times 10^{- 3}$	$8.883 \times 10^{- 1} / 8.667 \times 10^{- 2}$	$5.546 \times 10^{- 1} / 3.013 \times 10^{- 2}$	$8.050 \times 10^{- 1} / 1.758 \times 10^{- 2}$	$7.884 \times 10^{- 1} / 1.969 \times 10^{- 2}$	$3.758 \times 10^{- 1} / 2.630 \times 10^{- 2}$	$4.297 \times 10^{- 1} / 1.436 \times 10^{- 2}$	$4.584 \times 10^{- 1} / 1.905 \times 10^{- 2}$	$9.760 \times 10^{- 2} / 4.974 \times 10^{- 3}$

Table 8. Analysis of the averages and standard deviations of

T R_{a v e}

across 20 runs for single composite rules vs. DLIDQN.

Table 8. Analysis of the averages and standard deviations of

T R_{a v e}

across 20 runs for single composite rules vs. DLIDQN.

n	m	e	Ours	rule1	rule2	rule3	rule4	rule5	rule6	rule7	rule8	rule9
20	8	30	$6.657 \times 10^{- 1} / 1.125 \times 10^{- 2}$	$7.832 \times 10^{- 1} / 8.543 \times 10^{- 3}$	$7.542 \times 10^{- 1} / 2.043 \times 10^{- 2}$	$8.764 \times 10^{- 1} / 9.013 \times 10^{- 3}$	$6.841 \times 10^{- 1} / 1.915 \times 10^{- 2}$	$7.350 \times 10^{- 1} / 1.453 \times 10^{- 2}$	$8.503 \times 10^{- 1} / 2.299 \times 10^{- 2}$	$6.034 \times 10^{- 1} / 2.664 \times 10^{- 2}$	$6.514 \times 10^{- 1} / 1.870 \times 10^{- 2}$	$8.283 \times 10^{- 1} / 1.975 \times 10^{- 2}$
		50	$5.858 \times 10^{- 1} / 4.284 \times 10^{- 2}$	$8.034 \times 10^{- 1} / 5.551 \times 10^{- 3}$	$7.714 \times 10^{- 1} / 1.725 \times 10^{- 2}$	$8.605 \times 10^{- 1} / 1.192 \times 10^{- 2}$	$6.726 \times 10^{- 1} / 2.832 \times 10^{- 2}$	$7.232 \times 10^{- 1} / 2.591 \times 10^{- 2}$	$8.303 \times 10^{- 1} / 2.657 \times 10^{- 2}$	$6.420 \times 10^{- 1} / 1.405 \times 10^{- 2}$	$6.528 \times 10^{- 1} / 2.448 \times 10^{- 2}$	$8.418 \times 10^{- 1} / 1.999 \times 10^{- 2}$
		70	$5.745 \times 10^{- 1} / 8.057 \times 10^{- 3}$	$7.676 \times 10^{- 1} / 9.679 \times 10^{- 3}$	$7.526 \times 10^{- 1} / 1.843 \times 10^{- 2}$	$8.666 \times 10^{- 1} / 2.041 \times 10^{- 2}$	$7.142 \times 10^{- 1} / 1.690 \times 10^{- 2}$	$7.375 \times 10^{- 1} / 1.885 \times 10^{- 2}$	$8.573 \times 10^{- 1} / 1.898 \times 10^{- 2}$	$6.548 \times 10^{- 1} / 1.204 \times 10^{- 2}$	$7.035 \times 10^{- 1} / 1.244 \times 10^{- 2}$	$8.479 \times 10^{- 1} / 1.955 \times 10^{- 2}$
	12	30	$5.259 \times 10^{- 1} / 1.194 \times 10^{- 2}$	$6.843 \times 10^{- 1} / 8.545 \times 10^{- 3}$	$6.401 \times 10^{- 1} / 3.337 \times 10^{- 2}$	$8.304 \times 10^{- 1} / 1.944 \times 10^{- 2}$	$5.307 \times 10^{- 1} / 2.691 \times 10^{- 2}$	$6.092 \times 10^{- 1} / 2.733 \times 10^{- 2}$	$7.862 \times 10^{- 1} / 2.724 \times 10^{- 2}$	$5.505 \times 10^{- 1} / 1.377 \times 10^{- 2}$	$6.101 \times 10^{- 1} / 1.634 \times 10^{- 2}$	$7.823 \times 10^{- 1} / 1.725 \times 10^{- 2}$
		50	$5.152 \times 10^{- 1} / 1.306 \times 10^{- 2}$	$6.553 \times 10^{- 1} / 1.023 \times 10^{- 2}$	$6.494 \times 10^{- 1} / 2.327 \times 10^{- 2}$	$8.080 \times 10^{- 1} / 1.189 \times 10^{- 2}$	$5.021 \times 10^{- 1} / 2.541 \times 10^{- 2}$	$5.817 \times 10^{- 1} / 2.486 \times 10^{- 2}$	$7.782 \times 10^{- 1} / 3.133 \times 10^{- 2}$	$5.035 \times 10^{- 1} / 2.125 \times 10^{- 2}$	$5.533 \times 10^{- 1} / 1.918 \times 10^{- 2}$	$7.883 \times 10^{- 1} / 2.352 \times 10^{- 2}$
		70	$5.041 \times 10^{- 1} / 1.302 \times 10^{- 2}$	$6.729 \times 10^{- 1} / 1.072 \times 10^{- 2}$	$6.346 \times 10^{- 1} / 1.987 \times 10^{- 2}$	$8.024 \times 10^{- 1} / 1.510 \times 10^{- 2}$	$5.066 \times 10^{- 1} / 3.302 \times 10^{- 2}$	$5.918 \times 10^{- 1} / 1.869 \times 10^{- 2}$	$7.741 \times 10^{- 1} / 2.459 \times 10^{- 2}$	$5.187 \times 10^{- 1} / 1.339 \times 10^{- 2}$	$5.816 \times 10^{- 1} / 1.273 \times 10^{- 2}$	$8.071 \times 10^{- 1} / 2.278 \times 10^{- 2}$
	16	30	$3.798 \times 10^{- 1} / 1.751 \times 10^{- 2}$	$5.851 \times 10^{- 1} / 1.400 \times 10^{- 2}$	$5.469 \times 10^{- 1} / 3.217 \times 10^{- 2}$	$7.957 \times 10^{- 1} / 2.129 \times 10^{- 2}$	$3.279 \times 10^{- 1} / 3.796 \times 10^{- 2}$	$4.734 \times 10^{- 1} / 3.284 \times 10^{- 2}$	$7.425 \times 10^{- 1} / 4.493 \times 10^{- 2}$	$4.288 \times 10^{- 1} / 2.154 \times 10^{- 2}$	$4.975 \times 10^{- 1} / 1.638 \times 10^{- 2}$	$7.832 \times 10^{- 1} / 2.746 \times 10^{- 2}$
		50	$3.117 \times 10^{- 1} / 1.502 \times 10^{- 2}$	$5.208 \times 10^{- 1} / 1.873 \times 10^{- 2}$	$5.592 \times 10^{- 1} / 2.547 \times 10^{- 2}$	$7.418 \times 10^{- 1} / 2.109 \times 10^{- 2}$	$3.593 \times 10^{- 1} / 3.608 \times 10^{- 2}$	$4.829 \times 10^{- 1} / 2.416 \times 10^{- 2}$	$7.414 \times 10^{- 1} / 3.213 \times 10^{- 2}$	$5.275 \times 10^{- 1} / 1.583 \times 10^{- 2}$	$5.249 \times 10^{- 1} / 1.912 \times 10^{- 2}$	$7.938 \times 10^{- 1} / 1.999 \times 10^{- 2}$
		70	$2.826 \times 10^{- 1} / 2.583 \times 10^{- 2}$	$5.474 \times 10^{- 1} / 1.682 \times 10^{- 2}$	$5.201 \times 10^{- 1} / 3.392 \times 10^{- 2}$	$7.427 \times 10^{- 1} / 2.913 \times 10^{- 2}$	$3.074 \times 10^{- 1} / 3.159 \times 10^{- 2}$	$4.413 \times 10^{- 1} / 2.971 \times 10^{- 2}$	$7.040 \times 10^{- 1} / 2.999 \times 10^{- 2}$	$3.872 \times 10^{- 1} / 2.245 \times 10^{- 2}$	$4.350 \times 10^{- 1} / 1.553 \times 10^{- 2}$	$7.708 \times 10^{- 1} / 2.326 \times 10^{- 2}$
30	8	30	$5.900 \times 10^{- 1} / 6.196 \times 10^{- 3}$	$8.032 \times 10^{- 1} / 5.038 \times 10^{- 3}$	$7.687 \times 10^{- 1} / 2.268 \times 10^{- 2}$	$8.764 \times 10^{- 1} / 9.259 \times 10^{- 3}$	$6.842 \times 10^{- 1} / 2.928 \times 10^{- 2}$	$7.354 \times 10^{- 1} / 2.018 \times 10^{- 2}$	$8.453 \times 10^{- 1} / 2.357 \times 10^{- 2}$	$6.097 \times 10^{- 1} / 1.229 \times 10^{- 2}$	$6.385 \times 10^{- 1} / 1.695 \times 10^{- 2}$	$8.101 \times 10^{- 1} / 1.665 \times 10^{- 2}$
		50	$6.151 \times 10^{- 1} / 7.524 \times 10^{- 3}$	$7.355 \times 10^{- 1} / 5.036 \times 10^{- 3}$	$7.089 \times 10^{- 1} / 2.229 \times 10^{- 2}$	$8.221 \times 10^{- 1} / 1.666 \times 10^{- 2}$	$6.556 \times 10^{- 1} / 2.681 \times 10^{- 2}$	$6.975 \times 10^{- 1} / 3.707 \times 10^{- 2}$	$8.094 \times 10^{- 1} / 2.447 \times 10^{- 2}$	$6.042 \times 10^{- 1} / 1.592 \times 10^{- 2}$	$6.185 \times 10^{- 1} / 2.045 \times 10^{- 2}$	$8.104 \times 10^{- 1} / 1.848 \times 10^{- 2}$
		70	$7.271 \times 10^{- 1} / 3.618 \times 10^{- 3}$	$8.214 \times 10^{- 1} / 4.351 \times 10^{- 3}$	$8.106 \times 10^{- 1} / 2.450 \times 10^{- 2}$	$8.837 \times 10^{- 1} / 1.221 \times 10^{- 2}$	$7.650 \times 10^{- 1} / 1.684 \times 10^{- 2}$	$7.934 \times 10^{- 1} / 1.677 \times 10^{- 2}$	$8.753 \times 10^{- 1} / 1.580 \times 10^{- 2}$	$7.071 \times 10^{- 1} / 7.209 \times 10^{- 3}$	$7.209 \times 10^{- 1} / 1.023 \times 10^{- 2}$	$8.625 \times 10^{- 1} / 1.411 \times 10^{- 2}$
	12	30	$6.067 \times 10^{- 1} / 1.041 \times 10^{- 2}$	$7.675 \times 10^{- 1} / 6.345 \times 10^{- 3}$	$7.436 \times 10^{- 1} / 2.310 \times 10^{- 2}$	$8.585 \times 10^{- 1} / 2.219 \times 10^{- 2}$	$6.876 \times 10^{- 1} / 2.909 \times 10^{- 2}$	$7.250 \times 10^{- 1} / 2.246 \times 10^{- 2}$	$8.290 \times 10^{- 1} / 1.992 \times 10^{- 2}$	$6.173 \times 10^{- 1} / 1.566 \times 10^{- 2}$	$6.855 \times 10^{- 1} / 1.132 \times 10^{- 2}$	$8.328 \times 10^{- 1} / 2.167 \times 10^{- 2}$
		50	$5.031 \times 10^{- 1} / 1.188 \times 10^{- 2}$	$7.320 \times 10^{- 1} / 6.748 \times 10^{- 3}$	$6.955 \times 10^{- 1} / 3.014 \times 10^{- 2}$	$8.389 \times 10^{- 1} / 1.318 \times 10^{- 2}$	$4.998 \times 10^{- 1} / 2.011 \times 10^{- 2}$	$6.685 \times 10^{- 1} / 2.613 \times 10^{- 2}$	$8.207 \times 10^{- 1} / 2.097 \times 10^{- 2}$	$5.415 \times 10^{- 1} / 1.543 \times 10^{- 2}$	$5.870 \times 10^{- 1} / 2.085 \times 10^{- 2}$	$8.102 \times 10^{- 1} / 2.667 \times 10^{- 2}$
		70	$4.854 \times 10^{- 1} / 9.758 \times 10^{- 3}$	$7.245 \times 10^{- 1} / 7.092 \times 10^{- 3}$	$6.894 \times 10^{- 1} / 3.044 \times 10^{- 2}$	$8.380 \times 10^{- 1} / 1.556 \times 10^{- 2}$	$5.731 \times 10^{- 1} / 2.510 \times 10^{- 2}$	$6.456 \times 10^{- 1} / 2.422 \times 10^{- 2}$	$8.089 \times 10^{- 1} / 2.429 \times 10^{- 2}$	$5.351 \times 10^{- 1} / 1.923 \times 10^{- 2}$	$5.945 \times 10^{- 1} / 1.437 \times 10^{- 2}$	$8.408 \times 10^{- 1} / 2.010 \times 10^{- 2}$
	16	30	$3.849 \times 10^{- 1} / 1.738 \times 10^{- 2}$	$4.799 \times 10^{- 1} / 1.155 \times 10^{- 2}$	$5.667 \times 10^{- 1} / 3.163 \times 10^{- 2}$	$7.589 \times 10^{- 1} / 2.592 \times 10^{- 2}$	$4.151 \times 10^{- 1} / 3.330 \times 10^{- 2}$	$5.391 \times 10^{- 1} / 3.453 \times 10^{- 2}$	$7.642 \times 10^{- 1} / 2.348 \times 10^{- 2}$	$4.459 \times 10^{- 1} / 1.750 \times 10^{- 2}$	$4.972 \times 10^{- 1} / 1.525 \times 10^{- 2}$	$7.757 \times 10^{- 1} / 2.054 \times 10^{- 2}$
		50	$5.003 \times 10^{- 1} / 1.339 \times 10^{- 2}$	$6.464 \times 10^{- 1} / 8.000 \times 10^{- 3}$	$6.290 \times 10^{- 1} / 2.658 \times 10^{- 2}$	$7.973 \times 10^{- 1} / 1.475 \times 10^{- 2}$	$4.863 \times 10^{- 1} / 2.805 \times 10^{- 2}$	$5.814 \times 10^{- 1} / 1.995 \times 10^{- 2}$	$7.799 \times 10^{- 1} / 1.546 \times 10^{- 2}$	$4.865 \times 10^{- 1} / 1.665 \times 10^{- 2}$	$5.323 \times 10^{- 1} / 1.698 \times 10^{- 2}$	$8.059 \times 10^{- 1} / 2.170 \times 10^{- 2}$
		70	$4.882 \times 10^{- 1} / 1.172 \times 10^{- 2}$	$6.515 \times 10^{- 1} / 1.430 \times 10^{- 2}$	$6.128 \times 10^{- 1} / 3.036 \times 10^{- 2}$	$8.034 \times 10^{- 1} / 1.295 \times 10^{- 2}$	$4.908 \times 10^{- 1} / 2.071 \times 10^{- 2}$	$5.886 \times 10^{- 1} / 2.801 \times 10^{- 2}$	$7.775 \times 10^{- 1} / 2.789 \times 10^{- 2}$	$4.995 \times 10^{- 1} / 1.641 \times 10^{- 2}$	$5.632 \times 10^{- 1} / 1.975 \times 10^{- 2}$	$8.163 \times 10^{- 1} / 2.442 \times 10^{- 2}$
40	8	30	$6.549 \times 10^{- 1} / 6.776 \times 10^{- 3}$	$7.711 \times 10^{- 1} / 7.521 \times 10^{- 3}$	$7.587 \times 10^{- 1} / 2.163 \times 10^{- 2}$	$8.540 \times 10^{- 1} / 1.405 \times 10^{- 2}$	$7.324 \times 10^{- 1} / 1.626 \times 10^{- 2}$	$7.584 \times 10^{- 1} / 1.859 \times 10^{- 2}$	$8.638 \times 10^{- 1} / 1.712 \times 10^{- 2}$	$6.068 \times 10^{- 1} / 1.397 \times 10^{- 2}$	$6.588 \times 10^{- 1} / 1.286 \times 10^{- 2}$	$8.055 \times 10^{- 1} / 1.637 \times 10^{- 2}$
		50	$6.082 \times 10^{- 1} / 1.018 \times 10^{- 2}$	$8.140 \times 10^{- 1} / 6.494 \times 10^{- 3}$	$7.945 \times 10^{- 1} / 1.812 \times 10^{- 2}$	$8.771 \times 10^{- 1} / 1.183 \times 10^{- 2}$	$7.705 \times 10^{- 1} / 2.021 \times 10^{- 2}$	$7.987 \times 10^{- 1} / 1.879 \times 10^{- 2}$	$8.750 \times 10^{- 1} / 1.531 \times 10^{- 2}$	$6.682 \times 10^{- 1} / 1.317 \times 10^{- 2}$	$7.062 \times 10^{- 1} / 9.001 \times 10^{- 3}$	$8.169 \times 10^{- 1} / 2.249 \times 10^{- 2}$
		70	$6.572 \times 10^{- 1} / 7.475 \times 10^{- 3}$	$8.558 \times 10^{- 1} / 5.269 \times 10^{- 3}$	$8.517 \times 10^{- 1} / 2.017 \times 10^{- 2}$	$8.998 \times 10^{- 1} / 8.636 \times 10^{- 3}$	$8.094 \times 10^{- 1} / 1.775 \times 10^{- 2}$	$8.322 \times 10^{- 1} / 1.402 \times 10^{- 2}$	$8.874 \times 10^{- 1} / 1.757 \times 10^{- 2}$	$7.155 \times 10^{- 1} / 1.138 \times 10^{- 2}$	$7.452 \times 10^{- 1} / 1.101 \times 10^{- 2}$	$8.772 \times 10^{- 1} / 1.300 \times 10^{- 2}$
	12	30	$5.907 \times 10^{- 1} / 8.579 \times 10^{- 3}$	$7.243 \times 10^{- 1} / 5.455 \times 10^{- 3}$	$6.928 \times 10^{- 1} / 2.974 \times 10^{- 2}$	$8.051 \times 10^{- 1} / 1.613 \times 10^{- 2}$	$6.350 \times 10^{- 1} / 2.210 \times 10^{- 2}$	$6.920 \times 10^{- 1} / 2.874 \times 10^{- 2}$	$8.188 \times 10^{- 1} / 2.338 \times 10^{- 2}$	$5.491 \times 10^{- 1} / 1.459 \times 10^{- 2}$	$5.961 \times 10^{- 1} / 1.937 \times 10^{- 2}$	$7.802 \times 10^{- 1} / 1.869 \times 10^{- 2}$
		50	$5.464 \times 10^{- 1} / 1.071 \times 10^{- 2}$	$7.632 \times 10^{- 1} / 1.083 \times 10^{- 2}$	$7.269 \times 10^{- 1} / 3.742 \times 10^{- 2}$	$8.616 \times 10^{- 1} / 1.230 \times 10^{- 2}$	$6.737 \times 10^{- 1} / 2.455 \times 10^{- 2}$	$6.972 \times 10^{- 1} / 2.913 \times 10^{- 2}$	$8.501 \times 10^{- 1} / 2.112 \times 10^{- 2}$	$5.753 \times 10^{- 1} / 1.518 \times 10^{- 2}$	$5.941 \times 10^{- 1} / 1.654 \times 10^{- 2}$	$8.135 \times 10^{- 1} / 2.164 \times 10^{- 2}$
		70	$5.796 \times 10^{- 1} / 3.508 \times 10^{- 3}$	$7.912 \times 10^{- 1} / 5.226 \times 10^{- 3}$	$7.634 \times 10^{- 1} / 2.454 \times 10^{- 2}$	$8.767 \times 10^{- 1} / 1.225 \times 10^{- 2}$	$6.814 \times 10^{- 1} / 2.769 \times 10^{- 2}$	$7.313 \times 10^{- 1} / 1.620 \times 10^{- 2}$	$8.521 \times 10^{- 1} / 2.389 \times 10^{- 2}$	$5.829 \times 10^{- 1} / 1.085 \times 10^{- 2}$	$6.444 \times 10^{- 1} / 1.244 \times 10^{- 2}$	$8.589 \times 10^{- 1} / 1.831 \times 10^{- 2}$
	16	30	$5.018 \times 10^{- 1} / 9.645 \times 10^{- 3}$	$6.895 \times 10^{- 1} / 9.719 \times 10^{- 3}$	$6.709 \times 10^{- 1} / 2.979 \times 10^{- 2}$	$8.264 \times 10^{- 1} / 1.478 \times 10^{- 2}$	$5.808 \times 10^{- 1} / 3.893 \times 10^{- 2}$	$6.529 \times 10^{- 1} / 2.370 \times 10^{- 2}$	$8.185 \times 10^{- 1} / 2.219 \times 10^{- 2}$	$5.243 \times 10^{- 1} / 1.347 \times 10^{- 2}$	$5.772 \times 10^{- 1} / 8.974 \times 10^{- 3}$	$8.046 \times 10^{- 1} / 1.731 \times 10^{- 2}$
		50	$6.226 \times 10^{- 1} / 4.583 \times 10^{- 3}$	$7.461 \times 10^{- 1} / 5.823 \times 10^{- 3}$	$7.317 \times 10^{- 1} / 2.471 \times 10^{- 2}$	$8.695 \times 10^{- 1} / 1.195 \times 10^{- 2}$	$6.303 \times 10^{- 1} / 2.238 \times 10^{- 2}$	$7.009 \times 10^{- 1} / 1.846 \times 10^{- 2}$	$8.568 \times 10^{- 1} / 1.549 \times 10^{- 2}$	$6.522 \times 10^{- 1} / 1.267 \times 10^{- 2}$	$6.601 \times 10^{- 1} / 6.936 \times 10^{- 3}$	$8.777 \times 10^{- 1} / 9.884 \times 10^{- 3}$
		70	$4.869 \times 10^{- 1} / 1.310 \times 10^{- 2}$	$7.273 \times 10^{- 1} / 6.786 \times 10^{- 3}$	$7.062 \times 10^{- 1} / 2.400 \times 10^{- 2}$	$8.495 \times 10^{- 1} / 1.294 \times 10^{- 2}$	$5.799 \times 10^{- 1} / 2.923 \times 10^{- 2}$	$6.641 \times 10^{- 1} / 2.204 \times 10^{- 2}$	$8.189 \times 10^{- 1} / 1.948 \times 10^{- 2}$	$5.362 \times 10^{- 1} / 1.454 \times 10^{- 2}$	$5.810 \times 10^{- 1} / 1.630 \times 10^{- 2}$	$8.276 \times 10^{- 1} / 2.197 \times 10^{- 2}$

Table 9. Analysis of the averages and standard deviations of makespan across 20 runs for classic dispatching rules vs. DLIDQN.

n	m	e	Ours	FIFO	EDD	SPT	LPT	MRT
20	8	30	$1.635 \times 10^{3} / 7.285 \times 10^{1}$	$3.808 \times 10^{3} / 1.586 \times 10^{2}$	$3.432 \times 10^{3} / 1.683 \times 10^{2}$	$3.023 \times 10^{3} / 2.343 \times 10^{2}$	$2.590 \times 10^{3} / 1.685 \times 10^{2}$	$1.844 \times 10^{3} / 7.232 \times 10^{1}$
		50	$1.867 \times 10^{3} / 4.618 \times 10^{1}$	$3.503 \times 10^{3} / 2.867 \times 10^{2}$	$3.781 \times 10^{3} / 2.280 \times 10^{2}$	$3.039 \times 10^{3} / 1.589 \times 10^{2}$	$2.776 \times 10^{3} / 1.404 \times 10^{2}$	$1.877 \times 10^{3} / 6.817 \times 10^{1}$
		70	$1.937 \times 10^{3} / 5.959 \times 10^{1}$	$4.015 \times 10^{3} / 2.511 \times 10^{2}$	$4.229 \times 10^{3} / 1.651 \times 10^{2}$	$3.183 \times 10^{3} / 1.393 \times 10^{2}$	$3.301 \times 10^{3} / 1.578 \times 10^{2}$	$2.043 \times 10^{3} / 6.586 \times 10^{1}$
	12	30	$1.087 \times 10^{3} / 6.275 \times 10^{1}$	$2.616 \times 10^{3} / 1.811 \times 10^{2}$	$2.638 \times 10^{3} / 1.328 \times 10^{2}$	$2.371 \times 10^{3} / 1.630 \times 10^{2}$	$1.997 \times 10^{3} / 9.397 \times 10^{1}$	$1.173 \times 10^{3} / 6.041 \times 10^{1}$
		50	$1.220 \times 10^{3} / 4.861 \times 10^{1}$	$2.920 \times 10^{3} / 1.441 \times 10^{2}$	$3.208 \times 10^{3} / 1.562 \times 10^{2}$	$2.623 \times 10^{3} / 1.271 \times 10^{2}$	$2.400 \times 10^{3} / 1.614 \times 10^{2}$	$1.355 \times 10^{3} / 6.211 \times 10^{1}$
		70	$9.584 \times 10^{3} / 3.765 \times 10^{1}$	$2.240 \times 10^{3} / 7.706 \times 10^{1}$	$1.982 \times 10^{3} / 1.558 \times 10^{2}$	$1.617 \times 10^{3} / 9.245 \times 10^{1}$	$1.563 \times 10^{3} / 6.518 \times 10^{1}$	$1.103 \times 10^{3} / 5.697 \times 10^{1}$
	16	30	$7.884 \times 10^{2} / 5.579 \times 10^{1}$	$1.888 \times 10^{3} / 1.519 \times 10^{2}$	$1.799 \times 10^{3} / 1.051 \times 10^{2}$	$1.615 \times 10^{3} / 1.384 \times 10^{2}$	$1.421 \times 10^{3} / 7.434 \times 10^{1}$	$8.478 \times 10^{2} / 4.950 \times 10^{1}$
		50	$7.910 \times 10^{2} / 2.519 \times 10^{1}$	$1.827 \times 10^{3} / 6.531 \times 10^{1}$	$1.793 \times 10^{3} / 8.798 \times 10^{1}$	$1.639 \times 10^{3} / 1.109 \times 10^{2}$	$1.468 \times 10^{3} / 7.821 \times 10^{1}$	$9.243 \times 10^{2} / 3.926 \times 10^{1}$
		70	$7.568 \times 10^{2} / 4.819 \times 10^{1}$	$1.702 \times 10^{3} / 8.798 \times 10^{1}$	$1.736 \times 10^{3} / 8.543 \times 10^{1}$	$1.609 \times 10^{3} / 9.762 \times 10^{1}$	$1.484 \times 10^{3} / 1.021 \times 10^{2}$	$9.083 \times 10^{2} / 3.925 \times 10^{1}$
30	8	30	$1.841 \times 10^{3} / 4.900 \times 10^{1}$	$3.625 \times 10^{3} / 1.499 \times 10^{2}$	$3.587 \times 10^{3} / 1.073 \times 10^{2}$	$2.983 \times 10^{3} / 1.195 \times 10^{2}$	$2.995 \times 10^{3} / 1.340 \times 10^{2}$	$1.878 \times 10^{3} / 5.872 \times 10^{1}$
		50	$1.841 \times 10^{3} / 6.275 \times 10^{1}$	$4.059 \times 10^{3} / 1.659 \times 10^{2}$	$4.354 \times 10^{3} / 1.886 \times 10^{2}$	$3.487 \times 10^{3} / 2.077 \times 10^{2}$	$3.183 \times 10^{3} / 1.261 \times 10^{2}$	$1.961 \times 10^{3} / 9.982 \times 10^{1}$
		70	$2.547 \times 10^{3} / 4.407 \times 10^{1}$	$5.664 \times 10^{3} / 1.865 \times 10^{2}$	$5.733 \times 10^{3} / 1.833 \times 10^{2}$	$4.469 \times 10^{3} / 1.619 \times 10^{2}$	$4.804 \times 10^{3} / 1.626 \times 10^{2}$	$2.610 \times 10^{3} / 6.262 \times 10^{1}$
	12	30	$1.785 \times 10^{3} / 6.856 \times 10^{1}$	$4.109 \times 10^{3} / 2.138 \times 10^{2}$	$4.350 \times 10^{3} / 1.920 \times 10^{2}$	$4.032 \times 10^{3} / 2.308 \times 10^{2}$	$3.756 \times 10^{3} / 2.272 \times 10^{2}$	$1.902 \times 10^{3} / 4.610 \times 10^{1}$
		50	$1.491 \times 10^{3} / 5.342 \times 10^{1}$	$3.782 \times 10^{3} / 1.717 \times 10^{2}$	$3.653 \times 10^{3} / 1.777 \times 10^{2}$	$2.970 \times 10^{3} / 1.827 \times 10^{2}$	$3.081 \times 10^{3} / 1.579 \times 10^{2}$	$1.641 \times 10^{3} / 5.171 \times 10^{1}$
		70	$1.366 \times 10^{3} / 4.562 \times 10^{1}$	$2.955 \times 10^{3} / 1.452 \times 10^{2}$	$2.941 \times 10^{3} / 1.397 \times 10^{2}$	$2.375 \times 10^{3} / 1.445 \times 10^{2}$	$2.209 \times 10^{3} / 1.390 \times 10^{2}$	$1.484 \times 10^{3} / 6.512 \times 10^{1}$
	16	30	$8.613 \times 10^{2} / 5.363 \times 10^{1}$	$2.213 \times 10^{3} / 9.350 \times 10^{1}$	$2.491 \times 10^{3} / 1.188 \times 10^{2}$	$1.926 \times 10^{3} / 7.350 \times 10^{1}$	$1.888 \times 10^{3} / 1.060 \times 10^{2}$	$9.702 \times 10^{2} / 4.338 \times 10^{1}$
		50	$1.226 \times 10^{3} / 4.746 \times 10^{1}$	$2.932 \times 10^{3} / 1.363 \times 10^{2}$	$3.035 \times 10^{3} / 1.794 \times 10^{2}$	$2.566 \times 10^{3} / 1.275 \times 10^{2}$	$2.253 \times 10^{3} / 1.470 \times 10^{2}$	$1.328 \times 10^{3} / 7.583 \times 10^{1}$
		70	$1.027 \times 10^{3} / 4.635 \times 10^{1}$	$2.639 \times 10^{3} / 9.533 \times 10^{1}$	$2.475 \times 10^{3} / 1.503 \times 10^{2}$	$2.253 \times 10^{3} / 1.292 \times 10^{2}$	$2.173 \times 10^{3} / 1.194 \times 10^{2}$	$1.134 \times 10^{3} / 3.871 \times 10^{1}$
40	8	30	$2.068 \times 10^{3} / 6.852 \times 10^{1}$	$4.521 \times 10^{3} / 2.399 \times 10^{2}$	$4.716 \times 10^{3} / 1.659 \times 10^{2}$	$3.325 \times 10^{3} / 1.293 \times 10^{2}$	$3.552 \times 10^{3} / 1.246 \times 10^{2}$	$2.195 \times 10^{3} / 7.037 \times 10^{1}$
		50	$3.585 \times 10^{3} / 4.697 \times 10^{1}$	$8.473 \times 10^{3} / 2.913 \times 10^{2}$	$7.601 \times 10^{3} / 2.736 \times 10^{2}$	$5.469 \times 10^{3} / 1.052 \times 10^{2}$	$7.083 \times 10^{3} / 2.537 \times 10^{2}$	$3.588 \times 10^{3} / 8.020 \times 10^{1}$
		70	$3.578 \times 10^{3} / 5.479 \times 10^{1}$	$6.688 \times 10^{3} / 1.646 \times 10^{2}$	$6.967 \times 10^{3} / 3.112 \times 10^{2}$	$5.103 \times 10^{3} / 1.575 \times 10^{2}$	$5.731 \times 10^{3} / 3.152 \times 10^{2}$	$3.552 \times 10^{3} / 8.742 \times 10^{1}$
	12	30	$1.757 \times 10^{3} / 5.697 \times 10^{1}$	$4.648 \times 10^{3} / 2.678 \times 10^{2}$	$4.663 \times 10^{3} / 2.821 \times 10^{2}$	$3.200 \times 10^{3} / 1.990 \times 10^{2}$	$3.157 \times 10^{3} / 1.523 \times 10^{2}$	$1.846 \times 10^{3} / 4.708 \times 10^{1}$
		50	$1.797 \times 10^{3} / 6.543 \times 10^{1}$	$4.416 \times 10^{3} / 2.563 \times 10^{2}$	$4.196 \times 10^{3} / 1.621 \times 10^{2}$	$3.375 \times 10^{3} / 1.981 \times 10^{2}$	$3.525 \times 10^{3} / 2.258 \times 10^{2}$	$1.930 \times 10^{3} / 6.738 \times 10^{1}$
		70	$1.893 \times 10^{3} / 4.507 \times 10^{1}$	$4.633 \times 10^{3} / 2.002 \times 10^{2}$	$4.523 \times 10^{3} / 2.069 \times 10^{2}$	$3.488 \times 10^{3} / 1.515 \times 10^{2}$	$3.305 \times 10^{3} / 1.456 \times 10^{2}$	$1.978 \times 10^{3} / 5.715 \times 10^{1}$
	16	30	$1.410 \times 10^{3} / 4.413 \times 10^{1}$	$3.567 \times 10^{3} / 1.703 \times 10^{2}$	$3.621 \times 10^{3} / 1.192 \times 10^{2}$	$2.687 \times 10^{3} / 1.494 \times 10^{2}$	$2.677 \times 10^{3} / 1.324 \times 10^{2}$	$1.491 \times 10^{3} / 6.010 \times 10^{1}$
		50	$1.477 \times 10^{3} / 3.263 \times 10^{1}$	$2.831 \times 10^{3} / 9.829 \times 10^{1}$	$2.998 \times 10^{3} / 9.805 \times 10^{1}$	$2.458 \times 10^{3} / 1.281 \times 10^{2}$	$2.488 \times 10^{3} / 8.177 \times 10^{1}$	$1.570 \times 10^{3} / 3.971 \times 10^{1}$
		70	$1.357 \times 10^{3} / 6.114 \times 10^{1}$	$3.200 \times 10^{3} / 2.024 \times 10^{2}$	$3.267 \times 10^{3} / 1.511 \times 10^{2}$	$2.465 \times 10^{3} / 1.021 \times 10^{2}$	$2.513 \times 10^{3} / 1.200 \times 10^{2}$	$1.511 \times 10^{3} / 6.568 \times 10^{1}$

Table 10. Analysis of the averages and standard deviations of

U_{a v e}

across 20 runs for classic dispatching rules vs. DLIDQN.

Table 10. Analysis of the averages and standard deviations of

U_{a v e}

across 20 runs for classic dispatching rules vs. DLIDQN.

n	m	e	Ours	FIFO	EDD	SPT	LPT	MRT
20	8	30	$9.819 \times 10^{- 1} / 7.412 \times 10^{- 3}$	$3.667 \times 10^{- 1} / 1.372 \times 10^{- 2}$	$3.932 \times 10^{- 1} / 1.277 \times 10^{- 2}$	$4.747 \times 10^{- 1} / 2.344 \times 10^{- 2}$	$5.361 \times 10^{- 1} / 2.003 \times 10^{- 2}$	$8.372 \times 10^{- 1} / 1.774 \times 10^{- 2}$
		50	$9.884 \times 10^{- 1} / 5.444 \times 10^{- 3}$	$4.225 \times 10^{- 1} / 2.985 \times 10^{- 2}$	$4.074 \times 10^{- 1} / 1.898 \times 10^{- 2}$	$5.313 \times 10^{- 1} / 1.919 \times 10^{- 2}$	$6.092 \times 10^{- 1} / 1.586 \times 10^{- 2}$	$8.800 \times 10^{- 1} / 1.091 \times 10^{- 2}$
		70	$9.910 \times 10^{- 1} / 4.426 \times 10^{- 3}$	$4.018 \times 10^{- 1} / 1.988 \times 10^{- 2}$	$3.609 \times 10^{- 1} / 9.805 \times 10^{- 3}$	$5.197 \times 10^{- 1} / 2.275 \times 10^{- 2}$	$4.604 \times 10^{- 1} / 1.325 \times 10^{- 2}$	$8.933 \times 10^{- 1} / 1.272 \times 10^{- 2}$
	12	30	$9.244 \times 10^{- 1} / 7.699 \times 10^{- 3}$	$3.799 \times 10^{- 1} / 2.046 \times 10^{- 2}$	$3.499 \times 10^{- 1} / 1.366 \times 10^{- 2}$	$4.081 \times 10^{- 1} / 2.113 \times 10^{- 2}$	$4.745 \times 10^{- 1} / 2.597 \times 10^{- 2}$	$8.120 \times 10^{- 1} / 1.247 \times 10^{- 2}$
		50	$9.204 \times 10^{- 1} / 1.044 \times 10^{- 2}$	$3.797 \times 10^{- 1} / 1.946 \times 10^{- 2}$	$3.705 \times 10^{- 1} / 1.374 \times 10^{- 2}$	$4.917 \times 10^{- 1} / 2.388 \times 10^{- 2}$	$4.562 \times 10^{- 1} / 1.739 \times 10^{- 2}$	$7.800 \times 10^{- 1} / 1.787 \times 10^{- 2}$
		70	$9.278 \times 10^{- 1} / 1.432 \times 10^{- 2}$	$3.824 \times 10^{- 1} / 1.289 \times 10^{- 2}$	$4.174 \times 10^{- 1} / 2.680 \times 10^{- 2}$	$4.909 \times 10^{- 1} / 2.053 \times 10^{- 2}$	$5.414 \times 10^{- 1} / 1.991 \times 10^{- 2}$	$7.794 \times 10^{- 1} / 1.452 \times 10^{- 2}$
	16	30	$8.353 \times 10^{- 1} / 1.646 \times 10^{- 2}$	$3.604 \times 10^{- 1} / 2.208 \times 10^{- 2}$	$3.669 \times 10^{- 1} / 1.664 \times 10^{- 2}$	$3.896 \times 10^{- 1} / 2.598 \times 10^{- 2}$	$4.567 \times 10^{- 1} / 1.633 \times 10^{- 2}$	$7.375 \times 10^{- 1} / 1.336 \times 10^{- 2}$
		50	$8.436 \times 10^{- 1} / 1.386 \times 10^{- 2}$	$3.811 \times 10^{- 1} / 1.319 \times 10^{- 2}$	$4.075 \times 10^{- 1} / 1.741 \times 10^{- 2}$	$4.896 \times 10^{- 1} / 2.840 \times 10^{- 2}$	$5.437 \times 10^{- 1} / 1.819 \times 10^{- 2}$	$7.497 \times 10^{- 1} / 1.271 \times 10^{- 2}$
		70	$8.654 \times 10^{- 1} / 1.417 \times 10^{- 2}$	$4.033 \times 10^{- 1} / 1.052 \times 10^{- 2}$	$3.805 \times 10^{- 1} / 1.103 \times 10^{- 2}$	$4.677 \times 10^{- 1} / 1.728 \times 10^{- 2}$	$4.591 \times 10^{- 1} / 1.437 \times 10^{- 2}$	$7.043 \times 10^{- 1} / 2.031 \times 10^{- 2}$
30	8	30	$9.945 \times 10^{- 1} / 4.542 \times 10^{- 3}$	$4.184 \times 10^{- 1} / 1.349 \times 10^{- 2}$	$4.044 \times 10^{- 1} / 1.234 \times 10^{- 2}$	$5.112 \times 10^{- 1} / 1.459 \times 10^{- 2}$	$5.763 \times 10^{- 1} / 1.589 \times 10^{- 2}$	$9.019 \times 10^{- 1} / 1.457 \times 10^{- 2}$
		50	$9.942 \times 10^{- 1} / 4.098 \times 10^{- 3}$	$3.863 \times 10^{- 1} / 1.011 \times 10^{- 2}$	$3.668 \times 10^{- 1} / 1.242 \times 10^{- 2}$	$4.507 \times 10^{- 1} / 1.783 \times 10^{- 2}$	$5.021 \times 10^{- 1} / 1.668 \times 10^{- 2}$	$8.905 \times 10^{- 1} / 1.632 \times 10^{- 2}$
		70	$9.985 \times 10^{- 1} / 1.511 \times 10^{- 3}$	$3.346 \times 10^{- 1} / 8.075 \times 10^{- 3}$	$3.352 \times 10^{- 1} / 7.793 \times 10^{- 3}$	$4.253 \times 10^{- 1} / 9.587 \times 10^{- 3}$	$4.128 \times 10^{- 1} / 1.227 \times 10^{- 2}$	$9.267 \times 10^{- 1} / 7.287 \times 10^{- 3}$
	12	30	$9.744 \times 10^{- 1} / 8.648 \times 10^{- 3}$	$3.686 \times 10^{- 1} / 1.305 \times 10^{- 2}$	$3.433 \times 10^{- 1} / 1.266 \times 10^{- 2}$	$4.193 \times 10^{- 1} / 1.832 \times 10^{- 2}$	$4.190 \times 10^{- 1} / 1.831 \times 10^{- 2}$	$8.435 \times 10^{- 1} / 6.755 \times 10^{- 3}$
		50	$9.853 \times 10^{- 1} / 5.352 \times 10^{- 3}$	$3.577 \times 10^{- 1} / 1.684 \times 10^{- 2}$	$3.547 \times 10^{- 1} / 1.428 \times 10^{- 2}$	$4.739 \times 10^{- 1} / 1.850 \times 10^{- 2}$	$4.311 \times 10^{- 1} / 2.077 \times 10^{- 2}$	$8.536 \times 10^{- 1} / 1.252 \times 10^{- 2}$
		70	$9.689 \times 10^{- 1} / 7.467 \times 10^{- 3}$	$4.008 \times 10^{- 1} / 1.274 \times 10^{- 2}$	$3.910 \times 10^{- 1} / 1.808 \times 10^{- 2}$	$5.039 \times 10^{- 1} / 1.979 \times 10^{- 2}$	$5.364 \times 10^{- 1} / 1.916 \times 10^{- 2}$	$8.379 \times 10^{- 1} / 1.527 \times 10^{- 2}$
	16	30	$9.234 \times 10^{- 1} / 1.387 \times 10^{- 2}$	$3.535 \times 10^{- 1} / 1.053 \times 10^{- 2}$	$2.994 \times 10^{- 1} / 1.379 \times 10^{- 2}$	$4.116 \times 10^{- 1} / 1.323 \times 10^{- 2}$	$4.171 \times 10^{- 1} / 1.737 \times 10^{- 2}$	$7.898 \times 10^{- 1} / 1.461 \times 10^{- 2}$
		50	$9.202 \times 10^{- 1} / 1.081 \times 10^{- 2}$	$3.441 \times 10^{- 1} / 1.619 \times 10^{- 2}$	$3.411 \times 10^{- 1} / 1.137 \times 10^{- 2}$	$4.623 \times 10^{- 1} / 1.690 \times 10^{- 2}$	$4.693 \times 10^{- 1} / 1.559 \times 10^{- 2}$	$7.725 \times 10^{- 1} / 1.865 \times 10^{- 2}$
		70	$9.194 \times 10^{- 1} / 1.180 \times 10^{- 2}$	$3.890 \times 10^{- 1} / 1.124 \times 10^{- 2}$	$3.546 \times 10^{- 1} / 1.906 \times 10^{- 2}$	$4.243 \times 10^{- 1} / 1.519 \times 10^{- 2}$	$4.491 \times 10^{- 1} / 2.340 \times 10^{- 2}$	$8.049 \times 10^{- 1} / 1.116 \times 10^{- 2}$
40	8	30	$9.942 \times 10^{- 1} / 5.093 \times 10^{- 3}$	$3.783 \times 10^{- 1} / 1.741 \times 10^{- 2}$	$3.834 \times 10^{- 1} / 1.348 \times 10^{- 2}$	$5.328 \times 10^{- 1} / 1.160 \times 10^{- 2}$	$5.129 \times 10^{- 1} / 1.228 \times 10^{- 2}$	$9.211 \times 10^{- 1} / 1.105 \times 10^{- 2}$
		50	$9.996 \times 10^{- 1} / 5.870 \times 10^{- 4}$	$3.209 \times 10^{- 1} / 7.069 \times 10^{- 3}$	$3.616 \times 10^{- 1} / 1.039 \times 10^{- 2}$	$5.301 \times 10^{- 1} / 9.666 \times 10^{- 3}$	$3.973 \times 10^{- 1} / 1.206 \times 10^{- 2}$	$9.398 \times 10^{- 1} / 4.763 \times 10^{- 3}$
		70	$9.998 \times 10^{- 1} / 4.882 \times 10^{- 4}$	$3.866 \times 10^{- 1} / 7.138 \times 10^{- 3}$	$3.756 \times 10^{- 1} / 1.354 \times 10^{- 2}$	$5.461 \times 10^{- 1} / 1.414 \times 10^{- 2}$	$4.915 \times 10^{- 1} / 1.504 \times 10^{- 2}$	$9.328 \times 10^{- 1} / 9.328 \times 10^{- 3}$
	12	30	$9.928 \times 10^{- 1} / 3.390 \times 10^{- 3}$	$3.142 \times 10^{- 1} / 1.127 \times 10^{- 2}$	$3.023 \times 10^{- 1} / 1.253 \times 10^{- 2}$	$4.543 \times 10^{- 1} / 1.875 \times 10^{- 2}$	$4.542 \times 10^{- 1} / 1.716 \times 10^{- 2}$	$8.752 \times 10^{- 1} / 1.112 \times 10^{- 2}$
		50	$9.880 \times 10^{- 1} / 5.327 \times 10^{- 3}$	$3.665 \times 10^{- 1} / 1.492 \times 10^{- 2}$	$3.590 \times 10^{- 1} / 1.209 \times 10^{- 2}$	$4.519 \times 10^{- 1} / 2.199 \times 10^{- 2}$	$4.780 \times 10^{- 1} / 1.799 \times 10^{- 2}$	$8.699 \times 10^{- 1} / 1.328 \times 10^{- 2}$
		70	$9.911 \times 10^{- 1} / 3.814 \times 10^{- 3}$	$3.425 \times 10^{- 1} / 1.191 \times 10^{- 2}$	$3.447 \times 10^{- 1} / 1.468 \times 10^{- 2}$	$4.507 \times 10^{- 1} / 1.780 \times 10^{- 2}$	$4.790 \times 10^{- 1} / 1.640 \times 10^{- 2}$	$8.842 \times 10^{- 1} / 1.098 \times 10^{- 2}$
	16	30	$9.781 \times 10^{- 1} / 6.968 \times 10^{- 3}$	$3.528 \times 10^{- 1} / 1.503 \times 10^{- 2}$	$3.313 \times 10^{- 1} / 7.206 \times 10^{- 3}$	$4.649 \times 10^{- 1} / 1.482 \times 10^{- 2}$	$4.536 \times 10^{- 1} / 1.654 \times 10^{- 2}$	$8.578 \times 10^{- 1} / 1.574 \times 10^{- 2}$
		50	$9.917 \times 10^{- 1} / 2.110 \times 10^{- 3}$	$4.355 \times 10^{- 1} / 1.325 \times 10^{- 2}$	$4.208 \times 10^{- 1} / 1.292 \times 10^{- 2}$	$5.324 \times 10^{- 1} / 1.991 \times 10^{- 2}$	$5.246 \times 10^{- 1} / 1.120 \times 10^{- 2}$	$8.820 \times 10^{- 1} / 6.211 \times 10^{- 3}$
		70	$9.734 \times 10^{- 1} / 4.744 \times 10^{- 3}$	$3.848 \times 10^{- 1} / 1.938 \times 10^{- 2}$	$3.549 \times 10^{- 1} / 1.093 \times 10^{- 2}$	$4.949 \times 10^{- 1} / 1.732 \times 10^{- 2}$	$4.580 \times 10^{- 1} / 1.992 \times 10^{- 2}$	$8.281 \times 10^{- 1} / 1.517 \times 10^{- 2}$

Table 11. Analysis of the averages and standard deviations of

T R_{a v e}

across 20 runs for classic dispatching rules vs. DLIDQN.

Table 11. Analysis of the averages and standard deviations of

T R_{a v e}

across 20 runs for classic dispatching rules vs. DLIDQN.

n	m	e	Ours	FIFO	EDD	SPT	LPT	MRT
20	8	30	$6.657 \times 10^{- 1} / 1.125 \times 10^{- 2}$	$7.383 \times 10^{- 1} / 1.928 \times 10^{- 2}$	$6.505 \times 10^{- 1} / 2.400 \times 10^{- 2}$	$7.000 \times 10^{- 1} / 2.139 \times 10^{- 2}$	$7.442 \times 10^{- 1} / 1.251 \times 10^{- 2}$	$7.871 \times 10^{- 1} / 8.480 \times 10^{- 3}$
		50	$5.858 \times 10^{- 1} / 4.284 \times 10^{- 2}$	$7.271 \times 10^{- 1} / 1.623 \times 10^{- 2}$	$6.169 \times 10^{- 1} / 1.952 \times 10^{- 2}$	$7.855 \times 10^{- 1} / 1.444 \times 10^{- 2}$	$7.605 \times 10^{- 1} / 1.455 \times 10^{- 2}$	$8.078 \times 10^{- 1} / 7.737 \times 10^{- 3}$
		70	$5.745 \times 10^{- 1} / 8.057 \times 10^{- 3}$	$7.768 \times 10^{- 1} / 1.403 \times 10^{- 2}$	$6.952 \times 10^{- 1} / 1.888 \times 10^{- 2}$	$7.475 \times 10^{- 1} / 1.843 \times 10^{- 2}$	$6.730 \times 10^{- 1} / 1.800 \times 10^{- 2}$	$8.036 \times 10^{- 1} / 5.971 \times 10^{- 3}$
	12	30	$5.259 \times 10^{- 1} / 1.194 \times 10^{- 2}$	$7.322 \times 10^{- 1} / 1.308 \times 10^{- 2}$	$5.539 \times 10^{- 1} / 9.554 \times 10^{- 3}$	$5.981 \times 10^{- 1} / 1.249 \times 10^{- 2}$	$6.266 \times 10^{- 1} / 1.210 \times 10^{- 2}$	$6.926 \times 10^{- 1} / 1.468 \times 10^{- 2}$
		50	$5.152 \times 10^{- 1} / 1.306 \times 10^{- 2}$	$6.651 \times 10^{- 1} / 1.583 \times 10^{- 2}$	$5.063 \times 10^{- 1} / 1.379 \times 10^{- 2}$	$5.431 \times 10^{- 1} / 1.787 \times 10^{- 2}$	$5.005 \times 10^{- 1} / 2.423 \times 10^{- 2}$	$6.875 \times 10^{- 1} / 1.433 \times 10^{- 2}$
		70	$5.041 \times 10^{- 1} / 1.302 \times 10^{- 2}$	$7.337 \times 10^{- 1} / 8.501 \times 10^{- 3}$	$5.386 \times 10^{- 1} / 2.020 \times 10^{- 2}$	$6.473 \times 10^{- 1} / 2.360 \times 10^{- 2}$	$5.732 \times 10^{- 1} / 1.480 \times 10^{- 2}$	$6.744 \times 10^{- 1} / 1.008 \times 10^{- 2}$
	16	30	$3.798 \times 10^{- 1} / 1.751 \times 10^{- 2}$	$5.850 \times 10^{- 1} / 3.235 \times 10^{- 2}$	$4.617 \times 10^{- 1} / 1.963 \times 10^{- 2}$	$4.737 \times 10^{- 1} / 3.067 \times 10^{- 2}$	$5.019 \times 10^{- 1} / 2.351 \times 10^{- 2}$	$5.337 \times 10^{- 1} / 1.823 \times 10^{- 2}$
		50	$3.117 \times 10^{- 1} / 1.502 \times 10^{- 2}$	$5.132 \times 10^{- 1} / 1.626 \times 10^{- 2}$	$3.693 \times 10^{- 1} / 1.873 \times 10^{- 2}$	$4.621 \times 10^{- 1} / 1.855 \times 10^{- 2}$	$5.109 \times 10^{- 1} / 1.779 \times 10^{- 2}$	$5.372 \times 10^{- 1} / 1.648 \times 10^{- 2}$
		70	$2.826 \times 10^{- 1} / 2.583 \times 10^{- 2}$	$5.035 \times 10^{- 1} / 1.458 \times 10^{- 2}$	$3.160 \times 10^{- 1} / 2.550 \times 10^{- 2}$	$4.051 \times 10^{- 1} / 2.101 \times 10^{- 2}$	$3.898 \times 10^{- 1} / 1.543 \times 10^{- 2}$	$4.808 \times 10^{- 1} / 2.423 \times 10^{- 2}$
30	8	30	$5.900 \times 10^{- 1} / 6.196 \times 10^{- 3}$	$5.494 \times 10^{- 1} / 1.016 \times 10^{- 2}$	$5.722 \times 10^{- 1} / 1.563 \times 10^{- 2}$	$7.238 \times 10^{- 1} / 1.158 \times 10^{- 2}$	$7.240 \times 10^{- 1} / 1.346 \times 10^{- 2}$	$8.114 \times 10^{- 1} / 5.384 \times 10^{- 3}$
		50	$6.151 \times 10^{- 1} / 7.524 \times 10^{- 3}$	$7.827 \times 10^{- 1} / 8.962 \times 10^{- 3}$	$5.814 \times 10^{- 1} / 2.205 \times 10^{- 2}$	$6.426 \times 10^{- 1} / 2.322 \times 10^{- 2}$	$7.011 \times 10^{- 1} / 1.406 \times 10^{- 2}$	$8.222 \times 10^{- 1} / 8.668 \times 10^{- 3}$
		70	$7.271 \times 10^{- 1} / 3.618 \times 10^{- 3}$	$8.343 \times 10^{- 1} / 6.147 \times 10^{- 3}$	$7.445 \times 10^{- 1} / 9.601 \times 10^{- 3}$	$8.055 \times 10^{- 1} / 8.791 \times 10^{- 3}$	$8.374 \times 10^{- 1} / 7.061 \times 10^{- 3}$	$8.446 \times 10^{- 1} / 3.724 \times 10^{- 3}$
	12	30	$6.067 \times 10^{- 1} / 1.041 \times 10^{- 2}$	$7.969 \times 10^{- 1} / 8.965 \times 10^{- 3}$	$5.756 \times 10^{- 1} / 2.526 \times 10^{- 2}$	$7.246 \times 10^{- 1} / 1.155 \times 10^{- 2}$	$7.618 \times 10^{- 1} / 9.478 \times 10^{- 3}$	$8.246 \times 10^{- 1} / 3.205 \times 10^{- 3}$
		50	$5.031 \times 10^{- 1} / 1.188 \times 10^{- 2}$	$7.584 \times 10^{- 1} / 1.830 \times 10^{- 2}$	$5.732 \times 10^{- 1} / 1.475 \times 10^{- 2}$	$6.355 \times 10^{- 1} / 1.894 \times 10^{- 2}$	$6.663 \times 10^{- 1} / 1.268 \times 10^{- 2}$	$7.560 \times 10^{- 1} / 8.467 \times 10^{- 3}$
		70	$4.854 \times 10^{- 1} / 9.758 \times 10^{- 3}$	$7.090 \times 10^{- 1} / 1.846 \times 10^{- 2}$	$5.102 \times 10^{- 1} / 2.223 \times 10^{- 2}$	$6.546 \times 10^{- 1} / 1.319 \times 10^{- 2}$	$6.212 \times 10^{- 1} / 1.267 \times 10^{- 2}$	$7.514 \times 10^{- 1} / 1.039 \times 10^{- 2}$
	16	30	$3.849 \times 10^{- 1} / 1.738 \times 10^{- 2}$	$6.171 \times 10^{- 1} / 1.479 \times 10^{- 2}$	$3.958 \times 10^{- 1} / 1.420 \times 10^{- 2}$	$5.046 \times 10^{- 1} / 2.517 \times 10^{- 2}$	$5.075 \times 10^{- 1} / 2.122 \times 10^{- 2}$	$6.028 \times 10^{- 1} / 1.644 \times 10^{- 2}$
		50	$5.003 \times 10^{- 1} / 1.339 \times 10^{- 2}$	$6.893 \times 10^{- 1} / 1.354 \times 10^{- 2}$	$5.511 \times 10^{- 1} / 1.653 \times 10^{- 2}$	$5.887 \times 10^{- 1} / 1.757 \times 10^{- 2}$	$6.251 \times 10^{- 1} / 1.991 \times 10^{- 2}$	$6.825 \times 10^{- 1} / 1.926 \times 10^{- 2}$
		70	$4.882 \times 10^{- 1} / 1.172 \times 10^{- 2}$	$4.504 \times 10^{- 1} / 1.608 \times 10^{- 2}$	$4.593 \times 10^{- 1} / 2.556 \times 10^{- 2}$	$5.773 \times 10^{- 1} / 2.425 \times 10^{- 2}$	$6.330 \times 10^{- 1} / 1.228 \times 10^{- 2}$	$6.730 \times 10^{- 1} / 1.014 \times 10^{- 2}$
40	8	30	$6.549 \times 10^{- 1} / 6.776 \times 10^{- 3}$	$7.838 \times 10^{- 1} / 1.384 \times 10^{- 2}$	$6.078 \times 10^{- 1} / 2.138 \times 10^{- 2}$	$7.419 \times 10^{- 1} / 1.499 \times 10^{- 2}$	$7.929 \times 10^{- 1} / 7.281 \times 10^{- 3}$	$8.493 \times 10^{- 1} / 4.188 \times 10^{- 3}$
		50	$6.082 \times 10^{- 1} / 1.018 \times 10^{- 2}$	$8.310 \times 10^{- 1} / 9.457 \times 10^{- 3}$	$6.225 \times 10^{- 1} / 1.657 \times 10^{- 2}$	$7.624 \times 10^{- 1} / 7.822 \times 10^{- 3}$	$7.880 \times 10^{- 1} / 5.875 \times 10^{- 3}$	$8.689 \times 10^{- 1} / 2.150 \times 10^{- 3}$
		70	$6.572 \times 10^{- 1} / 7.475 \times 10^{- 3}$	$8.561 \times 10^{- 1} / 7.765 \times 10^{- 3}$	$6.904 \times 10^{- 1} / 1.192 \times 10^{- 2}$	$8.479 \times 10^{- 1} / 8.289 \times 10^{- 3}$	$8.327 \times 10^{- 1} / 7.667 \times 10^{- 3}$	$8.913 \times 10^{- 1} / 2.529 \times 10^{- 3}$
	12	30	$5.907 \times 10^{- 1} / 8.579 \times 10^{- 3}$	$7.605 \times 10^{- 1} / 1.137 \times 10^{- 2}$	$6.441 \times 10^{- 1} / 1.454 \times 10^{- 2}$	$6.911 \times 10^{- 1} / 1.527 \times 10^{- 2}$	$7.254 \times 10^{- 1} / 1.331 \times 10^{- 2}$	$8.083 \times 10^{- 1} / 3.797 \times 10^{- 3}$
		50	$5.464 \times 10^{- 1} / 1.071 \times 10^{- 2}$	$7.303 \times 10^{- 1} / 1.628 \times 10^{- 2}$	$5.606 \times 10^{- 1} / 2.521 \times 10^{- 2}$	$7.541 \times 10^{- 1} / 1.266 \times 10^{- 2}$	$7.200 \times 10^{- 1} / 1.988 \times 10^{- 2}$	$7.975 \times 10^{- 1} / 6.598 \times 10^{- 3}$
		70	$5.796 \times 10^{- 1} / 3.508 \times 10^{- 3}$	$7.846 \times 10^{- 1} / 1.147 \times 10^{- 2}$	$5.573 \times 10^{- 1} / 2.382 \times 10^{- 2}$	$5.363 \times 10^{- 1} / 8.257 \times 10^{- 3}$	$7.318 \times 10^{- 1} / 1.081 \times 10^{- 2}$	$8.055 \times 10^{- 1} / 5.346 \times 10^{- 3}$
	16	30	$5.018 \times 10^{- 1} / 9.645 \times 10^{- 3}$	$7.147 \times 10^{- 1} / 1.287 \times 10^{- 2}$	$5.164 \times 10^{- 1} / 1.410 \times 10^{- 2}$	$6.386 \times 10^{- 1} / 1.373 \times 10^{- 2}$	$6.178 \times 10^{- 1} / 1.306 \times 10^{- 2}$	$7.101 \times 10^{- 1} / 1.165 \times 10^{- 2}$
		50	$6.226 \times 10^{- 1} / 4.583 \times 10^{- 3}$	$7.045 \times 10^{- 1} / 1.149 \times 10^{- 2}$	$5.864 \times 10^{- 1} / 1.181 \times 10^{- 2}$	$6.289 \times 10^{- 1} / 1.453 \times 10^{- 2}$	$6.574 \times 10^{- 1} / 9.230 \times 10^{- 3}$	$7.528 \times 10^{- 1} / 4.476 \times 10^{- 3}$
		70	$4.869 \times 10^{- 1} / 1.310 \times 10^{- 2}$	$7.245 \times 10^{- 1} / 1.797 \times 10^{- 2}$	$4.990 \times 10^{- 1} / 2.615 \times 10^{- 2}$	$6.492 \times 10^{- 1} / 1.149 \times 10^{- 2}$	$6.864 \times 10^{- 1} / 1.459 \times 10^{- 2}$	$7.545 \times 10^{- 1} / 8.700 \times 10^{- 3}$

Table 12. Analysis of the averages and standard deviations of makespan across 20 runs for other RL-based algorithms vs. DLIDQN.

n	m	e	Ours	DQN	D3QN	HDMDDQN
20	8	30	$1.635 \times 10^{3} / 7.285 \times 10^{1}$	$2.386 \times 10^{3} / 1.656 \times 10^{2}$	$2.912 \times 10^{3} / 2.174 \times 10^{2}$	$2.093 \times 10^{3} / 1.262 \times 10^{2}$
		50	$1.867 \times 10^{3} / 4.618 \times 10^{1}$	$2.677 \times 10^{3} / 1.970 \times 10^{2}$	$3.137 \times 10^{3} / 3.069 \times 10^{2}$	$2.248 \times 10^{3} / 1.448 \times 10^{2}$
		70	$1.937 \times 10^{3} / 5.959 \times 10^{1}$	$3.003 \times 10^{3} / 2.023 \times 10^{2}$	$3.430 \times 10^{3} / 2.934 \times 10^{2}$	$2.422 \times 10^{3} / 1.964 \times 10^{2}$
	12	30	$1.087 \times 10^{3} / 6.275 \times 10^{1}$	$1.867 \times 10^{3} / 1.270 \times 10^{2}$	$2.304 \times 10^{3} / 2.598 \times 10^{2}$	$1.526 \times 10^{3} / 1.470 \times 10^{2}$
		50	$1.220 \times 10^{3} / 4.861 \times 10^{1}$	$2.219 \times 10^{3} / 2.468 \times 10^{2}$	$2.598 \times 10^{3} / 2.130 \times 10^{2}$	$1.721 \times 10^{3} / 1.758 \times 10^{2}$
		70	$9.584 \times 10^{2} / 3.765 \times 10^{1}$	$1.924 \times 10^{3} / 1.660 \times 10^{2}$	$2.043 \times 10^{3} / 1.930 \times 10^{2}$	$1.451 \times 10^{3} / 1.253 \times 10^{2}$
	16	30	$7.884 \times 10^{2} / 5.579 \times 10^{1}$	$1.547 \times 10^{3} / 1.635 \times 10^{2}$	$1.814 \times 10^{3} / 1.770 \times 10^{2}$	$1.069 \times 10^{3} / 9.941 \times 10^{1}$
		50	$7.910 \times 10^{2} / 2.519 \times 10^{1}$	$1.858 \times 10^{3} / 1.646 \times 10^{2}$	$1.913 \times 10^{3} / 2.086 \times 10^{2}$	$1.195 \times 10^{3} / 1.102 \times 10^{2}$
		70	$7.568 \times 10^{2} / 4.819 \times 10^{1}$	$1.857 \times 10^{3} / 1.354 \times 10^{2}$	$1.788 \times 10^{3} / 1.554 \times 10^{2}$	$1.204 \times 10^{3} / 1.783 \times 10^{2}$
30	8	30	$1.841 \times 10^{3} / 4.900 \times 10^{1}$	$2.548 \times 10^{3} / 1.202 \times 10^{2}$	$3.083 \times 10^{3} / 2.833 \times 10^{2}$	$2.151 \times 10^{3} / 1.307 \times 10^{2}$
		50	$1.841 \times 10^{3} / 6.275 \times 10^{1}$	$2.835 \times 10^{3} / 1.983 \times 10^{2}$	$3.439 \times 10^{3} / 3.073 \times 10^{2}$	$2.206 \times 10^{3} / 1.792 \times 10^{2}$
		70	$2.547 \times 10^{3} / 4.407 \times 10^{1}$	$3.385 \times 10^{3} / 2.226 \times 10^{2}$	$3.987 \times 10^{3} / 3.447 \times 10^{2}$	$3.017 \times 10^{3} / 2.216 \times 10^{2}$
	12	30	$1.785 \times 10^{3} / 6.856 \times 10^{1}$	$3.005 \times 10^{3} / 3.705 \times 10^{2}$	$3.378 \times 10^{3} / 2.679 \times 10^{2}$	$2.331 \times 10^{3} / 2.154 \times 10^{2}$
		50	$1.491 \times 10^{3} / 5.342 \times 10^{1}$	$2.746 \times 10^{3} / 2.367 \times 10^{2}$	$2.941 \times 10^{3} / 2.804 \times 10^{2}$	$1.880 \times 10^{3} / 1.998 \times 10^{2}$
		70	$1.366 \times 10^{3} / 4.562 \times 10^{1}$	$2.391 \times 10^{3} / 2.133 \times 10^{2}$	$2.692 \times 10^{3} / 3.133 \times 10^{2}$	$1.741 \times 10^{3} / 1.170 \times 10^{2}$
	16	30	$8.613 \times 10^{2} / 5.363 \times 10^{1}$	$1.769 \times 10^{3} / 1.492 \times 10^{2}$	$2.248 \times 10^{3} / 2.555 \times 10^{2}$	$1.225 \times 10^{3} / 1.641 \times 10^{2}$
		50	$1.226 \times 10^{3} / 4.746 \times 10^{1}$	$2.451 \times 10^{3} / 2.362 \times 10^{2}$	$2.727 \times 10^{3} / 2.052 \times 10^{2}$	$1.651 \times 10^{3} / 2.411 \times 10^{2}$
		70	$1.027 \times 10^{3} / 4.635 \times 10^{1}$	$2.416 \times 10^{3} / 2.551 \times 10^{2}$	$2.431 \times 10^{3} / 3.394 \times 10^{2}$	$1.497 \times 10^{3} / 2.043 \times 10^{2}$
40	8	30	$2.068 \times 10^{3} / 6.852 \times 10^{1}$	$3.060 \times 10^{3} / 2.890 \times 10^{2}$	$4.111 \times 10^{3} / 4.236 \times 10^{2}$	$2.419 \times 10^{3} / 2.065 \times 10^{2}$
		50	$3.585 \times 10^{3} / 4.697 \times 10^{1}$	$4.887 \times 10^{3} / 3.161 \times 10^{2}$	$6.079 \times 10^{3} / 6.604 \times 10^{2}$	$3.997 \times 10^{3} / 3.871 \times 10^{2}$
		70	$3.578 \times 10^{3} / 5.479 \times 10^{1}$	$4.518 \times 10^{3} / 1.816 \times 10^{2}$	$5.134 \times 10^{3} / 3.955 \times 10^{2}$	$4.008 \times 10^{3} / 2.101 \times 10^{2}$
	12	30	$1.757 \times 10^{3} / 5.697 \times 10^{1}$	$2.762 \times 10^{3} / 1.791 \times 10^{2}$	$3.438 \times 10^{3} / 2.914 \times 10^{2}$	$2.119 \times 10^{3} / 2.098 \times 10^{2}$
		50	$1.797 \times 10^{3} / 6.543 \times 10^{1}$	$2.871 \times 10^{3} / 2.330 \times 10^{2}$	$3.683 \times 10^{3} / 3.346 \times 10^{2}$	$2.158 \times 10^{3} / 2.101 \times 10^{2}$
		70	$1.893 \times 10^{3} / 4.507 \times 10^{1}$	$2.934 \times 10^{3} / 1.419 \times 10^{2}$	$3.571 \times 10^{3} / 3.300 \times 10^{2}$	$2.262 \times 10^{3} / 1.898 \times 10^{2}$
	16	30	$1.410 \times 10^{3} / 4.413 \times 10^{1}$	$2.854 \times 10^{3} / 2.830 \times 10^{2}$	$3.159 \times 10^{3} / 2.557 \times 10^{2}$	$1.798 \times 10^{3} / 2.171 \times 10^{2}$
		50	$1.477 \times 10^{3} / 3.263 \times 10^{1}$	$2.764 \times 10^{3} / 2.313 \times 10^{2}$	$3.550 \times 10^{3} / 3.488 \times 10^{2}$	$1.860 \times 10^{3} / 2.026 \times 10^{2}$
		70	$1.357 \times 10^{3} / 6.114 \times 10^{1}$	$2.669 \times 10^{3} / 2.094 \times 10^{2}$	$2.978 \times 10^{3} / 2.564 \times 10^{2}$	$1.824 \times 10^{3} / 2.414 \times 10^{2}$

Table 13. Analysis of the averages and standard deviations of

U_{a v e}

across 20 runs for other RL-based algorithms vs. DLIDQN.

Table 13. Analysis of the averages and standard deviations of

U_{a v e}

across 20 runs for other RL-based algorithms vs. DLIDQN.

n	m	e	Ours	DQN	D3QN	HDMDDQN
20	8	30	$9.819 \times 10^{- 1} / 7.412 \times 10^{- 3}$	$7.424 \times 10^{- 1} / 4.370 \times 10^{- 2}$	$6.461 \times 10^{- 1} / 4.627 \times 10^{- 2}$	$7.937 \times 10^{- 1} / 3.099 \times 10^{- 2}$
		50	$9.884 \times 10^{- 1} / 5.444 \times 10^{- 3}$	$7.305 \times 10^{- 1} / 4.664 \times 10^{- 2}$	$6.542 \times 10^{- 1} / 5.372 \times 10^{- 2}$	$7.968 \times 10^{- 1} / 4.610 \times 10^{- 2}$
		70	$9.910 \times 10^{- 1} / 4.426 \times 10^{- 3}$	$6.962 \times 10^{- 1} / 3.636 \times 10^{- 2}$	$6.493 \times 10^{- 1} / 4.626 \times 10^{- 2}$	$7.980 \times 10^{- 1} / 4.454 \times 10^{- 2}$
	12	30	$9.244 \times 10^{- 1} / 7.699 \times 10^{- 3}$	$6.709 \times 10^{- 1} / 3.878 \times 10^{- 2}$	$5.591 \times 10^{- 1} / 4.300 \times 10^{- 2}$	$7.173 \times 10^{- 1} / 4.426 \times 10^{- 2}$
		50	$9.204 \times 10^{- 1} / 1.044 \times 10^{- 2}$	$6.006 \times 10^{- 1} / 5.249 \times 10^{- 2}$	$5.266 \times 10^{- 1} / 3.982 \times 10^{- 2}$	$6.925 \times 10^{- 1} / 4.896 \times 10^{- 2}$
		70	$9.278 \times 10^{- 1} / 1.432 \times 10^{- 2}$	$5.847 \times 10^{- 1} / 4.656 \times 10^{- 2}$	$5.619 \times 10^{- 1} / 3.518 \times 10^{- 2}$	$7.009 \times 10^{- 1} / 3.631 \times 10^{- 2}$
	16	30	$8.353 \times 10^{- 1} / 1.646 \times 10^{- 2}$	$5.889 \times 10^{- 1} / 4.527 \times 10^{- 2}$	$4.952 \times 10^{- 1} / 3.488 \times 10^{- 2}$	$6.659 \times 10^{- 1} / 2.753 \times 10^{- 2}$
		50	$8.436 \times 10^{- 1} / 1.386 \times 10^{- 2}$	$5.142 \times 10^{- 1} / 3.624 \times 10^{- 2}$	$5.093 \times 10^{- 1} / 4.067 \times 10^{- 2}$	$6.640 \times 10^{- 1} / 3.389 \times 10^{- 2}$
		70	$8.654 \times 10^{- 1} / 1.417 \times 10^{- 2}$	$4.868 \times 10^{- 1} / 3.722 \times 10^{- 2}$	$5.154 \times 10^{- 1} / 4.283 \times 10^{- 2}$	$6.460 \times 10^{- 1} / 4.527 \times 10^{- 2}$
30	8	30	$9.945 \times 10^{- 1} / 4.542 \times 10^{- 3}$	$7.676 \times 10^{- 1} / 2.806 \times 10^{- 2}$	$6.845 \times 10^{- 1} / 4.141 \times 10^{- 2}$	$8.336 \times 10^{- 1} / 3.115 \times 10^{- 2}$
		50	$9.942 \times 10^{- 1} / 4.098 \times 10^{- 3}$	$7.174 \times 10^{- 1} / 4.945 \times 10^{- 2}$	$6.108 \times 10^{- 1} / 4.225 \times 10^{- 2}$	$8.070 \times 10^{- 1} / 5.105 \times 10^{- 2}$
		70	$9.985 \times 10^{- 1} / 1.511 \times 10^{- 3}$	$7.832 \times 10^{- 1} / 3.734 \times 10^{- 2}$	$7.055 \times 10^{- 1} / 5.408 \times 10^{- 2}$	$8.276 \times 10^{- 1} / 4.155 \times 10^{- 2}$
	12	30	$9.744 \times 10^{- 1} / 8.648 \times 10^{- 3}$	$6.593 \times 10^{- 1} / 6.212 \times 10^{- 2}$	$5.972 \times 10^{- 1} / 4.095 \times 10^{- 2}$	$7.584 \times 10^{- 1} / 4.801 \times 10^{- 2}$
		50	$9.853 \times 10^{- 1} / 5.352 \times 10^{- 3}$	$6.246 \times 10^{- 1} / 3.722 \times 10^{- 2}$	$6.035 \times 10^{- 1} / 4.394 \times 10^{- 2}$	$7.853 \times 10^{- 1} / 4.864 \times 10^{- 2}$
		70	$9.689 \times 10^{- 1} / 7.467 \times 10^{- 3}$	$6.544 \times 10^{- 1} / 4.112 \times 10^{- 2}$	$5.951 \times 10^{- 1} / 5.339 \times 10^{- 2}$	$7.668 \times 10^{- 1} / 3.496 \times 10^{- 2}$
	16	30	$9.234 \times 10^{- 1} / 1.387 \times 10^{- 2}$	$6.106 \times 10^{- 1} / 4.021 \times 10^{- 2}$	$5.038 \times 10^{- 1} / 3.474 \times 10^{- 2}$	$7.014 \times 10^{- 1} / 4.341 \times 10^{- 2}$
		50	$9.202 \times 10^{- 1} / 1.081 \times 10^{- 2}$	$5.664 \times 10^{- 1} / 4.694 \times 10^{- 2}$	$5.196 \times 10^{- 1} / 3.620 \times 10^{- 2}$	$7.099 \times 10^{- 1} / 6.776 \times 10^{- 2}$
		70	$9.194 \times 10^{- 1} / 1.180 \times 10^{- 2}$	$5.235 \times 10^{- 1} / 4.360 \times 10^{- 2}$	$5.228 \times 10^{- 1} / 5.052 \times 10^{- 2}$	$7.013 \times 10^{- 1} / 4.678 \times 10^{- 2}$
40	8	30	$9.942 \times 10^{- 1} / 5.093 \times 10^{- 3}$	$7.429 \times 10^{- 1} / 4.121 \times 10^{- 2}$	$6.117 \times 10^{- 1} / 4.876 \times 10^{- 2}$	$8.243 \times 10^{- 1} / 3.908 \times 10^{- 2}$
		50	$9.996 \times 10^{- 1} / 5.870 \times 10^{- 4}$	$7.731 \times 10^{- 1} / 4.184 \times 10^{- 2}$	$6.752 \times 10^{- 1} / 5.806 \times 10^{- 2}$	$8.638 \times 10^{- 1} / 5.852 \times 10^{- 2}$
		70	$9.998 \times 10^{- 1} / 4.882 \times 10^{- 4}$	$8.117 \times 10^{- 1} / 2.664 \times 10^{- 2}$	$7.429 \times 10^{- 1} / 4.296 \times 10^{- 2}$	$8.517 \times 10^{- 1} / 3.509 \times 10^{- 2}$
	12	30	$9.928 \times 10^{- 1} / 3.390 \times 10^{- 3}$	$7.197 \times 10^{- 1} / 3.915 \times 10^{- 2}$	$6.157 \times 10^{- 1} / 4.371 \times 10^{- 2}$	$8.024 \times 10^{- 1} / 4.583 \times 10^{- 2}$
		50	$9.880 \times 10^{- 1} / 5.327 \times 10^{- 3}$	$7.052 \times 10^{- 1} / 4.750 \times 10^{- 2}$	$5.832 \times 10^{- 1} / 4.088 \times 10^{- 2}$	$7.995 \times 10^{- 1} / 4.937 \times 10^{- 2}$
		70	$9.911 \times 10^{- 1} / 3.814 \times 10^{- 3}$	$7.121 \times 10^{- 1} / 2.417 \times 10^{- 2}$	$6.207 \times 10^{- 1} / 3.849 \times 10^{- 2}$	$8.072 \times 10^{- 1} / 4.488 \times 10^{- 2}$
	16	30	$9.781 \times 10^{- 1} / 6.968 \times 10^{- 3}$	$5.944 \times 10^{- 1} / 4.672 \times 10^{- 2}$	$5.521 \times 10^{- 1} / 4.119 \times 10^{- 2}$	$7.739 \times 10^{- 1} / 5.489 \times 10^{- 2}$
		50	$9.917 \times 10^{- 1} / 2.110 \times 10^{- 3}$	$6.278 \times 10^{- 1} / 5.165 \times 10^{- 2}$	$5.393 \times 10^{- 1} / 4.234 \times 10^{- 2}$	$7.956 \times 10^{- 1} / 4.841 \times 10^{- 2}$
		70	$9.734 \times 10^{- 1} / 4.744 \times 10^{- 3}$	$6.096 \times 10^{- 1} / 4.650 \times 10^{- 2}$	$5.647 \times 10^{- 1} / 3.668 \times 10^{- 2}$	$7.621 \times 10^{- 1} / 5.192 \times 10^{- 2}$

Table 14. Analysis of the averages and standard deviations of

T R_{a v e}

across 20 runs for other RL-based algorithms vs. DLIDQN.

Table 14. Analysis of the averages and standard deviations of

T R_{a v e}

across 20 runs for other RL-based algorithms vs. DLIDQN.

n	m	e	Ours	DQN	D3QN	HDMDDQN
20	8	30	$6.657 \times 10^{- 1} / 1.125 \times 10^{- 2}$	$7.293 \times 10^{- 1} / 2.358 \times 10^{- 2}$	$8.239 \times 10^{- 1} / 2.777 \times 10^{- 2}$	$7.086 \times 10^{- 1} / 2.652 \times 10^{- 2}$
		50	$5.858 \times 10^{- 1} / 4.284 \times 10^{- 2}$	$7.441 \times 10^{- 1} / 1.567 \times 10^{- 2}$	$8.098 \times 10^{- 1} / 3.450 \times 10^{- 2}$	$7.040 \times 10^{- 1} / 2.595 \times 10^{- 2}$
		70	$5.745 \times 10^{- 1} / 8.057 \times 10^{- 3}$	$7.575 \times 10^{- 1} / 1.474 \times 10^{- 2}$	$8.212 \times 10^{- 1} / 3.412 \times 10^{- 2}$	$7.341 \times 10^{- 1} / 1.825 \times 10^{- 2}$
	12	30	$5.259 \times 10^{- 1} / 1.194 \times 10^{- 2}$	$6.583 \times 10^{- 1} / 2.002 \times 10^{- 2}$	$7.243 \times 10^{- 1} / 5.078 \times 10^{- 2}$	$5.805 \times 10^{- 1} / 3.818 \times 10^{- 2}$
		50	$5.152 \times 10^{- 1} / 1.306 \times 10^{- 2}$	$6.506 \times 10^{- 1} / 1.821 \times 10^{- 2}$	$7.440 \times 10^{- 1} / 2.920 \times 10^{- 2}$	$5.445 \times 10^{- 1} / 3.458 \times 10^{- 2}$
		70	$5.041 \times 10^{- 1} / 1.302 \times 10^{- 2}$	$7.029 \times 10^{- 1} / 3.741 \times 10^{- 2}$	$7.170 \times 10^{- 1} / 3.485 \times 10^{- 2}$	$5.738 \times 10^{- 1} / 2.395 \times 10^{- 2}$
	16	30	$3.798 \times 10^{- 1} / 1.751 \times 10^{- 2}$	$5.707 \times 10^{- 1} / 3.973 \times 10^{- 2}$	$6.859 \times 10^{- 1} / 5.074 \times 10^{- 2}$	$4.028 \times 10^{- 1} / 4.550 \times 10^{- 2}$
		50	$3.117 \times 10^{- 1} / 1.502 \times 10^{- 2}$	$6.460 \times 10^{- 1} / 3.907 \times 10^{- 2}$	$6.622 \times 10^{- 1} / 4.081 \times 10^{- 2}$	$4.444 \times 10^{- 1} / 4.880 \times 10^{- 2}$
		70	$2.826 \times 10^{- 1} / 2.583 \times 10^{- 2}$	$6.417 \times 10^{- 1} / 3.933 \times 10^{- 2}$	$6.463 \times 10^{- 1} / 4.617 \times 10^{- 2}$	$3.828 \times 10^{- 1} / 5.359 \times 10^{- 2}$
30	8	30	$5.900 \times 10^{- 1} / 6.196 \times 10^{- 3}$	$7.405 \times 10^{- 1} / 1.665 \times 10^{- 2}$	$8.105 \times 10^{- 1} / 3.604 \times 10^{- 2}$	$7.039 \times 10^{- 1} / 2.302 \times 10^{- 2}$
		50	$6.151 \times 10^{- 1} / 7.524 \times 10^{- 3}$	$7.114 \times 10^{- 1} / 2.346 \times 10^{- 2}$	$7.691 \times 10^{- 1} / 3.471 \times 10^{- 2}$	$6.612 \times 10^{- 1} / 2.547 \times 10^{- 2}$
		70	$7.271 \times 10^{- 1} / 3.618 \times 10^{- 3}$	$7.756 \times 10^{- 1} / 1.138 \times 10^{- 2}$	$8.503 \times 10^{- 1} / 3.213 \times 10^{- 2}$	$6.954 \times 10^{- 1} / 4.078 \times 10^{- 2}$
	12	30	$6.067 \times 10^{- 1} / 1.041 \times 10^{- 2}$	$7.242 \times 10^{- 1} / 1.797 \times 10^{- 2}$	$8.091 \times 10^{- 1} / 3.695 \times 10^{- 2}$	$6.804 \times 10^{- 1} / 2.036 \times 10^{- 2}$
		50	$5.031 \times 10^{- 1} / 1.188 \times 10^{- 2}$	$7.263 \times 10^{- 1} / 2.682 \times 10^{- 2}$	$7.756 \times 10^{- 1} / 5.029 \times 10^{- 2}$	$6.253 \times 10^{- 1} / 2.306 \times 10^{- 2}$
		70	$4.854 \times 10^{- 1} / 9.758 \times 10^{- 3}$	$7.002 \times 10^{- 1} / 2.594 \times 10^{- 2}$	$7.589 \times 10^{- 1} / 4.567 \times 10^{- 2}$	$6.057 \times 10^{- 1} / 2.173 \times 10^{- 2}$
	16	30	$3.849 \times 10^{- 1} / 1.738 \times 10^{- 2}$	$6.253 \times 10^{- 1} / 4.275 \times 10^{- 2}$	$7.025 \times 10^{- 1} / 4.501 \times 10^{- 2}$	$4.435 \times 10^{- 1} / 4.732 \times 10^{- 2}$
		50	$5.003 \times 10^{- 1} / 1.339 \times 10^{- 2}$	$6.871 \times 10^{- 1} / 3.377 \times 10^{- 2}$	$7.539 \times 10^{- 1} / 4.799 \times 10^{- 2}$	$5.307 \times 10^{- 1} / 4.648 \times 10^{- 2}$
		70	$4.882 \times 10^{- 1} / 1.172 \times 10^{- 2}$	$7.246 \times 10^{- 1} / 5.090 \times 10^{- 2}$	$7.234 \times 10^{- 1} / 5.064 \times 10^{- 2}$	$5.372 \times 10^{- 1} / 3.008 \times 10^{- 2}$
40	8	30	$6.549 \times 10^{- 1} / 6.776 \times 10^{- 3}$	$7.456 \times 10^{- 1} / 2.167 \times 10^{- 2}$	$8.141 \times 10^{- 1} / 3.439 \times 10^{- 2}$	$7.105 \times 10^{- 1} / 2.174 \times 10^{- 2}$
		50	$6.082 \times 10^{- 1} / 1.018 \times 10^{- 2}$	$7.882 \times 10^{- 1} / 7.882 \times 10^{- 2}$	$8.416 \times 10^{- 1} / 8.416 \times 10^{- 2}$	$7.495 \times 10^{- 1} / 7.495 \times 10^{- 2}$
		70	$6.572 \times 10^{- 1} / 7.475 \times 10^{- 3}$	$8.138 \times 10^{- 1} / 2.135 \times 10^{- 2}$	$8.766 \times 10^{- 1} / 2.664 \times 10^{- 2}$	$7.994 \times 10^{- 1} / 1.607 \times 10^{- 2}$
	12	30	$5.907 \times 10^{- 1} / 8.579 \times 10^{- 3}$	$7.117 \times 10^{- 1} / 4.097 \times 10^{- 2}$	$7.590 \times 10^{- 1} / 5.410 \times 10^{- 2}$	$5.404 \times 10^{- 1} / 2.853 \times 10^{- 2}$
		50	$5.464 \times 10^{- 1} / 1.071 \times 10^{- 2}$	$7.219 \times 10^{- 1} / 2.530 \times 10^{- 2}$	$8.121 \times 10^{- 1} / 4.172 \times 10^{- 2}$	$6.486 \times 10^{- 1} / 2.902 \times 10^{- 2}$
		70	$5.796 \times 10^{- 1} / 3.508 \times 10^{- 3}$	$7.472 \times 10^{- 1} / 1.971 \times 10^{- 2}$	$8.307 \times 10^{- 1} / 4.075 \times 10^{- 2}$	$6.855 \times 10^{- 1} / 2.770 \times 10^{- 2}$
	16	30	$5.018 \times 10^{- 1} / 9.645 \times 10^{- 3}$	$7.341 \times 10^{- 1} / 3.631 \times 10^{- 2}$	$7.766 \times 10^{- 1} / 4.318 \times 10^{- 2}$	$5.820 \times 10^{- 1} / 3.263 \times 10^{- 2}$
		50	$6.226 \times 10^{- 1} / 4.583 \times 10^{- 3}$	$7.594 \times 10^{- 1} / 1.817 \times 10^{- 2}$	$8.385 \times 10^{- 1} / 3.791 \times 10^{- 2}$	$6.554 \times 10^{- 1} / 3.236 \times 10^{- 2}$
		70	$4.869 \times 10^{- 1} / 1.310 \times 10^{- 2}$	$7.241 \times 10^{- 1} / 3.470 \times 10^{- 2}$	$7.975 \times 10^{- 1} / 4.295 \times 10^{- 2}$	$5.989 \times 10^{- 1} / 3.449 \times 10^{- 2}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, H.; Zheng, J.; Huang, L.; Tao, J.; Zhang, C. Solving Dynamic Multi-Objective Flexible Job Shop Scheduling Problems Using a Dual-Level Integrated Deep Q-Network Approach. Processes 2025, 13, 386. https://doi.org/10.3390/pr13020386

AMA Style

Xu H, Zheng J, Huang L, Tao J, Zhang C. Solving Dynamic Multi-Objective Flexible Job Shop Scheduling Problems Using a Dual-Level Integrated Deep Q-Network Approach. Processes. 2025; 13(2):386. https://doi.org/10.3390/pr13020386

Chicago/Turabian Style

Xu, Hua, Jianlu Zheng, Lingxiang Huang, Juntai Tao, and Chenjie Zhang. 2025. "Solving Dynamic Multi-Objective Flexible Job Shop Scheduling Problems Using a Dual-Level Integrated Deep Q-Network Approach" Processes 13, no. 2: 386. https://doi.org/10.3390/pr13020386

APA Style

Xu, H., Zheng, J., Huang, L., Tao, J., & Zhang, C. (2025). Solving Dynamic Multi-Objective Flexible Job Shop Scheduling Problems Using a Dual-Level Integrated Deep Q-Network Approach. Processes, 13(2), 386. https://doi.org/10.3390/pr13020386

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Solving Dynamic Multi-Objective Flexible Job Shop Scheduling Problems Using a Dual-Level Integrated Deep Q-Network Approach

Abstract

1. Introduction

2. Literature Review

3. Preliminaries

3.1. Definition of RL and Q-Learning

3.2. Definition of Deep Q-Learning

3.3. Definition of IDQN

4. DMOFJSP Formulation

5. Methods for DMOFJSP

5.1. Definition of State Features

5.1.1. Detailed Definitions and Formula Explanations

5.1.2. Impact of State Feature Changes on Scheduling Decisions

5.2. Definition of Proposed Composite Dispatching Rules

5.2.1. Scheduling Rule 1 for Sub-Task 1

5.2.2. Scheduling Rule 2 for Sub-Task 1

5.2.3. Scheduling Rule 3 for Sub-Task 1

5.2.4. Scheduling Rule 1 for Sub-Task 2

5.2.5. Scheduling Rule 2 for Sub-Task 2

5.2.6. Scheduling Rule 3 for Sub-Task 2

5.3. Definition of Reward Function

5.4. Local Search Algorithm

5.5. The Network of the DLIDQN

6. Numerical Experiments

6.1. Training Process

6.2. Testing Settings

6.3. Comparisons with the Proposed Composite Dispatching Rules

6.4. Comparisons with Classic Dispatching Rules

6.5. Comparisons with Other RL-Based Scheduling Algorithms

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI