Next Article in Journal
CNC Milling Optimization via Intelligent Algorithms: An AI-Based Methodology
Previous Article in Journal
A Review on Simulation Application Function Development for Computer Monitoring Systems in Hydro–Wind–Solar Integrated Control Centers
Previous Article in Special Issue
Cutting Tool Remaining Useful Life Prediction Using Multi-Sensor Data Fusion Through Graph Neural Networks and Transformers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Low-Carbon and Energy-Efficient Dynamic Flexible Job Shop Scheduling Method Towards Renewable Energy Driven Manufacturing

1
School of Mechanical and Electrical Engineering, Guizhou Normal University, Guiyang 550001, China
2
Technical Engineering Center of Manufacturing Service and Knowledge Engineering, Guizhou Normal University, Guiyang 550001, China
*
Author to whom correspondence should be addressed.
Machines 2026, 14(1), 88; https://doi.org/10.3390/machines14010088
Submission received: 22 November 2025 / Revised: 5 January 2026 / Accepted: 9 January 2026 / Published: 10 January 2026
(This article belongs to the Special Issue Artificial Intelligence in Mechanical Engineering Applications)

Abstract

As one of the major sources of global carbon emissions, the manufacturing industry urgently requires green transformation. The utilization of renewable energy in production workshop offers a promising route toward zero-carbon manufacturing. However, renewable energy fluctuations and dynamic workshop events make efficient scheduling increasingly challenging. This paper introduces a low-carbon and energy-efficient dynamic flexible job shop scheduling problem oriented towards renewable energy integration, and develops a multi-agent deep reinforcement learning framework for dynamic and intelligent production scheduling. Inspired by the Proximal Policy Optimization (PPO) algorithm, a routing agent and a sequencing agent are designed for machine assignment and job sequencing, respectively. Customized state representations and reward functions are also designed to enhance learning performance and scheduling efficiency. Simulation results demonstrate that the proposed method achieves superior performance in multi-objective optimization, effectively balancing production efficiency, energy consumption, and carbon emission reduction across various job shop scheduling scenarios.

1. Introduction

Globally, energy security and environmental pollution have become increasingly prominent issues, posing significant constraints on sustainable economic and social development. In recent years, global greenhouse gas emissions have remained at a high level and continued to increase. Statistics show that global fossil fuel CO2 emissions reached approximately 36.8 GtCO2 in 2023, representing an increase of about 1.4% compared with 2022, and are projected to further rise to around 37.4 GtCO2 in 2024, setting a new historical record [1]. Existing studies indicate that global carbon emissions have not yet shown clear signs of peaking, underscoring the growing urgency of low-carbon transitions and emission-reduction efforts [2]. As one of the major contributors to global energy consumption and carbon emissions, the manufacturing sector is highly dependent on energy supply; under energy structures dominated by fossil fuels, its carbon emissions remain particularly significant. In this context, reducing carbon emissions while ensuring production efficiency and system stability has become a critical challenge for the transformation and upgrading of the manufacturing sector [3,4,5].
To achieve sustainable development, the concept of green manufacturing has gradually become a mainstream direction in industrial development. In this context, the integration of renewable energy presents a significant opportunity to reduce carbon emissions in the manufacturing sector. Clean energy sources such as wind, solar, hydro, and biomass are not only renewable and low-emission but also effectively reduce dependence on fossil fuels [6,7]. With the maturity of photovoltaic and wind power technologies, an increasing number of manufacturing enterprises have begun to adopt clean energy for production. However, renewable energy is inherently intermittent and fluctuating. When utilizing such energy sources, manufacturers must ensure both supply stability and continuity of the production process. This imposes higher demands on production planning and scheduling strategies, particularly in terms of multi-objective trade-offs and adaptation to dynamic environments.
The flexible job shop scheduling problem (FJSP) provides an important theoretical foundation for optimization and scheduling research in manufacturing systems. Compared with the traditional job shop scheduling problem (JSP), FJSP offers greater flexibility in resource allocation and task execution routing, enabling better adaptation to the diverse demands of complex manufacturing environments [8]. This characteristic has led to the widespread application of FJSP in industries such as semiconductor manufacturing, aerospace, and precision machinery. In recent years, researchers have begun to focus on green FJSP, aiming to incorporate energy consumption and carbon emission metrics in addition to traditional objectives such as makespan and tardiness [9]. Such studies have played a positive role in promoting green manufacturing; however, most are based on static scenarios and give limited consideration to the uncertainty of energy supply.
In actual manufacturing systems, the environment is often dynamic. Random job arrivals, temporary changes in delivery deadlines, equipment failures and maintenance, as well as the insertion of high-priority jobs, can all disrupt the original scheduling plan [10]. In the context of integrating renewable energy, the complexity of the problem increases further. On one hand, production loads need to be matched in real time with the fluctuating supply of clean energy; on the other hand, the uncertainty of energy supply often coincides with production disturbances, making it difficult for traditional scheduling methods to generate high-quality schedules within a limited time [11]. Furthermore, existing green FJSP studies have largely focused on energy consumption optimization and lack systematic modeling of the fluctuation characteristics of renewable energy, resulting in limited robustness of scheduling solutions under dynamic environments [12]. In practice, this manifests as reduced production efficiency, insufficient energy utilization, and difficulty in achieving carbon reduction targets.
Therefore, an intelligent scheduling approach capable of simultaneously perceiving both production and energy states is required to achieve coordinated optimization of multiple objectives such as production efficiency, energy consumption, and carbon emissions. In recent years, deep reinforcement learning (DRL), with its adaptability in uncertain environments and its ability for policy learning, has received growing attention in complex scheduling problems. In particular, multi-agent reinforcement learning methods, which model different scheduling components through distributed decision-making, have shown promising potential in dynamic FJSP (DFJSP) involving stochastic job arrivals and machine status variations.
However, as renewable energy is increasingly integrated into manufacturing power systems, the time-varying and fluctuating nature of energy supply begins to directly affect scheduling feasibility and optimization outcomes. Although existing studies have made progress in handling production-side disturbances, uncertainties on the energy side are still largely overlooked. Most existing DRL-based FJSP approaches assume a stable energy supply, lacking systematic modeling of renewable energy variability and a unified decision-making framework that can jointly address production disturbances and energy fluctuations. This research gap limits the full potential of DRL in dynamic green scheduling scenarios where energy availability plays a significant role.
Motivated by these challenges, this study focuses on DRL-based FJSP under renewable energy fluctuations and formulates a low-carbon and energy-efficient renewable-energy-integrated dynamic FJSP (LEDFJSP-RE). The main contributions of this work are as follows:
  • A dynamic flexible job shop scheduling model with renewable energy integration (LEDFJSP-RE) is formulated. The model establishes a real-time coupling mechanism between production workloads and renewable energy supply, enabling energy-aware and low-carbon decision-making under dynamic job arrivals.
  • A multi-agent PPO framework is developed to solve the LEDFJSP-RE. The framework supports variable-length state inputs and incorporates a state representation compatible with dynamic events. The state space, action space, and reward mechanism are specifically designed to capture both production dynamics and renewable energy fluctuations.
  • Numerical experiments with different scales of job insertions are conducted to assess the adaptability and performance of the proposed approach. In addition to comparisons with conventional dispatching rules, further experiments against representative DRL-based and evolutionary scheduling methods, as well as an energy-oriented scheduling rule, are carried out to provide a more comprehensive evaluation. The results indicate that the proposed approach achieves consistently superior performance in terms of scheduling efficiency, energy consumption, carbon emissions, and overall cost across diverse problem settings.
The structure of this paper is organized as follows: Section 2 reviews the research progress on green FJSP and DRL-based FJSP; Section 3 introduces the mathematical modeling of the LEDFJSP-RE; Section 4 presents the multi-agent reinforcement learning solution framework; Section 5 describes the experimental design and results analysis; and Section 6 concludes the study and discusses future research directions.

2. Literature Review

This chapter reviews recent studies on green FJSP and DRL-based FJSP, and then summarizes the main research gaps in existing work.

2.1. Green FJSP

As the pressure from energy consumption and carbon emissions increases, the manufacturing industry urgently needs to transition towards green, low-carbon, and intelligent systems for sustainable development [13,14]. Therefore, green scheduling has gradually become an important research direction in FJSP.
Early research on green scheduling mainly focused on maximizing production efficiency using metaheuristic methods such as genetic algorithms and simulated annealing, while minimizing makespan and load balancing [15]. With the rise of green manufacturing, scheduling objectives have expanded to include energy consumption, carbon emissions, and other green factors [4,16]. These methods break the static load assumptions of traditional scheduling models, promoting the coupling of production scheduling with energy management systems.
In recent years, researchers have gradually proposed low-carbon scheduling and multi-objective optimization models aimed at reducing carbon emissions and optimizing production scheduling. For example, Ref. [13] introduced makespan, delay time, bottleneck load rate, and carbon emissions into a low-carbon multi-objective scheduling model and proposed an improved multi-objective evolutionary algorithm (IMOEA/D-HS). Ref. [17] proposed a model based on an improved gray wolf optimization algorithm (SC-GWO) to optimize the weighted sum of carbon emissions and makespan. However, these studies, while effectively reducing carbon emissions, do not account for fluctuations in renewable energy supply or dynamic events on the shop floor.
The introduction of renewable energy offers a new perspective for green FJSP. Research shows that incorporating renewable energy into flexible manufacturing systems not only reduces carbon emissions but also enhances energy efficiency and reduces energy costs [18]. For instance, Ref. [19] proposed an “energy self-sufficient manufacturing system” that integrates renewable distributed energy to replace traditional grid power, reducing grid dependence and electricity costs. Additionally, Ref. [20] introduced distributed energy and energy storage systems into re-entrant hybrid flow shop scheduling, demonstrating energy-saving, carbon reduction, and cost control advantages. These studies show that renewable energy holds significant potential in green manufacturing, but managing its instability in dynamic scheduling while meeting production needs remains a challenge.

2.2. DRL-Related FJSP

With the development of DRL technologies, DFJSPs have gradually become a research hotspot. DRL allows self-learning in dynamic environments, optimizing task allocation, resource scheduling, and responses to unexpected events, thereby enhancing the adaptability and robustness of production systems.
In DFJSP research, DRL methods typically model job scheduling and resource allocation as a Markov Decision Process (MDP) to achieve decision optimization. Common DRL algorithms include deep Q-networks (DQN), Double DQN (DDQN) [21,22], and proximal policy optimization (PPO), which can effectively handle complex state spaces and high-dimensional action spaces, providing optimal decisions for production scheduling [23,24]. For instance, PPO, known for its stability and support for continuous action spaces, has become a common optimization method in DFJSP. Additionally, DQN and DDQN adjust action selection strategies through reinforcement learning, effectively reducing job delays and improving resource utilization. Ref. [25] proposed a deep multi-agent reinforcement learning scheduling framework based on DDQN for high-variance dynamic job shops. The framework treats each machine as an independent agent and uses a centralized training-decentralized execution approach with parameter sharing to mitigate non-stationarity in the environment. By combining knowledge-driven reward shaping and scalable state representation, this approach successfully achieves the dual goals of reducing job delays and improving resource utilization. To address high-dimensional, multi-objective scheduling problems, recent research has also introduced multi-agent DRL into DFJSP. By assigning job scheduling and machine allocation tasks to multiple agents for parallel learning, multi-agent reinforcement learning (MARL) enhances the flexibility and efficiency of scheduling systems, reducing scheduling conflicts and improving overall scheduling performance [26]. For example, Ref. [9] proposed an improved multi-agent PPO (MMAPPO) algorithm to solve the “partial re-entry-dynamic disturbance-multi-objective” hybrid flow shop scheduling problem. This algorithm effectively reduces job delays, reduces energy consumption, and improves resource utilization.
In summary, while DRL has made significant progress in handling dynamic factors (such as processing time variations and equipment failures) in traditional DFJSP, existing studies have not considered the impact of clean energy, particularly the fluctuations of renewable energy, on scheduling. Our research innovatively combines multi-agent reinforcement learning with the dynamic characteristics of renewable energy to address the scheduling challenges posed by energy fluctuations, thus further enhancing the adaptability and energy efficiency of scheduling systems. Table 1 offers a detailed comparison between this study and prior DRL-oriented scheduling research.

2.3. Research Gaps

Based on the review of existing studies, two research gaps can be identified:
  • Most existing studies on green FJSP primarily adopt metaheuristic approaches such as genetic algorithms and typically assume that energy supply conditions are stable or exogenously given. Although distributed renewable energy is introduced in [20], the corresponding scheduling decisions remain largely production-oriented, with limited consideration of the temporal variability of renewable energy output. In particular, the energy supply state has not been modeled as an endogenous dynamic factor influencing scheduling decisions. To address this limitation, this study explicitly incorporates renewable energy generation states into the scheduling decision process and develops an energy-aware hierarchical and distributed multi-agent scheduling framework, enabling dynamic coordination between production scheduling and fluctuating renewable energy supply.
  • As summarized in Table 1, existing works mainly focus on local dynamic events such as new job arrivals or changes in processing times, and carbon emissions are usually not included as optimization objectives. Meanwhile, under green scheduling settings, production dynamics and energy dynamics often occur simultaneously, whereas the volatility of renewable energy supply is still widely overlooked in the literature. In response to these gaps, this study considers a broader range of dynamic job events together with fluctuating renewable energy supply, incorporates carbon emissions into the optimization objectives, and establishes an energy-aware scheduling model for fully dynamic environments.

3. Problem Formulation

As shown in Figure 1, the LEDFJSP-RE model consists of photovoltaic (PV), wind turbines (WT), an energy storage system (ESS), an energy management system (EMS), and a job shop. The job shop system, serving as the load unit of the LEDFJSP-RE model, is composed of m heterogeneous machines that execute processing tasks of arriving jobs. Each job  J i  consists of multiple operations, and each operation  O i , j  can be processed on any of the m machines, with varying processing times and energy consumption across different machines.
During the production process, the PV and WT units can directly supply energy to the job shop. Let  P P V ( t )  and  P W T ( t )  denote the power outputs of the PV and WT units at time  t , and let  L ( t )  represent the load demand of the job shop at time  t . When  P P V t + P W T t > L ( t ) , the EMS stores the surplus energy into the ESS. Conversely, when  P P V t + P W T t < L ( t ) , the ESS is first used to cover the shortage; if the remaining energy in the ESS is insufficient, additional power is purchased from the main grid to meet the load demand. This energy management logic defines the real-time energy balance constraints under which scheduling decisions are executed in the job shop.
Under this energy structure, different scheduling decisions can significantly affect both the production performance and the energy-related outcomes of the job shop. Therefore, the scheduling system must optimize traditional objectives while simultaneously pursuing low-carbon operation under fluctuating renewable energy supply. To this end, an energy-aware scheduling strategy driven by two cooperative intelligent agents is designed. As shown in Figure 2, the routing agent (RA) determines the assignment of operations to machines, while the sequencing agent (SA) selects the next job to be processed from the waiting queue when a machine becomes idle. At each decision epoch triggered by dynamic events such as job arrivals or machine idle states, the agents observe both production states and energy-related information and generate corresponding scheduling actions. Through this hierarchical decision structure, job routing and sequencing decisions are jointly optimized, enabling the system to dynamically adapt to variations in job arrivals and renewable energy supply while balancing scheduling efficiency and carbon emission control. In this way, the LEDFJSP-RE is formulated as a sequential decision-making problem under coupled production and energy dynamics. To simplify the modeling complexity and focus on the scheduling and energy-aware decision-making mechanisms, a set of commonly adopted assumptions is introduced in this study, following related research on flexible job shop scheduling and energy-aware production scheduling. It should be noted that these assumptions mainly correspond to operational constraints that are frequently encountered in practical manufacturing systems but are not central to the decision logic investigated in this work. Neglecting these constraints allows clearer analysis and comparison of scheduling strategies without affecting the core scheduling mechanisms:
  • Each machine can process at most one operation at any given time.
  • The operations of each job must be executed sequentially according to a predefined technological order.
  • Once an operation starts processing, it cannot be interrupted.
  • Job transportation times between machines are not considered.
  • Finite buffer constraints between consecutive operations are not considered, and unlimited intermediate buffers are assumed.
  • Sequence-dependent setup times, such as tool changes or product changeovers, are not considered.
  • Random machine failures and maintenance downtime are not taken into account.
  • Transitions between idle and processing states of machines are assumed not to incur additional time delays.
These assumptions are widely adopted in the flexible job shop scheduling literature to maintain model tractability while preserving the essential characteristics of scheduling decision-making. The incorporation of the above omitted constraints into the proposed framework will be considered as an important direction for future research.
The parameters and key variables of the LEDFJSP-RE model are presented in Table 2.

Mathematical Model

Objective functions:
C m a x = max i J   C i
T a r d a v g = 1 n i = 1 n max C i D i , 0
C s u m = C t = 0 C m a x P G r i d ( t ) Δ t
E s u m = t = 0 C m a x L ( t ) Δ t
Equations (1)–(4) represent the optimization objectives of this study: makespan, average job tardiness, total carbon emissions, and total energy consumption. In Equation (3),  C  denotes the unit carbon emission coefficient,  P G r i d t  denotes the power of main grid at time  t Δ t = 1  min. In Equation (4),  L t = k = 1 m P k t Δ t = 1  min. This study aims to minimize these objectives.
Cost:
C o s t = C t a r d + C e l e
C t a r d = p t a r d i = 1 n max C i D i , 0
C e l e = t = 0 C m a x p G r i d P G r i d t Δ t
Equations (5)–(7) define the total cost model used to evaluate the economic performance of a scheduling solution. In Equation (5), Cost represents the total cost incurred during the scheduling process, which consists of the tardiness cost  C t a r d  and the electricity cost  C e l e . Equation (6) formulates the tardiness cost  C t a r d . Here,  C i  and  D i  denote the completion time and due date of job  J i , respectively. The term  max C i D i , 0  represents the tardiness of job  J i , and the total tardiness is weighted by the unit tardiness penalty coefficient  p tard . Equation (7) describes the electricity cost  C ele , where  p Grid  denotes the unit electricity price and  P Grid ( t )  represents the power purchased from the main grid at time  t . The electricity cost is obtained by accumulating the grid power consumption over the scheduling horizon with a time interval of  Δ t .
System constraints:
k M X i , j , k ( t ) 1 , i , j , t
S i , j C i , j , i , j i , j , i f   X i , j , k t = X i , j , k t = 1
S i , j + 1 C i , j , i , j = 1 , , n i 1
C i , j = S i , j + k = 1 m X i , j , k t T i , j , k
P k t P I , k
L t = P P V t + P W T t + P d i s t P c h a t + P G r i d t , t
E E S S t + 1 = E E S S t + η c h a P c h a t 1 η d i s P d i s t , t
0 P c h a t P c h a , m a x , t
0 P d i s t P d i s , m a x , t
u t d i s + u t c h a 1 , u t d i s , u t c h a { 0 , 1 }
Equations (8)–(17) represent the constraints of this study. Equation (8) indicates that each operation  O i , j  can only be assigned to one machine. if  k M X i , j , k ( t ) = 0 , it means that  O i , j  is unassigned at time  t . Equation (9) enforces the machine capacity constraint, indicating that two different operations assigned to the same machine cannot be processed simultaneously, and their processing intervals must not overlap. Equation (10) represents the technological precedence constraint, which ensures that the operations of each job are executed sequentially in the predefined order. Equation (11) defines the completion time of each operation as the sum of its start time and the processing time on the selected machine, thereby reflecting the non-preemptive processing assumption. Equation (12) ensures that the power consumption of each machine  P k t  does not fall below its idle power  P I , k , reflecting the baseline energy consumption of machines even when they are not actively processing. Equation (13) defines the power balance within the job shop. The total load demand  L ( t )  at time  t  must be met by the sum of renewable generation from PV and WT units, the discharging power of ESS, and supplementary power purchased from the main grid, minus the charging power of ESS. Equation (14) describes the state update of ESS. The energy stored in ESS at time  t + 1 , denoted by  E E S S ( t + 1 ) , depends on the previous energy level, the effective charging power, and the effective discharging power. Equations (15) and (16) sets the feasible ranges of charging and discharging power. Equation (17) imposes the operational constraint that ESS cannot charge and discharge simultaneously. The binary variables  u t d i s  and  u t c h a  are introduced to indicate the charging and discharging states, ensuring mutual exclusivity.

4. Proposed Approach

4.1. DRL and PPO

In reinforcement learning (RL), the interaction process between an agent and the environment is typically modeled as a Markov decision process (MDP), which is widely used for sequential decision-making problems. An MDP is usually defined as a five-tuple  ( S , A , P , R , γ ) , where  S  is the state space,  A  is the action space,  P  is the state transition probability function,  R  is the reward function, and  γ ( 0 , 1 )  is the discount factor.
At each time step  t , the agent observes the current state  s t , selects an action  a t , receives an immediate reward  r t , and transitions to the next state  s t + 1 . The goal is to learn an optimal policy  π θ a t | s t  that maximizes the expected cumulative discounted reward  V s t , as defined in Equation (18):
V s t = E π R t | s t
where  R t = k = 0 γ k r t + k . To achieve this objective, policy gradient (PG) methods are widely applied, especially in complex environments with high-dimensional or continuous state-action spaces. PG optimizes the policy by computing the gradient of the expected return with respect to the policy parameters and iteratively updating the policy toward optimality. However, traditional PG methods face challenges in selecting appropriate policy update step sizes, which can lead to instability during training.
To address this, trust region policy optimization (TRPO) was proposed. It introduces a Kullback–Leibler (KL) divergence constraint between the new and old policies to limit the magnitude of policy updates, thereby improving learning stability [27]. Building on this, PPO further replaces the KL divergence constraint with a clipping mechanism in the objective function, simplifying the optimization process and improving training efficiency and stability in large-scale tasks [28]. Unlike fixed exploration probability strategies, PPO employs a sampling mechanism based on the policy’s probability distribution, allowing it to automatically adjust the exploration intensity during training. This approach notably enhances policy robustness and convergence efficiency, especially in the later stages of learning.
The optimization objective of PPO is formulated as  L C L I P θ , as defined in Equation (19):
L C L I P θ = E t min r t θ A t , c l i p r t θ , 1 ε , 1 + ε A t
where  r t θ = π θ a t s t π θ o l d a t s t  represents the probability ratio of the new and old policies for the current action,  ε  is the hyperparameter for the cutting range, and  A t  is the estimated advantage function, commonly computed via generalized advantage estimation (GAE), as defined in Equation (20):
A t = R t V s t
In the study, considering the characteristics of the job shop scheduling problem—such as dynamic state scales, heterogeneous constraints, and long-term dependencies—we adopt a unified PPO-based policy optimization approach to train both RA and SA. This method is well-suited to handling variable input structures and offers strong policy sampling capabilities and convergence stability. By combining the clipping mechanism with advantage function estimation, PPO effectively controls the magnitude of policy updates, preventing policy oscillation and degradation, thereby enabling more efficient scheduling decisions in complex scheduling systems.

4.2. State

Before detailing the state modeling of RA, we introduce the concept of the power purchase ratio, denoted as  ρ . The power purchase ratio  ρ  is defined as the ratio of actual purchased power to the total energy consumption, serving as a key indicator for the system to perceive changes in energy availability. The power purchase ratio  ρ  aggregates the expected dependence of a job–machine assignment on grid electricity, reflecting the overall balance between energy demand and renewable supply during the upcoming operation.
The state of RA is composed of four categories of information sets, detailed as follows:
  • The set of estimated utilization rates for all machines in the system.
  • The set of estimated processing times of job  J i  on each machine.
  • The set of estimated processing power consumptions of job  J i  on each machine.
  • The set of estimated power purchase ratios  ρ  of job  J i  on each machine.
We first define the expected utilization rate of  M k  at scheduling time t as  U k ( t ) , as defined in Equation (21):
U k t = i = 1 n j = 1 n i T i , j , k X i , j , k ( t ) max l M E I l t
where  E I l t  denotes the estimated idle time of machine  M l  at rescheduling time  t . It is important to note that the machine utilization rate calculated here is forward-looking, taking into account the jobs in the machine’s queue that have not yet started processing.
We further define the expected energy consumption of the system at rescheduling time  t  as  E t , as defined in Equation (22):
E t = T = 0 max l M E I l t k = 1 m P k T
where  P k T  denotes the power consumption of machine  M k  at time  T . If  T t , then  P k T  represents the actual power consumption, which includes both idle and processing power, depending on the machine’s operational state. For instance, if  M k  is processing operation  O i , j  at time  T , then  P k T = P i , j , k . If  T > t , then  P k T  represents the expected power, which is estimated based on the job queue  Q k ( t ) . Jobs are assumed to be processed sequentially according to the queue, and the corresponding power usage is projected accordingly.
Furthermore, to characterize the energy consumption and carbon emissions associated with assigning a job to different machines, we define the expected total power purchase rate  ρ k t  when  J i  is assigned to  M k , as defined in Equation (23):
ρ k t = T = 0 E I m a x P G r i d ( t ) E t + T i , j , k P i , j , k
where  EI max = max max l M , l k E I l t , E I k t + T i , j , k  denotes the estimated maximum machine idle end time across all machines; the numerator represents the expected amount of electricity to be purchased from the grid over the upcoming processing period—i.e., the net power demand, calculated as the machine’s power requirement minus the available renewable energy supply, and the denominator corresponds to the expected energy consumption required to process the assigned job.
Although short-term renewable variations are not explicitly included in the state, the use of  ρ  still enables adaptive behavior. When renewable output drops, the effective grid purchase for certain assignments increases, leading to lower rewards under the grid-purchase penalty. Through training, the routing agent (RA) learns to avoid such assignments and favor those with lower  ρ . Thus, RA responds to renewable fluctuations implicitly without relying on high-frequency renewable measurements. This design is consistent with existing multi-agent scheduling studies that use aggregated process-level indicators to maintain tractability while preserving sensitivity to dynamic energy conditions [29,30].
The state of RA is modeled as a matrix with dimensions  4 × m  where each row corresponds to one of the four categories of information described above, and each column represents a specific machine. The detailed composition of the state set is presented in Table 3.
We assign one SA to each machine in the system. The state information for each SA also consists of four categories:
  • The set of processing times for each job.
  • The set of slack times for each job.
  • The set of expected tardiness rates for each job.
  • The set of actual tardiness rates for each job.
The slack time  S i ( t )  of  J i  at time  t  is defined as the ratio between its remaining time until the due date and the estimated remaining processing time, as defined in Equation (24):
S i t = D i t j = C O i t + 1 n i T ^ i , j
where  T ^ i , j = T i , j , k , i f   O i , j   i s   a s s i g n e d   t o   M k 1 m k = 1 m T i , j , k , o t h e r w i s e    is the estimated processing time of operation  O i , j .
To calculate the tardiness rate, we first introduce the tardiness indicator function:  f x = 1 , i f   x > 0 0 , o t h e r w i s e . During each decision-making step, SA on  M k  selects a job  J p  from the queue and places it at the front, forming a new job queue:  Q k ( t ) = J p J q Q k ( t ) | J q J p . For each job  J q  in the new queue  Q k ( t ) , its waiting time is calculated as  W q , as defined in Equation (25):
W q = J i Q k t , J i   b e f o r e   J q T i , C O i t + 1 , k
Based on this, we define the expected tardiness rate  r q e t a r d ( t )  and the actual tardiness rate  r q a t a r d ( t )  of the selected job  J q , as defined in Equations (26) and (27), respectively:
r q e t a r d ( t ) = 1 Q k t J q Q k t f t + W q + j = C O q t + 1 n q T ^ i , j D q
r q a t a r d ( t ) = 1 Q k t J q Q k t f t + W q + T i , C O q t + 1 , k D q
In the actual scheduling process, the job queue on each machine dynamically evolves as jobs arrive and are completed. Accordingly, the state of the sequencing agent (SA) is explicitly modeled as a job-level matrix with dimensions  4 × Q k ( t ) , where  Q k ( t )  denotes the current number of jobs waiting in the queue of machine  M k , and each column corresponds to the feature vector of one queued job. The four state features describe the dynamic attributes of each job and are detailed in Table 4.

4.3. Action

The action space of RA is a discrete set representing the candidate machines to which the current operation can be assigned. If there are  m  machines available in the system, the action space is defined as:  A R A = 0,1 , , m 1 , where each action corresponds to assigning the current operation to machine  M k  with index  k .
For SA on machine  M k , the action space consists of the indices of all jobs waiting in the current queue. It is defined as:  A S A = 0 , 1 , , Q k t 1 , where each action selects the job at the corresponding position in the queue for processing next.

4.4. Reward

To guide RA to make decisions that balance energy efficiency and scheduling quality whenever an operation  O i , j  arrives in the system, we design an immediate reward feedback mechanism based on the state-action pair. The instant reward  r t  for RA upon completing action  a t  is defined as the sum of three components, as defined in Equation (28):
r t = r t g r i d + r t e f f i c i e n c y + r t u t i l i z a t i o n
Firstly, RA is encouraged to prioritize jobs with lower power purchase ratios. The corresponding reward  r t g r i d  is defined as in Equation (29):
r t g r i d = 1 m l = 1 m ρ l t ρ k t
In addition to controlling the power purchase intensity, further reducing the processing energy consumption of jobs also helps decrease the total purchased electricity, thereby indirectly enhancing the utilization of clean energy and reducing carbon dioxide emissions. Therefore, we introduce a reward component based on energy consumption deviation. Let  E ¯  denote the average processing energy consumption of the target operation  O i , j  across all machines, and  E k  be its processing energy consumption on the currently selected machine  M k . The energy efficiency reward deviation term  r t e f f i c i e n c y  is then defined as in Equation (30):
r t e f f i c i e n c y = E ¯ E k max M j M E j
where  E k = P i , j , k T i , j , k E ¯ = 1 m k = 1 m P i , j , k T i , j , k . Let the average expected machine utilization before and after the current scheduling action be  U a v g ( t )  and  U a v g ( t ) , respectively, where  U a v g t = 1 m k = 1 m U k ( t ) , The machine utilization reward deviation term  r t u t i l i z a t i o n  is defined as in Equation (31):
r t u t i l i z a t i o n = 1 , i f   U a v g t > U a v g t 0 , o t h e r w i s e                                                          
SA is triggered when machine  M k  becomes idle and selects a job from its queue  Q k ( t )  to process. To guide SA to consider both scheduling urgency and queue disruption risk when selecting jobs, we formulate the following immediate reward function  r t , as defined in Equation (32):
r t = r t s l a c k + r t p t + r t e t a r d + r t a t a r d
where the slackness guidance term  r t s l a c k  is used to encourage prioritizing jobs with smaller slack times. Let  S ^ i = S i m a x ( S )  be the normalized slack time, and  S ^  the set of normalized slack times. If the currently selected job  J i  satisfies:
S ^ i min S ^ ε S
then the slackness reward  r t s l a c k  is defined as in Equation (34):
r t s l a c k = 0.7 , i f   S ^ i min S ^ ε S 0.1 , o t h e r w i s e                                  
where  ε S  is tolerance threshold for slackness. The values  + 0.7  and  0.1  are assigned to establish slackness as the dominant priority signal in the reward structure. A relatively strong positive value (0.7) ensures that jobs with the smallest slack are consistently prioritized, whereas a mild penalty (−0.1) prevents non-urgent selections without destabilizing learning. The magnitude difference reflects the importance of slackness while allowing sufficient exploration.
Let  T ^ i , j , k = T i , j , k m e d i a n T  denote the normalized processing time, and  T ^  the set of normalized processing times. If, within the slackness neighborhood, the currently selected job’s processing time satisfies:
T ^ i , j , k min T ^ + ε P
then the processing time reward  r t p t  is defined as in Equation (36):
r t p t = 0.3 , i f   S ^ i min S ^ ε S   a n d   T ^ i , j , k min T ^ + ε P 0 , o t e h r w i s e                                                                                                                                                  
where  ε P  is tolerance threshold for processing time. The value  + 0.3  is intentionally smaller than the slack reward (0.7), reflecting that shorter processing time is a secondary preference used only to refine decisions within the urgent-job neighborhood. The zero reward for non-qualifying jobs avoids imposing unnecessary negative signals and helps maintain policy diversity.
The expected and actual tardiness interference terms,  r t e t a r d  and  r t a t a r d , are defined in Equations (37) and (38), respectively:
r t e t a r d = 0 , i f   r i e t a r d = min R e t a r d 0.1 , o t e h r w i s e                                      
r t a t a r d = 0 , i f   r i a t a r d = min R a t a r d 0.3 , o t e h r w i s e                                      
The penalty magnitudes follow a rational hierarchy: a mild penalty (–0.1) is applied to expected tardiness, as it is an estimated risk, whereas a stronger penalty (–0.3) is applied to actual tardiness, which directly harms scheduling performance. Keeping both penalties moderate avoids destabilizing the learning dynamics while ensuring that the agent consistently avoids tardiness-inducing decisions.

4.5. Process and Structure of Network

To address the dynamic job queue length faced by SA at each decision point, we design a variable-length job feature sequence-based policy modeling approach (this method is also applicable to RA). As shown in Figure 3, both RA and SA adopt the same Actor–Critic architecture and train their scheduling policies based on the PPO algorithm.
This architecture consists of two subnetworks: the Actor network and the Critic network. At each scheduling decision point  t , the agent receives the current system state  s t —a variable-length feature sequence of size  N —as input. The policy network outputs the probability distribution  π θ a t | s t  over all candidate actions, from which an action  a t  is sampled. After executing  a t , the agent obtains an immediate reward  r t  and transitions to the next state  s t + 1 .
Simultaneously, we record  l o g π θ o l d , the log-probability of the chosen action under the old policy. This is crucial because in PPO, the policy ratio used for updates is:  r t ( θ ) = π θ ( a t , s t ) π θ o l d ( a t , s t ) , and  π θ o l d  corresponds to the policy used to sample the current action. Hence, we must store  l o g π θ o l d  at action sampling to serve as a baseline during subsequent training for ratio computation. The tuple  ( s t , a t , l o g π θ o l d , r t , s t + 1 )  is recorded and stored in a centralized replay buffer as the data foundation for future policy updates. We compute the advantage vector  A t  to update the Actor network and use the mean squared error loss between the return  R t  and the state value estimate  V s t  to update the Critic network.
The entire neural network consists of three main components:
  • Shared network: This part consists of two fully connected layers with 128 and 128 nodes, respectively, each using the ReLU activation function. It extracts state features and constructs a unified feature embedding. This module is shared between the Actor and Critic networks to enhance representation consistency and training efficiency.
  • Actor head: After obtaining the embedding for each state feature, a linear transformation layer (dimension 128 → 1) generates a score for each feature. Then, a Softmax function normalizes these scores into a probability distribution, representing the action-selection policy under the current state.
  • Critic head: First, average pooling is applied to all state feature embeddings to obtain a compact representation of the global state. Then, a linear layer (128 → 1) estimates the value function of this state.
Algorithm 1 Presents the details of the PPO algorithm.
Algorithm 1: PPO algorithm
1:
Initialize shared network parameters  ϕ , actor head parameters  θ , critic head parameters  φ
2:
Set learning rate  η , clipping threshold  ε , discount factor  γ , GAE  λ , training episode number  E p , mini-batch size  B , buffer capacity  C , number of training epochs  S
3:
Initialize centralized experience buffer  B u f f
4:
for  e p i s o d e = 1   t o   E p  do
5:
 Reset environment; get initial state  s 0
6:
while not done do
7:
  Observe current state  s t = { s 1 ,   , s N }
8:
  Compute action probabilities  π t = S o f t m a x f θ ϕ ( s 1 ) ,   ,   f θ ϕ ( s N )
9:
  Sample action  a t π t
10:
  Execute  a t ;  observe next state  s t + 1  and reward  r t
11:
  Compute and store  l o g π θ o l d a t s t )
12:
  Store transition  ( s t , a t , l o g π θ o l d a t s t ) , r t , s t + 1 )    in buffer  B u f f
13:
  if  B u f f > C  then remove oldest ( | B u f f |     C ) transitions
14:
end while
15:
if  B u f f B  then
16:
  Compute  V ( s i ) = f φ ( ϕ ( s i ) )  for all  i  in  B u f f
17:
  Compute  δ i = r i + γ × V ( s i + 1 ) V ( s i )
18:
  Estimate advantages  A  and returns  R  via GAE
19:
  for  e p o c h = 1   t o   S  do
20:
   Sample min-batch  M B u f f
21:
   Compute  r i ( θ ) = e x p l o g π θ ( a i | s i ) l o g π θ o l d a i s i )  for all  i  in  M
22:
   Compute actor loss:  L c l i p θ = E m i n r ( θ ) × A ,   c l i p ( r ( θ ) , 1 ε , 1 + ε ) × A
23:
   Compute value loss:  L v a l u e ( φ ) = M S E V ( s ) ,   R
24:
   Backpropagate  θ   L c l i p u p d a t e s   θ   a n d   ϕ
25:
   Backpropagate  φ   L v a l u e u p d a t e s   φ   a n d   ϕ
26:
  end for
27:
   θ o l d θ
28:
end if
29:
end for
30:
Return  ϕ θ φ

5. Numerical Experiments

This section presents the experimental design and performance evaluation of the proposed framework. Specifically, Section 5.1 describes the parameter settings of the numerical test instances, followed by the evaluation metrics for multi-objective optimization in Section 5.2. The training details of the RA and SA are then introduced in Section 5.3. Section 5.4 and Section 5.5 report the experimental results on static and dynamic instances, respectively, including comparisons with rule-based heuristics as well as representative DRL-based and evolutionary scheduling methods. Finally, Section 5.6 provides a case study to analyze the scheduling behaviors of different methods in energy-aware scheduling scenarios.

5.1. Parameters of Numerical Instances

Based on the study [31], the paper randomly generates numerical simulation instances to train and evaluate the proposed RA and SA. The job arrival process is assumed to follow an exponential distribution  E x p ( λ n e w ) , where  λ n e w = 1 μ n e w , and  μ n e w  denotes the average inter-arrival time of new jobs. Each job consists of multiple operations, and the number of operations is randomly sampled within a predefined range to represent manufacturing tasks of varying complexity.
Each operation can be processed on  m  candidate machines. To capture heterogeneous processing capabilities and energy consumption differences, the processing time  T i , j , k  and power consumption  P i , j , k  of each operation on each machine are randomly generated. Additionally, the idle power  P I , k  of each machine is assigned varying values to highlight the scheduling challenges under energy constraints. The due date  D i  of job  J i  is calculated by scaling the total average processing time of its operations with a tightness factor  L  and adding the current system time offset, creating a mix of high-urgency and low-urgency tasks:
D i = L j = 1 n i 1 m k = 1 m T i , j , k + A i
Regarding energy modeling, the outputs of PV and WT are subject to natural variability and are modeled as stochastic time series. Wind power output depends on wind speed and follows a Weibull distribution [32], expressed as:
P W T t = 0 , v < v i n   a n d   v > v o u t P r v v i n v r v i n , v i n v < v r P r , v r v < v o u t
where  v  is the actual wind speed,  v r  is the rated wind speed,  v i n  and  v o u t  are the cut-in and cut-out speeds, and  P r  is the rated output power. PV power generation primarily depends on solar irradiance and panel efficiency [33], and can be approximated as:
P t P V = η P V S P V G t
where  S P V  is the PV panel area,  G t  is the solar irradiance, and  η P V  is the PV conversion efficiency. The daily outputs of PV and WT over 24 h are illustrated in Figure 4 [32] and serve as input for scheduling simulations.
The value ranges and generation methods of all parameters are summarized in Table 5, where  r a n d i  denotes uniform integer sampling and  r a n d f  denotes uniform floating-point sampling. This instance generation framework enables realistic simulation of dynamic job arrivals, heterogeneous machine capabilities, and renewable energy fluctuations, providing a robust basis for training and evaluating multi-agent scheduling strategies.

5.2. Evaluation Metrics

To evaluate the overall performance of RA and SA in the multi-objective scheduling task, we adopt three comprehensive performance metrics: Generational Distance (GD) [34], Inverted Generational Distance Plus (IGD+) [35], and Hypervolume (HV) [36]. Among these, GD measures the convergence of the solution set. IGD+ evaluates both convergence and distribution uniformity, using a modified distance calculation to more accurately reflect the quality of the solution set relative to the reference Pareto front. HV indicates the dominance extent and distribution quality of the solution set within the objective space.
Before calculating these metrics, all solution sets are normalized using min-max scaling.
(1)
GD
GD measures the average distance from the obtained solution set  A  to the true Pareto front  P * , as defined in Equation (42):
G D A , P * = 1 A a A min p P *   d p , a 2
where  d p , a  denotes the Euclidean distance between solution  a  and its closest point  p  on the reference Pareto front. A smaller GD value indicates that the solution set is closer to the true Pareto front.
(2)
IGD+
IGD+ is a commonly used comprehensive metric in multi-objective optimization for evaluating the quality of a solution set, taking into account both convergence and distribution uniformity, as defined in Equation (43):
I G D + A , P * = 1 P * p P * min a A   d + p , a
where  A  denotes the solution set to be evaluated,  P *  represents the reference Pareto front, and  d + p , a  is the direction-sensitive modified distance function. This function only accounts for the distance when the objective values of solution  a  are worse than those of the reference point  p , thereby emphasizing the penalty for solutions that are dominated relative to the Pareto front.
Compared to GD, which measures from the solution set’s perspective, IGD+ is based on the reference Pareto front, allowing it to comprehensively assess both how well the current solution set approximates and covers the true Pareto boundary. A smaller IGD+ value indicates that the solution set is more uniformly distributed along the reference front and closer to the Pareto optimum, reflecting better overall performance.
(3)
HV
HV measures the volume covered by the solution set in the objective space and is defined as in Equation (44):
H V A = V o l u m e a A a 1 , r 1 × a 2 , r 2 × a m , r m
where  r  is the reference point (typically chosen beyond the maximum values of all objectives in the solution set), and  m  denotes the number of objectives. A larger HV value indicates that the solution set dominates a larger region in the objective space, reflecting higher solution quality.

5.3. The Training Details of RA and SA

Before introducing the content of this section, we first present the well-known routing and sequencing rules as shown in Table 6 and Table 7. In this study, we adopt a separated asynchronous learning architecture: RA is independently trained using the PPO method under the FIFO sequencing baseline rule, while SA is independently trained using PPO under the EA routing baseline rule. Specifically, both RA and SA are trained independently in a job shop with 10 machines. The training simulates 100 new job insertions, with the urgency factor  L  for each job set to 1.
The training tasks are implemented using the SimPy discrete-event simulation framework (SimPy 4.0.1) [37] and executed on a locally configured computing platform equipped with an Intel i7 processor, 16 GB RAM, and an NVIDIA RTX 3060 GPU. To ensure experimental reproducibility, the Python environment (Python 3.9.0) and deep learning framework (PyTorch 2.1) are explicitly specified, a fixed random seed (seed = 42) is applied to all training runs, and the entire training duration is recorded.
The main PPO parameters include the learning rate  η , mini-batch size  B , and discount factor  γ . In addition, the recommended clipping ratio of 0.2 is adopted, and the Adam optimizer is used to improve gradient-based optimization efficiency. Considering the significant influence of these parameters on training performance, an orthogonal experimental design is employed to determine their combinations, with the specific parameter levels provided in Table 8. For each parameter configuration, 10 independent training runs are conducted for every instance, with the number of training episodes fixed at 1500. The averaged HV values obtained from these experiments are subsequently compared and analyzed, and the trends of the seven parameter settings are illustrated in Figure 5.
Based on the experimental results, the optimal parameter combination is determined to be  η = 5 × 10 5 B = 32 , and  γ = 0.99 . The final PPO parameter configurations are summarized in Table 9.
The training results are shown in Figure 6. As training episodes increase, the cumulative rewards of both RA and SA show an overall upward trend, indicating that the agents have successfully learned effective policies.

5.4. Experimental Results on Static Instances

To comprehensively evaluate the generalization ability and robustness of the proposed RA_SA method for solving the LEDFJSP-RE problem, this section compares it with eleven representative scheduling approaches, including the DDQN-based method proposed in [31], the classical genetic algorithm (GA), and nine rule-based strategies listed in Table 10. To ensure fairness and comparability among different solution paradigms, all experiments were conducted on the widely used Brandimarte benchmark instances. In the static experimental setting, job arrival dynamics and renewable energy fluctuations are not considered. The machine power levels and job due-date parameters are set according to [31], thereby ensuring the rationality and consistency of the experimental configuration. Each instance is independently executed ten times, and the average values of all evaluation metrics are reported to mitigate the influence of randomness.
Table 11, Table 12 and Table 13 present the comparative results of different algorithms under multiple performance metrics, where the best results are highlighted in bold. The experimental results show that RA_SA achieves the lowest IGD+ values in most benchmark instances, particularly in Mk02, Mk03, and Mk04, indicating its superior convergence performance. Meanwhile, RA_SA attains the minimum GD values in several instances, including Mk02, Mk04, Mk05, and Mk10, suggesting that the obtained solution sets are closer to the true Pareto front and exhibit higher overall solution quality. Even in cases where RA_SA does not achieve the best IGD+ or GD values, it still demonstrates a clear advantage in terms of HV, reflecting stronger stability and consistency in balancing multiple objectives.
Furthermore, as illustrated by the Pareto fronts in Figure 7, RA_SA is able to obtain solution sets that are closer to the ideal front than those produced by DDQN and GA in most benchmark instances. This observation indicates that scheduling methods based on fixed or predefined rules are limited in their ability to fully capture system state characteristics at each decision stage and to adapt effectively to environmental changes, leading to performance degradation in complex scheduling scenarios. In contrast, reinforcement learning–based agents can dynamically perceive environmental states through continuous interaction and make adaptive decisions accordingly, thereby exhibiting superior optimization capability in multi-objective scheduling problems.
Further analysis indicates that although the single-agent DDQN method exhibits a certain degree of convergence in some instances, it still suffers from notable limitations in terms of solution diversity and stability. This limitation mainly arises from its single-agent decision-making structure, in which job assignment and sequencing decisions are coupled within a unified policy, thereby lacking sufficient flexibility and coordination when handling conflicting multiple objectives. Figure 8 presents violin plots of the four optimization objectives for the MK04 and MK10 instances, from which it can be observed that the solution sets obtained by RA_SA are clearly superior to those of the other comparative methods in terms of both distribution uniformity and numerical performance.

5.5. Experimental Results on Dynamic Instances

To comprehensively evaluate the performance of the proposed RA_SA framework under dynamic scheduling scenarios, a series of systematic experiments were conducted on multiple dynamic test instances. All experiments were carried out under conditions involving dynamic job arrivals and scheduling disturbances, in order to more realistically characterize the uncertainties encountered in practical production environments.
Table 14, Table 15 and Table 16 present the comparative results of different algorithms under multiple performance metrics, where the best results are highlighted in bold. The experimental results indicate that RA_SA achieves the lowest IGD+ values in most instances, attains the minimum GD values in several representative cases, and shows clear advantages in terms of the HV metric for the majority of instances. Overall, across all static test instances, RA_SA achieves the best IGD+ values in approximately 55.6% of the cases, the best GD values in about 61.1% of the cases, and the best HV values in roughly 88.8% of the cases. These quantitative results provide concrete support for the qualitative observations regarding the advantages of RA_SA presented earlier. This demonstrates its superior convergence performance, higher solution quality, and stronger stability and consistency in balancing multiple objectives.
Furthermore, the Pareto fronts obtained by RA_SA and other competing algorithms for several representative instances are illustrated in Figure 9. It can be observed that RA_SA is able to generate solution sets that are closer to the ideal Pareto front and exhibit a more balanced distribution among different objectives, highlighting its comprehensive advantages in terms of convergence and diversity. These results indicate that RA_SA can effectively coordinate conflicting objectives such as makespan, energy consumption, and carbon emissions in dynamic environments.
Further analysis from the perspective of problem scale reveals that RA_SA performs particularly well on medium- and large-scale instances, especially those with high job volumes and system complexity, such as the case of 1.5 × 20 × 200 ( L  ×  m  ×  N j o b s ). Such instances typically involve higher decision dimensionality and more complex job–resource coupling relationships, posing greater challenges to scheduling strategies. The experimental results demonstrate that RA_SA is able to maintain good convergence behavior and solution quality under these conditions, reflecting its strong robustness and scalability.
Overall, the dynamic experimental results demonstrate that RA_SA can effectively handle job insertions, renewable energy fluctuations, and multi-objective conflicts. Compared with traditional rule-based methods and single-agent approaches, RA_SA exhibits stronger adaptability and superior overall optimization performance in dynamic environments, confirming its effectiveness and practicality for complex green scheduling problems.

5.6. Case Study

It should be noted that the primary objective of this case study is not to directly demonstrate the immediate industrial deployability of the proposed framework, but rather to analyze and illustrate the decision-making behaviors of different scheduling paradigms under coupled job–energy dynamics. By constructing a scheduling scenario with renewable energy participation, this case study aims to provide insights into how different scheduling strategies respond to interactions among production load, energy utilization, and carbon emission control, thereby offering an analytical basis for future extensions toward practical industrial applications.
A test instance with a scale of 1.5 × 20 × 200 ( L × m × N j o b s ) is selected to evaluate the scheduling performance of RA_SA in terms of carbon emission control and operational cost under scenarios involving renewable energy participation. In addition to RA_SA, three comparative scheduling strategies are considered in the case study, including the EA_FIFO, the CT_SPT rule that ranked second overall in the previous experiments, and an energy-oriented scheduling method that combines a minimum energy routing rule (denoted as ME) with the SPT sequencing rule. Specifically, the ME routing rule prioritizes the assignment of each operation to the machine with the lowest processing energy consumption at the machine-selection stage.
Figure 10 presents a comparative illustration of the production load and renewable energy supply curves obtained by the three scheduling rules under the same renewable energy scenario. It can be observed that the ME routing rule consistently favors low-energy-consumption machines during machine selection, while neither explicitly accounting for the temporal fluctuations of renewable energy supply nor imposing constraints on job completion time and tardiness risk. As a result, jobs tend to be concentrated on a small number of low-energy machines, leading to load congestion in the time dimension and a significant extension of the makespan. Meanwhile, traditional rules such as EA_FIFO and CT_SPT, which lack a joint consideration of energy states and overall system load, struggle to achieve a balanced trade-off among makespan, energy consumption, carbon emissions, and job tardiness. In contrast, RA_SA not only accounts for machine-level energy consumption differences but also incorporates explicit awareness of renewable energy generation states. Through cooperative multi-agent decision-making, RA_SA dynamically adjusts job assignments and processing sequences, enabling the production load to better align with the renewable energy supply curve over time. This allows RA_SA to effectively reduce energy consumption and carbon emissions while simultaneously restraining the growth of makespan and job tardiness.
Furthermore, as shown in Table 17, additional experiments conducted on multiple randomly selected test instances yield consistent results with the above observations. Across different problem instances, RA_SA is able to achieve a more balanced trade-off between carbon emission reduction and operational cost, demonstrating its effectiveness and robustness in complex scheduling environments with renewable energy participation.

6. Conclusions and Future Work

This study develops an LEDFJSP-RE scheduling framework to support low-carbon and efficient operation in smart manufacturing environments. To address dynamic job arrivals and fluctuations in renewable energy supply, two cooperative agents—RA and SA—are designed and independently trained using PPO. With problem-specific state representations, tailored reward mechanisms, and a self-adaptive job-selection strategy, both agents maintain stable decision-making performance under dynamic conditions. Experimental results show that the proposed multi-agent framework consistently outperforms a wide range of baseline scheduling strategies under different disturbance scenarios. Case analyses further indicate that the coordinated RA_SA mechanism provides greater scheduling flexibility and enables better temporal alignment between production load and renewable energy generation profiles. Overall, the framework offers an effective intelligent solution for energy-aware scheduling in flexible job-shop environments.
Several limitations should be acknowledged. The framework is validated in a simulation environment, which enables controlled modeling of dynamic events and systematic analysis of scheduling behaviors, but inevitably abstracts certain complexities of real production systems. Factors such as transportation delays, sequence-dependent setup operations, finite buffer capacities, stochastic machine failures, and detailed machine-level energy behaviors (e.g., start-up peaks, shutdown policies, and idle-to-active transition costs) are not included. These simplifications reduce modeling complexity but may constrain direct applicability in certain industrial scenarios.
As manufacturing systems continue to evolve toward higher uncertainty and tighter coupling between production and energy resources, incorporating the omitted operational constraints will be essential. This includes modeling equipment failures, material-handling delays, buffer limitations, and setup operations, as well as integrating the scheduling framework more closely with energy management functions such as renewable energy forecasting, energy storage operation, and time-dependent electricity pricing. In addition, constructing standardized dynamic benchmark sets would facilitate more systematic and comprehensive comparisons across different classes of scheduling methods. Furthermore, recent advances in digital-twin-assisted dynamic rescheduling provide promising directions for enabling real-time state synchronization, disturbance prediction, and adaptive schedule adjustment [38]. Integrating digital twin mechanisms into the proposed MARL framework may further enhance its responsiveness and robustness in complex industrial environments.
Overall, the LEDFJSP-RE framework establishes a solid foundation for real-time, energy-aware scheduling in flexible job-shop systems. With the progressive integration of more realistic operational constraints, advanced energy management strategies, and real-time feedback mechanisms, the framework has the potential to evolve into a practical and scalable solution for more complex and large-scale manufacturing settings.

Author Contributions

Conceptualization, Q.Z. and Y.L.; methodology, Y.L.; software, Q.Z.; validation, Q.Z. and Y.L.; formal analysis, Q.Z. and Y.L.; investigation, Q.Z.; resources, Y.L.; data curation, Q.Z.; writing—original draft preparation, Q.Z.; writing—review and editing, Y.L., C.T., T.Z., E.H. and Y.L.; visualization, Q.Z.; supervision, Y.L., C.T., T.Z., E.H. and Y.L.; project administration, Q.Z.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

The research reported in this paper is financially supported by the Guizhou Provincial Basic Research Program (Natural Science) (Grant No. Qian Ke He Ji Chu-ZK [2024] General 439).

Data Availability Statement

All data generated or analyzed during this study are included in this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Friedlingstein, P.; O’SUllivan, M.; Jones, M.W.; Andrew, R.M.; Hauck, J.; Landschützer, P.; Le Quéré, C.; Li, H.; Luijkx, I.T.; Olsen, A.; et al. Global Carbon Budget 2024. Earth Syst. Sci. Data 2025, 17, 965–1039. [Google Scholar] [CrossRef]
  2. Deng, Z.; Zhu, B.; Davis, S.J.; Ciais, P.; Guan, D.; Gong, P.; Liu, Z. Global carbon emissions and decarbonization in 2024. Nat. Rev. Earth Environ. 2025, 6, 231–233. [Google Scholar] [CrossRef]
  3. Ghorbanzadeh, M.; Davari, M.; Ranjbar, M. Energy-aware flow shop scheduling with uncertain renewable energy. Comput. Oper. Res. 2024, 170, 106741. [Google Scholar] [CrossRef]
  4. Meng, L.; Zhang, C.; Shao, X.; Ren, Y. MILP models for energy-aware flexible job shop scheduling problem. J. Clean. Prod. 2019, 210, 710–723. [Google Scholar] [CrossRef]
  5. Xin, X.; Jiang, Q.; Li, S.; Gong, S.; Chen, K. Energy-efficient scheduling for a permutation flow shop with variable transportation time using an improved discrete whale swarm optimization. J. Clean. Prod. 2021, 293, 126121. [Google Scholar] [CrossRef]
  6. Meeks, R.C.; Thompson, H.; Wang, Z. Decentralized renewable energy to grow manufacturing? Evidence from microhydro mini-grids in Nepal. J. Environ. Econ. Manag. 2025, 130, 103092. [Google Scholar] [CrossRef]
  7. Subramanyam, V.; Jin, T.; Novoa, C. Sizing a renewable microgrid for flow shop manufacturing using climate analytics. J. Clean. Prod. 2020, 252, 119829. [Google Scholar] [CrossRef]
  8. Destouet, C.; Tlahig, H.; Bettayeb, B.; Mazari, B. Flexible job shop scheduling problem under Industry 5.0: A survey on human reintegration, environmental consideration and resilience improvement. J. Manuf. Syst. 2023, 67, 155–173. [Google Scholar] [CrossRef]
  9. Fazli Khalaf, A.; Wang, Y. Energy-cost-aware flow shop scheduling considering intermittent renewables, energy storage, and real-time electricity pricing. Int. J. Energy Res. 2018, 42, 3928–3942. [Google Scholar] [CrossRef]
  10. Wu, J.; Liu, Y. A modified multi-agent proximal policy optimization algorithm for multi-objective dynamic partial re-entrant hybrid flow shop scheduling problem. Eng. Appl. Artif. Intell. 2025, 140, 109688. [Google Scholar] [CrossRef]
  11. Islam, M.M.; Rahman, M.; Heidari, F.; Gude, V. Optimal onsite microgrid design for net-zero energy operation in manufacturing industry. Procedia Comput. Sci. 2021, 185, 81–90. [Google Scholar] [CrossRef]
  12. Hao, L.; Zou, Z.; Liang, X. Solving multi-objective energy-saving flexible job shop scheduling problem by hybrid search genetic algorithm. Comput. Ind. Eng. 2025, 200, 110829. [Google Scholar] [CrossRef]
  13. Wang, Z.; He, M.; Wu, J.; Chen, H.; Cao, Y. An improved MOEA/D for low-carbon many-objective flexible job shop scheduling problem. Comput. Ind. Eng. 2024, 188, 109926. [Google Scholar] [CrossRef]
  14. Wu, X.; Sun, Y. A green scheduling algorithm for flexible job shop with energy-saving measures. J. Clean. Prod. 2018, 172, 3249–3264. [Google Scholar] [CrossRef]
  15. Shahsavari-Pour, N.; Ghasemishabankareh, B. A novel hybrid meta-heuristic algorithm for solving multi objective flexible job shop scheduling. J. Manuf. Syst. 2013, 32, 771–780. [Google Scholar] [CrossRef]
  16. Yin, L.; Li, X.; Gao, L.; Lu, C.; Zhang, Z. A novel mathematical model and multi-objective method for the low-carbon flexible job shop scheduling problem. Sustain. Comput. Inform. Syst. 2017, 13, 15–30. [Google Scholar] [CrossRef]
  17. Zhou, K.; Tan, C.; Wu, Y.; Yang, B.; Long, X. Research on low-carbon flexible job shop scheduling problem based on improved Grey Wolf Algorithm. J. Supercomput. 2024, 80, 12123–12153. [Google Scholar] [CrossRef]
  18. Beier, J.; Thiede, S.; Herrmann, C. Energy flexibility of manufacturing systems for variable renewable energy supply integration: Real-time control method and simulation. J. Clean. Prod. 2017, 141, 648–661. [Google Scholar] [CrossRef]
  19. Schulz, J.; Scharmer, V.M.; Zaeh, M.F. Energy self-sufficient manufacturing systems—Integration of renewable and decentralized energy generation systems. Procedia Manuf. 2020, 43, 40–47. [Google Scholar] [CrossRef]
  20. Dong, J.; Ye, C. Green scheduling of distributed two-stage reentrant hybrid flow shop considering distributed energy resources and energy storage system. Comput. Ind. Eng. 2022, 169, 108146. [Google Scholar] [CrossRef]
  21. Lu, S.; Wang, Y.; Kong, M.; Wang, W.; Tan, W.; Song, Y. A Double Deep Q-Network framework for a flexible job shop scheduling problem with dynamic job arrivals and urgent job insertions. Eng. Appl. Artif. Intell. 2024, 133, 108487. [Google Scholar] [CrossRef]
  22. Li, Y.; Gu, W.; Yuan, M.; Tang, Y. Real-time data-driven dynamic scheduling for flexible job shop with insufficient transportation resources using hybrid deep Q network. Robot. Comput. Manuf. 2022, 74, 102283. [Google Scholar] [CrossRef]
  23. Zhang, L.; Feng, Y.; Xiao, Q.; Xu, Y.; Li, D.; Yang, D.; Yang, Z. Deep reinforcement learning for dynamic flexible job shop scheduling problem considering variable processing times. J. Manuf. Syst. 2023, 71, 257–273. [Google Scholar] [CrossRef]
  24. Zhang, W.; Zhao, F.; Li, Y.; Du, C.; Feng, X.; Mei, X. A novel collaborative agent reinforcement learning framework based on an attention mechanism and disjunctive graph embedding for flexible job shop scheduling problem. J. Manuf. Syst. 2024, 74, 329–345. [Google Scholar] [CrossRef]
  25. Liu, R.; Piplani, R.; Toro, C. A deep multi-agent reinforcement learning approach to solve dynamic job shop scheduling problem. Comput. Oper. Res. 2023, 159, 106294. [Google Scholar] [CrossRef]
  26. Wan, L.; Fu, L.; Li, C.; Li, K. An effective multi-agent-based graph reinforcement learning method for solving flexible job shop scheduling problem. Eng. Appl. Artif. Intell. 2025, 139, 109557. [Google Scholar] [CrossRef]
  27. Schulman, J.; Levine, S.; Moritz, P.; Jordan, M.I.; Abbeel, P. Trust region policy optimization. arXiv 2015, arXiv:1502.05477. [Google Scholar] [CrossRef]
  28. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
  29. Xiao, J.; Zhang, Z.; Terzi, S.; Tao, F.; Anwer, N.; Eynard, B. Multi-scenario digital twin-driven human-robot collaboration multi-task disassembly process planning based on dynamic time petri-net and heterogeneous multi-agent double deep Q-learning network. J. Manuf. Syst. 2025, 83, 284–305. [Google Scholar] [CrossRef]
  30. Xiao, J.; Gao, J.; Anwer, N.; Eynard, B. Multi-Agent Reinforcement Learning Method for Disassembly Sequential Task Optimization Based on Human–Robot Collaborative Disassembly in Electric Vehicle Battery Recycling. J. Manuf. Sci. Eng. 2023, 145, 121001. [Google Scholar] [CrossRef]
  31. Luo, S. Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning. Appl. Soft Comput. 2020, 91, 106208. [Google Scholar] [CrossRef]
  32. Yuan, G.; Gao, Y.; Ye, B.; Huang, R. Real-time pricing for smart grid with multi-energy microgrids and uncertain loads: A bilevel programming method. Int. J. Electr. Power Energy Syst. 2020, 123, 106206. [Google Scholar] [CrossRef]
  33. Tan, Z.; Fan, W.; Li, H.; De, G.; Ma, J.; Yang, S.; Ju, L.; Tan, Q. Dispatching optimization model of gas-electricity virtual power plant considering uncertainty based on robust stochastic optimization theory. J. Clean. Prod. 2020, 247, 119106. [Google Scholar] [CrossRef]
  34. Zitzler, E.; Deb, K.; Thiele, L. Comparison of multiobjective evolutionary algorithms: Empirical results. Evol. Comput. 2000, 8, 173–195. [Google Scholar] [CrossRef]
  35. Ishibuchi, H.; Masuda, H.; Tanigaki, Y.; Nojima, Y. Modified distance calculation in generational distance and inverted generational distance. In Evolutionary Multi-Criterion Optimization; Gaspar-Cunha, A., Henggeler Antunes, C., Coello, C.C., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 110–125. [Google Scholar]
  36. Zitzler, E.; Thiele, L. Multiobjective evolutionary algorithms: A comparative case study and the strength Pareto approach. IEEE Trans. Evol. Comput. 1999, 3, 257–271. [Google Scholar] [CrossRef]
  37. Matloff, N. Introduction to Discrete-Event Simulation and the SimPy Language; University of California, Davis: Davis, CA, USA, 2008. [Google Scholar]
  38. Yang, Y.; Yang, M.; Anwer, N.; Eynard, B.; Shu, L.; Xiao, J. A novel digital twin-assisted prediction approach for optimum rescheduling in high-efficient flexible production workshops. Comput. Ind. Eng. 2023, 182, 109398. [Google Scholar] [CrossRef]
Figure 1. LEDFJSP-RE model.
Figure 1. LEDFJSP-RE model.
Machines 14 00088 g001
Figure 2. Framework of the routing and sequencing scheduling system.
Figure 2. Framework of the routing and sequencing scheduling system.
Machines 14 00088 g002
Figure 3. Process and structure of the network.
Figure 3. Process and structure of the network.
Machines 14 00088 g003
Figure 4. Energy output of PV and WT in 24 h.
Figure 4. Energy output of PV and WT in 24 h.
Machines 14 00088 g004
Figure 5. The trend plots for the three parameters.
Figure 5. The trend plots for the three parameters.
Machines 14 00088 g005
Figure 6. Convergence curve of cumulative reward during training.
Figure 6. Convergence curve of cumulative reward during training.
Machines 14 00088 g006
Figure 7. Pareto fronts obtained by RA_SA and other dispatching rules on static instances.
Figure 7. Pareto fronts obtained by RA_SA and other dispatching rules on static instances.
Machines 14 00088 g007aMachines 14 00088 g007bMachines 14 00088 g007c
Figure 8. Violin plot of non-dominated solutions across dispatching rules for four objectives.
Figure 8. Violin plot of non-dominated solutions across dispatching rules for four objectives.
Machines 14 00088 g008aMachines 14 00088 g008b
Figure 9. Pareto fronts obtained by RA_SA and other dispatching rules on dynamic instances.
Figure 9. Pareto fronts obtained by RA_SA and other dispatching rules on dynamic instances.
Machines 14 00088 g009aMachines 14 00088 g009b
Figure 10. Hourly-averaged production load and renewable energy supply under different scheduling strategies for the 1.5 × 20 × 200 test case.
Figure 10. Hourly-averaged production load and renewable energy supply under different scheduling strategies for the 1.5 × 20 × 200 test case.
Machines 14 00088 g010
Table 1. Some studies on DRL-related flexible job shop scheduling.
Table 1. Some studies on DRL-related flexible job shop scheduling.
WorkAlgorithmAgentDynamic EventsObjectives
[10]PPOMultipleNew job insertions; Machine breakdownsMakespan; Total energy consumption
[21]DDQNSingleNew job insertionsMakespan; Average tardiness
[22]Hybrid DQNSingleNew job insertions; Machine breakdowns; Insufficient transportation resourcesMakespan; Total energy consumption
[23]PPOSingleVariable processing timesMakespan
[24]SAC (Soft Actor-Critic) + DQNMultipleNoneMakespan
[25]DDQNMultipleNew job insertionsToal tardiness
[26]PPOMultipleNoneMakespan
OurPPOMultipleDynamic job insertions; Due date change; Renewable energy supply fluctuationMakespan; Average tardiness; Total energy consumption; Total carbon emissions
Table 2. Notations and descriptions for the LEDFJSP-RE model.
Table 2. Notations and descriptions for the LEDFJSP-RE model.
ParametersDescriptionUnit
n Total number of jobs./
m Total number of machines./
J i The   i th job./
J The set of all jobs./
n i Total   number   of   operations   belonging   to   job   J i ./
O i , j The   j th   operation   of   job   J i ./
M k The   k th machine./
M The set of all machines./
T i , j , k The   processing   time   of   operation   O i , j   on   machine   M k .min
P i , j , k The   processing   power   of   operation   O i , j   on   machine   M k .kW
P I , k The   idle   power   of   M k .kW
A i The   arrival   time   of   job   J i .min
C i The   completion   time   of   job   J i .min
S i , j The   start   time   of   operation   O i , j /
C i , j The   completion   time   of   operation   O i , j /
D i The   due   date   of   job   J i .min
k Index   of   machines ,   k = 1,2 , , m ./
p G r i d Unit electricity price of the main gridCNY/kWh
p t a r d Unit tardiness penalty costCNY/min
P d i s , m a x Maximum discharging power of ESS.kW
P c h a , m a x Maximum charging power of ESS.kW
η d i s Discharging efficiency of ESS./
η c h a Charging efficiency of ESS./
Decision variablesDescriptionUnit
N o w System current time.min
X i , j , k ( t ) 1   if   operation   O i , j   has   been   assigned   on   machine   M k   by   time   t ; 0 otherwise./
Q k ( t ) The   set   of   job   operations   waiting   to   be   processed   on   machine   M k   at   time   t ./
C O i t The   number   of   operations   of   job   J i   actually   completed   by   time   t ./
E I k ( t ) The   expected   idle   time   of   machine   M k   at   time   t ,   i . e . ,   the   actual   idle   time   of   M k   After   completing   all   jobs   in   Q k ( t ) .min
P k ( t ) Power   of   machine   M k   at   time   t .kW
P P V ( t ) The   PV   power   output   at   time   t .kW
P W T t The   WT   power   output   at   time   t .kW
P d i s t Discharging   power   of   ESS   at   time   t .kW
P c h a t Charging   power   of   ESS   at   time   t .kW
E E S S ( t ) The   remaining   energy   level   of   the   ESS   at   time   t .kWh
P G r i d t Main   grid   power   at   time   t .kW
L ( t ) The   load   power   at   time   t kW
Table 3. State features for RA.
Table 3. State features for RA.
FeatureExpression
The set of processing time { T i , j , k | k M }
The set of processing power { P i , j , k | k M }
The set of power purchase rates ρ k t | k M
Table 4. State features for SA.
Table 4. State features for SA.
FeatureExpression
The set of slack S = S i ( t ) | J i Q k ( t )
The set of processing Time T = { T i , j , k | J i Q k t }
The set of actual tardy rate R a t a r d = r i a t a r d ( t ) J i Q k ( t )
The set of expected tardy rate R e t a r d = { r i e t a r d ( t ) | J i Q k ( t ) }
Table 5. Parameter settings of training and testing benchmarks.
Table 5. Parameter settings of training and testing benchmarks.
ParameterValue
Total   number   of   machines   m . r a n d i [ 5 , 20 ]
Idle   power   of   machines   P I , k  (kW). r a n d f [ 1 ,   3 ]
Total   number   of   new   inserted   jobs   N j o b s . r a n d i [ 50 ,   200 ]
Number   of   operations   per   job   n i . r a n d i [ 1 ,   10 ]
Tightness   of   the   due   date   L . r a n d f [ 0.5 ,   1.5 ]
Mean   value   of   the   interarrival   time   μ n e w .10
Processing   time   T i , j , k  (min). r a n d i [ 10 ,   20 ]
Processing   power   P i , j , k  (kW). r a n d f [ 10 ,   20 ]
Carbon   emission   intensity   C  (kg/kWh).0.998
Tolerance   threshold   of   slack   ε S .0.05
Tolerance   threshold   of   processing   time   ε P .0.3
Discharging   efficiency   of   ESS   η d i s .0.95
Charging   efficiency   of   ESS   η c h a .0.95
Maximum   state   of   ESS   E E S S , m a x  (kWh)200
Minimum   state   of   ESS   E E S S , m i n  (kWh)10
Initial   state   of   ESS   E E S S ( 0 )  (kWh)100
Maximum   discharging   power   of   the   ESS   P d i s , m a x  (kW)200
Maximum   charging   power   of   the   ESS   P c h a , m a x   ( k W ) 200
Unit   tardiness   penalty   cost   p t a r d 0.1
Unit   electricity   price   of   the   main   grid   p G r i d 0.779
Table 6. Benchmark routing rules.
Table 6. Benchmark routing rules.
No.Other Routing RuleDescriptionCalculation
1Earliest Completion Time (CT)CT chooses the machine with the earliest completion Time. M k = arg min k M E I k ( t ) + T i , j , k
2Minimum Utilization Rate (MU)MU chooses the machine with minimum utilization rate. M k = arg min k M   U k t
3Earliest available machine (EA)EA chooses the machine with the earliest available time. M k = arg min k M   E I k ( t )
Table 7. Benchmark sequencing rules.
Table 7. Benchmark sequencing rules.
No.Other Sequencing RuleDescriptionCalculation
1Earliest due date (EDD)EDD chooses the next operation among the existing jobs with the earliest due date. J i = a r g min i Q k ( t ) D i
2Shortest processing time (SPT)SPT chooses the next operation among the existing jobs with the shortest processing time. J i = a r g min i Q k ( t ) T i , j , k
3First in first out (FIFO)FIFO chooses the next operation of the earliest arriving job. J i = a r g min i Q k ( t ) A i
Table 8. Levels of each parameter in PPO.
Table 8. Levels of each parameter in PPO.
ParameterParameter Levels
123
η 1 × 10−55 × 10−51 × 10−4
λ 0.900.950.99
B 3264128
Table 9. Parameter settings for the proposed PPO.
Table 9. Parameter settings for the proposed PPO.
ParameterValue
Discount factor  γ   0.99
GAE lambda  λ   0.95
Clipping parameter  ε   0.2
Learning rate  η  5 × 10−5
Mini-batch size  B   32
Training episode number  E p  1000
Number of training epochs  S  5
Buffer capacity  C  1000
OptimizerAdam
Table 10. Combined dispatching rules.
Table 10. Combined dispatching rules.
No.Rule
1EA_EDD
2EA_SPT
3EA_FIFO
4CT_EDD
5CT_SPT
6CT_FIFO
7MU_EDD
8MU_SPT
9MU_FIFO
Table 11. Comparison of IGD+ values between RA_SA and other rules on static instances.
Table 11. Comparison of IGD+ values between RA_SA and other rules on static instances.
RA_SADDQNGACT_EDDCT_FIFOCT_SPTEA_EDDEA_FIFOEA_SPTMU_EDDMU_FIFOMU_SPT
MK010.23100.20100.9290 0.2340 0.2290 0.2470 0.7520 0.8000 0.7430 0.8330 1.0600 0.7630
MK020.27600.2890 0.8510 0.3190 0.2890 0.3040 0.8630 0.8740 0.8700 0.9410 1.1400 0.8790
MK030.22500.3810 0.6410 0.3130 0.3430 0.3720 0.9840 0.9590 1.0100 0.9860 1.2200 0.9850
MK040.14600.2860 0.2860 0.2430 0.2130 0.2850 1.0300 1.0200 1.0100 1.0200 1.2100 1.0400
MK050.2540 0.2640 0.2410 0.3260 0.2260 0.21400.9740 0.9380 0.9230 0.9760 1.1700 0.9460
MK060.2330 0.2600 0.2290 0.2890 0.2450 0.20501.0500 1.0500 0.9950 1.0800 1.3000 1.0500
MK070.21500.2360 0.2430 0.2160 0.2670 0.2110 1.4200 1.4400 1.4300 1.3900 1.4200 1.4300
MK080.27600.3180 0.3010 0.3560 0.3220 0.2970 1.4200 1.4300 1.4600 1.4900 1.0300 1.4400
MK090.22900.2800 0.2870 0.3170 0.2790 0.2650 1.4400 1.4100 1.4200 1.4500 1.4300 1.4100
MK100.20600.3640 0.3090 0.4150 0.3280 0.3330 0.7520 0.8000 0.7430 0.8330 1.0600 0.7630
Table 12. Comparison of GD values between RA_SA and other rules on static instances.
Table 12. Comparison of GD values between RA_SA and other rules on static instances.
RA_SADDQNGACT_EDDCT_FIFOCT_SPTEA_EDDEA_FIFOEA_SPTMU_EDDMU_FIFOMU_SPT
MK010.05250.2340 0.6290 0.3810 0.0626 0.0598 0.6880 0.7340 0.6750 0.7700 0.8020 0.6910
MK020.03710.2890 0.7760 0.0587 0.0410 0.0605 0.7170 0.7460 0.7100 0.8070 0.9080 0.7510
MK030.1170 0.03810.5130 0.3800 0.0395 0.0441 0.8390 0.8200 0.8380 0.8230 0.9340 0.8270
MK040.01700.2860 0.8280 0.1000 0.0877 0.0837 1.0600 1.0100 0.9970 1.0200 1.1800 1.0400
MK050.01470.2640 0.7710 0.1500 0.0721 0.0705 0.9110 0.8890 0.8430 0.9390 0.9690 0.9060
MK060.0485 0.2600 0.9560 0.1290 0.0688 0.03180.9540 0.9500 0.8910 0.9970 1.0500 0.9570
MK070.0290 0.02160.6780 0.0705 0.0753 0.0653 0.9420 0.8800 0.8590 0.8550 0.9290 0.9250
MK080.1870 0.3180 0.8050 0.0441 0.0461 0.04101.0900 0.9280 0.8710 0.9150 0.9850 1.0400
MK090.02430.0280 0.8040 0.0584 0.5380 0.0648 0.9770 0.9560 0.8850 0.9060 0.9710 1.0000
MK100.10300.3640 0.6090 0.0833 0.0983 0.1260 0.7540 0.6470 0.6980 0.7760 0.8570 0.7290
Table 13. Comparison of HV values between RA_SA and other rules on static instances.
Table 13. Comparison of HV values between RA_SA and other rules on static instances.
RA_SADDQNGACT_EDDCT_FIFOCT_SPTEA_EDDEA_FIFOEA_SPTMU_EDDMU_FIFOMU_SPT
MK010.75600.2340 0.2290 0.3230 0.3390 0.3200 0.0748 0.0650 0.0697 0.0520 0.0137 0.0665
MK020.84500.2890 0.1140 0.1950 0.1850 0.1930 0.0277 0.0302 0.0220 0.0177 0.0045 0.0247
MK030.86400.3810 0.1110 0.1190 0.1070 0.1180 0.0150 0.0203 0.0118 0.0134 0.0021 0.0151
MK040.47800.2860 0.1460 0.4110 0.3550 0.3640 0.0393 0.0460 0.0380 0.0451 0.0126 0.0402
MK050.4630 0.76400.1610 0.4070 0.4420 0.4900 0.0228 0.0311 0.0277 0.0240 0.0069 0.0266
MK060.86400.2600 0.1290 0.4300 0.4690 0.5140 0.0127 0.0134 0.0170 0.0107 0.0028 0.0145
MK070.71900.2160 0.2430 0.3300 0.2980 0.3260 0.0117 0.0095 0.0103 0.0131 0.0124 0.0110
MK080.75000.3180 0.3010 0.1450 0.1620 0.1590 0.0113 0.0098 0.0098 0.0098 0.0103 0.0106
MK090.93600.2800 0.2870 0.1410 0.1390 0.1450 0.0068 0.0079 0.0073 0.0059 0.0072 0.0075
MK100.75100.3640 0.3090 0.2200 0.2680 0.2280 0.0588 0.0794 0.0431 0.0425 0.0102 0.0620
Table 14. IGD+ values for the pareto fronts obtained by RA_SA and other rules on dynamic instances.
Table 14. IGD+ values for the pareto fronts obtained by RA_SA and other rules on dynamic instances.
L m N j o b s RA_SADDQNGACT_EDDCT_FIFOCT_SPTEA_EDDEA_FIFOEA_SPTMU_EDDMU_FIFOMU_SPT
0.510500.3310 0.5980 0.6170 0.2340 0.22900.2480 0.8640 0.8830 0.8560 0.8370 0.8930 0.9000
1000.27600.7820 0.7590 0.2890 0.2890 0.3040 0.8810 0.8780 0.9090 0.8820 0.8860 0.9010
2000.22500.8980 0.8420 0.3810 0.3810 0.3720 0.9680 0.9450 0.9490 0.9520 0.9340 0.9640
20500.3460 0.9360 0.9330 0.2860 0.2860 0.28501.0300 1.0100 1.0100 0.9790 0.9970 0.9820
1000.3540 0.9230 0.9330 0.2640 0.2410 0.21401.0500 1.0600 1.0700 1.0800 1.0500 1.0700
2000.4130 0.9780 0.9700 0.2600 0.2290 0.20501.0900 1.1000 1.1100 1.0900 1.1000 1.1100
110500.3150 1.0100 1.0300 0.2160 0.2430 0.21100.7920 0.8580 0.8350 0.8590 0.8760 0.8610
1000.27601.0400 1.0700 0.3180 0.3010 0.2970 0.8780 0.8700 0.8470 0.8730 0.8440 0.8440
2000.22901.0600 1.0800 0.2800 0.2870 0.2650 0.9190 0.9330 0.9480 0.9290 0.9050 0.9410
20500.20600.6370 0.6120 0.3640 0.3090 0.3330 0.9930 1.0300 0.9850 0.9590 0.9830 0.9780
1000.2310 0.7800 0.7210 0.2380 0.2370 0.22301.0800 1.0400 1.0600 1.0300 1.0300 1.0400
2000.21800.8500 0.8440 0.2570 0.2530 0.2700 1.0700 1.0800 1.0600 1.0300 1.0100 1.0100
1.510500.2430 0.8830 0.8590 0.2370 0.2530 0.23400.8010 0.8160 0.8110 0.8470 0.8190 0.8470
1000.22100.9260 0.8710 0.2850 0.2680 0.2780 0.8060 0.8100 0.8370 0.8060 0.8100 0.7890
2000.2530 0.9450 0.9120 0.2180 0.2250 0.21500.8600 0.8550 0.8840 0.8520 0.8750 0.9070
20500.17801.0100 0.9880 0.2880 0.3220 0.2850 0.8630 0.8430 0.8810 0.8560 0.8380 0.8730
1000.13901.0400 1.0500 0.3930 0.4210 0.3880 0.8150 0.8380 0.8650 0.7950 0.7840 0.8110
2000.18401.0400 1.0800 0.2660 0.3410 0.2740 0.8570 0.8710 0.8520 0.8280 0.8420 0.8260
Table 15. GD values for the pareto fronts obtained by RA_SA and other rules on dynamic instances.
Table 15. GD values for the pareto fronts obtained by RA_SA and other rules on dynamic instances.
L m N j o b s RA_SADDQNGACT_EDDCT_FIFOCT_SPTEA_EDDEA_FIFOEA_SPTMU_EDDMU_FIFOMU_SPT
0.510500.5250 0.0727 0.0856 0.03790.0622 0.0598 0.7900 0.7490 0.7520 0.7530 0.7890 0.7790
1000.37100.0811 0.1840 0.0587 0.0440 0.0605 0.7660 0.7670 0.8050 0.7550 0.7570 0.7860
2000.1170 0.0626 0.1030 0.03800.0395 0.0441 0.8810 0.8310 0.8570 0.8710 0.8370 0.8950
20500.01700.0692 0.0736 0.1000 0.0987 0.0837 0.8160 0.8110 0.7960 0.7710 0.7970 0.7970
1000.01470.0530 0.0710 0.1500 0.0821 0.0705 0.8200 0.8270 0.8380 0.8680 0.8370 0.8500
2000.0485 0.0656 0.0730 0.1290 0.0778 0.03180.9410 0.9670 0.9610 0.9370 0.9230 0.9620
110500.1900 0.1260 0.1140 0.0705 0.0753 0.06530.8470 0.8950 0.8800 0.8980 0.9140 0.8880
1000.01870.1770 0.0716 0.0441 0.0461 0.0490 0.8230 0.7780 0.7980 0.8200 0.7870 0.8170
2000.3430 0.1310 0.0602 0.0584 0.05380.0648 0.8490 0.8520 0.8720 0.8540 0.8470 0.8890
20500.01330.0675 0.1270 0.0833 0.0983 0.1060 0.7830 0.8020 0.7820 0.7890 0.7580 0.7620
1000.02250.0769 0.1890 0.1080 0.0681 0.1120 0.9160 0.8680 0.9080 0.8750 0.8810 0.8540
2000.02010.0535 0.1080 0.0621 0.0760 0.0621 0.8300 0.8230 0.8080 0.7660 0.7440 0.7790
1.510500.01830.1050 0.0784 0.0249 0.0325 0.0374 0.8160 0.8160 0.8110 0.8440 0.7910 0.8360
1000.2310 0.0448 0.0316 0.0731 0.0726 0.05730.7540 0.7100 0.7600 0.7320 0.7230 0.7150
2000.4470 0.0645 0.1120 0.04720.0516 0.0538 0.8110 0.8150 0.8500 0.8080 0.8220 0.8690
20500.04940.1850 0.1010 0.0961 0.1990 0.1560 0.6400 0.6330 0.6720 0.6600 0.6540 0.6840
1000.02810.1410 0.1120 0.1680 0.2790 0.2730 0.6220 0.6390 0.6510 0.5780 0.6110 0.5940
2000.01510.2240 0.1410 0.0864 0.1810 0.1420 0.6180 0.6160 0.6270 0.6190 0.5770 0.6020
Table 16. HV values for the pareto fronts obtained by RA_SA and other rules on dynamic instances.
Table 16. HV values for the pareto fronts obtained by RA_SA and other rules on dynamic instances.
L m N j o b s RA_SADDQNGACT_EDDCT_FIFOCT_SPTEA_EDDEA_FIFOEA_SPTMU_EDDMU_FIFOMU_SPT
0.510500.65500.1770 0.0556 0.3230 0.3390 0.3200 0.0439 0.0441 0.0480 0.0521 0.0354 0.0383
1000.75500.1060 0.0302 0.1950 0.1850 0.1930 0.0273 0.0306 0.0251 0.0288 0.0311 0.0273
2000.87500.0657 0.0203 0.1190 0.1070 0.1180 0.0188 0.0214 0.0224 0.0239 0.0243 0.0197
20500.49900.2970 0.2990 0.4110 0.3550 0.3640 0.0167 0.0229 0.0200 0.0263 0.0211 0.0240
1000.4630 0.1970 0.1640 0.4070 0.4420 0.49000.0112 0.0100 0.0096 0.0093 0.0116 0.0106
2000.4640 0.1020 0.0943 0.4300 0.4690 0.51400.0089 0.0093 0.0079 0.0098 0.0091 0.0079
110500.68900.3300 0.3670 0.3300 0.2980 0.3260 0.0679 0.0488 0.0633 0.0545 0.0523 0.0537
1000.73000.3910 0.4810 0.1450 0.1620 0.1590 0.0320 0.0274 0.0335 0.0317 0.0355 0.0343
2000.91300.3830 0.4610 0.1410 0.1390 0.1450 0.0235 0.0212 0.0198 0.0222 0.0245 0.0217
20500.73900.1680 0.0579 0.2200 0.2680 0.2280 0.0249 0.0199 0.0253 0.0306 0.0257 0.0279
1000.73700.1100 0.0334 0.4510 0.4330 0.4470 0.0179 0.0229 0.0187 0.0234 0.0228 0.0203
2000.72200.0643 0.0203 0.3500 0.3560 0.3120 0.0105 0.0108 0.0128 0.0154 0.0168 0.0146
1.510500.83300.3070 0.2560 0.2610 0.2330 0.2680 0.0640 0.0623 0.0628 0.0525 0.0603 0.0519
1000.89500.1660 0.1390 0.1780 0.1740 0.1820 0.0439 0.0425 0.0364 0.0467 0.0414 0.0480
2000.92500.1370 0.1110 0.1260 0.1180 0.1200 0.0255 0.0264 0.0210 0.0246 0.0231 0.0191
20500.89500.2290 0.3070 0.4310 0.4020 0.4250 0.0646 0.0658 0.0621 0.0647 0.0756 0.0591
1000.93600.3640 0.3850 0.2810 0.2640 0.2750 0.0630 0.0545 0.0471 0.0696 0.0681 0.0625
2000.85800.2250 0.3200 0.4400 0.3400 0.3830 0.0448 0.0362 0.0417 0.0511 0.0458 0.0492
Table 17. Comparison of total cost and carbon emissions under different scheduling strategies across test cases.
Table 17. Comparison of total cost and carbon emissions under different scheduling strategies across test cases.
CasesItemsRA_SAME_SPTCT_SPTEA_FIFO
0.5 × 10 × 50 Total cost102.0172.0186.2294.3
Carbon emissions405.5442.1497.5602.1
1 × 10 × 100 Total cost208.8355.0336.1700.1
Carbon emissions337.3393.6443.41429.1
1.5 × 10 × 200 Total cost541.5878.1743.01246.2
Carbon emissions921.41078.11304.42742.3
0.5 × 20 × 50 Total cost132.5176.4191.4277.1
Carbon emissions382.3435.0488.1710.6
1 × 20 × 100 Total cost298.4487.0451.7753.4
Carbon emissions399.4463.1522.01924.0
1.5 × 20 × 200 Total cost505.3988.4930.81987.3
Carbon emissions1175.61325.41513.95481.5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lu, Y.; Zhu, Q.; Tian, C.; He, E.; Zhang, T. Low-Carbon and Energy-Efficient Dynamic Flexible Job Shop Scheduling Method Towards Renewable Energy Driven Manufacturing. Machines 2026, 14, 88. https://doi.org/10.3390/machines14010088

AMA Style

Lu Y, Zhu Q, Tian C, He E, Zhang T. Low-Carbon and Energy-Efficient Dynamic Flexible Job Shop Scheduling Method Towards Renewable Energy Driven Manufacturing. Machines. 2026; 14(1):88. https://doi.org/10.3390/machines14010088

Chicago/Turabian Style

Lu, Yao, Qicai Zhu, Changhao Tian, Erbao He, and Taihua Zhang. 2026. "Low-Carbon and Energy-Efficient Dynamic Flexible Job Shop Scheduling Method Towards Renewable Energy Driven Manufacturing" Machines 14, no. 1: 88. https://doi.org/10.3390/machines14010088

APA Style

Lu, Y., Zhu, Q., Tian, C., He, E., & Zhang, T. (2026). Low-Carbon and Energy-Efficient Dynamic Flexible Job Shop Scheduling Method Towards Renewable Energy Driven Manufacturing. Machines, 14(1), 88. https://doi.org/10.3390/machines14010088

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop