MDPI - Publisher of Open Access Journals

17 pages, 3062 KiB

Open AccessArticle

Spatiotemporal Risk-Aware Patrol Planning Using Value-Based Policy Optimization and Sensor-Integrated Graph Navigation in Urban Environments

by Swarnamouli Majumdar, Anjali Awasthi and Lorant Andras Szolga

Appl. Sci. 2025, 15(15), 8565; https://doi.org/10.3390/app15158565 (registering DOI) - 1 Aug 2025

Abstract

This study proposes an intelligent patrol planning framework that leverages reinforcement learning, spatiotemporal crime forecasting, and simulated sensor telemetry to optimize autonomous vehicle (AV) navigation in urban environments. Crime incidents from Washington DC (2024–2025) and Seattle (2008–2024) are modeled as a dynamic spatiotemporal [...] Read more.

This study proposes an intelligent patrol planning framework that leverages reinforcement learning, spatiotemporal crime forecasting, and simulated sensor telemetry to optimize autonomous vehicle (AV) navigation in urban environments. Crime incidents from Washington DC (2024–2025) and Seattle (2008–2024) are modeled as a dynamic spatiotemporal graph, capturing the evolving intensity and distribution of criminal activity across neighborhoods and time windows. The agent’s state space incorporates synthetic AV sensor inputs—including fuel level, visual anomaly detection, and threat signals—to reflect real-world operational constraints. We evaluate and compare three learning strategies: Deep Q-Network (DQN), Double Deep Q-Network (DDQN), and Proximal Policy Optimization (PPO). Experimental results show that DDQN outperforms DQN in convergence speed and reward accumulation, while PPO demonstrates greater adaptability in sensor-rich, high-noise conditions. Real-map simulations and hourly risk heatmaps validate the effectiveness of our approach, highlighting its potential to inform scalable, data-driven patrol strategies in next-generation smart cities. Full article

(This article belongs to the Special Issue AI-Aided Intelligent Vehicle Positioning in Urban Areas)

► Show Figures

Figure 1

27 pages, 3211 KiB

Open AccessArticle

Hybrid Deep Learning-Reinforcement Learning for Adaptive Human-Robot Task Allocation in Industry 5.0

by Claudio Urrea

Systems 2025, 13(8), 631; https://doi.org/10.3390/systems13080631 - 26 Jul 2025

Viewed by 400

Abstract

Human-Robot Collaboration (HRC) is pivotal for flexible, worker-centric manufacturing in Industry 5.0, yet dynamic task allocation remains difficult because operator states—fatigue and skill—fluctuate abruptly. I address this gap with a hybrid framework that couples real-time perception and double-estimating reinforcement learning. A Convolutional Neural [...] Read more.

Human-Robot Collaboration (HRC) is pivotal for flexible, worker-centric manufacturing in Industry 5.0, yet dynamic task allocation remains difficult because operator states—fatigue and skill—fluctuate abruptly. I address this gap with a hybrid framework that couples real-time perception and double-estimating reinforcement learning. A Convolutional Neural Network (CNN) classifies nine fatigue–skill combinations from synthetic physiological cues (heart-rate, blink rate, posture, wrist acceleration); its outputs feed a Double Deep Q-Network (DDQN) whose state vector also includes task-queue and robot-status features. The DDQN optimises a multi-objective reward balancing throughput, workload and safety and executes at 10 Hz within a closed-loop pipeline implemented in MATLAB R2025a and RoboDK v5.9. Benchmarking on a 1000-episode HRC dataset (2500 allocations·episode⁻¹) shows the hybrid CNN+DDQN controller raises throughput to 60.48 ± 0.08 tasks·min⁻¹ (+21% vs. rule-based, +12% vs. SARSA, +8% vs. Dueling DQN, +5% vs. PPO), trims operator fatigue by 7% and sustains 99.9% collision-free operation (one-way ANOVA, p < 0.05; post-hoc power 1 − β = 0.87). Visual analyses confirm responsive task reallocation as fatigue rises or skill varies. The approach outperforms strong baselines (PPO, A3C, Dueling DQN) by mitigating Q-value over-estimation through double learning, providing robust policies under stochastic human states and offering a reproducible blueprint for multi-robot, Industry 5.0 factories. Future work will validate the controller on a physical Doosan H2017 cell and incorporate fairness constraints to avoid workload bias across multiple operators. Full article

(This article belongs to the Section Systems Engineering)

► Show Figures

Figure 1

22 pages, 2867 KiB

Open AccessArticle

Hierarchical Deep Reinforcement Learning-Based Path Planning with Underlying High-Order Control Lyapunov Function—Control Barrier Function—Quadratic Programming Collision Avoidance Path Tracking Control of Lane-Changing Maneuvers for Autonomous Vehicles

by Haochong Chen and Bilin Aksun-Guvenc

Electronics 2025, 14(14), 2776; https://doi.org/10.3390/electronics14142776 - 10 Jul 2025

Viewed by 350

Abstract

Path planning and collision avoidance are essential components of an autonomous driving system (ADS), ensuring safe navigation in complex environments shared with other road users. High-quality planning and reliable obstacle avoidance strategies are essential for advancing the SAE autonomy level of autonomous vehicles, [...] Read more.

Path planning and collision avoidance are essential components of an autonomous driving system (ADS), ensuring safe navigation in complex environments shared with other road users. High-quality planning and reliable obstacle avoidance strategies are essential for advancing the SAE autonomy level of autonomous vehicles, which can largely reduce the risk of traffic accidents. In daily driving scenarios, lane changing is a common maneuver used to avoid unexpected obstacles such as parked vehicles or suddenly appearing pedestrians. Notably, lane-changing behavior is also widely regarded as a key evaluation criterion in driver license examinations, highlighting its practical importance in real-world driving. Motivated by this observation, this paper aims to develop an autonomous lane-changing system capable of dynamically avoiding obstacles in multi-lane traffic environments. To achieve this objective, we propose a hierarchical decision-making and control framework in which a Double Deep Q-Network (DDQN) agent operates as the high-level planner to select lane-level maneuvers, while a High-Order Control Lyapunov Function–High-Order Control Barrier Function–based Quadratic Program (HOCLF-HOCBF-QP) serves as the low-level controller to ensure safe and stable trajectory tracking under dynamic constraints. Simulation studies are used to evaluate the planning efficiency and overall collision avoidance performance of the proposed hierarchical control framework. The results demonstrate that the system is capable of autonomously executing appropriate lane-changing maneuvers to avoid multiple obstacles in complex multi-lane traffic environments. In computational cost tests, the low-level controller operates at 100 Hz with an average solve time of 0.66 ms per step, and the high-level policy operates at 5 Hz with an average solve time of 0.60 ms per step. The results demonstrate real-time capability in autonomous driving systems. Full article

(This article belongs to the Special Issue Intelligent Technologies for Vehicular Networks, 2nd Edition)

► Show Figures

Figure 1

28 pages, 1293 KiB

Open AccessArticle

A Lightweight Double-Deep Q-Network for Energy Efficiency Optimization of Industrial IoT Devices in Thermal Power Plants

by Shuang Gao, Yuntao Zou and Li Feng

Electronics 2025, 14(13), 2569; https://doi.org/10.3390/electronics14132569 - 25 Jun 2025

Viewed by 352

Abstract

Industrial Internet of Things (IIoT) deployments in thermal power plants face significant energy efficiency challenges due to harsh operating conditions and device resource constraints. This paper presents gradient memory double-deep Q-network (GM-DDQN), a lightweight reinforcement learning approach for energy optimization on resource-constrained IIoT [...] Read more.

Industrial Internet of Things (IIoT) deployments in thermal power plants face significant energy efficiency challenges due to harsh operating conditions and device resource constraints. This paper presents gradient memory double-deep Q-network (GM-DDQN), a lightweight reinforcement learning approach for energy optimization on resource-constrained IIoT devices. At its core, GM-DDQN introduces the gradient memory mechanism, a novel memory-efficient alternative to experience replay. This core innovation, combined with a simplified neural network architecture and efficient parameter quantization, collectively reduces memory requirements by 99% and computation time by 85–90% compared to standard methods. Experimental evaluations across three realistic simulated thermal power plant scenarios demonstrate that GM-DDQN improves energy efficiency by 42% compared to fixed policies and 27% compared to threshold-based approaches, extending battery lifetime from 8–9 months to 14–15 months while maintaining 96–97% PSR. The method enables sophisticated reinforcement learning directly on IIoT edge devices without requiring cloud connectivity, reducing maintenance costs and improving monitoring reliability in industrial environments. Full article

(This article belongs to the Topic Advanced Propagation Channel Estimation Techniques for Sixth-Generation (6G) Wireless Communications)

► Show Figures

Figure 1

27 pages, 9972 KiB

Open AccessArticle

Multi-Scenario Robust Distributed Permutation Flow Shop Scheduling Based on DDQN

by Shilong Guo and Ming Chen

Appl. Sci. 2025, 15(12), 6560; https://doi.org/10.3390/app15126560 - 11 Jun 2025

Viewed by 361

Abstract

In order to address the Distributed Displacement Flow Shop Scheduling Problem (DPFSP) with uncertain processing times in real production environments, Plant Simulation is employed to construct a simulation model for the MSRDPFSP. The model conducts quantitative analyses of workshop layout, assembly line design, [...] Read more.

In order to address the Distributed Displacement Flow Shop Scheduling Problem (DPFSP) with uncertain processing times in real production environments, Plant Simulation is employed to construct a simulation model for the MSRDPFSP. The model conducts quantitative analyses of workshop layout, assembly line design, worker status, operating status of robotic arms and AGV vehicles, and production system failure rates. A hybrid NEH-DDQN algorithm is integrated into the simulation model via a COM interface and DLL, where the NEH algorithm ensures the model maintains optimal performance during the early training phase. Four scheduling strategies are designed for workpiece allocation across different workshops. A deep neural network replaces the traditional Q-table for greedy selection among these four scheduling strategies, using each workshop’s completion time as a simplified state variable. This approach reduces algorithm training complexity by abstracting away intricate workpiece allocation details. Experimental comparisons show that for the data of 500 workpieces, the NEH algorithm in 3 s demonstrates equivalent quality to that produced by the GA algorithm in 300 s. After 2000 iterations, the DDQN algorithm achieves a 15% reduction in makespan with only a 2.5% increase in computational time compared to random search, this joint simulation system offers an efficient and stable solution for the modeling and optimization of the MSRDPFSP issue. Full article

► Show Figures

Figure 1

20 pages, 772 KiB

Open AccessArticle

A DDQN-Guided Dual-Population Evolutionary Multitasking Framework for Constrained Multi-Objective Ship Berthing

by Jinyou Mou and Qidan Zhu

J. Mar. Sci. Eng. 2025, 13(6), 1068; https://doi.org/10.3390/jmse13061068 - 28 May 2025

Viewed by 353

Abstract

Autonomous ship berthing requires advanced path planning to balance multiple objectives, such as minimizing berthing time, reducing energy consumption, and ensuring safety under dynamic environmental constraints. However, traditional planning and learning methods often suffer from inefficient search or sparse rewards in such constrained [...] Read more.

Autonomous ship berthing requires advanced path planning to balance multiple objectives, such as minimizing berthing time, reducing energy consumption, and ensuring safety under dynamic environmental constraints. However, traditional planning and learning methods often suffer from inefficient search or sparse rewards in such constrained and high-dimensional settings. This study introduces a double deep Q-network (DDQN)-guided dual-population constrained multi-objective evolutionary algorithm (CMOEA) framework for autonomous ship berthing. By integrating deep reinforcement learning (DRL) with CMOEA, the framework employs DDQN to dynamically guide operator selection, enhancing search efficiency and solution diversity. The designed reward function optimizes thrust, time, and heading accuracy while accounting for vessel kinematics, water currents, and obstacles. Simulations on the CSAD vessel model demonstrate that this framework outperforms baseline algorithms such as evolutionary multitasking constrained multi-objective optimization (EMCMO), DQN, Q-learning, and non-dominated sorting genetic algorithm II (NSGA-II), achieving superior efficiency and stability while maintaining the required berthing angle. The framework also exhibits strong adaptability across varying environmental conditions, making it a promising solution for autonomous ship berthing in port environments. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

23 pages, 4463 KiB

Open AccessArticle

Dual-Priority Delayed Deep Double Q-Network (DPD3QN): A Dueling Double Deep Q-Network with Dual-Priority Experience Replay for Autonomous Driving Behavior Decision-Making

by Shuai Li, Peicheng Shi, Aixi Yang, Heng Qi and Xinlong Dong

Algorithms 2025, 18(5), 291; https://doi.org/10.3390/a18050291 - 19 May 2025

Viewed by 436

Abstract

The behavior decision control of autonomous vehicles is a critical aspect of advancing autonomous driving technology. However, current behavior decision algorithms based on deep reinforcement learning still face several challenges, such as insufficient safety and sparse reward mechanisms. To solve these problems, this [...] Read more.

The behavior decision control of autonomous vehicles is a critical aspect of advancing autonomous driving technology. However, current behavior decision algorithms based on deep reinforcement learning still face several challenges, such as insufficient safety and sparse reward mechanisms. To solve these problems, this paper proposes a dueling double deep Q-network based on dual-priority experience replay—DPD3QN. Initially, the dueling network is integrated with the double deep Q-network, and the original network’s output layer is restructured to enhance the precision of action value estimation. Subsequently, dual-priority experience replay is incorporated to facilitate the model’s ability to swiftly recognize and leverage critical experiences. Ultimately, the training and evaluation are conducted on the OpenAI Gym simulation platform. The test results show that DPD3QN helps to improve the convergence speed of driverless vehicle behavior decision-making. Compared with the currently popular DQN and DDQN algorithms, this algorithm achieves higher success rates in challenging scenarios. Test scenario I increases by 11.8 and 25.8 percentage points, respectively, while the success rates in test scenarios I and II rise by 8.8 and 22.2 percentage points, respectively, indicating a more secure and efficient autonomous driving decision-making capability. Full article

(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

► Show Figures

Figure 1

16 pages, 2932 KiB

Open AccessArticle

Research on Mobile Agent Path Planning Based on Deep Reinforcement Learning

by Shengwei Jin, Xizheng Zhang, Ying Hu, Ruoyuan Liu, Qing Wang, Haihua He, Junyu Liao and Lijing Zeng

Systems 2025, 13(5), 385; https://doi.org/10.3390/systems13050385 - 16 May 2025

Viewed by 383

Abstract

For mobile agent path planning, traditional path planning algorithms frequently induce abrupt variations in path curvature and steering angles, increasing the risk of lateral tire slippage and undermining operational safety. Concurrently, conventional reinforcement learning methods struggle to converge rapidly, leading to an insufficient [...] Read more.

For mobile agent path planning, traditional path planning algorithms frequently induce abrupt variations in path curvature and steering angles, increasing the risk of lateral tire slippage and undermining operational safety. Concurrently, conventional reinforcement learning methods struggle to converge rapidly, leading to an insufficient efficiency in planning to meet the demand for energy economy. This study proposes LSTM Bézier–Double Deep Q-Network (LB-DDQN), an advanced path-planning framework for mobile agents based on deep reinforcement learning. The architecture first enables mapless navigation through a DDQN foundation, subsequently integrates long short-term memory (LSTM) networks for the fusion of environmental features and preservation of training information, and ultimately enhances the path’s quality through redundant node elimination via an obstacle–path relationship analysis, combined with Bézier curve-based trajectory smoothing. A sensor-driven three-dimensional simulation environment featuring static obstacles was constructed using the ROS and Gazebo platforms, where LiDAR-equipped mobile agent models were trained for real-time environmental perception and strategy optimization prior to deployment on experimental vehicles. The simulation and physical implementation results reveal that LB-DDQN achieves effective collision avoidance, while demonstrating marked enhancements in critical metrics: the path’s smoothness, energy efficiency, and motion stability exhibit average improvements exceeding 50%. The framework further maintains superior safety standards and operational efficiency across diverse scenarios. Full article

► Show Figures

Figure 1

35 pages, 8275 KiB

Open AccessArticle

Marine Voyage Optimization and Weather Routing with Deep Reinforcement Learning

by Charilaos Latinopoulos, Efstathios Zavvos, Dimitrios Kaklis, Veerle Leemen and Aristides Halatsis

J. Mar. Sci. Eng. 2025, 13(5), 902; https://doi.org/10.3390/jmse13050902 - 30 Apr 2025

Viewed by 1904

Abstract

Marine voyage optimization determines the optimal route and speed to ensure timely arrival. The problem becomes particularly complex when incorporating a dynamic environment, such as future expected weather conditions along the route and unexpected disruptions. This study explores two model-free Deep Reinforcement Learning [...] Read more.

Marine voyage optimization determines the optimal route and speed to ensure timely arrival. The problem becomes particularly complex when incorporating a dynamic environment, such as future expected weather conditions along the route and unexpected disruptions. This study explores two model-free Deep Reinforcement Learning (DRL) algorithms: (i) a Double Deep Q Network (DDQN) and (ii) a Deep Deterministic Policy Gradient (DDPG). These algorithms are computationally costly, so we split optimization into an offline phase (costly pre-training for a route) and an online phase where the algorithms are fine-tuned as updated weather data become available. Fine tuning is quick enough for en-route adjustments and for updating the offline planning for different dates where the weather might be very different. The models are compared to classical and heuristic methods: the DDPG achieved a 4% lower fuel consumption than the DDQN and was only outperformed by Tabu Search by 1%. Both DRL models demonstrate high adaptability to dynamic weather updates, achieving up to 12% improvement in fuel consumption compared to the distance-based baseline model. Additionally, they are non-graph-based and self-learning, making them more straightforward to extend and integrate into future digital twin-driven autonomous solutions, compared to traditional approaches. Full article

(This article belongs to the Special Issue Autonomous Marine Vehicle Operations—3rd Edition)

► Show Figures

Figure 1

33 pages, 3244 KiB

Open AccessArticle

Long Short-Term Memory–Model Predictive Control Speed Prediction-Based Double Deep Q-Network Energy Management for Hybrid Electric Vehicle to Enhanced Fuel Economy

by Haichao Liu, Hongliang Wang, Miao Yu, Yaolin Wang and Yang Luo

Sensors 2025, 25(9), 2784; https://doi.org/10.3390/s25092784 - 28 Apr 2025

Viewed by 815

Abstract

How to further improve the fuel economy and emission performance of hybrid vehicles through scientific and reasonable energy management strategies has become an urgent issue to be addressed at present. This paper proposes an energy management model based on speed prediction using Long [...] Read more.

How to further improve the fuel economy and emission performance of hybrid vehicles through scientific and reasonable energy management strategies has become an urgent issue to be addressed at present. This paper proposes an energy management model based on speed prediction using Long Short-Term Memory (LSTM) neural networks. The initial learning rate and dropout probability of the LSTM speed prediction model are optimized using a Double Deep Q-Network (DDQN) algorithm. Furthermore, the LSTM speed prediction function is implemented within a Model Predictive Control (MPC) framework. A fuzzy logic-based driving mode recognition system classifies driving cycles and identifies real-time conditions. The fuzzy logic-based driving mode is used to divide the typical driving cycle into different driving modes, and the real-time driving modes are identified. The LSTM-MPC method achieves low RMSE across different prediction horizons. Using predicted power demand, battery SOC, and real-time power demand as inputs, the model implements MPC for real-time control. In our experiments, four prediction horizons (5 s, 10 s, 15 s, and 20 s) were set. The energy management strategy demonstrated optimal performance and the lowest fuel consumption at a 5 s horizon, with fuel usage at only 6.3220 L, saving 2.034 L compared to the rule-based strategy. Validation under the UDDS driving cycle revealed that the LSTM-MPC-DDQN strategy reduced fuel consumption by 0.2729 L compared to the rule-based approach and showed only a 0.0749 L difference from the DP strategy. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

28 pages, 6260 KiB

Open AccessFeature PaperArticle

Development of Chiller Plant Models in OpenAI Gym Environment for Evaluating Reinforcement Learning Algorithms

by Xiangrui Wang, Qilin Zhang, Zhihua Chen, Jingjing Yang and Yixing Chen

Energies 2025, 18(9), 2225; https://doi.org/10.3390/en18092225 - 27 Apr 2025

Viewed by 857

Abstract

To face the global energy crisis, the requirement of energy transition and sustainable development has emphasized the importance of controlling building energy management systems. Reinforcement learning (RL) has shown notable energy-saving potential in the optimal control of heating, ventilation, and air-conditioning (HVAC) systems. [...] Read more.

To face the global energy crisis, the requirement of energy transition and sustainable development has emphasized the importance of controlling building energy management systems. Reinforcement learning (RL) has shown notable energy-saving potential in the optimal control of heating, ventilation, and air-conditioning (HVAC) systems. However, the coupling of the algorithms and environments limits the cross-scenario application. This paper develops chiller plant models in OpenAI Gym environments to evaluate different RL algorithms for optimizing condenser water loop control. A shopping mall in Changsha, China, was selected as the case study building. First, an energy simulation model in EnergyPlus was generated using AutoBPS. Then, the OpenAI Gym chiller plant system model was developed and validated by comparing it with the EnergyPlus simulation results. Moreover, two RL algorithms, Deep-Q-Network (DQN) and Double Deep-Q-Network (DDQN), were deployed to control the condenser water flow rate and approach temperature of cooling towers in the RL environment. Finally, the optimization performance of DQN across three climate zones was evaluated using the AutoBPS-Gym toolkit. The findings indicated that during the cooling season in a shopping mall in Changsha, the DQN control method resulted in energy savings of 14.16% for the cooling water system, whereas the DDQN method achieved savings of 14.01%. Using the average control values from DQN, the EnergyPlus simulation recorded an energy-saving rate of 10.42% compared to the baseline. Furthermore, implementing the DQN algorithm across three different climatic zones led to an average energy savings of 4.0%, highlighting the toolkit’s ability to effectively utilize RL for optimal control in various environmental contexts. Full article

(This article belongs to the Special Issue Energy Modeling and Efficiency Optimization for Sustainable Building Systems)

► Show Figures

Figure 1

21 pages, 5686 KiB

Open AccessArticle

Path Planning for Agricultural UAVs Based on Deep Reinforcement Learning and Energy Consumption Constraints

by Haitao Fu, Zheng Li, Weijian Zhang, Yuxuan Feng, Li Zhu, Yunze Long and Jian Li

Agriculture 2025, 15(9), 943; https://doi.org/10.3390/agriculture15090943 - 26 Apr 2025

Viewed by 651

Abstract

Traditional pesticide application methods pose systemic threats to sustainable agriculture due to inefficient spraying practices and ecological contamination. Although agricultural drones demonstrate potential to address these challenges, they face critical limitations in energy-constrained complete coverage path planning for field operations. This study proposes [...] Read more.

Traditional pesticide application methods pose systemic threats to sustainable agriculture due to inefficient spraying practices and ecological contamination. Although agricultural drones demonstrate potential to address these challenges, they face critical limitations in energy-constrained complete coverage path planning for field operations. This study proposes a novel BiLG-D3QN algorithm by integrating deep reinforcement learning with Bi-LSTM and Bi-GRU architectures, specifically designed to optimize segmented coverage path planning under payload-dependent energy consumption constraints. The methodology encompasses four components: payload-energy consumption modeling, soybean cultivation area identification using Google Earth Engine-derived spatial distribution data, raster map construction, and enhanced segmented coverage path planning implementation. Through simulation experiments, the BiLG-D3QN algorithm demonstrated superior coverage efficiency, outperforming DDQN by 13.45%, D3QN by 12.27%, Dueling DQN by 14.62%, A-Star by 15.59%, and PPO by 22.15%. Additionally, the algorithm achieved an average redundancy rate of only 2.45%, which is significantly lower than that of DDQN (18.89%), D3QN (17.59%), Dueling DQN (17.59%), A-Star (21.54%), and PPO (25.12%). These results highlight the notable advantages of the BiLG-D3QN algorithm in addressing the challenges of pesticide spraying tasks in agricultural UAV applications. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

25 pages, 1392 KiB

Open AccessArticle

Dynamic Scheduling for Multi-Objective Flexible Job Shops with Machine Breakdown by Deep Reinforcement Learning

by Rui Wu, Jianxin Zheng and Xiyan Yin

Processes 2025, 13(4), 1246; https://doi.org/10.3390/pr13041246 - 20 Apr 2025

Viewed by 856

Abstract

Dynamic scheduling for flexible job shops under machine breakdown is a complex and challenging problem due to its valuable application in real-life productions. However, prior studies have struggled to perform well in changeable scenarios. To address this challenge, this paper introduces a dual-objective [...] Read more.

Dynamic scheduling for flexible job shops under machine breakdown is a complex and challenging problem due to its valuable application in real-life productions. However, prior studies have struggled to perform well in changeable scenarios. To address this challenge, this paper introduces a dual-objective deep reinforcement learning (DRL) to solve this problem. This algorithm is based on the Double Deep Q-network (DDQN) and incorporates the attention mechanism. It decouples action relationships in the action space to reduce problem dimensionality and introduces an adaptive weighting method in agent decision-making to obtain high-quality Pareto front solutions. The algorithm is evaluated on a set of benchmark instances and compared with state-of-the-art algorithms. The experimental results show that the proposed algorithm outperforms the state-of-the-art algorithms regarding machine offset and total tardiness, demonstrating more excellent stability and higher-quality solutions. At the same time, the actual use of the algorithm is verified using cases from real enterprises, and the results are still better than those of the multi-objective meta-heuristic algorithm. Full article

(This article belongs to the Special Issue Transfer Learning Methods in Equipment Reliability Management)

► Show Figures

Figure 1

42 pages, 2232 KiB

Open AccessArticle

Federated Reinforcement Learning-Based Dynamic Resource Allocation and Task Scheduling in Edge for IoT Applications

by Saroj Mali, Feng Zeng, Deepak Adhikari, Inam Ullah, Mahmoud Ahmad Al-Khasawneh, Osama Alfarraj and Fahad Alblehai

Sensors 2025, 25(7), 2197; https://doi.org/10.3390/s25072197 - 30 Mar 2025

Cited by 1 | Viewed by 1917

Abstract

Using Google cluster traces, the research presents a task offloading algorithm and a hybrid forecasting model that unites Bidirectional Long Short-Term Memory (BiLSTM) with Gated Recurrent Unit (GRU) layers along an attention mechanism. This model predicts resource usage for flexible task scheduling in [...] Read more.

Using Google cluster traces, the research presents a task offloading algorithm and a hybrid forecasting model that unites Bidirectional Long Short-Term Memory (BiLSTM) with Gated Recurrent Unit (GRU) layers along an attention mechanism. This model predicts resource usage for flexible task scheduling in Internet of Things (IoT) applications based on edge computing. The suggested algorithm improves task distribution to boost performance and reduce energy consumption. The system’s design includes collecting data, fusing and preparing it for use, training models, and performing simulations with EdgeSimPy. Experimental outcomes show that the method we suggest is better than those used in best-fit, first-fit, and worst-fit basic algorithms. It maintains power stability usage among edge servers while surpassing old-fashioned heuristic techniques. Moreover, we also propose the Deep Deterministic Policy Gradient (D4PG) based on a Federated Learning algorithm for adjusting the participation of dynamic user equipment (UE) according to resource availability and data distribution. This algorithm is compared to DQN, DDQN, Dueling DQN, and Dueling DDQN models using Non-IID EMNIST, IID EMNIST datasets, and with the Crop Prediction dataset. Results indicate that the proposed D4PG method achieves superior performance, with an accuracy of 92.86% on the Crop Prediction dataset, outperforming alternative models. On the Non-IID EMNIST dataset, the proposed approach achieves an F1-score of 0.9192, demonstrating better efficiency and fairness in model updates while preserving privacy. Similarly, on the IID EMNIST dataset, the proposed D4PG model attains an F1-score of 0.82 and an accuracy of 82%, surpassing other Reinforcement Learning-based approaches. Additionally, for edge server power consumption, the hybrid offloading algorithm reduces fluctuations compared to existing methods, ensuring more stable energy usage across edge nodes. This corroborates that the proposed method can preserve privacy by handling issues related to fairness in model updates and improving efficiency better than state-of-the-art alternatives. Full article

(This article belongs to the Special Issue Securing E-Health Data Across IoMT and Wearable Sensor Networks)

► Show Figures

Figure 1

27 pages, 6487 KiB

Open AccessArticle

Flexible Job Shop Dynamic Scheduling and Fault Maintenance Personnel Cooperative Scheduling Optimization Based on the ACODDQN Algorithm

by Jiansha Lu, Jiarui Zhang, Jun Cao, Xuesong Xu, Yiping Shao and Zhenbo Cheng

Mathematics 2025, 13(6), 932; https://doi.org/10.3390/math13060932 - 11 Mar 2025

Viewed by 864

Abstract

In order to address the impact of equipment fault diagnosis and repair delays on production schedule execution in the dynamic scheduling of flexible job shops, this paper proposes a multi-resource, multi-objective dynamic scheduling optimization model, which aims to minimize delay time and completion [...] Read more.

In order to address the impact of equipment fault diagnosis and repair delays on production schedule execution in the dynamic scheduling of flexible job shops, this paper proposes a multi-resource, multi-objective dynamic scheduling optimization model, which aims to minimize delay time and completion time. It integrates the scheduling of the workpieces, machines, and maintenance personnel to improve the response efficiency of emergency equipment maintenance. To this end, a self-learning Ant Colony Algorithm based on deep reinforcement learning (ACODDQN) is designed in this paper. The algorithm searches the solution space by using the ACO, prioritizes the solutions by combining the non-dominated sorting strategies, and achieves the adaptive optimization of scheduling decisions by utilizing the organic integration of the pheromone update mechanism and the DDQN framework. Further, the generated solutions are locally adjusted via the feasible solution optimization strategy to ensure that the solutions satisfy all the constraints and ultimately generate a Pareto optimal solution set with high quality. Simulation results based on standard examples and real cases show that the ACODDQN algorithm exhibits significant optimization effects in several tests, which verifies its superiority and practical application potential in dynamic scheduling problems. Full article

(This article belongs to the Special Issue Ensemble Evolutionary Algorithms and Machine Learning for Solving Complex Optimization and Scheduling Problems)

► Show Figures

Figure 1

Search Results (118)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (118)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI