MDPI - Publisher of Open Access Journals

44 pages, 6212 KiB

Open AccessArticle

A Hybrid Deep Reinforcement Learning Architecture for Optimizing Concrete Mix Design Through Precision Strength Prediction

by Ali Mirzaei and Amir Aghsami

Math. Comput. Appl. 2025, 30(4), 83; https://doi.org/10.3390/mca30040083 (registering DOI) - 3 Aug 2025

Viewed by 32

Abstract

Concrete mix design plays a pivotal role in ensuring the mechanical performance, durability, and sustainability of construction projects. However, the nonlinear interactions among the mix components challenge traditional approaches in predicting compressive strength and optimizing proportions. This study presents a two-stage hybrid framework [...] Read more.

Concrete mix design plays a pivotal role in ensuring the mechanical performance, durability, and sustainability of construction projects. However, the nonlinear interactions among the mix components challenge traditional approaches in predicting compressive strength and optimizing proportions. This study presents a two-stage hybrid framework that integrates deep learning with reinforcement learning to overcome these limitations. First, a Convolutional Neural Network–Long Short-Term Memory (CNN–LSTM) model was developed to capture spatial–temporal patterns from a dataset of 1030 historical concrete samples. The extracted features were enhanced using an eXtreme Gradient Boosting (XGBoost) meta-model to improve generalizability and noise resistance. Then, a Dueling Double Deep Q-Network (Dueling DDQN) agent was used to iteratively identify optimal mix ratios that maximize the predicted compressive strength. The proposed framework outperformed ten benchmark models, achieving an MAE of 2.97, RMSE of 4.08, and R² of 0.94. Feature attribution methods—including SHapley Additive exPlanations (SHAP), Elasticity-Based Feature Importance (EFI), and Permutation Feature Importance (PFI)—highlighted the dominant influence of cement content and curing age, as well as revealing non-intuitive effects such as the compensatory role of superplasticizers in low-water mixtures. These findings demonstrate the potential of the proposed approach to support intelligent concrete mix design and real-time optimization in smart construction environments. Full article

(This article belongs to the Section Engineering)

► Show Figures

Figure 1

31 pages, 3480 KiB

Open AccessArticle

The First Step of AI in LEO SOPs: DRL-Driven Epoch Credibility Evaluation to Enhance Opportunistic Positioning Accuracy

by Jiaqi Yin, Feilong Li, Ruidan Luo, Xiao Chen, Linhui Zhao, Hong Yuan and Guang Yang

Remote Sens. 2025, 17(15), 2692; https://doi.org/10.3390/rs17152692 - 3 Aug 2025

Viewed by 57

Abstract

Low Earth orbit (LEO) signal of opportunity (SOP) positioning relies on the accumulation of epochs obtained through prolonged observation periods. The contribution of an LEO satellite single epoch to positioning accuracy is influenced by multi-level characteristics that are challenging for traditional models. To [...] Read more.

Low Earth orbit (LEO) signal of opportunity (SOP) positioning relies on the accumulation of epochs obtained through prolonged observation periods. The contribution of an LEO satellite single epoch to positioning accuracy is influenced by multi-level characteristics that are challenging for traditional models. To address this limitation, we propose an Agent-Weighted Recursive Least Squares (RLS) Positioning Framework (AWR-PF). This framework employs an agent to comprehensively analyze individual epoch characteristics, assess their credibility, and convert them into adaptive weights for RLS iterations. We developed a novel Markov Decision Process (MDP) model to assist the agent in addressing the epoch weighting problem and trained the agent utilizing the Double Deep Q-Network (DDQN) algorithm on 107 h of Iridium signal data. Experimental validation on a separate 28 h Iridium signal test set through 97 positioning trials demonstrated that AWR-PF achieves superior average positioning accuracy compared to both standard RLS and randomly weighted RLS throughout nearly the entire iterative process. In a single positioning trial, AWR-PF improves positioning accuracy by up to 45.15% over standard RLS. To the best of our knowledge, this work represents the first instance where an AI algorithm is used as the core decision-maker in LEO SOP positioning, establishing a groundbreaking paradigm for future research. Full article

(This article belongs to the Special Issue LEO-Augmented PNT Service)

► Show Figures

Graphical abstract

17 pages, 3062 KiB

Open AccessArticle

Spatiotemporal Risk-Aware Patrol Planning Using Value-Based Policy Optimization and Sensor-Integrated Graph Navigation in Urban Environments

by Swarnamouli Majumdar, Anjali Awasthi and Lorant Andras Szolga

Appl. Sci. 2025, 15(15), 8565; https://doi.org/10.3390/app15158565 (registering DOI) - 1 Aug 2025

Viewed by 176

Abstract

This study proposes an intelligent patrol planning framework that leverages reinforcement learning, spatiotemporal crime forecasting, and simulated sensor telemetry to optimize autonomous vehicle (AV) navigation in urban environments. Crime incidents from Washington DC (2024–2025) and Seattle (2008–2024) are modeled as a dynamic spatiotemporal [...] Read more.

This study proposes an intelligent patrol planning framework that leverages reinforcement learning, spatiotemporal crime forecasting, and simulated sensor telemetry to optimize autonomous vehicle (AV) navigation in urban environments. Crime incidents from Washington DC (2024–2025) and Seattle (2008–2024) are modeled as a dynamic spatiotemporal graph, capturing the evolving intensity and distribution of criminal activity across neighborhoods and time windows. The agent’s state space incorporates synthetic AV sensor inputs—including fuel level, visual anomaly detection, and threat signals—to reflect real-world operational constraints. We evaluate and compare three learning strategies: Deep Q-Network (DQN), Double Deep Q-Network (DDQN), and Proximal Policy Optimization (PPO). Experimental results show that DDQN outperforms DQN in convergence speed and reward accumulation, while PPO demonstrates greater adaptability in sensor-rich, high-noise conditions. Real-map simulations and hourly risk heatmaps validate the effectiveness of our approach, highlighting its potential to inform scalable, data-driven patrol strategies in next-generation smart cities. Full article

(This article belongs to the Special Issue AI-Aided Intelligent Vehicle Positioning in Urban Areas)

► Show Figures

Figure 1

27 pages, 3211 KiB

Open AccessArticle

Hybrid Deep Learning-Reinforcement Learning for Adaptive Human-Robot Task Allocation in Industry 5.0

by Claudio Urrea

Systems 2025, 13(8), 631; https://doi.org/10.3390/systems13080631 - 26 Jul 2025

Viewed by 493

Abstract

Human-Robot Collaboration (HRC) is pivotal for flexible, worker-centric manufacturing in Industry 5.0, yet dynamic task allocation remains difficult because operator states—fatigue and skill—fluctuate abruptly. I address this gap with a hybrid framework that couples real-time perception and double-estimating reinforcement learning. A Convolutional Neural [...] Read more.

Human-Robot Collaboration (HRC) is pivotal for flexible, worker-centric manufacturing in Industry 5.0, yet dynamic task allocation remains difficult because operator states—fatigue and skill—fluctuate abruptly. I address this gap with a hybrid framework that couples real-time perception and double-estimating reinforcement learning. A Convolutional Neural Network (CNN) classifies nine fatigue–skill combinations from synthetic physiological cues (heart-rate, blink rate, posture, wrist acceleration); its outputs feed a Double Deep Q-Network (DDQN) whose state vector also includes task-queue and robot-status features. The DDQN optimises a multi-objective reward balancing throughput, workload and safety and executes at 10 Hz within a closed-loop pipeline implemented in MATLAB R2025a and RoboDK v5.9. Benchmarking on a 1000-episode HRC dataset (2500 allocations·episode⁻¹) shows the hybrid CNN+DDQN controller raises throughput to 60.48 ± 0.08 tasks·min⁻¹ (+21% vs. rule-based, +12% vs. SARSA, +8% vs. Dueling DQN, +5% vs. PPO), trims operator fatigue by 7% and sustains 99.9% collision-free operation (one-way ANOVA, p < 0.05; post-hoc power 1 − β = 0.87). Visual analyses confirm responsive task reallocation as fatigue rises or skill varies. The approach outperforms strong baselines (PPO, A3C, Dueling DQN) by mitigating Q-value over-estimation through double learning, providing robust policies under stochastic human states and offering a reproducible blueprint for multi-robot, Industry 5.0 factories. Future work will validate the controller on a physical Doosan H2017 cell and incorporate fairness constraints to avoid workload bias across multiple operators. Full article

(This article belongs to the Section Systems Engineering)

► Show Figures

Figure 1

22 pages, 2867 KiB

Open AccessArticle

Hierarchical Deep Reinforcement Learning-Based Path Planning with Underlying High-Order Control Lyapunov Function—Control Barrier Function—Quadratic Programming Collision Avoidance Path Tracking Control of Lane-Changing Maneuvers for Autonomous Vehicles

by Haochong Chen and Bilin Aksun-Guvenc

Electronics 2025, 14(14), 2776; https://doi.org/10.3390/electronics14142776 - 10 Jul 2025

Viewed by 373

Abstract

Path planning and collision avoidance are essential components of an autonomous driving system (ADS), ensuring safe navigation in complex environments shared with other road users. High-quality planning and reliable obstacle avoidance strategies are essential for advancing the SAE autonomy level of autonomous vehicles, [...] Read more.

Path planning and collision avoidance are essential components of an autonomous driving system (ADS), ensuring safe navigation in complex environments shared with other road users. High-quality planning and reliable obstacle avoidance strategies are essential for advancing the SAE autonomy level of autonomous vehicles, which can largely reduce the risk of traffic accidents. In daily driving scenarios, lane changing is a common maneuver used to avoid unexpected obstacles such as parked vehicles or suddenly appearing pedestrians. Notably, lane-changing behavior is also widely regarded as a key evaluation criterion in driver license examinations, highlighting its practical importance in real-world driving. Motivated by this observation, this paper aims to develop an autonomous lane-changing system capable of dynamically avoiding obstacles in multi-lane traffic environments. To achieve this objective, we propose a hierarchical decision-making and control framework in which a Double Deep Q-Network (DDQN) agent operates as the high-level planner to select lane-level maneuvers, while a High-Order Control Lyapunov Function–High-Order Control Barrier Function–based Quadratic Program (HOCLF-HOCBF-QP) serves as the low-level controller to ensure safe and stable trajectory tracking under dynamic constraints. Simulation studies are used to evaluate the planning efficiency and overall collision avoidance performance of the proposed hierarchical control framework. The results demonstrate that the system is capable of autonomously executing appropriate lane-changing maneuvers to avoid multiple obstacles in complex multi-lane traffic environments. In computational cost tests, the low-level controller operates at 100 Hz with an average solve time of 0.66 ms per step, and the high-level policy operates at 5 Hz with an average solve time of 0.60 ms per step. The results demonstrate real-time capability in autonomous driving systems. Full article

(This article belongs to the Special Issue Intelligent Technologies for Vehicular Networks, 2nd Edition)

► Show Figures

Figure 1

28 pages, 1293 KiB

Open AccessArticle

A Lightweight Double-Deep Q-Network for Energy Efficiency Optimization of Industrial IoT Devices in Thermal Power Plants

by Shuang Gao, Yuntao Zou and Li Feng

Electronics 2025, 14(13), 2569; https://doi.org/10.3390/electronics14132569 - 25 Jun 2025

Viewed by 367

Abstract

Industrial Internet of Things (IIoT) deployments in thermal power plants face significant energy efficiency challenges due to harsh operating conditions and device resource constraints. This paper presents gradient memory double-deep Q-network (GM-DDQN), a lightweight reinforcement learning approach for energy optimization on resource-constrained IIoT [...] Read more.

Industrial Internet of Things (IIoT) deployments in thermal power plants face significant energy efficiency challenges due to harsh operating conditions and device resource constraints. This paper presents gradient memory double-deep Q-network (GM-DDQN), a lightweight reinforcement learning approach for energy optimization on resource-constrained IIoT devices. At its core, GM-DDQN introduces the gradient memory mechanism, a novel memory-efficient alternative to experience replay. This core innovation, combined with a simplified neural network architecture and efficient parameter quantization, collectively reduces memory requirements by 99% and computation time by 85–90% compared to standard methods. Experimental evaluations across three realistic simulated thermal power plant scenarios demonstrate that GM-DDQN improves energy efficiency by 42% compared to fixed policies and 27% compared to threshold-based approaches, extending battery lifetime from 8–9 months to 14–15 months while maintaining 96–97% PSR. The method enables sophisticated reinforcement learning directly on IIoT edge devices without requiring cloud connectivity, reducing maintenance costs and improving monitoring reliability in industrial environments. Full article

(This article belongs to the Topic Advanced Propagation Channel Estimation Techniques for Sixth-Generation (6G) Wireless Communications)

► Show Figures

Figure 1

20 pages, 772 KiB

Open AccessArticle

A DDQN-Guided Dual-Population Evolutionary Multitasking Framework for Constrained Multi-Objective Ship Berthing

by Jinyou Mou and Qidan Zhu

J. Mar. Sci. Eng. 2025, 13(6), 1068; https://doi.org/10.3390/jmse13061068 - 28 May 2025

Viewed by 361

Abstract

Autonomous ship berthing requires advanced path planning to balance multiple objectives, such as minimizing berthing time, reducing energy consumption, and ensuring safety under dynamic environmental constraints. However, traditional planning and learning methods often suffer from inefficient search or sparse rewards in such constrained [...] Read more.

Autonomous ship berthing requires advanced path planning to balance multiple objectives, such as minimizing berthing time, reducing energy consumption, and ensuring safety under dynamic environmental constraints. However, traditional planning and learning methods often suffer from inefficient search or sparse rewards in such constrained and high-dimensional settings. This study introduces a double deep Q-network (DDQN)-guided dual-population constrained multi-objective evolutionary algorithm (CMOEA) framework for autonomous ship berthing. By integrating deep reinforcement learning (DRL) with CMOEA, the framework employs DDQN to dynamically guide operator selection, enhancing search efficiency and solution diversity. The designed reward function optimizes thrust, time, and heading accuracy while accounting for vessel kinematics, water currents, and obstacles. Simulations on the CSAD vessel model demonstrate that this framework outperforms baseline algorithms such as evolutionary multitasking constrained multi-objective optimization (EMCMO), DQN, Q-learning, and non-dominated sorting genetic algorithm II (NSGA-II), achieving superior efficiency and stability while maintaining the required berthing angle. The framework also exhibits strong adaptability across varying environmental conditions, making it a promising solution for autonomous ship berthing in port environments. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

23 pages, 4463 KiB

Open AccessArticle

Dual-Priority Delayed Deep Double Q-Network (DPD3QN): A Dueling Double Deep Q-Network with Dual-Priority Experience Replay for Autonomous Driving Behavior Decision-Making

by Shuai Li, Peicheng Shi, Aixi Yang, Heng Qi and Xinlong Dong

Algorithms 2025, 18(5), 291; https://doi.org/10.3390/a18050291 - 19 May 2025

Viewed by 446

Abstract

The behavior decision control of autonomous vehicles is a critical aspect of advancing autonomous driving technology. However, current behavior decision algorithms based on deep reinforcement learning still face several challenges, such as insufficient safety and sparse reward mechanisms. To solve these problems, this [...] Read more.

The behavior decision control of autonomous vehicles is a critical aspect of advancing autonomous driving technology. However, current behavior decision algorithms based on deep reinforcement learning still face several challenges, such as insufficient safety and sparse reward mechanisms. To solve these problems, this paper proposes a dueling double deep Q-network based on dual-priority experience replay—DPD3QN. Initially, the dueling network is integrated with the double deep Q-network, and the original network’s output layer is restructured to enhance the precision of action value estimation. Subsequently, dual-priority experience replay is incorporated to facilitate the model’s ability to swiftly recognize and leverage critical experiences. Ultimately, the training and evaluation are conducted on the OpenAI Gym simulation platform. The test results show that DPD3QN helps to improve the convergence speed of driverless vehicle behavior decision-making. Compared with the currently popular DQN and DDQN algorithms, this algorithm achieves higher success rates in challenging scenarios. Test scenario I increases by 11.8 and 25.8 percentage points, respectively, while the success rates in test scenarios I and II rise by 8.8 and 22.2 percentage points, respectively, indicating a more secure and efficient autonomous driving decision-making capability. Full article

(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

► Show Figures

Figure 1

16 pages, 2932 KiB

Open AccessArticle

Research on Mobile Agent Path Planning Based on Deep Reinforcement Learning

by Shengwei Jin, Xizheng Zhang, Ying Hu, Ruoyuan Liu, Qing Wang, Haihua He, Junyu Liao and Lijing Zeng

Systems 2025, 13(5), 385; https://doi.org/10.3390/systems13050385 - 16 May 2025

Viewed by 389

Abstract

For mobile agent path planning, traditional path planning algorithms frequently induce abrupt variations in path curvature and steering angles, increasing the risk of lateral tire slippage and undermining operational safety. Concurrently, conventional reinforcement learning methods struggle to converge rapidly, leading to an insufficient [...] Read more.

For mobile agent path planning, traditional path planning algorithms frequently induce abrupt variations in path curvature and steering angles, increasing the risk of lateral tire slippage and undermining operational safety. Concurrently, conventional reinforcement learning methods struggle to converge rapidly, leading to an insufficient efficiency in planning to meet the demand for energy economy. This study proposes LSTM Bézier–Double Deep Q-Network (LB-DDQN), an advanced path-planning framework for mobile agents based on deep reinforcement learning. The architecture first enables mapless navigation through a DDQN foundation, subsequently integrates long short-term memory (LSTM) networks for the fusion of environmental features and preservation of training information, and ultimately enhances the path’s quality through redundant node elimination via an obstacle–path relationship analysis, combined with Bézier curve-based trajectory smoothing. A sensor-driven three-dimensional simulation environment featuring static obstacles was constructed using the ROS and Gazebo platforms, where LiDAR-equipped mobile agent models were trained for real-time environmental perception and strategy optimization prior to deployment on experimental vehicles. The simulation and physical implementation results reveal that LB-DDQN achieves effective collision avoidance, while demonstrating marked enhancements in critical metrics: the path’s smoothness, energy efficiency, and motion stability exhibit average improvements exceeding 50%. The framework further maintains superior safety standards and operational efficiency across diverse scenarios. Full article

► Show Figures

Figure 1

35 pages, 8275 KiB

Open AccessArticle

Marine Voyage Optimization and Weather Routing with Deep Reinforcement Learning

by Charilaos Latinopoulos, Efstathios Zavvos, Dimitrios Kaklis, Veerle Leemen and Aristides Halatsis

J. Mar. Sci. Eng. 2025, 13(5), 902; https://doi.org/10.3390/jmse13050902 - 30 Apr 2025

Viewed by 1969

Abstract

Marine voyage optimization determines the optimal route and speed to ensure timely arrival. The problem becomes particularly complex when incorporating a dynamic environment, such as future expected weather conditions along the route and unexpected disruptions. This study explores two model-free Deep Reinforcement Learning [...] Read more.

Marine voyage optimization determines the optimal route and speed to ensure timely arrival. The problem becomes particularly complex when incorporating a dynamic environment, such as future expected weather conditions along the route and unexpected disruptions. This study explores two model-free Deep Reinforcement Learning (DRL) algorithms: (i) a Double Deep Q Network (DDQN) and (ii) a Deep Deterministic Policy Gradient (DDPG). These algorithms are computationally costly, so we split optimization into an offline phase (costly pre-training for a route) and an online phase where the algorithms are fine-tuned as updated weather data become available. Fine tuning is quick enough for en-route adjustments and for updating the offline planning for different dates where the weather might be very different. The models are compared to classical and heuristic methods: the DDPG achieved a 4% lower fuel consumption than the DDQN and was only outperformed by Tabu Search by 1%. Both DRL models demonstrate high adaptability to dynamic weather updates, achieving up to 12% improvement in fuel consumption compared to the distance-based baseline model. Additionally, they are non-graph-based and self-learning, making them more straightforward to extend and integrate into future digital twin-driven autonomous solutions, compared to traditional approaches. Full article

(This article belongs to the Special Issue Autonomous Marine Vehicle Operations—3rd Edition)

► Show Figures

Figure 1

33 pages, 3244 KiB

Open AccessArticle

Long Short-Term Memory–Model Predictive Control Speed Prediction-Based Double Deep Q-Network Energy Management for Hybrid Electric Vehicle to Enhanced Fuel Economy

by Haichao Liu, Hongliang Wang, Miao Yu, Yaolin Wang and Yang Luo

Sensors 2025, 25(9), 2784; https://doi.org/10.3390/s25092784 - 28 Apr 2025

Viewed by 853

Abstract

How to further improve the fuel economy and emission performance of hybrid vehicles through scientific and reasonable energy management strategies has become an urgent issue to be addressed at present. This paper proposes an energy management model based on speed prediction using Long [...] Read more.

How to further improve the fuel economy and emission performance of hybrid vehicles through scientific and reasonable energy management strategies has become an urgent issue to be addressed at present. This paper proposes an energy management model based on speed prediction using Long Short-Term Memory (LSTM) neural networks. The initial learning rate and dropout probability of the LSTM speed prediction model are optimized using a Double Deep Q-Network (DDQN) algorithm. Furthermore, the LSTM speed prediction function is implemented within a Model Predictive Control (MPC) framework. A fuzzy logic-based driving mode recognition system classifies driving cycles and identifies real-time conditions. The fuzzy logic-based driving mode is used to divide the typical driving cycle into different driving modes, and the real-time driving modes are identified. The LSTM-MPC method achieves low RMSE across different prediction horizons. Using predicted power demand, battery SOC, and real-time power demand as inputs, the model implements MPC for real-time control. In our experiments, four prediction horizons (5 s, 10 s, 15 s, and 20 s) were set. The energy management strategy demonstrated optimal performance and the lowest fuel consumption at a 5 s horizon, with fuel usage at only 6.3220 L, saving 2.034 L compared to the rule-based strategy. Validation under the UDDS driving cycle revealed that the LSTM-MPC-DDQN strategy reduced fuel consumption by 0.2729 L compared to the rule-based approach and showed only a 0.0749 L difference from the DP strategy. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

28 pages, 6260 KiB

Open AccessFeature PaperArticle

Development of Chiller Plant Models in OpenAI Gym Environment for Evaluating Reinforcement Learning Algorithms

by Xiangrui Wang, Qilin Zhang, Zhihua Chen, Jingjing Yang and Yixing Chen

Energies 2025, 18(9), 2225; https://doi.org/10.3390/en18092225 - 27 Apr 2025

Cited by 1 | Viewed by 884

Abstract

To face the global energy crisis, the requirement of energy transition and sustainable development has emphasized the importance of controlling building energy management systems. Reinforcement learning (RL) has shown notable energy-saving potential in the optimal control of heating, ventilation, and air-conditioning (HVAC) systems. [...] Read more.

To face the global energy crisis, the requirement of energy transition and sustainable development has emphasized the importance of controlling building energy management systems. Reinforcement learning (RL) has shown notable energy-saving potential in the optimal control of heating, ventilation, and air-conditioning (HVAC) systems. However, the coupling of the algorithms and environments limits the cross-scenario application. This paper develops chiller plant models in OpenAI Gym environments to evaluate different RL algorithms for optimizing condenser water loop control. A shopping mall in Changsha, China, was selected as the case study building. First, an energy simulation model in EnergyPlus was generated using AutoBPS. Then, the OpenAI Gym chiller plant system model was developed and validated by comparing it with the EnergyPlus simulation results. Moreover, two RL algorithms, Deep-Q-Network (DQN) and Double Deep-Q-Network (DDQN), were deployed to control the condenser water flow rate and approach temperature of cooling towers in the RL environment. Finally, the optimization performance of DQN across three climate zones was evaluated using the AutoBPS-Gym toolkit. The findings indicated that during the cooling season in a shopping mall in Changsha, the DQN control method resulted in energy savings of 14.16% for the cooling water system, whereas the DDQN method achieved savings of 14.01%. Using the average control values from DQN, the EnergyPlus simulation recorded an energy-saving rate of 10.42% compared to the baseline. Furthermore, implementing the DQN algorithm across three different climatic zones led to an average energy savings of 4.0%, highlighting the toolkit’s ability to effectively utilize RL for optimal control in various environmental contexts. Full article

(This article belongs to the Special Issue Energy Modeling and Efficiency Optimization for Sustainable Building Systems)

► Show Figures

Figure 1

25 pages, 1392 KiB

Open AccessArticle

Dynamic Scheduling for Multi-Objective Flexible Job Shops with Machine Breakdown by Deep Reinforcement Learning

by Rui Wu, Jianxin Zheng and Xiyan Yin

Processes 2025, 13(4), 1246; https://doi.org/10.3390/pr13041246 - 20 Apr 2025

Viewed by 893

Abstract

Dynamic scheduling for flexible job shops under machine breakdown is a complex and challenging problem due to its valuable application in real-life productions. However, prior studies have struggled to perform well in changeable scenarios. To address this challenge, this paper introduces a dual-objective [...] Read more.

Dynamic scheduling for flexible job shops under machine breakdown is a complex and challenging problem due to its valuable application in real-life productions. However, prior studies have struggled to perform well in changeable scenarios. To address this challenge, this paper introduces a dual-objective deep reinforcement learning (DRL) to solve this problem. This algorithm is based on the Double Deep Q-network (DDQN) and incorporates the attention mechanism. It decouples action relationships in the action space to reduce problem dimensionality and introduces an adaptive weighting method in agent decision-making to obtain high-quality Pareto front solutions. The algorithm is evaluated on a set of benchmark instances and compared with state-of-the-art algorithms. The experimental results show that the proposed algorithm outperforms the state-of-the-art algorithms regarding machine offset and total tardiness, demonstrating more excellent stability and higher-quality solutions. At the same time, the actual use of the algorithm is verified using cases from real enterprises, and the results are still better than those of the multi-objective meta-heuristic algorithm. Full article

(This article belongs to the Special Issue Transfer Learning Methods in Equipment Reliability Management)

► Show Figures

Figure 1

19 pages, 4962 KiB

Open AccessArticle

A Prediction of the Shooting Trajectory for a Tuna Purse Seine Using the Double Deep Q-Network (DDQN) Algorithm

by Daeyeon Cho and Jihoon Lee

J. Mar. Sci. Eng. 2025, 13(3), 530; https://doi.org/10.3390/jmse13030530 - 10 Mar 2025

Viewed by 760

Abstract

The purse seine is a fishing method in which a net is used to encircle a fish school, capturing isolated fish by tightening a purse line at the bottom of the net. Tuna purse seine operations are technically complex, requiring the evaluation of [...] Read more.

The purse seine is a fishing method in which a net is used to encircle a fish school, capturing isolated fish by tightening a purse line at the bottom of the net. Tuna purse seine operations are technically complex, requiring the evaluation of fish movements, vessel dynamics, and their interactions, with success largely dependent on the expertise of the crew. In particular, efficiency in terms of highly complex tasks, such as calculating the shooting trajectory during fishing operations, varies significantly based on the fisher’s skill level. To address this challenge, developing techniques to support less experienced fishers is necessary, particularly for operations targeting free-swimming fish schools, which are more difficult to capture compared to those utilizing Fish Aggregating Devices (FADs). This study proposes a method for predicting shooting trajectories using the Double Deep Q-Network (DDQN) algorithm. Observation states, actions, and reward functions were designed to identify optimal scenarios for shooting, and the catchability of the predicted trajectories was evaluated through gear behavior analysis. The findings of this study are expected to aid in the development of a trajectory prediction system for inexperienced fishers and serve as foundational data for automating purse seine fishing systems. Full article

(This article belongs to the Special Issue The Application of Artificial Intelligence and Machine Learning in a Marine Context - Edition II)

► Show Figures

Figure 1

17 pages, 3949 KiB

Open AccessArticle

A Novel Approach to Autonomous Driving Using Double Deep Q-Network-Bsed Deep Reinforcement Learning

by Ahmed Khlifi, Mohamed Othmani and Monji Kherallah

World Electr. Veh. J. 2025, 16(3), 138; https://doi.org/10.3390/wevj16030138 - 1 Mar 2025

Cited by 1 | Viewed by 2392

Abstract

Deep reinforcement learning (DRL) trains agents to make decisions by learning from rewards and penalties, using trial and error. It combines reinforcement learning (RL) with deep neural networks (DNNs), enabling agents to process large datasets and learn from complex environments. DRL has achieved [...] Read more.

Deep reinforcement learning (DRL) trains agents to make decisions by learning from rewards and penalties, using trial and error. It combines reinforcement learning (RL) with deep neural networks (DNNs), enabling agents to process large datasets and learn from complex environments. DRL has achieved notable success in gaming, robotics, decision-making, etc. However, real-world applications, such as self-driving cars, face challenges due to complex state and action spaces, requiring precise control. Researchers continue to develop new algorithms to improve performance in dynamic settings. A key algorithm, Deep Q-Network (DQN), uses neural networks to approximate the Q-value function but suffers from overestimation bias, leading to suboptimal outcomes. To address this, Double Deep Q-Network (DDQN) was introduced, which decouples action selection from evaluation, thereby reducing bias and promoting more stable learning. This study evaluates the effectiveness of DQN and DDQN in autonomous driving using the CARLA simulator. The key findings emphasize DDQN’s advantages in significantly reducing overestimation bias and enhancing policy performance, making it a more robust and reliable approach for complex real-world applications like self-driving cars. The results underscore DDQN’s potential to improve decision-making accuracy and stability in dynamic environments. Full article

► Show Figures

Figure 1

Search Results (94)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (94)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI