Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (171)

Search Parameters:
Keywords = double-deep Q-learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 1138 KiB  
Article
Intelligent Priority-Aware Spectrum Access in 5G Vehicular IoT: A Reinforcement Learning Approach
by Adeel Iqbal, Tahir Khurshaid and Yazdan Ahmad Qadri
Sensors 2025, 25(15), 4554; https://doi.org/10.3390/s25154554 - 23 Jul 2025
Abstract
Efficient and intelligent spectrum access is crucial for meeting the diverse Quality of Service (QoS) demands of Vehicular Internet of Things (V-IoT) systems in next-generation cellular networks. This work proposes a novel reinforcement learning (RL)-based priority-aware spectrum management (RL-PASM) framework, a centralized self-learning [...] Read more.
Efficient and intelligent spectrum access is crucial for meeting the diverse Quality of Service (QoS) demands of Vehicular Internet of Things (V-IoT) systems in next-generation cellular networks. This work proposes a novel reinforcement learning (RL)-based priority-aware spectrum management (RL-PASM) framework, a centralized self-learning priority-aware spectrum management framework operating through Roadside Units (RSUs). RL-PASM dynamically allocates spectrum resources across three traffic classes: high-priority (HP), low-priority (LP), and best-effort (BE), utilizing reinforcement learning (RL). This work compares four RL algorithms: Q-Learning, Double Q-Learning, Deep Q-Network (DQN), and Actor-Critic (AC) methods. The environment is modeled as a discrete-time Markov Decision Process (MDP), and a context-sensitive reward function guides fairness-preserving decisions for access, preemption, coexistence, and hand-off. Extensive simulations conducted under realistic vehicular load conditions evaluate the performance across key metrics, including throughput, delay, energy efficiency, fairness, blocking, and interruption probability. Unlike prior approaches, RL-PASM introduces a unified multi-objective reward formulation and centralized RSU-based control to support adaptive priority-aware access for dynamic vehicular environments. Simulation results confirm that RL-PASM balances throughput, latency, fairness, and energy efficiency, demonstrating its suitability for scalable and resource-constrained deployments. The results also demonstrate that DQN achieves the highest average throughput, followed by vanilla QL. DQL and AC maintain fairness at high levels and low average interruption probability. QL demonstrates the lowest average delay and the highest energy efficiency, making it a suitable candidate for edge-constrained vehicular deployments. Selecting the appropriate RL method, RL-PASM offers a robust and adaptable solution for scalable, intelligent, and priority-aware spectrum access in vehicular communication infrastructures. Full article
(This article belongs to the Special Issue Emerging Trends in Next-Generation mmWave Cognitive Radio Networks)
Show Figures

Figure 1

40 pages, 4846 KiB  
Article
Comparative Analysis of Some Methods and Algorithms for Traffic Optimization in Urban Environments Based on Maximum Flow and Deep Reinforcement Learning
by Silvia Baeva, Nikolay Hinov and Plamen Nakov
Mathematics 2025, 13(14), 2296; https://doi.org/10.3390/math13142296 - 17 Jul 2025
Viewed by 159
Abstract
This paper presents a comparative analysis between classical maximum flow algorithms and modern deep Reinforcement Learning (RL) algorithms applied to traffic optimization in urban environments. Through SUMO simulations and statistical tests, algorithms such as Ford–Fulkerson, Edmonds–Karp, Dinitz, Preflow–Push, Boykov–Kolmogorov and Double [...] Read more.
This paper presents a comparative analysis between classical maximum flow algorithms and modern deep Reinforcement Learning (RL) algorithms applied to traffic optimization in urban environments. Through SUMO simulations and statistical tests, algorithms such as Ford–Fulkerson, Edmonds–Karp, Dinitz, Preflow–Push, Boykov–Kolmogorov and Double DQN are compared. Their efficiency and stability are evaluated in terms of metrics such as cumulative vehicle dispersion and the ratio of waiting time to vehicle number. The results show that classical algorithms such as Edmonds–Karp and Dinitz perform stably under deterministic conditions, while Double DQN suffers from high variation. Recommendations are made regarding the selection of an appropriate algorithm based on the characteristics of the environment, and opportunities for improvement using DRL techniques such as PPO and A2C are indicated. Full article
Show Figures

Figure 1

22 pages, 3392 KiB  
Article
Research on Wellbore Trajectory Optimization and Drilling Control Based on the TD3 Algorithm
by Haipeng Gu, Yang Wu, Xiaowei Li and Zhaokai Hou
Appl. Sci. 2025, 15(13), 7258; https://doi.org/10.3390/app15137258 - 27 Jun 2025
Viewed by 289
Abstract
In modern oil and gas exploration and development, wellbore trajectory optimization and control is the key technology to improve drilling efficiency, reduce costs, and ensure safety. In the drilling operation of non-vertical wells in complex formations, the traditional static trajectory function, combined with [...] Read more.
In modern oil and gas exploration and development, wellbore trajectory optimization and control is the key technology to improve drilling efficiency, reduce costs, and ensure safety. In the drilling operation of non-vertical wells in complex formations, the traditional static trajectory function, combined with the classical optimization algorithm, has difficulty adapting to the parameter fluctuation caused by formation changes and lacks real-time performance. Therefore, this paper proposes a wellbore trajectory optimization model based on deep reinforcement learning to realize non-vertical well trajectory design and control while drilling. Aiming at the real-time optimization requirements of complex drilling scenarios, the TD3 algorithm is adopted to solve the problem of high-dimensional continuous decision-making through delay strategy update, double Q network, and target strategy smoothing. After reinforcement learning training, the trajectory offset is significantly reduced, and the accuracy is greatly improved. This research shows that the TD3 algorithm is superior to the multi-objective optimization algorithm in optimizing key parameters, such as well deviation, kickoff point (KOP), and trajectory length, especially in well deviation and KOP optimization. This study provides a new idea for wellbore trajectory optimization and design while drilling, promotes the progress and development of intelligent drilling technology, and provides a theoretical basis and technical support for more accurate, efficient, concise, and effective wellbore trajectory optimization and design while drilling in the future. Full article
Show Figures

Figure 1

28 pages, 1293 KiB  
Article
A Lightweight Double-Deep Q-Network for Energy Efficiency Optimization of Industrial IoT Devices in Thermal Power Plants
by Shuang Gao, Yuntao Zou and Li Feng
Electronics 2025, 14(13), 2569; https://doi.org/10.3390/electronics14132569 - 25 Jun 2025
Viewed by 327
Abstract
Industrial Internet of Things (IIoT) deployments in thermal power plants face significant energy efficiency challenges due to harsh operating conditions and device resource constraints. This paper presents gradient memory double-deep Q-network (GM-DDQN), a lightweight reinforcement learning approach for energy optimization on resource-constrained IIoT [...] Read more.
Industrial Internet of Things (IIoT) deployments in thermal power plants face significant energy efficiency challenges due to harsh operating conditions and device resource constraints. This paper presents gradient memory double-deep Q-network (GM-DDQN), a lightweight reinforcement learning approach for energy optimization on resource-constrained IIoT devices. At its core, GM-DDQN introduces the gradient memory mechanism, a novel memory-efficient alternative to experience replay. This core innovation, combined with a simplified neural network architecture and efficient parameter quantization, collectively reduces memory requirements by 99% and computation time by 85–90% compared to standard methods. Experimental evaluations across three realistic simulated thermal power plant scenarios demonstrate that GM-DDQN improves energy efficiency by 42% compared to fixed policies and 27% compared to threshold-based approaches, extending battery lifetime from 8–9 months to 14–15 months while maintaining 96–97% PSR. The method enables sophisticated reinforcement learning directly on IIoT edge devices without requiring cloud connectivity, reducing maintenance costs and improving monitoring reliability in industrial environments. Full article
Show Figures

Figure 1

23 pages, 20322 KiB  
Article
An Intelligent Path Planning System for Urban Airspace Monitoring: From Infrastructure Assessment to Strategic Optimization
by Qianyu Liu, Wei Dai, Zichun Yan and Claudio J. Tessone
Smart Cities 2025, 8(3), 100; https://doi.org/10.3390/smartcities8030100 - 19 Jun 2025
Viewed by 377
Abstract
Urban Air Mobility (UAM) requires reliable communication and surveillance infrastructures to ensure safe Unmanned Aerial Vehicle (UAV) operations in dense metropolitan environments. However, urban infrastructure is inherently heterogeneous, leading to significant spatial variations in monitoring performance. This study proposes a unified framework that [...] Read more.
Urban Air Mobility (UAM) requires reliable communication and surveillance infrastructures to ensure safe Unmanned Aerial Vehicle (UAV) operations in dense metropolitan environments. However, urban infrastructure is inherently heterogeneous, leading to significant spatial variations in monitoring performance. This study proposes a unified framework that integrates infrastructure readiness assessment with Deep Reinforcement Learning (DRL)-based UAV path planning. Using Singapore as a representative case, we employ a data-driven methodology combining clustering analysis and in situ measurements to estimate the citywide distribution of surveillance quality. We then introduce an infrastructure-aware path planning algorithm based on a Double Deep Q-Network (DQN) with a convolutional architecture, which enables UAVs to learn efficient trajectories while avoiding surveillance blind zones. Extensive simulations demonstrate that the proposed approach significantly improves path success rates, reduces traversal through poorly monitored regions, and maintains high navigation efficiency. These results highlight the potential of combining infrastructure modeling with DRL to support performance-aware airspace operations and inform future UAM governance systems. Full article
Show Figures

Figure 1

19 pages, 3650 KiB  
Article
Enhanced-Dueling Deep Q-Network for Trustworthy Physical Security of Electric Power Substations
by Nawaraj Kumar Mahato, Junfeng Yang, Jiaxuan Yang, Gangjun Gong, Jianhong Hao, Jing Sun and Jinlu Liu
Energies 2025, 18(12), 3194; https://doi.org/10.3390/en18123194 - 18 Jun 2025
Viewed by 346
Abstract
This paper introduces an Enhanced-Dueling Deep Q-Network (EDDQN) specifically designed to bolster the physical security of electric power substations. We model the intricate substation security challenge as a Markov Decision Process (MDP), segmenting the facility into three zones, each with potential normal, suspicious, [...] Read more.
This paper introduces an Enhanced-Dueling Deep Q-Network (EDDQN) specifically designed to bolster the physical security of electric power substations. We model the intricate substation security challenge as a Markov Decision Process (MDP), segmenting the facility into three zones, each with potential normal, suspicious, or attacked states. The EDDQN agent learns to strategically select security actions, aiming for optimal threat prevention while minimizing disruptive errors and false alarms. This methodology integrates Double DQN for stable learning, Prioritized Experience Replay (PER) to accelerate the learning process, and a sophisticated neural network architecture tailored to the complexities of multi-zone substation environments. Empirical evaluation using synthetic data derived from historical incident patterns demonstrates the significant advantages of EDDQN over other standard DQN variations, yielding an average reward of 7.5, a threat prevention success rate of 91.1%, and a notably low false alarm rate of 0.5%. The learned action policy exhibits a proactive security posture, establishing EDDQN as a promising and reliable intelligent solution for enhancing the physical resilience of power substations against evolving threats. This research directly addresses the critical need for adaptable and intelligent security mechanisms within the electric power infrastructure. Full article
(This article belongs to the Special Issue Energy, Electrical and Power Engineering: 3rd Edition)
Show Figures

Graphical abstract

27 pages, 3479 KiB  
Article
A Hybrid IVFF-AHP and Deep Reinforcement Learning Framework for an ATM Location and Routing Problem
by Bahar Yalcin Kavus, Kübra Yazici Sahin, Alev Taskin and Tolga Kudret Karaca
Appl. Sci. 2025, 15(12), 6747; https://doi.org/10.3390/app15126747 - 16 Jun 2025
Viewed by 574
Abstract
The impact of alternative distribution channels, such as bank Automated Teller Machines (ATMs), on the financial industry is growing due to technological advancements. Investing in ideal locations is critical for new ATM companies. Due to the many factors to be evaluated, this study [...] Read more.
The impact of alternative distribution channels, such as bank Automated Teller Machines (ATMs), on the financial industry is growing due to technological advancements. Investing in ideal locations is critical for new ATM companies. Due to the many factors to be evaluated, this study addresses the problem of determining the best location for ATMs to be deployed in Istanbul districts by utilizing the multi-criteria decision-making framework. Furthermore, the advantages of fuzzy logic are used to convert expert opinions into mathematical expressions and incorporate them into decision-making processes. For the first time in the literature, a model has been proposed for ATM location selection, integrating clustering and the interval-valued Fermatean fuzzy analytic hierarchy process (IVFF-AHP). With the proposed methodology, the districts of Istanbul are first clustered to find the risky ones. Then, the most suitable alternative location in this district is determined using IVFF-AHP. After deciding the ATM locations with IVFF-AHP, in the last step, a Double Deep Q-Network Reinforcement Learning model is used to optimize the Cash in Transit (CIT) vehicle route. The study results reveal that the proposed approach provides stable, efficient, and adaptive routing for real-world CIT operations. Full article
Show Figures

Figure 1

25 pages, 7158 KiB  
Article
Anti-Jamming Decision-Making for Phased-Array Radar Based on Improved Deep Reinforcement Learning
by Hang Zhao, Hu Song, Rong Liu, Jiao Hou and Xianxiang Yu
Electronics 2025, 14(11), 2305; https://doi.org/10.3390/electronics14112305 - 5 Jun 2025
Viewed by 541
Abstract
In existing phased-array radar systems, anti-jamming strategies are mainly generated through manual judgment. However, manually designing or selecting anti-jamming decisions is often difficult and unreliable in complex jamming environments. Therefore, reinforcement learning is applied to anti-jamming decision-making to solve the above problems. However, [...] Read more.
In existing phased-array radar systems, anti-jamming strategies are mainly generated through manual judgment. However, manually designing or selecting anti-jamming decisions is often difficult and unreliable in complex jamming environments. Therefore, reinforcement learning is applied to anti-jamming decision-making to solve the above problems. However, the existing anti-jamming decision-making models based on reinforcement learning often suffer from problems such as low convergence speeds and low decision-making accuracy. In this paper, a multi-aspect improved deep Q-network (MAI-DQN) is proposed to improve the exploration policy, the network structure, and the training methods of the deep Q-network. In order to solve the problem of the ϵ-greedy strategy being highly dependent on hyperparameter settings, and the Q-value being overly influenced by the action in other deep Q-networks, this paper proposes a structure that combines a noisy network, a dueling network, and a double deep Q-network, which incorporates an adaptive exploration policy into the neural network and increases the influence of the state itself on the Q-value. These enhancements enable a highly adaptive exploration strategy and a high-performance network architecture, thereby improving the decision-making accuracy of the model. In order to calculate the target value more accurately during the training process and improve the stability of the parameter update, this paper proposes a training method that combines n-step learning, target soft update, variable learning rate, and gradient clipping. Moreover, a novel variable double-depth priority experience replay (VDDPER) method that more accurately simulates the storage and update mechanism of human memory is used in the MAI-DQN. The VDDPER improves the decision-making accuracy by dynamically adjusting the sample size based on different values of experience during training, enhancing exploration during the early stages of training, and placing greater emphasis on high-value experiences in the later stages. Enhancements to the training method improve the model’s convergence speed. Moreover, a reward function combining signal-level and data-level benefits is proposed to adapt to complex jamming environments, which ensures a high reward convergence speed with fewer computational resources. The findings of a simulation experiment show that the proposed phased-array radar anti-jamming decision-making method based on MAI-DQN can achieve a high convergence speed and high decision-making accuracy in environments where deceptive jamming and suppressive jamming coexist. Full article
Show Figures

Figure 1

20 pages, 772 KiB  
Article
A DDQN-Guided Dual-Population Evolutionary Multitasking Framework for Constrained Multi-Objective Ship Berthing
by Jinyou Mou and Qidan Zhu
J. Mar. Sci. Eng. 2025, 13(6), 1068; https://doi.org/10.3390/jmse13061068 - 28 May 2025
Viewed by 337
Abstract
Autonomous ship berthing requires advanced path planning to balance multiple objectives, such as minimizing berthing time, reducing energy consumption, and ensuring safety under dynamic environmental constraints. However, traditional planning and learning methods often suffer from inefficient search or sparse rewards in such constrained [...] Read more.
Autonomous ship berthing requires advanced path planning to balance multiple objectives, such as minimizing berthing time, reducing energy consumption, and ensuring safety under dynamic environmental constraints. However, traditional planning and learning methods often suffer from inefficient search or sparse rewards in such constrained and high-dimensional settings. This study introduces a double deep Q-network (DDQN)-guided dual-population constrained multi-objective evolutionary algorithm (CMOEA) framework for autonomous ship berthing. By integrating deep reinforcement learning (DRL) with CMOEA, the framework employs DDQN to dynamically guide operator selection, enhancing search efficiency and solution diversity. The designed reward function optimizes thrust, time, and heading accuracy while accounting for vessel kinematics, water currents, and obstacles. Simulations on the CSAD vessel model demonstrate that this framework outperforms baseline algorithms such as evolutionary multitasking constrained multi-objective optimization (EMCMO), DQN, Q-learning, and non-dominated sorting genetic algorithm II (NSGA-II), achieving superior efficiency and stability while maintaining the required berthing angle. The framework also exhibits strong adaptability across varying environmental conditions, making it a promising solution for autonomous ship berthing in port environments. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

26 pages, 2589 KiB  
Article
Sensor-Generated In Situ Data Management for Smart Grids: Dynamic Optimization Driven by Double Deep Q-Network with Prioritized Experience Replay
by Peiying Zhang, Siyi Li, Dandan Li, Qingyang Ding and Lei Shi
Appl. Sci. 2025, 15(11), 5980; https://doi.org/10.3390/app15115980 - 26 May 2025
Viewed by 382
Abstract
According to forecast data from the State Grid Corporation of China, the number of terminal devices connected to the power grid is expected to reach the scale of 2 billion within the next five years. With the continuous growth in the number and [...] Read more.
According to forecast data from the State Grid Corporation of China, the number of terminal devices connected to the power grid is expected to reach the scale of 2 billion within the next five years. With the continuous growth in the number and variety of terminal devices in the smart grid, traditional cloud-edge-end architecture will face the increasing issue of response latency. In this context, in situ computing, as a paradigm for local or near-source data processing within cloud-edge-end architecture, is gradually becoming a key technological pathway in industrial systems. The in situ server system, by deploying servers near terminals, enables the near-data-source processing of terminal-generated in situ data, representing an important implementation of in situ computing. To enhance the processing efficiency and response capability of in situ data in smart grid scenarios, this study designs an in situ data processing mechanism and an access demand management framework. Due to the heterogeneity of in situ server performance, there are variations in response capabilities for access demands across rounds. This study introduces a double deep Q-network with prioritized experience replay to assist in response decision-making. Simulation experiments show that the proposed method reduces waiting latency and response latency by an average of 67.69% and 68.77%, respectively, compared to traditional algorithms and other reinforcement learning algorithms, verifying its effectiveness in in situ data management. This scheme can also be widely applied to in situ computing scenarios with low-latency data management, such as smart cities and industrial IoT. Full article
Show Figures

Figure 1

23 pages, 4463 KiB  
Article
Dual-Priority Delayed Deep Double Q-Network (DPD3QN): A Dueling Double Deep Q-Network with Dual-Priority Experience Replay for Autonomous Driving Behavior Decision-Making
by Shuai Li, Peicheng Shi, Aixi Yang, Heng Qi and Xinlong Dong
Algorithms 2025, 18(5), 291; https://doi.org/10.3390/a18050291 - 19 May 2025
Viewed by 412
Abstract
The behavior decision control of autonomous vehicles is a critical aspect of advancing autonomous driving technology. However, current behavior decision algorithms based on deep reinforcement learning still face several challenges, such as insufficient safety and sparse reward mechanisms. To solve these problems, this [...] Read more.
The behavior decision control of autonomous vehicles is a critical aspect of advancing autonomous driving technology. However, current behavior decision algorithms based on deep reinforcement learning still face several challenges, such as insufficient safety and sparse reward mechanisms. To solve these problems, this paper proposes a dueling double deep Q-network based on dual-priority experience replay—DPD3QN. Initially, the dueling network is integrated with the double deep Q-network, and the original network’s output layer is restructured to enhance the precision of action value estimation. Subsequently, dual-priority experience replay is incorporated to facilitate the model’s ability to swiftly recognize and leverage critical experiences. Ultimately, the training and evaluation are conducted on the OpenAI Gym simulation platform. The test results show that DPD3QN helps to improve the convergence speed of driverless vehicle behavior decision-making. Compared with the currently popular DQN and DDQN algorithms, this algorithm achieves higher success rates in challenging scenarios. Test scenario I increases by 11.8 and 25.8 percentage points, respectively, while the success rates in test scenarios I and II rise by 8.8 and 22.2 percentage points, respectively, indicating a more secure and efficient autonomous driving decision-making capability. Full article
(This article belongs to the Section Evolutionary Algorithms and Machine Learning)
Show Figures

Figure 1

16 pages, 2932 KiB  
Article
Research on Mobile Agent Path Planning Based on Deep Reinforcement Learning
by Shengwei Jin, Xizheng Zhang, Ying Hu, Ruoyuan Liu, Qing Wang, Haihua He, Junyu Liao and Lijing Zeng
Systems 2025, 13(5), 385; https://doi.org/10.3390/systems13050385 - 16 May 2025
Viewed by 364
Abstract
For mobile agent path planning, traditional path planning algorithms frequently induce abrupt variations in path curvature and steering angles, increasing the risk of lateral tire slippage and undermining operational safety. Concurrently, conventional reinforcement learning methods struggle to converge rapidly, leading to an insufficient [...] Read more.
For mobile agent path planning, traditional path planning algorithms frequently induce abrupt variations in path curvature and steering angles, increasing the risk of lateral tire slippage and undermining operational safety. Concurrently, conventional reinforcement learning methods struggle to converge rapidly, leading to an insufficient efficiency in planning to meet the demand for energy economy. This study proposes LSTM Bézier–Double Deep Q-Network (LB-DDQN), an advanced path-planning framework for mobile agents based on deep reinforcement learning. The architecture first enables mapless navigation through a DDQN foundation, subsequently integrates long short-term memory (LSTM) networks for the fusion of environmental features and preservation of training information, and ultimately enhances the path’s quality through redundant node elimination via an obstacle–path relationship analysis, combined with Bézier curve-based trajectory smoothing. A sensor-driven three-dimensional simulation environment featuring static obstacles was constructed using the ROS and Gazebo platforms, where LiDAR-equipped mobile agent models were trained for real-time environmental perception and strategy optimization prior to deployment on experimental vehicles. The simulation and physical implementation results reveal that LB-DDQN achieves effective collision avoidance, while demonstrating marked enhancements in critical metrics: the path’s smoothness, energy efficiency, and motion stability exhibit average improvements exceeding 50%. The framework further maintains superior safety standards and operational efficiency across diverse scenarios. Full article
Show Figures

Figure 1

30 pages, 8160 KiB  
Article
Developing a Novel Adaptive Double Deep Q-Learning-Based Routing Strategy for IoT-Based Wireless Sensor Network with Federated Learning
by Nalini Manogaran, Mercy Theresa Michael Raphael, Rajalakshmi Raja, Aarav Kannan Jayakumar, Malarvizhi Nandagopal, Balamurugan Balusamy and George Ghinea
Sensors 2025, 25(10), 3084; https://doi.org/10.3390/s25103084 - 13 May 2025
Viewed by 786
Abstract
The working of the Internet of Things (IoT) ecosystem indeed depends extensively on the mechanisms of real-time data collection, sharing, and automatic operation. Among these fundamentals, wireless sensor networks (WSNs) are important for maintaining a countenance with their many distributed Sensor Nodes (SNs), [...] Read more.
The working of the Internet of Things (IoT) ecosystem indeed depends extensively on the mechanisms of real-time data collection, sharing, and automatic operation. Among these fundamentals, wireless sensor networks (WSNs) are important for maintaining a countenance with their many distributed Sensor Nodes (SNs), which can sense and transmit environmental data wirelessly. Because WSNs possess advantages for remote data collection, they are severely hampered by constraints imposed by the limited energy capacity of SNs; hence, energy-efficient routing is a pertinent challenge. Therefore, in the case of clustering and routing mechanisms, these two play important roles where clustering is performed to reduce energy consumption and prolong the lifetime of the network, while routing refers to the actual paths for transmission of data. Addressing the limitations witnessed in the conventional IoT-based routing of data, this proposal presents an FL-oriented framework that presents a new energy-efficient routing scheme. Such routing is facilitated by the ADDQL model, which creates smart high-speed routing across changing scenarios in WSNs. The proposed ADDQL-IRHO model has been compared to other existing state-of-the-art algorithms according to multiple performance metrics such as energy consumption, communication delay, temporal complexity, data sum rate, message overhead, and scalability, with extensive experimental evaluation reporting superior performance. This also substantiates the applicability and competitiveness of the framework in variable-serviced IoT-oriented WSNs for next-gen intelligent routing solutions. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

28 pages, 4738 KiB  
Article
AEM-D3QN: A Graph-Based Deep Reinforcement Learning Framework for Dynamic Earth Observation Satellite Mission Planning
by Shuo Li, Gang Wang and Jinyong Chen
Aerospace 2025, 12(5), 420; https://doi.org/10.3390/aerospace12050420 - 9 May 2025
Viewed by 548
Abstract
Efficient and adaptive mission planning for Earth Observation Satellites (EOSs) remains a challenging task due to the growing complexity of user demands, task constraints, and limited satellite resources. Traditional heuristic and metaheuristic approaches often struggle with scalability and adaptability in dynamic environments. To [...] Read more.
Efficient and adaptive mission planning for Earth Observation Satellites (EOSs) remains a challenging task due to the growing complexity of user demands, task constraints, and limited satellite resources. Traditional heuristic and metaheuristic approaches often struggle with scalability and adaptability in dynamic environments. To overcome these limitations, we introduce AEM-D3QN, a novel intelligent task scheduling framework that integrates Graph Neural Networks (GNNs) with an Adaptive Exploration Mechanism-enabled Double Dueling Deep Q-Network (D3QN). This framework constructs a Directed Acyclic Graph (DAG) atlas to represent task dependencies and constraints, leveraging GNNs to extract spatial–temporal task features. These features are then encoded into a reinforcement learning model that dynamically optimizes scheduling policies under multiple resource constraints. The adaptive exploration mechanism improves learning efficiency by balancing exploration and exploitation based on task urgency and satellite status. Extensive experiments conducted under both periodic and emergency planning scenarios demonstrate that AEM-D3QN outperforms state-of-the-art algorithms in scheduling efficiency, response time, and task completion rate. The proposed framework offers a scalable and robust solution for real-time satellite mission planning in complex and dynamic operational environments. Full article
(This article belongs to the Section Astronautics & Space Science)
Show Figures

Figure 1

35 pages, 8275 KiB  
Article
Marine Voyage Optimization and Weather Routing with Deep Reinforcement Learning
by Charilaos Latinopoulos, Efstathios Zavvos, Dimitrios Kaklis, Veerle Leemen and Aristides Halatsis
J. Mar. Sci. Eng. 2025, 13(5), 902; https://doi.org/10.3390/jmse13050902 - 30 Apr 2025
Viewed by 1701
Abstract
Marine voyage optimization determines the optimal route and speed to ensure timely arrival. The problem becomes particularly complex when incorporating a dynamic environment, such as future expected weather conditions along the route and unexpected disruptions. This study explores two model-free Deep Reinforcement Learning [...] Read more.
Marine voyage optimization determines the optimal route and speed to ensure timely arrival. The problem becomes particularly complex when incorporating a dynamic environment, such as future expected weather conditions along the route and unexpected disruptions. This study explores two model-free Deep Reinforcement Learning (DRL) algorithms: (i) a Double Deep Q Network (DDQN) and (ii) a Deep Deterministic Policy Gradient (DDPG). These algorithms are computationally costly, so we split optimization into an offline phase (costly pre-training for a route) and an online phase where the algorithms are fine-tuned as updated weather data become available. Fine tuning is quick enough for en-route adjustments and for updating the offline planning for different dates where the weather might be very different. The models are compared to classical and heuristic methods: the DDPG achieved a 4% lower fuel consumption than the DDQN and was only outperformed by Tabu Search by 1%. Both DRL models demonstrate high adaptability to dynamic weather updates, achieving up to 12% improvement in fuel consumption compared to the distance-based baseline model. Additionally, they are non-graph-based and self-learning, making them more straightforward to extend and integrate into future digital twin-driven autonomous solutions, compared to traditional approaches. Full article
(This article belongs to the Special Issue Autonomous Marine Vehicle Operations—3rd Edition)
Show Figures

Figure 1

Back to TopTop