Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (425)

Search Parameters:
Keywords = deep Q-networks (DQN)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
33 pages, 3714 KB  
Article
SADQN-Based Residual Energy-Aware Beamforming for LoRa-Enabled RF Energy Harvesting for Disaster-Tolerant Underground Mining Networks
by Hilary Kelechi Anabi, Samuel Frimpong and Sanjay Madria
Sensors 2026, 26(2), 730; https://doi.org/10.3390/s26020730 - 21 Jan 2026
Viewed by 92
Abstract
The end-to-end efficiency of radio-frequency (RF)-powered wireless communication networks (WPCNs) in post-disaster underground mine environments can be enhanced through adaptive beamforming. The primary challenges in such scenarios include (i) identifying the most energy-constrained nodes, i.e., nodes with the lowest residual energy to prevent [...] Read more.
The end-to-end efficiency of radio-frequency (RF)-powered wireless communication networks (WPCNs) in post-disaster underground mine environments can be enhanced through adaptive beamforming. The primary challenges in such scenarios include (i) identifying the most energy-constrained nodes, i.e., nodes with the lowest residual energy to prevent the loss of tracking and localization functionality; (ii) avoiding reliance on the computationally intensive channel state information (CSI) acquisition process; and (iii) ensuring long-range RF wireless power transfer (LoRa-RFWPT). To address these issues, this paper introduces an adaptive and safety-aware deep reinforcement learning (DRL) framework for energy beamforming in LoRa-enabled underground disaster networks. Specifically, we develop a Safe Adaptive Deep Q-Network (SADQN) that incorporates residual energy awareness to enhance energy harvesting under mobility, while also formulating a SADQN approach with dual-variable updates to mitigate constraint violations associated with fairness, minimum energy thresholds, duty cycle, and uplink utilization. A mathematical model is proposed to capture the dynamics of post-disaster underground mine environments, and the problem is formulated as a constrained Markov decision process (CMDP). To address the inherent NP hardness of this constrained reinforcement learning (CRL) formulation, we employ a Lagrangian relaxation technique to reduce complexity and derive near-optimal solutions. Comprehensive simulation results demonstrate that SADQN significantly outperforms all baseline algorithms: increasing cumulative harvested energy by approximately 11% versus DQN, 15% versus Safe-DQN, and 40% versus PSO, and achieving substantial gains over random beamforming and non-beamforming approaches. The proposed SADQN framework maintains fairness indices above 0.90, converges 27% faster than Safe-DQN and 43% faster than standard DQN in terms of episodes, and demonstrates superior stability, with 33% lower performance variance than Safe-DQN and 66% lower than DQN after convergence, making it particularly suitable for safety-critical underground mining disaster scenarios where reliable energy delivery and operational stability are paramount. Full article
Show Figures

Figure 1

17 pages, 1555 KB  
Article
Path Planning in Sparse Reward Environments: A DQN Approach with Adaptive Reward Shaping and Curriculum Learning
by Hongyi Yang, Bo Cai and Yunlong Li
Algorithms 2026, 19(1), 89; https://doi.org/10.3390/a19010089 - 21 Jan 2026
Viewed by 202
Abstract
Deep reinforcement learning (DRL) has shown great potential in path planning tasks. However, in sparse reward environments, DRL still faces significant challenges such as low training efficiency and a tendency to converge to suboptimal policies. Traditional reward shaping methods can partially alleviate these [...] Read more.
Deep reinforcement learning (DRL) has shown great potential in path planning tasks. However, in sparse reward environments, DRL still faces significant challenges such as low training efficiency and a tendency to converge to suboptimal policies. Traditional reward shaping methods can partially alleviate these issues, but they typically rely on hand-crafted designs, which often introduce complex reward coupling, make hyperparameter tuning difficult, and limit generalization capability. To address these challenges, this paper proposes Curriculum-guided Learning with Adaptive Reward Shaping for Deep Q-Network (CLARS-DQN), a path planning algorithm that integrates Adaptive Reward Shaping (ARS) and Curriculum Learning (CL). The algorithm consists of two key components: (1) ARS-DQN, which augments the DQN framework with a learnable intrinsic reward function to reduce reward sparsity and dependence on expert knowledge; and (2) a curriculum strategy that guides policy optimization through a staged training process, progressing from simple to complex tasks to enhance generalization. Training also incorporates Prioritized Experience Replay (PER) to improve sample efficiency and training stability. CLARS-DQN outperforms baseline methods in task success rate, path quality, training efficiency, and hyperparameter robustness. In unseen environments, the method improves task success rate and average path length by 12% and 26%, respectively, demonstrating strong generalization. Ablation studies confirm the critical contribution of each module. Full article
Show Figures

Figure 1

25 pages, 4648 KB  
Systematic Review
Deep Reinforcement Learning Algorithms for Intrusion Detection: A Bibliometric Analysis and Systematic Review
by Lekhetho Joseph Mpoporo, Pius Adewale Owolawi and Chunling Tu
Appl. Sci. 2026, 16(2), 1048; https://doi.org/10.3390/app16021048 - 20 Jan 2026
Viewed by 139
Abstract
Intrusion detection systems (IDSs) are crucial for safeguarding modern digital infrastructure against the ever-evolving cyber threats. As cyberattacks become increasingly complex, traditional machine learning (ML) algorithms, while remaining effective in classifying known threats, face limitations such as static learning, dependency on labeled data, [...] Read more.
Intrusion detection systems (IDSs) are crucial for safeguarding modern digital infrastructure against the ever-evolving cyber threats. As cyberattacks become increasingly complex, traditional machine learning (ML) algorithms, while remaining effective in classifying known threats, face limitations such as static learning, dependency on labeled data, and susceptibility to adversarial exploits. Deep reinforcement learning (DRL) has recently surfaced as a viable substitute, providing resilience in unanticipated circumstances, dynamic adaptation, and continuous learning. This study conducts a thorough bibliometric analysis and systematic literature review (SLR) of DRL-based intrusion detection systems (DRL-based IDS). The relevant literature from 2020 to 2024 was identified and investigated using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework. Emerging research themes, influential works, and structural relationships in the research fields were identified using a bibliometric analysis. SLR was used to synthesize methodological techniques, datasets, and performance analysis. The results indicate that DRL algorithms such as deep Q-network (DQN), double DQNs (DDQN), dueling DQN (D3QN), policy gradient methods, and actor–critic models have been actively utilized for enhancing IDS performance in various applications and datasets. The results highlight the increasing significance of DRL-based solutions for developing intelligent and robust intrusion detection systems and advancing cybersecurity. Full article
(This article belongs to the Special Issue Advances in Cyber Security)
Show Figures

Figure 1

24 pages, 3185 KB  
Article
A Hybrid Optimization Approach for Multi-Generation Intelligent Breeding Decisions
by Mingxiang Yang, Ziyu Li, Jiahao Li, Bingling Huang, Xiaohui Niu, Xin Lu and Xiaoxia Li
Information 2026, 17(1), 106; https://doi.org/10.3390/info17010106 - 20 Jan 2026
Viewed by 154
Abstract
Multi-generation intelligent breeding (MGIB) decision-making is a technique used by plant breeders to select mating individuals to produce new generations and allocate resources for each generation. However, existing research remains scarce on dynamic optimization of resources under limited budget and time constraints. Inspired [...] Read more.
Multi-generation intelligent breeding (MGIB) decision-making is a technique used by plant breeders to select mating individuals to produce new generations and allocate resources for each generation. However, existing research remains scarce on dynamic optimization of resources under limited budget and time constraints. Inspired by advances in reinforcement learning (RL), a framework that integrates evolutionary algorithms with deep RL was proposed to fill this gap. The framework combines two modules: the Improved Look-Ahead Selection (ILAS) module and Deep Q-Networks (DQNs) module. The former employs a simulated annealing-enhanced estimation of the distribution algorithm to make mating decisions. Based on the selected mating individual, the latter module learns multi-generation resource allocation policies using DQN. To evaluate our framework, numerical experiments were conducted on two realistic breeding datasets, i.e., Corn2019 and CUBIC. The ILAS outperformed LAS on corn2019, increasing the maximum and mean population Genomic Estimated Breeding Value (GEBV) by 9.1% and 7.7%. ILAS-DQN consistently outperformed the baseline methods, achieving significant and practical improvements in both top-performing and elite-average GEBVs across two independent datasets. The results demonstrated that our method outperforms traditional baselines, in both generalization and effectiveness for complex agricultural problems with delayed rewards. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Graphical abstract

17 pages, 1621 KB  
Article
Reinforcement Learning-Based Optimization of Environmental Control Systems in Battery Energy Storage Rooms
by So-Yeon Park, Deun-Chan Kim and Jun-Ho Bang
Energies 2026, 19(2), 516; https://doi.org/10.3390/en19020516 - 20 Jan 2026
Viewed by 134
Abstract
This study proposes a reinforcement learning (RL)-based optimization framework for the environmental control system of battery rooms in Energy Storage Systems (ESS). Conventional rule-based air-conditioning strategies are unable to adapt to real-time temperature and humidity fluctuations, often leading to excessive energy consumption or [...] Read more.
This study proposes a reinforcement learning (RL)-based optimization framework for the environmental control system of battery rooms in Energy Storage Systems (ESS). Conventional rule-based air-conditioning strategies are unable to adapt to real-time temperature and humidity fluctuations, often leading to excessive energy consumption or insufficient thermal protection. To overcome these limitations, both value-based (DQN, Double DQN, Dueling DQN) and policy-based (Policy Gradient, PPO, TRPO) RL algorithms are implemented and systematically compared. The algorithms are trained and evaluated using one year of real ESS operational data and corresponding meteorological data sampled at 15-min intervals. Performance is assessed in terms of convergence speed, learning stability, and cooling-energy consumption. The experimental results show that the DQN algorithm reduces time-averaged cooling power consumption by 46.5% compared to conventional rule-based control, while maintaining temperature, humidity, and dew-point constraint violation rates below 1% throughout the testing period. Among the policy-based methods, the Policy Gradient algorithm demonstrates competitive energy-saving performance but requires longer training time and exhibits higher reward variance. These findings confirm that RL-based control can effectively adapt to dynamic environmental conditions, thereby improving both energy efficiency and operational safety in ESS battery rooms. The proposed framework offers a practical and scalable solution for intelligent thermal management in ESS facilities. Full article
Show Figures

Figure 1

22 pages, 3437 KB  
Article
A Soft Actor-Critic-Based Energy Management Strategy for Fuel Cell Vehicles Considering Fuel Cell Degradation
by Handong Zeng, Changqing Du and Yifeng Hu
Energies 2026, 19(2), 430; https://doi.org/10.3390/en19020430 - 15 Jan 2026
Viewed by 129
Abstract
Energy management strategies (EMSs) play a critical role in improving both the efficiency and durability of fuel cell electric vehicles (FCEVs). To overcome the limited adaptability and insufficient durability consideration of existing deep reinforcement learning-based EMSs, this study develops a degradation-aware energy management [...] Read more.
Energy management strategies (EMSs) play a critical role in improving both the efficiency and durability of fuel cell electric vehicles (FCEVs). To overcome the limited adaptability and insufficient durability consideration of existing deep reinforcement learning-based EMSs, this study develops a degradation-aware energy management strategy based on the Soft Actor–Critic (SAC) algorithm. By leveraging SAC’s maximum-entropy framework, the proposed method enhances exploration efficiency and avoids premature convergence to operating patterns that are unfavorable to fuel cell durability. A reward function explicitly penalizing hydrogen consumption, power fluctuation, and degradation-related operating behaviors is designed, and the influences of reward weighting and key hyperparameters on learning stability and performance are systematically analyzed. The proposed SAC-based EMS is evaluated against Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) strategies under both training and unseen driving cycles. Simulation results demonstrate that SAC achieves a superior and robust trade-off between hydrogen economy and degradation mitigation, maintaining improved adaptability and durability under varying operating conditions. These findings indicate that integrating degradation awareness with entropy-regularized reinforcement learning provides an effective framework for practical EMS design in FCEVs. Full article
(This article belongs to the Section E: Electric Vehicles)
Show Figures

Figure 1

19 pages, 2822 KB  
Article
A New Framework for Job Shop Integrated Scheduling and Vehicle Path Planning Problem
by Ruiqi Li, Jianlin Mao, Xing Wu, Wenna Zhou, Chengze Qian and Haoshuang Du
Sensors 2026, 26(2), 543; https://doi.org/10.3390/s26020543 - 13 Jan 2026
Viewed by 160
Abstract
With the development of manufacturing industry, traditional fixed process processing methods cannot adapt to the changes in workshop operations and the demand for small batches and multiple orders. Therefore, it is necessary to introduce multiple robots to provide a more flexible production mode. [...] Read more.
With the development of manufacturing industry, traditional fixed process processing methods cannot adapt to the changes in workshop operations and the demand for small batches and multiple orders. Therefore, it is necessary to introduce multiple robots to provide a more flexible production mode. Currently, some Job Shop Scheduling Problems with Transportation (JSP-T) only consider job scheduling and vehicle task allocation, and does not focus on the problem of collision free paths between vehicles. This article proposes a novel solution framework that integrates workshop scheduling, material handling robot task allocation, and conflict free path planning between robots. With the goal of minimizing the maximum completion time (Makespan) that includes handling, this paper first establishes an extended JSP-T problem model that integrates handling time and robot paths, and provides the corresponding workshop layout map. Secondly, in the scheduling layer, an improved Deep Q-Network (DQN) method is used for dynamic scheduling to generate a feasible and optimal machining scheduling scheme. Subsequently, considering the robot’s position information, the task sequence is assigned to the robot path execution layer. Finally, at the path execution layer, the Priority Based Search (PBS) algorithm is applied to solve conflict free paths for the handling robot. The optimized solution for obtaining the maximum completion time of all jobs under the condition of conflict free path handling. The experimental results show that compared with algorithms such as PPO, the scheduling algorithm proposed in this paper has improved performance by 9.7% in Makespan, and the PBS algorithm can obtain optimized paths for multiple handling robots under conflict free conditions. The framework can handle scheduling, task allocation, and conflict-free path planning in a unified optimization process, which can adapt well to job changes and then flexible manufacturing. Full article
Show Figures

Figure 1

19 pages, 6478 KB  
Article
An Intelligent Dynamic Cluster Partitioning and Regulation Strategy for Distribution Networks
by Keyan Liu, Kaiyuan He, Dongli Jia, Huiyu Zhan, Wanxing Sheng, Zukun Li, Yuxuan Huang, Sijia Hu and Yong Li
Energies 2026, 19(2), 384; https://doi.org/10.3390/en19020384 - 13 Jan 2026
Viewed by 181
Abstract
As distributed generators (DGs) and flexible adjustable loads (FALs) further penetrate distribution networks (DNs), to reduce regulation complexity compared with traditional centralized control frameworks, DGs and FALs in DNs should be packed in several clusters to enable their dispatch to become standard in [...] Read more.
As distributed generators (DGs) and flexible adjustable loads (FALs) further penetrate distribution networks (DNs), to reduce regulation complexity compared with traditional centralized control frameworks, DGs and FALs in DNs should be packed in several clusters to enable their dispatch to become standard in the industry. To mitigate the negative influence of DGs’ and FALs’ spatiotemporal distribution and uncertain output characteristics on dispatch, this paper proposes an intelligent dynamic cluster partitioning strategy for DNs, from which the DN’s resources and loads can be intelligently aggregated, organized, and regulated in a dynamic and optimal way with relatively high implementation efficiency. An environmental model based on the Markov decision process (MDP) technique is first developed for DN cluster partitioning, in which a continuous state space, a discrete action space, and a dispatching performance-oriented reward are designed. Then, a novel random forest Q-learning network (RF-QN) is developed to implement dynamic cluster partitioning by interacting with the proposed environmental model, from which the generalization and robust capability to estimate the Q-function can be improved by taking advantage of combining deep learning and decision trees. Finally, a modified IEEE-33-node system is adopted to verify the effectiveness of the proposed intelligent dynamic cluster partitioning and regulation strategy; the results also indicate that the proposed RF-QN is superior to the traditional deep Q-learning (DQN) model in terms of renewable energy accommodation rate, training efficiency, and portioning and regulation performance. Full article
(This article belongs to the Special Issue Advanced in Modeling, Analysis and Control of Microgrids)
Show Figures

Figure 1

19 pages, 14874 KB  
Article
Deep Q-Network for Maneuver Planning in Beyond-Visual-Range Aerial Pursuit–Evasion with Target Re-Engagement
by Long-Jun Zhu, Kevin W. Tong and Edmond Q. Wu
Aerospace 2026, 13(1), 77; https://doi.org/10.3390/aerospace13010077 - 11 Jan 2026
Viewed by 191
Abstract
Decision-making for maneuvering in the presence of long-range threats is crucial for enhancing the safety and reliability of autonomous aerial platforms operating in beyond-line-of-sight environments. This study employs the Deep Q-Network (DQN) method to investigate maneuvering strategies for simultaneously avoiding incoming high-speed threats [...] Read more.
Decision-making for maneuvering in the presence of long-range threats is crucial for enhancing the safety and reliability of autonomous aerial platforms operating in beyond-line-of-sight environments. This study employs the Deep Q-Network (DQN) method to investigate maneuvering strategies for simultaneously avoiding incoming high-speed threats and re-establishing tracking of a maneuvering target platform. First, kinematic models for the aerial platforms and the approaching interceptor are developed, and a DQN training environment is constructed based on these models. A DQN framework is then designed, integrating scenario-specific state representation, action space, and a hybrid reward structure to enable autonomous strategy learning without prior expert knowledge. The agent is trained within this environment to achieve near-optimal maneuvering decisions, with comparative evaluations against Q-learning and deep deterministic policy gradient (DDPG) baselines. Simulation results demonstrate that the trained model outperforms the baselines on key metrics by effectively avoiding approaching threats, re-establishing robust target tracking, reducing maneuver time, and exhibiting strong generalization across challenging scenarios. This work advances Beyond-Visual-Range (BVR) maneuver planning and provides a foundational methodological framework for future research on complex multi-stage aerial pursuit–evasion problems. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

17 pages, 459 KB  
Article
Adaptive Credit Card Fraud Detection: Reinforcement Learning Agents vs. Anomaly Detection Techniques
by Houda Ben Mekhlouf, Abdellatif Moussaid and Fadoua Ghanimi
FinTech 2026, 5(1), 9; https://doi.org/10.3390/fintech5010009 - 9 Jan 2026
Viewed by 302
Abstract
Credit card fraud detection remains a critical challenge for financial institutions, particularly due to extreme class imbalance and the continuously evolving nature of fraudulent behavior. This study investigates two complementary approaches: anomaly detection based on multivariate normal distribution and deep reinforcement learning using [...] Read more.
Credit card fraud detection remains a critical challenge for financial institutions, particularly due to extreme class imbalance and the continuously evolving nature of fraudulent behavior. This study investigates two complementary approaches: anomaly detection based on multivariate normal distribution and deep reinforcement learning using a Deep Q-Network. While anomaly detection effectively identifies deviations from normal transaction patterns, its static nature limits adaptability in real-time systems. In contrast, the DQN reinforcement learning model continuously learns from every transaction, autonomously adapting to emerging fraud strategies. Experimental results demonstrate that, although initial performance metrics of the DQN are modest compared to anomaly detection, its capacity for online learning and policy refinement enables long-term improvement and operational scalability. This work highlights reinforcement learning as a highly promising paradigm for dynamic, high-volume fraud detection, capable of evolving with the environment and achieving near-optimal detection rates over time. Full article
Show Figures

Figure 1

21 pages, 988 KB  
Article
Study of Performance from Hierarchical Decision Modeling in IVAs Within a Greedy Context
by Francisco Federico Meza-Barrón, Nelson Rangel-Valdez, María Lucila Morales-Rodríguez, Claudia Guadalupe Gómez-Santillán, Juan Javier González-Barbosa, Guadalupe Castilla-Valdez, Nohra Violeta Gallardo-Rivas and Ana Guadalupe Vélez-Chong
Math. Comput. Appl. 2026, 31(1), 8; https://doi.org/10.3390/mca31010008 - 7 Jan 2026
Viewed by 336
Abstract
This study examines decision-making in intelligent virtual agents (IVAs) and formalizes the distinction between tactical decisions (individual actions) and strategic decisions (composed of sequences of tactical actions) using a mathematical model based on set theory and the Bellman equation. Although the equation itself [...] Read more.
This study examines decision-making in intelligent virtual agents (IVAs) and formalizes the distinction between tactical decisions (individual actions) and strategic decisions (composed of sequences of tactical actions) using a mathematical model based on set theory and the Bellman equation. Although the equation itself is not modified, the analysis reveals that the discount factor (γ) influences the type of decision: low values favor tactical decisions, while high values favor strategic ones. The model was implemented and validated in a proof-of-concept simulated environment, namely the Snake Coin Change Problem (SCCP), using a Deep Q-Network (DQN) architecture, showing significant differences between agents with different decision profiles. These findings suggest that adjusting γ can serve as a useful mechanism to regulate both tactical and strategic decision-making processes in IVAs, thus offering a conceptual basis that could facilitate the design of more intelligent and adaptive agents in domains such as video games, and potentially in robotics and artificial intelligence as future research directions. Full article
(This article belongs to the Special Issue Numerical and Evolutionary Optimization 2025)
Show Figures

Figure 1

22 pages, 3874 KB  
Article
Cloud-Edge Collaboration-Based Data Processing Method for Distribution Terminal Unit Edge Clusters
by Ruijiang Zeng, Zhiyong Li, Sifeng Li, Jiahao Zhang and Xiaomei Chen
Energies 2026, 19(1), 269; https://doi.org/10.3390/en19010269 - 4 Jan 2026
Viewed by 211
Abstract
Distribution terminal units (DTUs) play critical roles in smart grid for supporting data acquisition, remote monitoring, and fault management. A single DTU generates continuous data streams, imposing new challenges on data processing. To tackle these issues, a cloud-edge collaboration-based data processing method is [...] Read more.
Distribution terminal units (DTUs) play critical roles in smart grid for supporting data acquisition, remote monitoring, and fault management. A single DTU generates continuous data streams, imposing new challenges on data processing. To tackle these issues, a cloud-edge collaboration-based data processing method is introduced for DTU edge clusters. First, considering the load imbalance degree of DTU data queues, a cloud-edge integrated data processing architecture is designed. It optimizes edge server selection, the offloading splitting ratio, and edge-cloud computing resource allocation in a collaboration mechanism. Second, an optimization problem is formulated to maximize the weighted difference between the total data processing volume and the load imbalance degree. Next, a cloud-edge collaboration-based data processing method is proposed. In the first stage, cloud-edge collaborative data offloading based on the load imbalance degree, and a data volume-aware deep Q-network (DQN) is developed. A penalty function based on load fluctuations and the data volume deficit is incorporated. It drives the DQN to evolve toward suppressing the fluctuation of load imbalance degree while ensuring differentiated long-term data volume constraints. In the second stage, cloud-edge computing resource allocation based on adaptive differential evolution is designed. An adaptive mutation scaling factor is introduced to overcome the gene overlapping issues of traditional heuristic approaches, enabling deeper exploration of the solution space and accelerating global optimum identification. Finally, the simulation results demonstrate that the proposed method effectively improves the data processing efficiency of DTUs while reducing the load imbalance degree. Full article
Show Figures

Figure 1

21 pages, 1330 KB  
Article
A Clustering and Reinforcement Learning-Based Handover Strategy for LEO Satellite Networks in Power IoT Scenarios
by Jin Shao, Weidong Gao, Kuixing Liu, Rantong Qiao, Haizhi Yu, Kaisa Zhang, Xu Zhao and Junbao Duan
Electronics 2026, 15(1), 174; https://doi.org/10.3390/electronics15010174 - 30 Dec 2025
Viewed by 263
Abstract
Communication infrastructure in remote areas struggles to deliver stable, high-quality services for power systems. Low Earth Orbit (LEO) satellite networks offer an effective solution through their low latency and extensive coverage. Nevertheless, the high orbital velocity of LEO satellites combined with massive user [...] Read more.
Communication infrastructure in remote areas struggles to deliver stable, high-quality services for power systems. Low Earth Orbit (LEO) satellite networks offer an effective solution through their low latency and extensive coverage. Nevertheless, the high orbital velocity of LEO satellites combined with massive user access frequently leads to signaling congestion and degradation of service quality. To address these challenges, this paper proposes a LEO satellite handover strategy based on Quality of Service (QoS)-constrained K-Means clustering and Deep Q-Network (DQN) learning. The proposed framework first partitions users into groups via the K-Means algorithm and then imposes an intra-group QoS fairness constraint to refine clustering and designate a cluster head for each group. These cluster heads act as proxies that execute unified DQN-driven handover decisions on behalf of all group members, thereby enabling coordinated multi-user handover. Simulation results demonstrate that, compared with conventional handover schemes, the proposed strategy achieves an optimal balance between performance and signaling overhead, significantly enhances system scalability while ensuring long-term QoS gains, and provides an efficient solution for mobility management in future large-scale LEO satellite networks. Full article
Show Figures

Figure 1

35 pages, 3269 KB  
Article
Multi-Head Attention DQN and Dynamic Priority for Path Planning of Unmanned Aerial Vehicles Oriented to Penetration
by Liuyu Cheng and Wei Shang
Electronics 2026, 15(1), 167; https://doi.org/10.3390/electronics15010167 - 29 Dec 2025
Viewed by 278
Abstract
Unmanned aerial vehicle (UAV) penetration missions in hostile environments face significant challenges due to dense threat coverage, dynamic defense systems, and the need for real-time decision-making under uncertainty. Traditional path planning methods suffer from computational intractability in high-dimensional spaces, while existing deep reinforcement [...] Read more.
Unmanned aerial vehicle (UAV) penetration missions in hostile environments face significant challenges due to dense threat coverage, dynamic defense systems, and the need for real-time decision-making under uncertainty. Traditional path planning methods suffer from computational intractability in high-dimensional spaces, while existing deep reinforcement learning approaches lack efficient feature extraction and sample utilization mechanisms for threat-dense scenarios. To address these limitations, this paper presents an enhanced Deep Q-Network (DQN) framework integrating multi-head attention mechanisms with dynamic priority experience replay for autonomous UAV path planning. The proposed architecture employs four specialized attention heads operating in parallel to extract proximity, danger, alignment, and threat density features, enabling selective focus on critical environmental aspects. A dynamic priority mechanism adaptively adjusts sampling strategies during training, prioritizing informative experiences in early exploration while maintaining balanced learning in later stages. Experimental results demonstrate that the proposed method achieves 94.3% mission success rate in complex penetration scenarios, representing 7.1–17.5% improvement over state-of-the-art baselines with 2.2× faster convergence. The approach shows superior robustness in high-threat environments and meets real-time operational requirements with 18.3 ms inference latency, demonstrating its practical viability for autonomous UAV penetration missions. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

27 pages, 4812 KB  
Article
Development of an Initial Burial Rate Estimation Simulator for Bottom-Contact Mines and a Reinforcement Learning-Based Mine-Laying Route Optimization Method
by Su Hwan Kim, Young Seo Park and Se Won Kim
J. Mar. Sci. Eng. 2026, 14(1), 51; https://doi.org/10.3390/jmse14010051 - 26 Dec 2025
Viewed by 228
Abstract
In modern naval operations, the strategic value of naval mines has been increasingly emphasized, highlighting the need for intelligent and efficient deployment strategies. This study proposes integrated framework that combines mine burial rate estimation with reinforcement learning-based optimization to generate mine-laying routes that [...] Read more.
In modern naval operations, the strategic value of naval mines has been increasingly emphasized, highlighting the need for intelligent and efficient deployment strategies. This study proposes integrated framework that combines mine burial rate estimation with reinforcement learning-based optimization to generate mine-laying routes that maximize burial effectiveness. An initial burial rate estimation simulator was developed using environmental factors such as sediment bulk density and shear strength estimated from sediment type and mean grain size to predict the burial rates of bottom-contact mines. The simulator was integrated into reinforcement learning frameworks—Deep Q-Network (DQN), and proximal policy optimization (PPO). The reinforcement learning methods were trained to autonomously explore the environment and generate routes that strategically utilize high burial regions while satisfying navigational constraints. Experimental results demonstrate that the reinforcement learning methods consistently generated routes with higher average burial rates while requiring significantly shorter computation time compared with the A* algorithm. These findings suggest that reinforcement learning, when coupled with environmental modeling, provides a practical and scalable strategy for improving the effectiveness, concealment, and autonomy of naval mine-laying operations. Full article
(This article belongs to the Special Issue Advanced Research on Path Planning for Intelligent Ships)
Show Figures

Figure 1

Back to TopTop