Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (965)

Search Parameters:
Keywords = Deep Q-Network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 1369 KB  
Article
Symmetry-Aware Interpretable Anomaly Alarm Optimization Method for Power Monitoring Systems Based on Hierarchical Attention Deep Reinforcement Learning
by Zepeng Hou, Qiang Fu, Weixun Li, Yao Wang, Zhengkun Dong, Xianlin Ye, Xiaoyu Chen and Fangyu Zhang
Symmetry 2026, 18(2), 216; https://doi.org/10.3390/sym18020216 - 23 Jan 2026
Viewed by 223
Abstract
With the rapid advancement of smart grids driven by renewable energy integration and the extensive deployment of supervisory control and data acquisition (SCADA) and phasor measurement units (PMUs), addressing the escalating alarm flooding via intelligent analysis of large-scale alarm data is pivotal to [...] Read more.
With the rapid advancement of smart grids driven by renewable energy integration and the extensive deployment of supervisory control and data acquisition (SCADA) and phasor measurement units (PMUs), addressing the escalating alarm flooding via intelligent analysis of large-scale alarm data is pivotal to safeguarding the safe and stable operation of power grids. To tackle these challenges, this study introduces a pioneering alarm optimization framework based on symmetry-driven crowdsourced active learning and interpretable deep reinforcement learning (DRL). Firstly, an anomaly alarm annotation method integrating differentiated crowdsourcing and active learning is proposed to mitigate the inherent asymmetry in data distribution. Secondly, a symmetrically structured DRL-based hierarchical attention deep Q-network is designed with a dual-path encoder to balance the processing of multi-scale alarm features. Finally, a SHAP-driven interpretability framework is established, providing global and local attribution to enhance decision transparency. Experimental results on a real-world power alarm dataset demonstrate that the proposed method achieves a Fleiss’ Kappa of 0.82 in annotation consistency and an F1-Score of 0.95 in detection performance, significantly outperforming state-of-the-art baselines. Additionally, the false positive rate is reduced to 0.04, verifying the framework’s effectiveness in suppressing alarm flooding while maintaining high recall. Full article
(This article belongs to the Special Issue Symmetry and Asymmetry in Data Analysis)
Show Figures

Figure 1

27 pages, 5704 KB  
Article
Intent-Aware Collision Avoidance for UAVs in High-Density Non-Cooperative Environments Using Deep Reinforcement Learning
by Xuchuan Liu, Yuan Zheng, Chenglong Li, Bo Jiang and Wenyong Gu
Aerospace 2026, 13(2), 111; https://doi.org/10.3390/aerospace13020111 - 23 Jan 2026
Viewed by 110
Abstract
Collision avoidance between unmanned aerial vehicles (UAVs) and non-cooperative targets (e.g., off-nominal operations or birds) presents significant challenges in urban air mobility (UAM). This difficulty arises due to the highly dynamic and unpredictable flight intentions of these targets. Traditional collision-avoidance methods primarily focus [...] Read more.
Collision avoidance between unmanned aerial vehicles (UAVs) and non-cooperative targets (e.g., off-nominal operations or birds) presents significant challenges in urban air mobility (UAM). This difficulty arises due to the highly dynamic and unpredictable flight intentions of these targets. Traditional collision-avoidance methods primarily focus on cooperative targets or non-cooperative ones with fixed behavior, rendering them ineffective when dealing with highly unpredictable flight patterns. To address this, we introduce a deep reinforcement learning-based collision-avoidance approach leveraging global and local intent prediction. Specifically, we propose a Global and Local Perception Prediction Module (GLPPM) that combines a state-space-based global intent association mechanism with a local feature extraction module, enabling accurate prediction of short- and long-term flight intents. Additionally, we propose a Fusion Sector Flight Control Module (FSFCM) that is trained with a Dueling Double Deep Q-Network (D3QN). The module integrates both predicted future and current intents into the state space and employs a specifically designed reward function, thereby ensuring safe UAV operations. Experimental results demonstrate that the proposed method significantly improves mission success rates in high-density environments, with up to 80 non-cooperative targets per square kilometer. In 1000 flight tests, the mission success rate is 15.2 percentage points higher than that of the baseline D3QN. Furthermore, the approach retains an 88.1% success rate even under extreme target densities of 120 targets per square kilometer. Finally, interpretability analysis via Deep SHAP further verifies the decision-making rationality of the algorithm. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

33 pages, 3714 KB  
Article
SADQN-Based Residual Energy-Aware Beamforming for LoRa-Enabled RF Energy Harvesting for Disaster-Tolerant Underground Mining Networks
by Hilary Kelechi Anabi, Samuel Frimpong and Sanjay Madria
Sensors 2026, 26(2), 730; https://doi.org/10.3390/s26020730 - 21 Jan 2026
Viewed by 86
Abstract
The end-to-end efficiency of radio-frequency (RF)-powered wireless communication networks (WPCNs) in post-disaster underground mine environments can be enhanced through adaptive beamforming. The primary challenges in such scenarios include (i) identifying the most energy-constrained nodes, i.e., nodes with the lowest residual energy to prevent [...] Read more.
The end-to-end efficiency of radio-frequency (RF)-powered wireless communication networks (WPCNs) in post-disaster underground mine environments can be enhanced through adaptive beamforming. The primary challenges in such scenarios include (i) identifying the most energy-constrained nodes, i.e., nodes with the lowest residual energy to prevent the loss of tracking and localization functionality; (ii) avoiding reliance on the computationally intensive channel state information (CSI) acquisition process; and (iii) ensuring long-range RF wireless power transfer (LoRa-RFWPT). To address these issues, this paper introduces an adaptive and safety-aware deep reinforcement learning (DRL) framework for energy beamforming in LoRa-enabled underground disaster networks. Specifically, we develop a Safe Adaptive Deep Q-Network (SADQN) that incorporates residual energy awareness to enhance energy harvesting under mobility, while also formulating a SADQN approach with dual-variable updates to mitigate constraint violations associated with fairness, minimum energy thresholds, duty cycle, and uplink utilization. A mathematical model is proposed to capture the dynamics of post-disaster underground mine environments, and the problem is formulated as a constrained Markov decision process (CMDP). To address the inherent NP hardness of this constrained reinforcement learning (CRL) formulation, we employ a Lagrangian relaxation technique to reduce complexity and derive near-optimal solutions. Comprehensive simulation results demonstrate that SADQN significantly outperforms all baseline algorithms: increasing cumulative harvested energy by approximately 11% versus DQN, 15% versus Safe-DQN, and 40% versus PSO, and achieving substantial gains over random beamforming and non-beamforming approaches. The proposed SADQN framework maintains fairness indices above 0.90, converges 27% faster than Safe-DQN and 43% faster than standard DQN in terms of episodes, and demonstrates superior stability, with 33% lower performance variance than Safe-DQN and 66% lower than DQN after convergence, making it particularly suitable for safety-critical underground mining disaster scenarios where reliable energy delivery and operational stability are paramount. Full article
Show Figures

Figure 1

17 pages, 1555 KB  
Article
Path Planning in Sparse Reward Environments: A DQN Approach with Adaptive Reward Shaping and Curriculum Learning
by Hongyi Yang, Bo Cai and Yunlong Li
Algorithms 2026, 19(1), 89; https://doi.org/10.3390/a19010089 - 21 Jan 2026
Viewed by 189
Abstract
Deep reinforcement learning (DRL) has shown great potential in path planning tasks. However, in sparse reward environments, DRL still faces significant challenges such as low training efficiency and a tendency to converge to suboptimal policies. Traditional reward shaping methods can partially alleviate these [...] Read more.
Deep reinforcement learning (DRL) has shown great potential in path planning tasks. However, in sparse reward environments, DRL still faces significant challenges such as low training efficiency and a tendency to converge to suboptimal policies. Traditional reward shaping methods can partially alleviate these issues, but they typically rely on hand-crafted designs, which often introduce complex reward coupling, make hyperparameter tuning difficult, and limit generalization capability. To address these challenges, this paper proposes Curriculum-guided Learning with Adaptive Reward Shaping for Deep Q-Network (CLARS-DQN), a path planning algorithm that integrates Adaptive Reward Shaping (ARS) and Curriculum Learning (CL). The algorithm consists of two key components: (1) ARS-DQN, which augments the DQN framework with a learnable intrinsic reward function to reduce reward sparsity and dependence on expert knowledge; and (2) a curriculum strategy that guides policy optimization through a staged training process, progressing from simple to complex tasks to enhance generalization. Training also incorporates Prioritized Experience Replay (PER) to improve sample efficiency and training stability. CLARS-DQN outperforms baseline methods in task success rate, path quality, training efficiency, and hyperparameter robustness. In unseen environments, the method improves task success rate and average path length by 12% and 26%, respectively, demonstrating strong generalization. Ablation studies confirm the critical contribution of each module. Full article
Show Figures

Figure 1

25 pages, 4648 KB  
Systematic Review
Deep Reinforcement Learning Algorithms for Intrusion Detection: A Bibliometric Analysis and Systematic Review
by Lekhetho Joseph Mpoporo, Pius Adewale Owolawi and Chunling Tu
Appl. Sci. 2026, 16(2), 1048; https://doi.org/10.3390/app16021048 - 20 Jan 2026
Viewed by 122
Abstract
Intrusion detection systems (IDSs) are crucial for safeguarding modern digital infrastructure against the ever-evolving cyber threats. As cyberattacks become increasingly complex, traditional machine learning (ML) algorithms, while remaining effective in classifying known threats, face limitations such as static learning, dependency on labeled data, [...] Read more.
Intrusion detection systems (IDSs) are crucial for safeguarding modern digital infrastructure against the ever-evolving cyber threats. As cyberattacks become increasingly complex, traditional machine learning (ML) algorithms, while remaining effective in classifying known threats, face limitations such as static learning, dependency on labeled data, and susceptibility to adversarial exploits. Deep reinforcement learning (DRL) has recently surfaced as a viable substitute, providing resilience in unanticipated circumstances, dynamic adaptation, and continuous learning. This study conducts a thorough bibliometric analysis and systematic literature review (SLR) of DRL-based intrusion detection systems (DRL-based IDS). The relevant literature from 2020 to 2024 was identified and investigated using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework. Emerging research themes, influential works, and structural relationships in the research fields were identified using a bibliometric analysis. SLR was used to synthesize methodological techniques, datasets, and performance analysis. The results indicate that DRL algorithms such as deep Q-network (DQN), double DQNs (DDQN), dueling DQN (D3QN), policy gradient methods, and actor–critic models have been actively utilized for enhancing IDS performance in various applications and datasets. The results highlight the increasing significance of DRL-based solutions for developing intelligent and robust intrusion detection systems and advancing cybersecurity. Full article
(This article belongs to the Special Issue Advances in Cyber Security)
Show Figures

Figure 1

24 pages, 3185 KB  
Article
A Hybrid Optimization Approach for Multi-Generation Intelligent Breeding Decisions
by Mingxiang Yang, Ziyu Li, Jiahao Li, Bingling Huang, Xiaohui Niu, Xin Lu and Xiaoxia Li
Information 2026, 17(1), 106; https://doi.org/10.3390/info17010106 - 20 Jan 2026
Viewed by 147
Abstract
Multi-generation intelligent breeding (MGIB) decision-making is a technique used by plant breeders to select mating individuals to produce new generations and allocate resources for each generation. However, existing research remains scarce on dynamic optimization of resources under limited budget and time constraints. Inspired [...] Read more.
Multi-generation intelligent breeding (MGIB) decision-making is a technique used by plant breeders to select mating individuals to produce new generations and allocate resources for each generation. However, existing research remains scarce on dynamic optimization of resources under limited budget and time constraints. Inspired by advances in reinforcement learning (RL), a framework that integrates evolutionary algorithms with deep RL was proposed to fill this gap. The framework combines two modules: the Improved Look-Ahead Selection (ILAS) module and Deep Q-Networks (DQNs) module. The former employs a simulated annealing-enhanced estimation of the distribution algorithm to make mating decisions. Based on the selected mating individual, the latter module learns multi-generation resource allocation policies using DQN. To evaluate our framework, numerical experiments were conducted on two realistic breeding datasets, i.e., Corn2019 and CUBIC. The ILAS outperformed LAS on corn2019, increasing the maximum and mean population Genomic Estimated Breeding Value (GEBV) by 9.1% and 7.7%. ILAS-DQN consistently outperformed the baseline methods, achieving significant and practical improvements in both top-performing and elite-average GEBVs across two independent datasets. The results demonstrated that our method outperforms traditional baselines, in both generalization and effectiveness for complex agricultural problems with delayed rewards. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Graphical abstract

17 pages, 1621 KB  
Article
Reinforcement Learning-Based Optimization of Environmental Control Systems in Battery Energy Storage Rooms
by So-Yeon Park, Deun-Chan Kim and Jun-Ho Bang
Energies 2026, 19(2), 516; https://doi.org/10.3390/en19020516 - 20 Jan 2026
Viewed by 132
Abstract
This study proposes a reinforcement learning (RL)-based optimization framework for the environmental control system of battery rooms in Energy Storage Systems (ESS). Conventional rule-based air-conditioning strategies are unable to adapt to real-time temperature and humidity fluctuations, often leading to excessive energy consumption or [...] Read more.
This study proposes a reinforcement learning (RL)-based optimization framework for the environmental control system of battery rooms in Energy Storage Systems (ESS). Conventional rule-based air-conditioning strategies are unable to adapt to real-time temperature and humidity fluctuations, often leading to excessive energy consumption or insufficient thermal protection. To overcome these limitations, both value-based (DQN, Double DQN, Dueling DQN) and policy-based (Policy Gradient, PPO, TRPO) RL algorithms are implemented and systematically compared. The algorithms are trained and evaluated using one year of real ESS operational data and corresponding meteorological data sampled at 15-min intervals. Performance is assessed in terms of convergence speed, learning stability, and cooling-energy consumption. The experimental results show that the DQN algorithm reduces time-averaged cooling power consumption by 46.5% compared to conventional rule-based control, while maintaining temperature, humidity, and dew-point constraint violation rates below 1% throughout the testing period. Among the policy-based methods, the Policy Gradient algorithm demonstrates competitive energy-saving performance but requires longer training time and exhibits higher reward variance. These findings confirm that RL-based control can effectively adapt to dynamic environmental conditions, thereby improving both energy efficiency and operational safety in ESS battery rooms. The proposed framework offers a practical and scalable solution for intelligent thermal management in ESS facilities. Full article
Show Figures

Figure 1

14 pages, 2906 KB  
Proceeding Paper
Onboard Deep Reinforcement Learning: Deployment and Testing for CubeSat Attitude Control
by Sajjad Zahedi, Jafar Roshanian, Mehran Mirshams and Krasin Georgiev
Eng. Proc. 2026, 121(1), 26; https://doi.org/10.3390/engproc2025121026 - 20 Jan 2026
Viewed by 90
Abstract
Recent progress in Reinforcement Learning (RL), especially deep RL, has created new possibilities for autonomous control in complex and uncertain environments. This study explores these possibilities through a practical approach, implementing an RL agent on a custom-built CubeSat. The CubeSat, equipped with a [...] Read more.
Recent progress in Reinforcement Learning (RL), especially deep RL, has created new possibilities for autonomous control in complex and uncertain environments. This study explores these possibilities through a practical approach, implementing an RL agent on a custom-built CubeSat. The CubeSat, equipped with a reaction wheel for active attitude control, serves as a physical testbed for validating RL-based strategies. To mimic space-like conditions, the CubeSat was placed on a custom air-bearing platform that allows near-frictionless rotation along a single axis, simulating microgravity. Unlike simulation-only research, this work showcases real-time hardware-level implementation of a Double Deep Q-Network (DDQN) controller. The DDQN agent receives real system state data and outputs control commands to orient the CubeSat via its reaction wheel. For comparison, a traditional PID controller was also tested under identical conditions. Both controllers were evaluated based on response time, accuracy, and resilience to disturbances. The DDQN outperformed the PID, showing better adaptability and control. This research demonstrates the successful integration of RL into real aerospace hardware, bridging the gap between theoretical algorithms and practical space applications through a hands-on CubeSat platform. Full article
Show Figures

Figure 1

20 pages, 390 KB  
Systematic Review
Systematic Review of Quantization-Optimized Lightweight Transformer Architectures for Real-Time Fruit Ripeness Detection on Edge Devices
by Donny Maulana and R Kanesaraj Ramasamy
Computers 2026, 15(1), 69; https://doi.org/10.3390/computers15010069 - 19 Jan 2026
Viewed by 353
Abstract
Real-time visual inference on resource-constrained hardware remains a core challenge for edge computing and embedded artificial intelligence systems. Recent deep learning architectures, particularly Vision Transformers (ViTs) and Detection Transformers (DETRs), achieve high detection accuracy but impose substantial computational and memory demands that limit [...] Read more.
Real-time visual inference on resource-constrained hardware remains a core challenge for edge computing and embedded artificial intelligence systems. Recent deep learning architectures, particularly Vision Transformers (ViTs) and Detection Transformers (DETRs), achieve high detection accuracy but impose substantial computational and memory demands that limit their deployment on low-power edge platforms such as NVIDIA Jetson and Raspberry Pi devices. This paper presents a systematic review of model compression and optimization strategies—specifically quantization, pruning, and knowledge distillation—applied to lightweight object detection architectures for edge deployment. Following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, peer-reviewed studies were analyzed from Scopus, IEEE Xplore, and ScienceDirect to examine the evolution of efficient detectors from convolutional neural networks to transformer-based models. The synthesis highlights a growing focus on real-time transformer variants, including Real-Time DETR (RT-DETR) and low-bit quantized approaches such as Q-DETR, alongside optimized YOLO-based architectures. While quantization enables substantial theoretical acceleration (e.g., up to 16× operation reduction), aggressive low-bit precision introduces accuracy degradation, particularly in transformer attention mechanisms, highlighting a critical efficiency-accuracy tradeoff. The review further shows that Quantization-Aware Training (QAT) consistently outperforms Post-Training Quantization (PTQ) in preserving performance under low-precision constraints. Finally, this review identifies critical open research challenges, emphasizing the efficiency–accuracy tradeoff and the high computational demands imposed by Transformer architectures. Future directions are proposed, including hardware-aware optimization, robustness to imbalanced datasets, and multimodal sensing integration, to ensure reliable real-time inference in practical agricultural edge computing environments. Full article
Show Figures

Figure 1

28 pages, 11626 KB  
Article
A Dynamic Illumination-Constrained Spatio-Temporal A* Algorithm for Path Planning in Lunar South Pole Exploration
by Qingliang Miao and Guangfei Wei
Remote Sens. 2026, 18(2), 310; https://doi.org/10.3390/rs18020310 - 16 Jan 2026
Viewed by 152
Abstract
Future lunar south pole missions face dual challenges of highly variable illumination and rugged terrain that directly constrain rover mobility and energy sustainability. To address these issues, this study proposes a dynamic illumination-constrained spatio-temporal A* (DIC3D-A*) path-planning algorithm that jointly optimizes terrain safety [...] Read more.
Future lunar south pole missions face dual challenges of highly variable illumination and rugged terrain that directly constrain rover mobility and energy sustainability. To address these issues, this study proposes a dynamic illumination-constrained spatio-temporal A* (DIC3D-A*) path-planning algorithm that jointly optimizes terrain safety and illumination continuity in polar environments. Using high-resolution digital elevation model data from the Lunar Reconnaissance Orbiter Laser Altimeter, a 1300 m × 1300 m terrain model with 5 m/pixel spatial resolution was constructed. Hourly solar visibility for November–December 2026 was computed based on planetary ephemerides to generate a dynamic illumination dataset. The algorithm integrates slope, distance, and illumination into a unified heuristic cost function, performing a time-dependent search in a 3D spatiotemporal state space. Simulation results show that, compared with conventional A* algorithms considering only terrain or distance, the DIC3D-A* algorithm improves CSDV by 106.1% and 115.1%, respectively. Moreover, relative to illumination-based A* algorithms, it reduces the average terrain roughness index by 17.2%, while achieving shorter path length and faster computation than both the Rapidly-exploring Random Tree Star and Deep Q-Network baselines. These results demonstrate that dynamic illumination is the dominant environmental factor affecting lunar polar rover traversal and that DIC3D-A* provides an efficient, energy-aware framework for illumination-adaptive navigation in upcoming missions such as Chang’E-7. Full article
(This article belongs to the Special Issue Remote Sensing and Photogrammetry Applied to Deep Space Exploration)
Show Figures

Graphical abstract

16 pages, 1725 KB  
Article
A Reinforcement Learning-Based Link State Optimization for Handover and Link Duration Performance Enhancement in Low Earth Orbit Satellite Networks
by Sihwa Jin, Doyeon Park, Sieun Kim, Jinho Lee and Inwhee Joe
Electronics 2026, 15(2), 398; https://doi.org/10.3390/electronics15020398 - 16 Jan 2026
Viewed by 214
Abstract
This study proposes a reinforcement learning-based link selection method for Low Earth Orbit satellite networks, aiming to reduce handover frequency while extending link duration under highly dynamic orbital environments. The proposed approach relies solely on basic satellite positional information, namely latitude, longitude, and [...] Read more.
This study proposes a reinforcement learning-based link selection method for Low Earth Orbit satellite networks, aiming to reduce handover frequency while extending link duration under highly dynamic orbital environments. The proposed approach relies solely on basic satellite positional information, namely latitude, longitude, and altitude, to construct compact state representations without requiring complex sensing or prediction mechanisms. Using relative satellite and terminal geometry, each state is represented as a vector consisting of azimuth, elevation, range, and direction difference. To validate the feasibility of policy learning under realistic conditions, a total of 871,105 orbit based data samples were generated through simulations of 300 LEO satellite orbits. The reinforcement learning environment was implemented using the OpenAI Gym framework, in which an agent selects an optimal communication target from a prefiltered set of candidate satellites at each time step. Three reinforcement learning algorithms, namely SARSA, Q-Learning, and Deep Q-Network, were evaluated under identical experimental conditions. Performance was assessed in terms of smoothed total reward per episode, average handover count, and average link duration. The results show that the Deep Q-Network-based approach achieves approximately 77.4% fewer handovers than SARSA and 49.9% fewer than Q-Learning, while providing the longest average link duration. These findings demonstrate that effective handover control can be achieved using lightweight state information and indicate the potential of deep reinforcement learning for future LEO satellite communication systems. Full article
Show Figures

Figure 1

22 pages, 3437 KB  
Article
A Soft Actor-Critic-Based Energy Management Strategy for Fuel Cell Vehicles Considering Fuel Cell Degradation
by Handong Zeng, Changqing Du and Yifeng Hu
Energies 2026, 19(2), 430; https://doi.org/10.3390/en19020430 - 15 Jan 2026
Viewed by 128
Abstract
Energy management strategies (EMSs) play a critical role in improving both the efficiency and durability of fuel cell electric vehicles (FCEVs). To overcome the limited adaptability and insufficient durability consideration of existing deep reinforcement learning-based EMSs, this study develops a degradation-aware energy management [...] Read more.
Energy management strategies (EMSs) play a critical role in improving both the efficiency and durability of fuel cell electric vehicles (FCEVs). To overcome the limited adaptability and insufficient durability consideration of existing deep reinforcement learning-based EMSs, this study develops a degradation-aware energy management strategy based on the Soft Actor–Critic (SAC) algorithm. By leveraging SAC’s maximum-entropy framework, the proposed method enhances exploration efficiency and avoids premature convergence to operating patterns that are unfavorable to fuel cell durability. A reward function explicitly penalizing hydrogen consumption, power fluctuation, and degradation-related operating behaviors is designed, and the influences of reward weighting and key hyperparameters on learning stability and performance are systematically analyzed. The proposed SAC-based EMS is evaluated against Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) strategies under both training and unseen driving cycles. Simulation results demonstrate that SAC achieves a superior and robust trade-off between hydrogen economy and degradation mitigation, maintaining improved adaptability and durability under varying operating conditions. These findings indicate that integrating degradation awareness with entropy-regularized reinforcement learning provides an effective framework for practical EMS design in FCEVs. Full article
(This article belongs to the Section E: Electric Vehicles)
Show Figures

Figure 1

19 pages, 2822 KB  
Article
A New Framework for Job Shop Integrated Scheduling and Vehicle Path Planning Problem
by Ruiqi Li, Jianlin Mao, Xing Wu, Wenna Zhou, Chengze Qian and Haoshuang Du
Sensors 2026, 26(2), 543; https://doi.org/10.3390/s26020543 - 13 Jan 2026
Viewed by 154
Abstract
With the development of manufacturing industry, traditional fixed process processing methods cannot adapt to the changes in workshop operations and the demand for small batches and multiple orders. Therefore, it is necessary to introduce multiple robots to provide a more flexible production mode. [...] Read more.
With the development of manufacturing industry, traditional fixed process processing methods cannot adapt to the changes in workshop operations and the demand for small batches and multiple orders. Therefore, it is necessary to introduce multiple robots to provide a more flexible production mode. Currently, some Job Shop Scheduling Problems with Transportation (JSP-T) only consider job scheduling and vehicle task allocation, and does not focus on the problem of collision free paths between vehicles. This article proposes a novel solution framework that integrates workshop scheduling, material handling robot task allocation, and conflict free path planning between robots. With the goal of minimizing the maximum completion time (Makespan) that includes handling, this paper first establishes an extended JSP-T problem model that integrates handling time and robot paths, and provides the corresponding workshop layout map. Secondly, in the scheduling layer, an improved Deep Q-Network (DQN) method is used for dynamic scheduling to generate a feasible and optimal machining scheduling scheme. Subsequently, considering the robot’s position information, the task sequence is assigned to the robot path execution layer. Finally, at the path execution layer, the Priority Based Search (PBS) algorithm is applied to solve conflict free paths for the handling robot. The optimized solution for obtaining the maximum completion time of all jobs under the condition of conflict free path handling. The experimental results show that compared with algorithms such as PPO, the scheduling algorithm proposed in this paper has improved performance by 9.7% in Makespan, and the PBS algorithm can obtain optimized paths for multiple handling robots under conflict free conditions. The framework can handle scheduling, task allocation, and conflict-free path planning in a unified optimization process, which can adapt well to job changes and then flexible manufacturing. Full article
Show Figures

Figure 1

19 pages, 6478 KB  
Article
An Intelligent Dynamic Cluster Partitioning and Regulation Strategy for Distribution Networks
by Keyan Liu, Kaiyuan He, Dongli Jia, Huiyu Zhan, Wanxing Sheng, Zukun Li, Yuxuan Huang, Sijia Hu and Yong Li
Energies 2026, 19(2), 384; https://doi.org/10.3390/en19020384 - 13 Jan 2026
Viewed by 178
Abstract
As distributed generators (DGs) and flexible adjustable loads (FALs) further penetrate distribution networks (DNs), to reduce regulation complexity compared with traditional centralized control frameworks, DGs and FALs in DNs should be packed in several clusters to enable their dispatch to become standard in [...] Read more.
As distributed generators (DGs) and flexible adjustable loads (FALs) further penetrate distribution networks (DNs), to reduce regulation complexity compared with traditional centralized control frameworks, DGs and FALs in DNs should be packed in several clusters to enable their dispatch to become standard in the industry. To mitigate the negative influence of DGs’ and FALs’ spatiotemporal distribution and uncertain output characteristics on dispatch, this paper proposes an intelligent dynamic cluster partitioning strategy for DNs, from which the DN’s resources and loads can be intelligently aggregated, organized, and regulated in a dynamic and optimal way with relatively high implementation efficiency. An environmental model based on the Markov decision process (MDP) technique is first developed for DN cluster partitioning, in which a continuous state space, a discrete action space, and a dispatching performance-oriented reward are designed. Then, a novel random forest Q-learning network (RF-QN) is developed to implement dynamic cluster partitioning by interacting with the proposed environmental model, from which the generalization and robust capability to estimate the Q-function can be improved by taking advantage of combining deep learning and decision trees. Finally, a modified IEEE-33-node system is adopted to verify the effectiveness of the proposed intelligent dynamic cluster partitioning and regulation strategy; the results also indicate that the proposed RF-QN is superior to the traditional deep Q-learning (DQN) model in terms of renewable energy accommodation rate, training efficiency, and portioning and regulation performance. Full article
(This article belongs to the Special Issue Advanced in Modeling, Analysis and Control of Microgrids)
Show Figures

Figure 1

19 pages, 14874 KB  
Article
Deep Q-Network for Maneuver Planning in Beyond-Visual-Range Aerial Pursuit–Evasion with Target Re-Engagement
by Long-Jun Zhu, Kevin W. Tong and Edmond Q. Wu
Aerospace 2026, 13(1), 77; https://doi.org/10.3390/aerospace13010077 - 11 Jan 2026
Viewed by 184
Abstract
Decision-making for maneuvering in the presence of long-range threats is crucial for enhancing the safety and reliability of autonomous aerial platforms operating in beyond-line-of-sight environments. This study employs the Deep Q-Network (DQN) method to investigate maneuvering strategies for simultaneously avoiding incoming high-speed threats [...] Read more.
Decision-making for maneuvering in the presence of long-range threats is crucial for enhancing the safety and reliability of autonomous aerial platforms operating in beyond-line-of-sight environments. This study employs the Deep Q-Network (DQN) method to investigate maneuvering strategies for simultaneously avoiding incoming high-speed threats and re-establishing tracking of a maneuvering target platform. First, kinematic models for the aerial platforms and the approaching interceptor are developed, and a DQN training environment is constructed based on these models. A DQN framework is then designed, integrating scenario-specific state representation, action space, and a hybrid reward structure to enable autonomous strategy learning without prior expert knowledge. The agent is trained within this environment to achieve near-optimal maneuvering decisions, with comparative evaluations against Q-learning and deep deterministic policy gradient (DDPG) baselines. Simulation results demonstrate that the trained model outperforms the baselines on key metrics by effectively avoiding approaching threats, re-establishing robust target tracking, reducing maneuver time, and exhibiting strong generalization across challenging scenarios. This work advances Beyond-Visual-Range (BVR) maneuver planning and provides a foundational methodological framework for future research on complex multi-stage aerial pursuit–evasion problems. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

Back to TopTop