Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (242)

Search Parameters:
Keywords = double deep Q-network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 2115 KB  
Article
Graph-Density-Aware Joint Energy-Latency Optimization in Multi-UAV IoT Networks Using Dueling Deep Q-Network
by Mohammad Ahmed Alnakhli
Drones 2026, 10(4), 275; https://doi.org/10.3390/drones10040275 - 10 Apr 2026
Viewed by 314
Abstract
Multi-UAV communication networks face significant challenges in achieving high energy efficiency and low communication latency under dynamic topology and interference conditions. This paper proposes a Dueling Deep Q-Network (DQN) framework for joint resource optimization in 6G-enabled multi-UAV systems. The proposed approach jointly optimizes [...] Read more.
Multi-UAV communication networks face significant challenges in achieving high energy efficiency and low communication latency under dynamic topology and interference conditions. This paper proposes a Dueling Deep Q-Network (DQN) framework for joint resource optimization in 6G-enabled multi-UAV systems. The proposed approach jointly optimizes transmit power allocation, inter-UAV link association, and adaptive graph density within a unified reinforcement learning framework. By employing a dueling value–advantage decomposition, the proposed model improves learning stability and convergence compared to conventional DQN and Double DQN (DDQN) schemes. Simulation results under varying network densities and UAV scales show that the proposed Dueling DQN achieves up to 15% higher energy efficiency and 12% lower end-to-end latency, while maintaining robust performance in dense connectivity scenarios. These results demonstrate the effectiveness and scalability of the proposed framework for energy- and latency-sensitive UAV communication applications. Full article
(This article belongs to the Section Drone Communications)
Show Figures

Figure 1

32 pages, 2316 KB  
Article
Energy-Efficient and Maintenance-Aware Control of a Residential Split-Type Air Conditioner Using an Enhanced Deep Q-Network
by Natdanai Kiewwath, Pattaraporn Khuwuthyakorn and Orawit Thinnukool
Sustainability 2026, 18(7), 3578; https://doi.org/10.3390/su18073578 - 6 Apr 2026
Viewed by 299
Abstract
Residential air conditioning systems are a major contributor to household electricity consumption in tropical regions, where environmental factors such as climate variability and particulate pollution (PM10) can further increase cooling demand and accelerate equipment degradation. This study proposes an Enhanced Deep Q-Network (Enhanced [...] Read more.
Residential air conditioning systems are a major contributor to household electricity consumption in tropical regions, where environmental factors such as climate variability and particulate pollution (PM10) can further increase cooling demand and accelerate equipment degradation. This study proposes an Enhanced Deep Q-Network (Enhanced DQN) for energy-efficient and maintenance-aware control of residential split-type air conditioners under dynamic environmental conditions. The proposed method integrates several stability-oriented reinforcement learning mechanisms, including Double Q-learning, a dueling architecture, prioritized experience replay, multi-step returns, Bayesian-style regularization via Monte Carlo dropout, and entropy-aware exploration. The framework is evaluated through a two-stage process consisting of a diagnostic benchmark on LunarLander-v3 to assess learning stability, followed by a realistic 365-day simulation driven by Thai weather and PM10 data. Compared with a fixed 25 °C baseline, the proposed controller reduced annual electricity consumption from 5116.22 kWh to as low as 4440.03 kWh, corresponding to a saving of 13.22%. The learned policy also exhibited environmentally adaptive behavior under high PM10 conditions, indicating maintenance-aware characteristics. These findings demonstrate that reinforcement learning can provide robust, adaptive, and sustainable control strategies for residential cooling systems in tropical environments. Full article
(This article belongs to the Special Issue AI in Smart Cities and Urban Mobility)
Show Figures

Figure 1

45 pages, 7679 KB  
Article
Conquering the Urban Firefighting Challenge: A Deep Q-Network Approach for Autonomous UAV Navigation
by Shafiqul Alam Khan, Damian Valles, Marcelo M. Carvalho and Wenquan Dong
Inventions 2026, 11(2), 35; https://doi.org/10.3390/inventions11020035 - 2 Apr 2026
Viewed by 405
Abstract
Firefighters must locate victims reliably to carry out rescue operations within burning structures during urban firefighting events. Low visibility, reduced oxygen levels, weakened structural rigidity, and dense smoke make it difficult to locate victims. In addition to these challenges, victims may be unconscious [...] Read more.
Firefighters must locate victims reliably to carry out rescue operations within burning structures during urban firefighting events. Low visibility, reduced oxygen levels, weakened structural rigidity, and dense smoke make it difficult to locate victims. In addition to these challenges, victims may be unconscious and unable to report their locations to firefighters. This research work explores the Double Deep Q-Network (Double DQN), Dueling Deep Q-Network (Dueling DQN), and Dueling Double Deep Q-Network (D3QN) agents for an unmanned aerial vehicle (UAV) to navigate around a structure and locate trapped victims within it. The UAV’s position, Light Detection and Ranging (LiDAR), and infrared camera data are utilized as inputs for the Deep Q-Networks. The PER is used to store transitions and sample them according to priority for training. Python’s Pygame library is used in this research to create a simulated environment in which infrared camera and LiDAR data are simulated. The performance of the UAV agent is evaluated using cumulative maximum reward, reward distribution histogram, Temporal Difference (TD) error over time, and number of successful episodes. Among the three DQN UAV agents, the Dueling DQN and Double DQN have potential for real-world applications in firefighting. Full article
(This article belongs to the Special Issue Unmanned Aerial Vehicles (UAVs): Innovations and Applications)
Show Figures

Figure 1

31 pages, 1687 KB  
Article
A Hybrid Planning–Learning Framework for Autonomous Navigation with Dynamic Obstacles
by Hatice Arslan Öztürk, Sırma Yavuz and Çetin Kaya Koç
Appl. Sci. 2026, 16(6), 2961; https://doi.org/10.3390/app16062961 - 19 Mar 2026
Viewed by 363
Abstract
Traditional navigation methods work well in known, static environments but degrade in real-world settings with dynamic and unpredictable obstacles. This paper presents Double Deep Q-Network with A* guidance (DDQNA), a hybrid navigation algorithm that enables an agent to traverse mazes containing static [...] Read more.
Traditional navigation methods work well in known, static environments but degrade in real-world settings with dynamic and unpredictable obstacles. This paper presents Double Deep Q-Network with A* guidance (DDQNA), a hybrid navigation algorithm that enables an agent to traverse mazes containing static and dynamic obstacles while maintaining a low probability of collision. DDQNA combines A* guidance with Double Deep Q-Network (DDQN) learning using an ϵ-greedy policy, and it introduces a redesigned reward function and an improved action-selection mechanism to better exploit A*’s directional cues during training. We evaluate DDQNA in a custom Pygame simulation across 11 environments of increasing difficulty. Experimental results show that DDQNA consistently outperforms the standard DDQN and other state-of-the-art reinforcement learning baselines, achieving higher goal-reaching rates, fewer visited cells, shorter computation times, and higher cumulative rewards. These results indicate that DDQNA provides both effective navigation and computational efficiency in complex environments with static and dynamic obstacles. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

25 pages, 2297 KB  
Article
A Multi-Agent Advisory Board Reinforcement Learning Framework for Adaptive Cooperative Control
by Onur Osman, Tolga Kudret Karaca, Bahar Yalcin Kavus, Gokalp Tulum and Sajjad Nematzadeh
Algorithms 2026, 19(3), 230; https://doi.org/10.3390/a19030230 - 18 Mar 2026
Viewed by 284
Abstract
This study proposes Advisory Board Reinforcement Learning (AdvB-RL), a cooperative reinforcement-learning framework that integrates multiple advisory neural networks to guide policy optimization. Unlike conventional single-agent architectures, AdvB-RL maintains a set of independently trained advisory networks that contribute to action selection through a dynamic [...] Read more.
This study proposes Advisory Board Reinforcement Learning (AdvB-RL), a cooperative reinforcement-learning framework that integrates multiple advisory neural networks to guide policy optimization. Unlike conventional single-agent architectures, AdvB-RL maintains a set of independently trained advisory networks that contribute to action selection through a dynamic aggregation mechanism. This design preserves diverse experiential knowledge while improving learning stability and the exploration–exploitation balance. The framework is evaluated on three benchmark control tasks, namely LunarLander-v2, CartPole-v1, and MountainCar-v0, using advisory board sizes of 1, 5, and 10 members against a Double Deep Q-Network (DDQN) baseline. The best-performing configuration, 10 AdvB, achieved 270.02 ± 24.74 on LunarLander-v2 versus 227.92 ± 86.02 for DDQN, 497.79 ± 5.18 on CartPole-v1 versus 304.37 ± 144.04, and −103.16 ± 15.46 on MountainCar-v0 versus −130.71 ± 31.64, indicating higher returns together with markedly lower variability. Across the three environments, these results show that increasing the number of advisory members improves both reward consistency and overall robustness, with the 10-member setting providing the strongest performance. Within the tested configurations, the advisory board mechanism remains computationally feasible, while preliminary experiments beyond 10 advisors show diminishing returns relative to added complexity. Overall, AdvB-RL provides a robust and modular alternative to single-policy reinforcement learning for adaptive cooperative control. Full article
Show Figures

Figure 1

36 pages, 4478 KB  
Article
CBAM-BiLSTM-DDQN: A Novel Adaptive Quantitative Trading Model for Financial Data Analysis
by Yan Zhang, Mingxuan Zhou, Feng Sun and Yuehua Wu
Axioms 2026, 15(3), 222; https://doi.org/10.3390/axioms15030222 - 16 Mar 2026
Viewed by 691
Abstract
Financial data analysis remains a significant challenge due to the inherent stochasticity, non-stationarity, and low signal-to-noise ratio of market data. Conventional methods often struggle to disentangle intrinsic trends from noise and frequently overlook the critical influence of investor sentiment on price dynamics. To [...] Read more.
Financial data analysis remains a significant challenge due to the inherent stochasticity, non-stationarity, and low signal-to-noise ratio of market data. Conventional methods often struggle to disentangle intrinsic trends from noise and frequently overlook the critical influence of investor sentiment on price dynamics. To address these issues, we propose an adaptive trading model named CBAM-BiLSTM-DDQN, which integrates signal decomposition, multi-source feature fusion, and deep reinforcement learning. First, we construct a comprehensive heterogeneous feature set by combining price signals decomposed via Variational Mode Decomposition (VMD) and investor sentiment indices extracted from financial texts. Subsequently, a Genetic Algorithm (GA) is employed to identify the most significant feature subset, effectively reducing dimensionality and redundancy. Finally, these optimized features are input into a Double Deep Q-Network (DDQN) agent equipped with a Convolutional Block Attention Module (CBAM) and a Bidirectional Long Short-Term Memory (BiLSTM) network to capture complex spatiotemporal dependencies. We evaluated this approach through simulated trading on three major Chinese stock indices—the Shanghai Stock Exchange Composite (SSEC), the Shenzhen Stock Exchange Component (SZSE), and the China Securities 300 (CSI 300). Experimental results demonstrate the superiority of our method over traditional strategies and standard baselines; specifically, the trading agent achieved robust cumulative returns across the SSEC and CSI 300 indices, confirming the model’s exceptional capability in balancing profitability and risk aversion in complex financial environments. Furthermore, additional experiments on individual stocks in the Chinese A-share market reinforce the robustness and generalization ability of our proposed model, validating its practical potential for diverse trading scenarios. Furthermore, additional experiments on individual stocks in the Chinese A-share market reinforce the robustness and generalization ability of our proposed model, validating its practical potential for diverse trading scenarios. Full article
(This article belongs to the Special Issue New Perspectives in Mathematical Statistics, 2nd Edition)
Show Figures

Figure 1

27 pages, 2344 KB  
Article
Cloud-Edge Resource Scheduling and Offloading Optimization Based on Deep Reinforcement Learning
by Lili Yin, Yunze Xie, Ze Zhao and Jie Gao
Sensors 2026, 26(5), 1704; https://doi.org/10.3390/s26051704 - 8 Mar 2026
Viewed by 420
Abstract
In the context of smart manufacturing, with the widespread deployment of Industrial Internet of Things (IoT) devices, a large number of computation tasks that are highly sensitive to latency and have strict deadlines have emerged, requiring real-time processing. Effectively offloading tasks to address [...] Read more.
In the context of smart manufacturing, with the widespread deployment of Industrial Internet of Things (IoT) devices, a large number of computation tasks that are highly sensitive to latency and have strict deadlines have emerged, requiring real-time processing. Effectively offloading tasks to address the issues of increased latency and task dropouts caused by dynamic changes in edge node load has become a key challenge in the cloud–edge–end collaborative environment of smart manufacturing. To tackle the complex issues of unknown edge node loads and dynamic system state changes, this paper proposes a distributed algorithm based on deep reinforcement learning, utilizing convolutional neural networks (CNN) and the Informer architecture. The proposed algorithm leverages CNN to extract local features of edge node loads while utilizing Informer’s self-attention mechanism to capture long-term load variation trends, thereby effectively handling the uncertainty and dynamics inherent in node loads. Furthermore, by integrating the Dueling Deep Q-Network (DQN) and Double DQN techniques, the algorithm achieves a precise approximation of the state–action value function, further enhancing its capability to perceive system temporal characteristics and adapt to heterogeneous tasks. Each mobile device can independently make task offloading decisions and scheduling strategies based on its observations, enabling dynamic task allocation and optimization of execution order. Simulation results show that, compared to various existing algorithms, the proposed method reduces task dropout rates by 82.3–94% and average latency by 28–39.2%. Experimental results validate the significant advantages of this method in intelligent manufacturing scenarios with high load and latency-sensitive tasks. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

14 pages, 1058 KB  
Article
QCNN-Inspired Variational Circuits for Enhanced Noise Robustness in Quantum Deep Q-Learning
by Louyang Yu, Wenbin Yu, Yadang Chen and Chengjun Zhang
Information 2026, 17(3), 250; https://doi.org/10.3390/info17030250 - 3 Mar 2026
Viewed by 339
Abstract
Quantum reinforcement learning (QRL) is often evaluated under idealized, noiseless assumptions, yet realistic quantum devices inevitably introduce noise that can severely degrade performance. This paper improves the robustness of quantum deep Q-learning (QDQN) by redesigning the variational quantum circuit (VQC) used in its [...] Read more.
Quantum reinforcement learning (QRL) is often evaluated under idealized, noiseless assumptions, yet realistic quantum devices inevitably introduce noise that can severely degrade performance. This paper improves the robustness of quantum deep Q-learning (QDQN) by redesigning the variational quantum circuit (VQC) used in its value-function approximator. Motivated by recent advances in quantum convolutional neural networks (QCNNs), we construct four QCNN-inspired VQC variants (Models A–D) by combining representative QCNN two-qubit building blocks with an explicit fully connected (all-to-all) layer. Using a 10-fold evaluation protocol at a fixed noise level p = 0.005, Model D achieves the best robustness, reducing the mean number of episodes required to reach a target reward from 1981 (baseline) to 1243. Under a stricter success criterion, Model D also doubles the empirically observed noise-tolerance boundary from 0.002 to 0.004. These results indicate that carefully chosen QCNN-style circuit components and connectivity can significantly improve the noise robustness of QDQN-like QRL agents. Full article
Show Figures

Figure 1

22 pages, 35239 KB  
Article
TBDDQN: Imbalanced Fault Diagnosis for Blast Furnace Ironmaking Process via Transformer–BiLSTM Double Deep Q-Networks
by Jinlong Zheng, Ping Wu, Ruirui Zuo, Xin Su, Yinzhu Liu and Nabin Kandel
Machines 2026, 14(3), 276; https://doi.org/10.3390/machines14030276 - 2 Mar 2026
Viewed by 346
Abstract
The blast furnace ironmaking process (BFIP) is a highly complex and dynamic industrial system where strong spatiotemporal coupling and severe data imbalance pose substantial challenges for fault diagnosis. To address these issues, this study proposes a Transformer–BiLSTM Double Deep Q-Network (TBDDQN) framework for [...] Read more.
The blast furnace ironmaking process (BFIP) is a highly complex and dynamic industrial system where strong spatiotemporal coupling and severe data imbalance pose substantial challenges for fault diagnosis. To address these issues, this study proposes a Transformer–BiLSTM Double Deep Q-Network (TBDDQN) framework for intelligent fault diagnosis. The framework employs a dual-branch architecture that integrates a Transformer-based spatial encoder with a BiLSTM-attention temporal extractor to capture global dependencies and dynamic patterns from multivariate time-series data. To mitigate class imbalance and asymmetric fault costs, a cost-sensitive reinforcement learning scheme based on Double DQN is incorporated, featuring prioritized experience replay and adaptive misclassification penalties. Experiments on real blast furnace datasets show that TBDDQN achieves a macro-averaged precision of 0.970 and a macro-averaged F1-score of 0.929, outperforming conventional CNN, LSTM, and DQN-based baselines. These results demonstrate that TBDDQN offers a robust and interpretable solution for imbalanced industrial fault diagnosis in the BFIP. Full article
Show Figures

Figure 1

32 pages, 2534 KB  
Article
A Knowledge-Guided Deep Reinforcement Learning Approach for Energy-Aware Distributed Flexible Job Shop Scheduling with Job Priority
by Zhi-Yong Luo, Jia-Bao Song and Chun-Qiao Ge
Processes 2026, 14(4), 662; https://doi.org/10.3390/pr14040662 - 14 Feb 2026
Viewed by 537
Abstract
Energy-aware distributed manufacturing has become a key focus in modern production systems due to the growing demand for sustainable and efficient operations. This study investigates the energy-aware distributed flexible job shop scheduling problem with job priority, where multiple factories cooperate to process prioritized [...] Read more.
Energy-aware distributed manufacturing has become a key focus in modern production systems due to the growing demand for sustainable and efficient operations. This study investigates the energy-aware distributed flexible job shop scheduling problem with job priority, where multiple factories cooperate to process prioritized jobs under energy consumption considerations. Considering job priorities is essential for reflecting the practical importance and urgency of different customer orders, which directly affects scheduling fairness and production responsiveness. The proposed bi-objective model aims to simultaneously minimize total weighted tardiness and total energy consumption, accounting for both processing and idle power. To effectively solve this complex NP-hard problem, a knowledge-guided deep reinforcement learning approach is developed. Domain knowledge is integrated into a double deep Q-network to guide the adaptive selection of local search operators, while a co-evolutionary mechanism maintains global exploration and accelerates convergence. Extensive computational experiments are conducted on 24 benchmark instances, which are categorized into five groups according to factory scale, with the maximum problem size reaching 160 jobs × 6 machines × 5 factories, together with a real-world case study. Compared with four state-of-the-art multi-objective baseline algorithms (NSGA-II, MOPSO, MOEA/D, and SPEA2), the proposed D2QN-COEA demonstrates substantial performance advantages. On average, it achieves an HV improvement of 23.1% compared with the best-performing baseline on each instance, while GD and IGD are reduced by 70.8% and 63.7%, respectively. When averaged across all four baseline algorithms, D2QN-COEA yields improvements of 203.4% in HV, 83.9% in GD, 79.9% in IGD, and 70.8% in Spacing, confirming its superior convergence accuracy and solution diversity. The results confirm that embedding domain knowledge into deep reinforcement learning enhances optimization robustness and provides an intelligent solution for energy-efficient distributed scheduling in modern manufacturing systems. Full article
(This article belongs to the Section AI-Enabled Process Engineering)
Show Figures

Figure 1

16 pages, 3489 KB  
Article
A Deployment Strategy for Reconfigurable Intelligent Surfaces with Joint Phase and Position Optimization
by Guangsong Yang, Hongbo Huang, Chuwei Sun, Yiliang Wu, Xinjie Xu and Shan Huang
Electronics 2026, 15(3), 718; https://doi.org/10.3390/electronics15030718 - 6 Feb 2026
Cited by 1 | Viewed by 389
Abstract
The actual implementation of fifth-generation (5G) and beyond networks faces persistent challenges, including environmental interference and limited coverage, which compromise transmission stability and network feasibility. Reconfigurable Intelligent Surfaces (RISs) have emerged as a promising technology to dynamically reconfigure wireless propagation environments and enhance [...] Read more.
The actual implementation of fifth-generation (5G) and beyond networks faces persistent challenges, including environmental interference and limited coverage, which compromise transmission stability and network feasibility. Reconfigurable Intelligent Surfaces (RISs) have emerged as a promising technology to dynamically reconfigure wireless propagation environments and enhance communication quality. To fully unlock the potential of RIS, this paper proposes a novel deployment strategy based on Double Deep Q-Networks (DDQNs) that jointly optimizes the RIS placement and phase shift configuration to maximize the system sum-rate. Specifically, the coverage area is discretized into a grid, and at each candidate location, a DDQN-based method is developed to solve the corresponding non-convex phase optimization problem. Simulation results reveal that our proposed strategy significantly surpasses conventional benchmark schemes, resulting in a sum-rate improvement of up to 38.41%. The study provides a practical and efficient pre-deployment framework for RIS-enhanced wireless networks. Full article
Show Figures

Figure 1

8 pages, 1055 KB  
Proceeding Paper
Subchannel Allocation in Massive Multiple-Input Multiple-Output Orthogonal Frequency-Division Multiple Access and Hybrid Beamforming Systems with Deep Reinforcement Learning
by Jih-Wei Lee and Yung-Fang Chen
Eng. Proc. 2025, 120(1), 55; https://doi.org/10.3390/engproc2025120055 - 6 Feb 2026
Viewed by 295
Abstract
In this study, we emphasize that the maximum sum rate can be achieved through AI-based subchannel allocation, while taking into account all users’ quality of service (QoS) requirements in data rates for hybrid beamforming systems. We assume a limited number of radio frequency [...] Read more.
In this study, we emphasize that the maximum sum rate can be achieved through AI-based subchannel allocation, while taking into account all users’ quality of service (QoS) requirements in data rates for hybrid beamforming systems. We assume a limited number of radio frequency (RF) chains in practical hybrid beamforming architectures. This constraint makes subchannel allocation a critical aspect of hybrid beamforming in massive multiple-input multiple-output (MIMO) systems with orthogonal frequency division multiple access (MIMO-OFDMA), as it enables the system to serve more users within a single time slot. Unlike conventional subcarrier allocation methods, we employ a deep reinforcement learning (DRL)-based algorithm to address real-time decision-making challenges. Specifically, we propose a dueling double deep Q-network (Dueling-DDQN) to implement dynamic subchannel allocation. Simulation results demonstrate that the performance of the proposed algorithm gradually approaches that of the greedy method. Furthermore, both the average sum rate and the average spectral efficiency per user improve with a reasonable variation in outage probability. Full article
(This article belongs to the Proceedings of 8th International Conference on Knowledge Innovation and Invention)
Show Figures

Figure 1

26 pages, 5704 KB  
Article
Intent-Aware Collision Avoidance for UAVs in High-Density Non-Cooperative Environments Using Deep Reinforcement Learning
by Xuchuan Liu, Yuan Zheng, Chenglong Li, Bo Jiang and Wenyong Gu
Aerospace 2026, 13(2), 111; https://doi.org/10.3390/aerospace13020111 - 23 Jan 2026
Viewed by 527
Abstract
Collision avoidance between unmanned aerial vehicles (UAVs) and non-cooperative targets (e.g., off-nominal operations or birds) presents significant challenges in urban air mobility (UAM). This difficulty arises due to the highly dynamic and unpredictable flight intentions of these targets. Traditional collision-avoidance methods primarily focus [...] Read more.
Collision avoidance between unmanned aerial vehicles (UAVs) and non-cooperative targets (e.g., off-nominal operations or birds) presents significant challenges in urban air mobility (UAM). This difficulty arises due to the highly dynamic and unpredictable flight intentions of these targets. Traditional collision-avoidance methods primarily focus on cooperative targets or non-cooperative ones with fixed behavior, rendering them ineffective when dealing with highly unpredictable flight patterns. To address this, we introduce a deep reinforcement learning-based collision-avoidance approach leveraging global and local intent prediction. Specifically, we propose a Global and Local Perception Prediction Module (GLPPM) that combines a state-space-based global intent association mechanism with a local feature extraction module, enabling accurate prediction of short- and long-term flight intents. Additionally, we propose a Fusion Sector Flight Control Module (FSFCM) that is trained with a Dueling Double Deep Q-Network (D3QN). The module integrates both predicted future and current intents into the state space and employs a specifically designed reward function, thereby ensuring safe UAV operations. Experimental results demonstrate that the proposed method significantly improves mission success rates in high-density environments, with up to 80 non-cooperative targets per square kilometer. In 1000 flight tests, the mission success rate is 15.2 percentage points higher than that of the baseline D3QN. Furthermore, the approach retains an 88.1% success rate even under extreme target densities of 120 targets per square kilometer. Finally, interpretability analysis via Deep SHAP further verifies the decision-making rationality of the algorithm. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

25 pages, 4648 KB  
Systematic Review
Deep Reinforcement Learning Algorithms for Intrusion Detection: A Bibliometric Analysis and Systematic Review
by Lekhetho Joseph Mpoporo, Pius Adewale Owolawi and Chunling Tu
Appl. Sci. 2026, 16(2), 1048; https://doi.org/10.3390/app16021048 - 20 Jan 2026
Viewed by 971
Abstract
Intrusion detection systems (IDSs) are crucial for safeguarding modern digital infrastructure against the ever-evolving cyber threats. As cyberattacks become increasingly complex, traditional machine learning (ML) algorithms, while remaining effective in classifying known threats, face limitations such as static learning, dependency on labeled data, [...] Read more.
Intrusion detection systems (IDSs) are crucial for safeguarding modern digital infrastructure against the ever-evolving cyber threats. As cyberattacks become increasingly complex, traditional machine learning (ML) algorithms, while remaining effective in classifying known threats, face limitations such as static learning, dependency on labeled data, and susceptibility to adversarial exploits. Deep reinforcement learning (DRL) has recently surfaced as a viable substitute, providing resilience in unanticipated circumstances, dynamic adaptation, and continuous learning. This study conducts a thorough bibliometric analysis and systematic literature review (SLR) of DRL-based intrusion detection systems (DRL-based IDS). The relevant literature from 2020 to 2024 was identified and investigated using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework. Emerging research themes, influential works, and structural relationships in the research fields were identified using a bibliometric analysis. SLR was used to synthesize methodological techniques, datasets, and performance analysis. The results indicate that DRL algorithms such as deep Q-network (DQN), double DQNs (DDQN), dueling DQN (D3QN), policy gradient methods, and actor–critic models have been actively utilized for enhancing IDS performance in various applications and datasets. The results highlight the increasing significance of DRL-based solutions for developing intelligent and robust intrusion detection systems and advancing cybersecurity. Full article
(This article belongs to the Special Issue Advances in Cyber Security)
Show Figures

Figure 1

14 pages, 2906 KB  
Proceeding Paper
Onboard Deep Reinforcement Learning: Deployment and Testing for CubeSat Attitude Control
by Sajjad Zahedi, Jafar Roshanian, Mehran Mirshams and Krasin Georgiev
Eng. Proc. 2026, 121(1), 26; https://doi.org/10.3390/engproc2025121026 - 20 Jan 2026
Viewed by 481
Abstract
Recent progress in Reinforcement Learning (RL), especially deep RL, has created new possibilities for autonomous control in complex and uncertain environments. This study explores these possibilities through a practical approach, implementing an RL agent on a custom-built CubeSat. The CubeSat, equipped with a [...] Read more.
Recent progress in Reinforcement Learning (RL), especially deep RL, has created new possibilities for autonomous control in complex and uncertain environments. This study explores these possibilities through a practical approach, implementing an RL agent on a custom-built CubeSat. The CubeSat, equipped with a reaction wheel for active attitude control, serves as a physical testbed for validating RL-based strategies. To mimic space-like conditions, the CubeSat was placed on a custom air-bearing platform that allows near-frictionless rotation along a single axis, simulating microgravity. Unlike simulation-only research, this work showcases real-time hardware-level implementation of a Double Deep Q-Network (DDQN) controller. The DDQN agent receives real system state data and outputs control commands to orient the CubeSat via its reaction wheel. For comparison, a traditional PID controller was also tested under identical conditions. Both controllers were evaluated based on response time, accuracy, and resilience to disturbances. The DDQN outperformed the PID, showing better adaptability and control. This research demonstrates the successful integration of RL into real aerospace hardware, bridging the gap between theoretical algorithms and practical space applications through a hands-on CubeSat platform. Full article
Show Figures

Figure 1

Back to TopTop