Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (27)

Search Parameters:
Keywords = Dueling Double Deep Q Network (D3QN)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
31 pages, 9881 KiB  
Article
Guide Robot Based on Image Processing and Path Planning
by Chen-Hsien Yang and Jih-Gau Juang
Machines 2025, 13(7), 560; https://doi.org/10.3390/machines13070560 - 27 Jun 2025
Viewed by 296
Abstract
While guide dogs remain the primary aid for visually impaired individuals, robotic guides continue to be an important area of research. This study introduces an indoor guide robot designed to physically assist a blind person by holding their hand with a robotic arm [...] Read more.
While guide dogs remain the primary aid for visually impaired individuals, robotic guides continue to be an important area of research. This study introduces an indoor guide robot designed to physically assist a blind person by holding their hand with a robotic arm and guiding them to a specified destination. To enable hand-holding, we employed a camera combined with object detection to identify the human hand and a closed-loop control system to manage the robotic arm’s movements. For path planning, we implemented a Dueling Double Deep Q Network (D3QN) enhanced with a genetic algorithm. To address dynamic obstacles, the robot utilizes a depth camera alongside fuzzy logic to control its wheels and navigate around them. A 3D point cloud map is generated to determine the start and end points accurately. The D3QN algorithm, supplemented by variables defined using the genetic algorithm, is then used to plan the robot’s path. As a result, the robot can safely guide blind individuals to their destinations without collisions. Full article
(This article belongs to the Special Issue Autonomous Navigation of Mobile Robots and UAVs, 2nd Edition)
Show Figures

Figure 1

29 pages, 5292 KiB  
Article
Path Planning for Lunar Rovers in Dynamic Environments: An Autonomous Navigation Framework Enhanced by Digital Twin-Based A*-D3QN
by Wei Liu, Gang Wan, Jia Liu and Dianwei Cong
Aerospace 2025, 12(6), 517; https://doi.org/10.3390/aerospace12060517 - 8 Jun 2025
Viewed by 636
Abstract
In lunar exploration missions, rovers must navigate multiple waypoints within strict time constraints while avoiding dynamic obstacles, demanding real-time, collision-free path planning. This paper proposes a digital twin-enhanced hierarchical planning method, A*-D3QN-Opt (A-Star-Dueling Double Deep Q-Network-Optimized). The framework combines the A* algorithm for [...] Read more.
In lunar exploration missions, rovers must navigate multiple waypoints within strict time constraints while avoiding dynamic obstacles, demanding real-time, collision-free path planning. This paper proposes a digital twin-enhanced hierarchical planning method, A*-D3QN-Opt (A-Star-Dueling Double Deep Q-Network-Optimized). The framework combines the A* algorithm for global optimal paths in static environments with an improved D3QN (Dueling Double Deep Q-Network) for dynamic obstacle avoidance. A multi-dimensional reward function balances path efficiency, safety, energy, and time, while priority experience replay accelerates training convergence. A high-fidelity digital twin simulation environment integrates a YOLOv5-based multimodal perception system for real-time obstacle detection and distance estimation. Experimental validation across low-, medium-, and high-complexity scenarios demonstrates superior performance: the method achieves shorter paths, zero collisions in dynamic settings, and 30% faster convergence than baseline D3QN. Results confirm its ability to harmonize optimality, safety, and real-time adaptability under dynamic constraints, offering critical support for autonomous navigation in lunar missions like Chang’e and future deep space exploration, thereby reducing operational risks and enhancing mission efficiency. Full article
(This article belongs to the Section Astronautics & Space Science)
Show Figures

Figure 1

28 pages, 4738 KiB  
Article
AEM-D3QN: A Graph-Based Deep Reinforcement Learning Framework for Dynamic Earth Observation Satellite Mission Planning
by Shuo Li, Gang Wang and Jinyong Chen
Aerospace 2025, 12(5), 420; https://doi.org/10.3390/aerospace12050420 - 9 May 2025
Viewed by 604
Abstract
Efficient and adaptive mission planning for Earth Observation Satellites (EOSs) remains a challenging task due to the growing complexity of user demands, task constraints, and limited satellite resources. Traditional heuristic and metaheuristic approaches often struggle with scalability and adaptability in dynamic environments. To [...] Read more.
Efficient and adaptive mission planning for Earth Observation Satellites (EOSs) remains a challenging task due to the growing complexity of user demands, task constraints, and limited satellite resources. Traditional heuristic and metaheuristic approaches often struggle with scalability and adaptability in dynamic environments. To overcome these limitations, we introduce AEM-D3QN, a novel intelligent task scheduling framework that integrates Graph Neural Networks (GNNs) with an Adaptive Exploration Mechanism-enabled Double Dueling Deep Q-Network (D3QN). This framework constructs a Directed Acyclic Graph (DAG) atlas to represent task dependencies and constraints, leveraging GNNs to extract spatial–temporal task features. These features are then encoded into a reinforcement learning model that dynamically optimizes scheduling policies under multiple resource constraints. The adaptive exploration mechanism improves learning efficiency by balancing exploration and exploitation based on task urgency and satellite status. Extensive experiments conducted under both periodic and emergency planning scenarios demonstrate that AEM-D3QN outperforms state-of-the-art algorithms in scheduling efficiency, response time, and task completion rate. The proposed framework offers a scalable and robust solution for real-time satellite mission planning in complex and dynamic operational environments. Full article
(This article belongs to the Section Astronautics & Space Science)
Show Figures

Figure 1

23 pages, 6216 KiB  
Article
A Macro-Control and Micro-Autonomy Pathfinding Strategy for Multi-Automated Guided Vehicles in Complex Manufacturing Scenarios
by Jiahui Le, Lili He and Junhong Zheng
Appl. Sci. 2025, 15(10), 5249; https://doi.org/10.3390/app15105249 - 8 May 2025
Viewed by 514
Abstract
To effectively plan the travel paths of automated guided vehicles (AGVs) in complex manufacturing scenarios and avoid dynamic obstacles, this paper proposes a pathfinding strategy that integrates macro-control and micro-autonomy. At the macro level, a central system employs a modified A* algorithm for [...] Read more.
To effectively plan the travel paths of automated guided vehicles (AGVs) in complex manufacturing scenarios and avoid dynamic obstacles, this paper proposes a pathfinding strategy that integrates macro-control and micro-autonomy. At the macro level, a central system employs a modified A* algorithm for preliminary pathfinding, guiding the AGVs toward their targets. At the micro level, a distributed system incorporates a navigation and obstacle avoidance strategy trained by Prioritized Experience Replay Double Dueling Deep Q-Network with ε-Dataset Aggregation (PER-D3QN-EDAgger). Each AGV integrates its current state with information from the central system and the neighboring AGVs to make autonomous pathfinding decisions. The experimental results indicate that this strategy exhibits a strong adaptability to diverse environments, low path costs, and rapid solution speeds. It effectively avoids the neighboring AGVs and other dynamic obstacles, and maintains a high task completion rate of over 95% when the number of AGVs is below 200 and the obstacle density is below 0.5. This approach combines the advantages of centralized pathfinding, which ensures high path quality, with distributed planning, which enhances adaptability to dynamic environments. Full article
Show Figures

Figure 1

28 pages, 16065 KiB  
Article
Optimization of Adaptive Observation Strategies for Multi-AUVs in Complex Marine Environments Using Deep Reinforcement Learning
by Jingjing Zhang, Weidong Zhou, Xiong Deng, Shuo Yang, Chunwang Yang and Hongliang Yin
J. Mar. Sci. Eng. 2025, 13(5), 865; https://doi.org/10.3390/jmse13050865 - 26 Apr 2025
Cited by 1 | Viewed by 511
Abstract
This paper explores the application of Deep Reinforcement Learning (DRL) to optimize adaptive observation strategies for multi-AUV systems in complex marine environments. Traditional algorithms struggle with the strong coupling between environmental information and observation modeling, making it challenging to derive optimal strategies. To [...] Read more.
This paper explores the application of Deep Reinforcement Learning (DRL) to optimize adaptive observation strategies for multi-AUV systems in complex marine environments. Traditional algorithms struggle with the strong coupling between environmental information and observation modeling, making it challenging to derive optimal strategies. To address this, we designed a DRL framework based on the Dueling Double Deep Q-Network (D3QN), enabling AUVs to interact directly with the environment for more efficient 3D dynamic ocean observation. However, traditional D3QN faces slow convergence and weak action–decision correlation in partially observable, dynamic marine settings. To overcome these challenges, we integrate a Gated Recurrent Unit (GRU) into the D3QN, improving state-space prediction and accelerating reward convergence. This enhancement allows AUVs to optimize observations, leverage ocean currents, and navigate obstacles while minimizing energy consumption. Experimental results demonstrate that the proposed approach excels in safety, energy efficiency, and observation effectiveness. Additionally, experiments with three, five, and seven AUVs reveal that while increasing platform numbers enhances predictive accuracy, the benefits diminish with additional units. Full article
(This article belongs to the Special Issue Underwater Observation Technology in Marine Environment)
Show Figures

Figure 1

30 pages, 22571 KiB  
Article
Joint Pricing, Server Orchestration and Network Slice Deployment in Mobile Edge Computing Networks
by Yijian Hou, Kaisa Zhang, Gang Chuai, Weidong Gao, Xiangyu Chen and Siqi Liu
Electronics 2025, 14(5), 841; https://doi.org/10.3390/electronics14050841 - 21 Feb 2025
Viewed by 777
Abstract
The integration of mobile edge computing (MEC) and network slicing can provide low-latency and customized services. In such integrated wireless networks, we propose a pricing-driven joint MEC server orchestration and network slice deployment scheme (PD-JSOSD), jointly solving the pricing, MEC server orchestration and [...] Read more.
The integration of mobile edge computing (MEC) and network slicing can provide low-latency and customized services. In such integrated wireless networks, we propose a pricing-driven joint MEC server orchestration and network slice deployment scheme (PD-JSOSD), jointly solving the pricing, MEC server orchestration and network slicing deployment issues. We divide the system into an infrastructure provider layer (IPL), network planning layer (NPL) and resource allocation layer (RAL), and a three-stage Stackelberg game is proposed to describe their relationships. To obtain the Stackelberg equalization, we propose a three-layer deep reinforcement learning (DRL) algorithm. Specifically, the dueling double deep Q-network (D3QN) is used in the IPL, and the DRL with branching dueling Q-network (BDQ) is used in the NPL and the RAL to cope with the large-scale discrete action spaces. Moreover, we propose an innovative illegal action modification algorithm to improve the convergence of the BDQ. Simulations verify the convergence of the three-layer DRL and the superiority of modified-BDQ in dealing with large-scale action spaces, where modified-BDQ can improve the convergence by 21.9% and 28.3%. Furthermore, compared with the benchmark algorithms, JSOSD in the NPL and the RAL can improve system utility by up to 52.1%, proving the superiority of the server orchestration and slice deployment scheme. Full article
(This article belongs to the Special Issue New Advances in Distributed Computing and Its Applications)
Show Figures

Graphical abstract

16 pages, 8546 KiB  
Article
Reactive Power Optimization Method of Power Network Based on Deep Reinforcement Learning Considering Topology Characteristics
by Tianhua Chen, Zemei Dai, Xin Shan, Zhenghong Li, Chengming Hu, Yang Xue and Ke Xu
Energies 2024, 17(24), 6454; https://doi.org/10.3390/en17246454 - 21 Dec 2024
Viewed by 1394
Abstract
Aiming at the load fluctuation problem caused by a high proportion of new energy grid connections, a reactive power optimization method based on deep reinforcement learning (DRL) considering topological characteristics is proposed. The proposed method transforms the reactive power optimization problem into a [...] Read more.
Aiming at the load fluctuation problem caused by a high proportion of new energy grid connections, a reactive power optimization method based on deep reinforcement learning (DRL) considering topological characteristics is proposed. The proposed method transforms the reactive power optimization problem into a Markov decision process and models and solves it through the deep reinforcement learning framework. The Dueling Double Deep Q-Network (D3QN) algorithm is adopted to improve the accuracy and efficiency of calculation. Aiming at the problem that deep reinforcement learning algorithms are difficult to simulate the topological characteristics of power flow, the Graph Convolutional Dueling Double Deep Q-Network (GCD3QN) algorithm is proposed. The graph convolutional neural network (GCN) is integrated into the D3QN model, and the information aggregation of topological nodes is realized through the graph convolution operator, which solves the calculation problem of deep learning algorithms in non-European space and improves the accuracy of reactive power optimization. The IEEE standard node system is used for simulation analysis, and the effectiveness of the proposed method is verified. Full article
(This article belongs to the Section F: Electrical Engineering)
Show Figures

Figure 1

24 pages, 2771 KiB  
Article
Redundant Path Optimization in Smart Ship Software-Defined Networking and Time-Sensitive Networking Networks: An Improved Double-Dueling-Deep-Q-Networks-Based Approach
by Yanli Xu, Songtao He, Zirui Zhou and Jingxin Xu
J. Mar. Sci. Eng. 2024, 12(12), 2214; https://doi.org/10.3390/jmse12122214 - 2 Dec 2024
Cited by 2 | Viewed by 1404
Abstract
Traditional network architectures in smart ship communication systems struggle to efficiently manage the integration of heterogeneous sensor data. Additionally, conventional end-to-end transmission algorithms that rely on single-metric and single-path selection are inadequate in fulfilling the high reliability and real-time transmission requirements essential for [...] Read more.
Traditional network architectures in smart ship communication systems struggle to efficiently manage the integration of heterogeneous sensor data. Additionally, conventional end-to-end transmission algorithms that rely on single-metric and single-path selection are inadequate in fulfilling the high reliability and real-time transmission requirements essential for high-priority service data. This inadequacy results in increased latency and packet loss for critical control information. To address these challenges, this study proposes an innovative ship network framework that synergistically integrates Software-Defined Networking (SDN) and Time-Sensitive Networking (TSN) technologies. Central to this framework is the introduction of a redundant multipath selection algorithm, which leverages Double Dueling Deep Q-Networks (D3QNs) in conjunction with Graph Convolutional Networks (GCNs). Initially, an optimization function encompassing transmission latency, bandwidth utilization, and packet loss rate is formulated within a software-defined time-sensitive network transmission framework tailored for smart ships. The proposed D3QN-GCN-based algorithm effectively identifies optimal working and redundant paths for TSN switches. These dual-path configurations are then disseminated by the SDN controller to the TSN switches, enabling the TSN’s inherent reliability redundancy mechanisms to facilitate the simultaneous transmission of critical service flows across multiple paths. Experimental evaluations demonstrate that the proposed algorithm exhibits robust convergence characteristics and significantly outperforms existing algorithms in terms of reducing network latency and packet loss rates. Furthermore, the algorithm enhances bandwidth utilization and promotes balanced network load distribution. This research offers a novel and effective solution for shipboard switch path selection, thereby advancing the reliability and efficiency of smart ship communication systems. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

15 pages, 3686 KiB  
Article
Optimal Operation of Virtual Power Plants Based on Stackelberg Game Theory
by Weishi Zhang, Chuan He, Haichao Wang, Hanhan Qian, Zhemin Lin and Hui Qi
Energies 2024, 17(15), 3612; https://doi.org/10.3390/en17153612 - 23 Jul 2024
Cited by 4 | Viewed by 1485
Abstract
As the scale of units within virtual power plants (VPPs) continues to expand, establishing an effective operational game model for these internal units has become a pressing issue for enhancing management and operations. This paper integrates photovoltaic generation, wind power, energy storage, and [...] Read more.
As the scale of units within virtual power plants (VPPs) continues to expand, establishing an effective operational game model for these internal units has become a pressing issue for enhancing management and operations. This paper integrates photovoltaic generation, wind power, energy storage, and constant-temperature responsive loads, and it also considers micro gas turbines as auxiliary units, collectively forming a typical VPP case study. An operational optimization model was developed for the VPP control center and the micro gas turbines, and the game relationship between them was analyzed. A Stackelberg game model between the VPP control center and the micro gas turbines was proposed. Lastly, an improved D3QN (Dueling Double Deep Q-network) algorithm was employed to compute the VPP’s optimal operational strategy based on Stackelberg game theory. The results demonstrate that the proposed model can balance the energy complementarity between the VPP control center and the micro gas turbines, thereby enhancing the overall economic efficiency of operations. Full article
(This article belongs to the Section F: Electrical Engineering)
Show Figures

Figure 1

20 pages, 4919 KiB  
Article
Mobile Robot Navigation Based on Noisy N-Step Dueling Double Deep Q-Network and Prioritized Experience Replay
by Wenjie Hu, Ye Zhou and Hann Woei Ho
Electronics 2024, 13(12), 2423; https://doi.org/10.3390/electronics13122423 - 20 Jun 2024
Cited by 4 | Viewed by 2133
Abstract
Effective real-time autonomous navigation for mobile robots in static and dynamic environments has become a challenging and active research topic. Although the simultaneous localization and mapping (SLAM) algorithm offers a solution, it often heavily relies on complex global and local maps, resulting in [...] Read more.
Effective real-time autonomous navigation for mobile robots in static and dynamic environments has become a challenging and active research topic. Although the simultaneous localization and mapping (SLAM) algorithm offers a solution, it often heavily relies on complex global and local maps, resulting in significant computational demands, slower convergence rates, and prolonged training times. In response to these challenges, this paper presents a novel algorithm called PER-n2D3QN, which integrates prioritized experience replay, a noisy network with factorized Gaussian noise, n-step learning, and a dueling structure into a double deep Q-network. This combination enhances the efficiency of experience replay, facilitates exploration, and provides more accurate Q-value estimates, thereby significantly improving the performance of autonomous navigation for mobile robots. To further bolster the stability and robustness, meaningful improvements, such as target “soft” updates and the gradient clipping mechanism, are employed. Additionally, a novel and powerful target-oriented reshaping reward function is designed to expedite learning. The proposed model is validated through extensive experiments using the robot operating system (ROS) and Gazebo simulation environment. Furthermore, to more specifically reflect the complexity of the simulation environment, this paper presents a quantitative analysis of the simulation environment. The experimental results demonstrate that PER-n2D3QN exhibits heightened accuracy, accelerated convergence rates, and enhanced robustness in both static and dynamic scenarios. Full article
Show Figures

Figure 1

19 pages, 6305 KiB  
Article
Deep Reinforcement Learning-Based 3D Trajectory Planning for Cellular Connected UAV
by Xiang Liu, Weizhi Zhong, Xin Wang, Hongtao Duan, Zhenxiong Fan, Haowen Jin, Yang Huang and Zhipeng Lin
Drones 2024, 8(5), 199; https://doi.org/10.3390/drones8050199 - 15 May 2024
Cited by 8 | Viewed by 3180
Abstract
To address the issue of limited application scenarios associated with connectivity assurance based on two-dimensional (2D) trajectory planning, this paper proposes an improved deep reinforcement learning (DRL) -based three-dimensional (3D) trajectory planning method for cellular unmanned aerial vehicles (UAVs) communication. By considering the [...] Read more.
To address the issue of limited application scenarios associated with connectivity assurance based on two-dimensional (2D) trajectory planning, this paper proposes an improved deep reinforcement learning (DRL) -based three-dimensional (3D) trajectory planning method for cellular unmanned aerial vehicles (UAVs) communication. By considering the 3D space environment and integrating factors such as UAV mission completion time and connectivity, we develop an objective function for path optimization and utilize the advanced dueling double deep Q network (D3QN) to optimize it. Additionally, we introduce the prioritized experience replay (PER) mechanism to enhance learning efficiency and expedite convergence. In order to further aid in trajectory planning, our method incorporates a simultaneous navigation and radio mapping (SNARM) framework that generates simulated 3D radio maps and simulates flight processes by utilizing measurement signals from the UAV during flight, thereby reducing actual flight costs. The simulation results demonstrate that the proposed approach effectively enable UAVs to avoid weak coverage regions in space, thereby reducing the weighted sum of flight time and expected interruption time. Full article
Show Figures

Figure 1

23 pages, 10010 KiB  
Article
Task-Offloading Strategy of Mobile Edge Computing for WBANs
by Yuhong Li and Wenzhu Zhang
Electronics 2024, 13(8), 1422; https://doi.org/10.3390/electronics13081422 - 9 Apr 2024
Cited by 2 | Viewed by 1859
Abstract
In recent years, mobile edge computing has become one of the popular methods to provide computing resources for the body area network, but existing research only considers the problem of minimizing the cost of offloading when solving the optimization problem of task-offloading, ignoring [...] Read more.
In recent years, mobile edge computing has become one of the popular methods to provide computing resources for the body area network, but existing research only considers the problem of minimizing the cost of offloading when solving the optimization problem of task-offloading, ignoring the trust problem of edge computing nodes, and offloading tasks on edge nodes may cause user information disclosure and reduce the quality of user experience. In response to this situation, this study aims to minimize the average user cost and designs a task-offloading strategy based on the D3QN (dueling double deep Q-network) algorithm in conjunction with the blockchain information security storage model. This strategy uses deep reinforcement learning algorithms to obtain the minimum average offloading cost of the system while considering user latency, energy consumption, and data protection conditions. The experimental simulation results show that compared to traditional schemes and other reinforcement learning-based schemes, this scheme can more effectively reduce the average cost of the system, and the average cost is reduced by 31.25% when reaching convergence. In addition, as the complexity of the model increases, this scheme can provide users with better experience quality, with 53.7% of the 1000 users having a very good experience quality. Full article
(This article belongs to the Special Issue The Applications of Deep Neural Network in Edge Computing)
Show Figures

Figure 1

25 pages, 5686 KiB  
Article
Carbon Dioxide Emission Reduction-Oriented Optimal Control of Traffic Signals in Mixed Traffic Flow Based on Deep Reinforcement Learning
by Zhaowei Wang, Le Xu and Jianxiao Ma
Sustainability 2023, 15(24), 16564; https://doi.org/10.3390/su152416564 - 5 Dec 2023
Cited by 8 | Viewed by 2855
Abstract
To alleviate intersection traffic congestion and reduce carbon emissions at intersections, research on exploiting reinforcement learning for intersection signal control has become a frontier topic in the field of intelligent transportation. This study utilizes a deep reinforcement learning algorithm based on the D3QN [...] Read more.
To alleviate intersection traffic congestion and reduce carbon emissions at intersections, research on exploiting reinforcement learning for intersection signal control has become a frontier topic in the field of intelligent transportation. This study utilizes a deep reinforcement learning algorithm based on the D3QN (dueling double deep Q network) to achieve adaptive control of signal timings. Under a mixed traffic environment with connected and automated vehicles (CAVs) and human-driven vehicles (HDVs), this study constructs a reward function (Reward—CO2 Reduction) to minimize vehicle waiting time and carbon dioxide emissions at the intersection. Additionally, to account for the spatiotemporal distribution characteristics of traffic flow, an adaptive-phase action space and a fixed-phase action space are designed to optimize action selections. The proposed algorithm is validated in a SUMO simulation with different traffic volumes and CAV penetration rates. The experimental results are compared with other control strategies like Webster’s method (fixed-time control). The analysis shows that the proposed model can effectively reduce carbon dioxide emissions when the traffic volume is low or medium. As the penetration rate of CAVs increases, the average carbon dioxide emissions and waiting time can be further reduced with the proposed model. The significance of this study lies in its dual achievement: by presenting a flexible strategy that not only reduces the environmental impact by lowering carbon dioxide emissions but also enhances traffic efficiency, it provides a tangible example of the advancement of green intelligent transportation systems. Full article
Show Figures

Figure 1

17 pages, 2793 KiB  
Article
Multi-Objective Flexible Flow Shop Production Scheduling Problem Based on the Double Deep Q-Network Algorithm
by Hua Gong, Wanning Xu, Wenjuan Sun and Ke Xu
Processes 2023, 11(12), 3321; https://doi.org/10.3390/pr11123321 - 29 Nov 2023
Cited by 9 | Viewed by 2611
Abstract
In this paper, motivated by the production process of electronic control modules in the digital electronic detonators industry, we study a multi-objective flexible flow shop scheduling problem. The objective is to find a feasible schedule that minimizes both the makespan and the total [...] Read more.
In this paper, motivated by the production process of electronic control modules in the digital electronic detonators industry, we study a multi-objective flexible flow shop scheduling problem. The objective is to find a feasible schedule that minimizes both the makespan and the total tardiness. Considering the constraints imposed by the jobs and the machines throughout the manufacturing process, a mixed integer programming model is formulated. By transforming the scheduling problem into a Markov decision process, the agent state features and the actions are designed based on the processing status of the machines and the jobs, along with heuristic rules. Furthermore, a reward function based on the optimization objectives is designed. Based on the deep reinforcement learning algorithm, the Dueling Double Deep Q-Network (D3QN) algorithm is designed to solve the scheduling problem by incorporating the target network, the dueling network, and the experience replay buffer. The D3QN algorithm is compared with heuristic rules, the genetic algorithm (GA), and the optimal solutions generated by Gurobi. The ablation experiments are designed. The experimental results demonstrate the high performance of the D3QN algorithm with the target network and the dueling network proposed in this paper. The scheduling model and the algorithm proposed in this paper can provide theoretical support to make the production plan of electronic control modules reasonable and improve production efficiency. Full article
Show Figures

Figure 1

15 pages, 836 KiB  
Article
RIS-Assisted Robust Beamforming for UAV Anti-Jamming and Eavesdropping Communications: A Deep Reinforcement Learning Approach
by Chao Zou, Cheng Li, Yong Li and Xiaojuan Yan
Electronics 2023, 12(21), 4490; https://doi.org/10.3390/electronics12214490 - 1 Nov 2023
Cited by 7 | Viewed by 3249
Abstract
The reconfigurable intelligent surface (RIS) has been widely recognized as a rising paradigm for physical layer security due to its potential to substantially adjust the electromagnetic propagation environment. In this regard, this paper adopted the RIS deployed on an unmanned aerial vehicle (UAV) [...] Read more.
The reconfigurable intelligent surface (RIS) has been widely recognized as a rising paradigm for physical layer security due to its potential to substantially adjust the electromagnetic propagation environment. In this regard, this paper adopted the RIS deployed on an unmanned aerial vehicle (UAV) to enhance information transmission while defending against both jamming and eavesdropping attacks. Furthermore, an innovative deep reinforcement learning (DRL) approach is proposed with the purpose of optimizing the power allocation of the base station (BS) and the discrete phase shifts of the RIS. Specifically, considering the imperfect illegitimate node’s channel state information (CSI), we first reformulated the non-convex and non-conventional original problem into a Markov decision process (MDP) framework. Subsequently, a noisy dueling double-deep Q-network with prioritized experience replay (Noisy-D3QN-PER) algorithm was developed with the objective of maximizing the achievable sum rate while ensuring the fulfillment of the security requirements. Finally, the numerical simulations showed that our proposed algorithm outperformed the baselines on the system rate and at transmission protection level. Full article
Show Figures

Figure 1

Back to TopTop