MDPI - Publisher of Open Access Journals

31 pages, 9881 KiB

Open AccessArticle

Guide Robot Based on Image Processing and Path Planning

by Chen-Hsien Yang and Jih-Gau Juang

Machines 2025, 13(7), 560; https://doi.org/10.3390/machines13070560 - 27 Jun 2025

Viewed by 296

While guide dogs remain the primary aid for visually impaired individuals, robotic guides continue to be an important area of research. This study introduces an indoor guide robot designed to physically assist a blind person by holding their hand with a robotic arm [...] Read more.

While guide dogs remain the primary aid for visually impaired individuals, robotic guides continue to be an important area of research. This study introduces an indoor guide robot designed to physically assist a blind person by holding their hand with a robotic arm and guiding them to a specified destination. To enable hand-holding, we employed a camera combined with object detection to identify the human hand and a closed-loop control system to manage the robotic arm’s movements. For path planning, we implemented a Dueling Double Deep Q Network (D3QN) enhanced with a genetic algorithm. To address dynamic obstacles, the robot utilizes a depth camera alongside fuzzy logic to control its wheels and navigate around them. A 3D point cloud map is generated to determine the start and end points accurately. The D3QN algorithm, supplemented by variables defined using the genetic algorithm, is then used to plan the robot’s path. As a result, the robot can safely guide blind individuals to their destinations without collisions. Full article

(This article belongs to the Special Issue Autonomous Navigation of Mobile Robots and UAVs, 2nd Edition)

► Show Figures

Figure 1

29 pages, 5292 KiB

Open AccessArticle

Path Planning for Lunar Rovers in Dynamic Environments: An Autonomous Navigation Framework Enhanced by Digital Twin-Based A*-D3QN

by Wei Liu, Gang Wan, Jia Liu and Dianwei Cong

Aerospace 2025, 12(6), 517; https://doi.org/10.3390/aerospace12060517 - 8 Jun 2025

Viewed by 636

Abstract

In lunar exploration missions, rovers must navigate multiple waypoints within strict time constraints while avoiding dynamic obstacles, demanding real-time, collision-free path planning. This paper proposes a digital twin-enhanced hierarchical planning method, A*-D3QN-Opt (A-Star-Dueling Double Deep Q-Network-Optimized). The framework combines the A* algorithm for [...] Read more.

In lunar exploration missions, rovers must navigate multiple waypoints within strict time constraints while avoiding dynamic obstacles, demanding real-time, collision-free path planning. This paper proposes a digital twin-enhanced hierarchical planning method, A*-D3QN-Opt (A-Star-Dueling Double Deep Q-Network-Optimized). The framework combines the A* algorithm for global optimal paths in static environments with an improved D3QN (Dueling Double Deep Q-Network) for dynamic obstacle avoidance. A multi-dimensional reward function balances path efficiency, safety, energy, and time, while priority experience replay accelerates training convergence. A high-fidelity digital twin simulation environment integrates a YOLOv5-based multimodal perception system for real-time obstacle detection and distance estimation. Experimental validation across low-, medium-, and high-complexity scenarios demonstrates superior performance: the method achieves shorter paths, zero collisions in dynamic settings, and 30% faster convergence than baseline D3QN. Results confirm its ability to harmonize optimality, safety, and real-time adaptability under dynamic constraints, offering critical support for autonomous navigation in lunar missions like Chang’e and future deep space exploration, thereby reducing operational risks and enhancing mission efficiency. Full article

(This article belongs to the Section Astronautics & Space Science)

► Show Figures

Figure 1

28 pages, 4738 KiB

Open AccessArticle

AEM-D3QN: A Graph-Based Deep Reinforcement Learning Framework for Dynamic Earth Observation Satellite Mission Planning

by Shuo Li, Gang Wang and Jinyong Chen

Aerospace 2025, 12(5), 420; https://doi.org/10.3390/aerospace12050420 - 9 May 2025

Viewed by 604

Abstract

Efficient and adaptive mission planning for Earth Observation Satellites (EOSs) remains a challenging task due to the growing complexity of user demands, task constraints, and limited satellite resources. Traditional heuristic and metaheuristic approaches often struggle with scalability and adaptability in dynamic environments. To [...] Read more.

Efficient and adaptive mission planning for Earth Observation Satellites (EOSs) remains a challenging task due to the growing complexity of user demands, task constraints, and limited satellite resources. Traditional heuristic and metaheuristic approaches often struggle with scalability and adaptability in dynamic environments. To overcome these limitations, we introduce AEM-D3QN, a novel intelligent task scheduling framework that integrates Graph Neural Networks (GNNs) with an Adaptive Exploration Mechanism-enabled Double Dueling Deep Q-Network (D3QN). This framework constructs a Directed Acyclic Graph (DAG) atlas to represent task dependencies and constraints, leveraging GNNs to extract spatial–temporal task features. These features are then encoded into a reinforcement learning model that dynamically optimizes scheduling policies under multiple resource constraints. The adaptive exploration mechanism improves learning efficiency by balancing exploration and exploitation based on task urgency and satellite status. Extensive experiments conducted under both periodic and emergency planning scenarios demonstrate that AEM-D3QN outperforms state-of-the-art algorithms in scheduling efficiency, response time, and task completion rate. The proposed framework offers a scalable and robust solution for real-time satellite mission planning in complex and dynamic operational environments. Full article

(This article belongs to the Section Astronautics & Space Science)

► Show Figures

Figure 1

23 pages, 6216 KiB

Open AccessArticle

A Macro-Control and Micro-Autonomy Pathfinding Strategy for Multi-Automated Guided Vehicles in Complex Manufacturing Scenarios

by Jiahui Le, Lili He and Junhong Zheng

Appl. Sci. 2025, 15(10), 5249; https://doi.org/10.3390/app15105249 - 8 May 2025

Viewed by 514

Abstract

To effectively plan the travel paths of automated guided vehicles (AGVs) in complex manufacturing scenarios and avoid dynamic obstacles, this paper proposes a pathfinding strategy that integrates macro-control and micro-autonomy. At the macro level, a central system employs a modified A* algorithm for [...] Read more.

To effectively plan the travel paths of automated guided vehicles (AGVs) in complex manufacturing scenarios and avoid dynamic obstacles, this paper proposes a pathfinding strategy that integrates macro-control and micro-autonomy. At the macro level, a central system employs a modified A* algorithm for preliminary pathfinding, guiding the AGVs toward their targets. At the micro level, a distributed system incorporates a navigation and obstacle avoidance strategy trained by Prioritized Experience Replay Double Dueling Deep Q-Network with

ε

-Dataset Aggregation (PER-D3QN-EDAgger). Each AGV integrates its current state with information from the central system and the neighboring AGVs to make autonomous pathfinding decisions. The experimental results indicate that this strategy exhibits a strong adaptability to diverse environments, low path costs, and rapid solution speeds. It effectively avoids the neighboring AGVs and other dynamic obstacles, and maintains a high task completion rate of over 95% when the number of AGVs is below 200 and the obstacle density is below 0.5. This approach combines the advantages of centralized pathfinding, which ensures high path quality, with distributed planning, which enhances adaptability to dynamic environments. Full article

► Show Figures

Figure 1

28 pages, 16065 KiB

Open AccessArticle

Optimization of Adaptive Observation Strategies for Multi-AUVs in Complex Marine Environments Using Deep Reinforcement Learning

by Jingjing Zhang, Weidong Zhou, Xiong Deng, Shuo Yang, Chunwang Yang and Hongliang Yin

J. Mar. Sci. Eng. 2025, 13(5), 865; https://doi.org/10.3390/jmse13050865 - 26 Apr 2025

Cited by 1 | Viewed by 511

Abstract

This paper explores the application of Deep Reinforcement Learning (DRL) to optimize adaptive observation strategies for multi-AUV systems in complex marine environments. Traditional algorithms struggle with the strong coupling between environmental information and observation modeling, making it challenging to derive optimal strategies. To [...] Read more.

This paper explores the application of Deep Reinforcement Learning (DRL) to optimize adaptive observation strategies for multi-AUV systems in complex marine environments. Traditional algorithms struggle with the strong coupling between environmental information and observation modeling, making it challenging to derive optimal strategies. To address this, we designed a DRL framework based on the Dueling Double Deep Q-Network (D3QN), enabling AUVs to interact directly with the environment for more efficient 3D dynamic ocean observation. However, traditional D3QN faces slow convergence and weak action–decision correlation in partially observable, dynamic marine settings. To overcome these challenges, we integrate a Gated Recurrent Unit (GRU) into the D3QN, improving state-space prediction and accelerating reward convergence. This enhancement allows AUVs to optimize observations, leverage ocean currents, and navigate obstacles while minimizing energy consumption. Experimental results demonstrate that the proposed approach excels in safety, energy efficiency, and observation effectiveness. Additionally, experiments with three, five, and seven AUVs reveal that while increasing platform numbers enhances predictive accuracy, the benefits diminish with additional units. Full article

(This article belongs to the Special Issue Underwater Observation Technology in Marine Environment)

► Show Figures

Figure 1

30 pages, 22571 KiB

Open AccessArticle

Joint Pricing, Server Orchestration and Network Slice Deployment in Mobile Edge Computing Networks

by Yijian Hou, Kaisa Zhang, Gang Chuai, Weidong Gao, Xiangyu Chen and Siqi Liu

Electronics 2025, 14(5), 841; https://doi.org/10.3390/electronics14050841 - 21 Feb 2025

Viewed by 777

Abstract

The integration of mobile edge computing (MEC) and network slicing can provide low-latency and customized services. In such integrated wireless networks, we propose a pricing-driven joint MEC server orchestration and network slice deployment scheme (PD-JSOSD), jointly solving the pricing, MEC server orchestration and [...] Read more.

The integration of mobile edge computing (MEC) and network slicing can provide low-latency and customized services. In such integrated wireless networks, we propose a pricing-driven joint MEC server orchestration and network slice deployment scheme (PD-JSOSD), jointly solving the pricing, MEC server orchestration and network slicing deployment issues. We divide the system into an infrastructure provider layer (IPL), network planning layer (NPL) and resource allocation layer (RAL), and a three-stage Stackelberg game is proposed to describe their relationships. To obtain the Stackelberg equalization, we propose a three-layer deep reinforcement learning (DRL) algorithm. Specifically, the dueling double deep Q-network (D3QN) is used in the IPL, and the DRL with branching dueling Q-network (BDQ) is used in the NPL and the RAL to cope with the large-scale discrete action spaces. Moreover, we propose an innovative illegal action modification algorithm to improve the convergence of the BDQ. Simulations verify the convergence of the three-layer DRL and the superiority of modified-BDQ in dealing with large-scale action spaces, where modified-BDQ can improve the convergence by 21.9% and 28.3%. Furthermore, compared with the benchmark algorithms, JSOSD in the NPL and the RAL can improve system utility by up to 52.1%, proving the superiority of the server orchestration and slice deployment scheme. Full article

(This article belongs to the Special Issue New Advances in Distributed Computing and Its Applications)

► Show Figures

Graphical abstract

16 pages, 8546 KiB

Open AccessArticle

Reactive Power Optimization Method of Power Network Based on Deep Reinforcement Learning Considering Topology Characteristics

by Tianhua Chen, Zemei Dai, Xin Shan, Zhenghong Li, Chengming Hu, Yang Xue and Ke Xu

Energies 2024, 17(24), 6454; https://doi.org/10.3390/en17246454 - 21 Dec 2024

Viewed by 1394

Abstract

Aiming at the load fluctuation problem caused by a high proportion of new energy grid connections, a reactive power optimization method based on deep reinforcement learning (DRL) considering topological characteristics is proposed. The proposed method transforms the reactive power optimization problem into a [...] Read more.

Aiming at the load fluctuation problem caused by a high proportion of new energy grid connections, a reactive power optimization method based on deep reinforcement learning (DRL) considering topological characteristics is proposed. The proposed method transforms the reactive power optimization problem into a Markov decision process and models and solves it through the deep reinforcement learning framework. The Dueling Double Deep Q-Network (D3QN) algorithm is adopted to improve the accuracy and efficiency of calculation. Aiming at the problem that deep reinforcement learning algorithms are difficult to simulate the topological characteristics of power flow, the Graph Convolutional Dueling Double Deep Q-Network (GCD3QN) algorithm is proposed. The graph convolutional neural network (GCN) is integrated into the D3QN model, and the information aggregation of topological nodes is realized through the graph convolution operator, which solves the calculation problem of deep learning algorithms in non-European space and improves the accuracy of reactive power optimization. The IEEE standard node system is used for simulation analysis, and the effectiveness of the proposed method is verified. Full article

(This article belongs to the Section F: Electrical Engineering)

► Show Figures

Figure 1

24 pages, 2771 KiB

Open AccessArticle

Redundant Path Optimization in Smart Ship Software-Defined Networking and Time-Sensitive Networking Networks: An Improved Double-Dueling-Deep-Q-Networks-Based Approach

by Yanli Xu, Songtao He, Zirui Zhou and Jingxin Xu

J. Mar. Sci. Eng. 2024, 12(12), 2214; https://doi.org/10.3390/jmse12122214 - 2 Dec 2024

Cited by 2 | Viewed by 1404

Abstract

Traditional network architectures in smart ship communication systems struggle to efficiently manage the integration of heterogeneous sensor data. Additionally, conventional end-to-end transmission algorithms that rely on single-metric and single-path selection are inadequate in fulfilling the high reliability and real-time transmission requirements essential for [...] Read more.

Traditional network architectures in smart ship communication systems struggle to efficiently manage the integration of heterogeneous sensor data. Additionally, conventional end-to-end transmission algorithms that rely on single-metric and single-path selection are inadequate in fulfilling the high reliability and real-time transmission requirements essential for high-priority service data. This inadequacy results in increased latency and packet loss for critical control information. To address these challenges, this study proposes an innovative ship network framework that synergistically integrates Software-Defined Networking (SDN) and Time-Sensitive Networking (TSN) technologies. Central to this framework is the introduction of a redundant multipath selection algorithm, which leverages Double Dueling Deep Q-Networks (D3QNs) in conjunction with Graph Convolutional Networks (GCNs). Initially, an optimization function encompassing transmission latency, bandwidth utilization, and packet loss rate is formulated within a software-defined time-sensitive network transmission framework tailored for smart ships. The proposed D3QN-GCN-based algorithm effectively identifies optimal working and redundant paths for TSN switches. These dual-path configurations are then disseminated by the SDN controller to the TSN switches, enabling the TSN’s inherent reliability redundancy mechanisms to facilitate the simultaneous transmission of critical service flows across multiple paths. Experimental evaluations demonstrate that the proposed algorithm exhibits robust convergence characteristics and significantly outperforms existing algorithms in terms of reducing network latency and packet loss rates. Furthermore, the algorithm enhances bandwidth utilization and promotes balanced network load distribution. This research offers a novel and effective solution for shipboard switch path selection, thereby advancing the reliability and efficiency of smart ship communication systems. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

15 pages, 3686 KiB

Open AccessArticle

Optimal Operation of Virtual Power Plants Based on Stackelberg Game Theory

by Weishi Zhang, Chuan He, Haichao Wang, Hanhan Qian, Zhemin Lin and Hui Qi

Energies 2024, 17(15), 3612; https://doi.org/10.3390/en17153612 - 23 Jul 2024

Cited by 4 | Viewed by 1485

Abstract

As the scale of units within virtual power plants (VPPs) continues to expand, establishing an effective operational game model for these internal units has become a pressing issue for enhancing management and operations. This paper integrates photovoltaic generation, wind power, energy storage, and [...] Read more.

As the scale of units within virtual power plants (VPPs) continues to expand, establishing an effective operational game model for these internal units has become a pressing issue for enhancing management and operations. This paper integrates photovoltaic generation, wind power, energy storage, and constant-temperature responsive loads, and it also considers micro gas turbines as auxiliary units, collectively forming a typical VPP case study. An operational optimization model was developed for the VPP control center and the micro gas turbines, and the game relationship between them was analyzed. A Stackelberg game model between the VPP control center and the micro gas turbines was proposed. Lastly, an improved D3QN (Dueling Double Deep Q-network) algorithm was employed to compute the VPP’s optimal operational strategy based on Stackelberg game theory. The results demonstrate that the proposed model can balance the energy complementarity between the VPP control center and the micro gas turbines, thereby enhancing the overall economic efficiency of operations. Full article

(This article belongs to the Section F: Electrical Engineering)

► Show Figures

Figure 1

20 pages, 4919 KiB

Open AccessArticle

Mobile Robot Navigation Based on Noisy N-Step Dueling Double Deep Q-Network and Prioritized Experience Replay

by Wenjie Hu, Ye Zhou and Hann Woei Ho

Electronics 2024, 13(12), 2423; https://doi.org/10.3390/electronics13122423 - 20 Jun 2024

Cited by 4 | Viewed by 2133

Abstract

Effective real-time autonomous navigation for mobile robots in static and dynamic environments has become a challenging and active research topic. Although the simultaneous localization and mapping (SLAM) algorithm offers a solution, it often heavily relies on complex global and local maps, resulting in [...] Read more.

Effective real-time autonomous navigation for mobile robots in static and dynamic environments has become a challenging and active research topic. Although the simultaneous localization and mapping (SLAM) algorithm offers a solution, it often heavily relies on complex global and local maps, resulting in significant computational demands, slower convergence rates, and prolonged training times. In response to these challenges, this paper presents a novel algorithm called PER-n₂D₃QN, which integrates prioritized experience replay, a noisy network with factorized Gaussian noise, n-step learning, and a dueling structure into a double deep Q-network. This combination enhances the efficiency of experience replay, facilitates exploration, and provides more accurate Q-value estimates, thereby significantly improving the performance of autonomous navigation for mobile robots. To further bolster the stability and robustness, meaningful improvements, such as target “soft” updates and the gradient clipping mechanism, are employed. Additionally, a novel and powerful target-oriented reshaping reward function is designed to expedite learning. The proposed model is validated through extensive experiments using the robot operating system (ROS) and Gazebo simulation environment. Furthermore, to more specifically reflect the complexity of the simulation environment, this paper presents a quantitative analysis of the simulation environment. The experimental results demonstrate that PER-n₂D₃QN exhibits heightened accuracy, accelerated convergence rates, and enhanced robustness in both static and dynamic scenarios. Full article

► Show Figures

Figure 1

19 pages, 6305 KiB

Open AccessEditor’s ChoiceArticle

Deep Reinforcement Learning-Based 3D Trajectory Planning for Cellular Connected UAV

by Xiang Liu, Weizhi Zhong, Xin Wang, Hongtao Duan, Zhenxiong Fan, Haowen Jin, Yang Huang and Zhipeng Lin

Drones 2024, 8(5), 199; https://doi.org/10.3390/drones8050199 - 15 May 2024

Cited by 8 | Viewed by 3180

Abstract

To address the issue of limited application scenarios associated with connectivity assurance based on two-dimensional (2D) trajectory planning, this paper proposes an improved deep reinforcement learning (DRL) -based three-dimensional (3D) trajectory planning method for cellular unmanned aerial vehicles (UAVs) communication. By considering the [...] Read more.

To address the issue of limited application scenarios associated with connectivity assurance based on two-dimensional (2D) trajectory planning, this paper proposes an improved deep reinforcement learning (DRL) -based three-dimensional (3D) trajectory planning method for cellular unmanned aerial vehicles (UAVs) communication. By considering the 3D space environment and integrating factors such as UAV mission completion time and connectivity, we develop an objective function for path optimization and utilize the advanced dueling double deep Q network (D3QN) to optimize it. Additionally, we introduce the prioritized experience replay (PER) mechanism to enhance learning efficiency and expedite convergence. In order to further aid in trajectory planning, our method incorporates a simultaneous navigation and radio mapping (SNARM) framework that generates simulated 3D radio maps and simulates flight processes by utilizing measurement signals from the UAV during flight, thereby reducing actual flight costs. The simulation results demonstrate that the proposed approach effectively enable UAVs to avoid weak coverage regions in space, thereby reducing the weighted sum of flight time and expected interruption time. Full article

(This article belongs to the Special Issue Technologies and Applications of UAV Channel Models in Communications and Spectrum Awareness)

► Show Figures

Figure 1

23 pages, 10010 KiB

Open AccessFeature PaperArticle

Task-Offloading Strategy of Mobile Edge Computing for WBANs

by Yuhong Li and Wenzhu Zhang

Electronics 2024, 13(8), 1422; https://doi.org/10.3390/electronics13081422 - 9 Apr 2024

Cited by 2 | Viewed by 1859

Abstract

In recent years, mobile edge computing has become one of the popular methods to provide computing resources for the body area network, but existing research only considers the problem of minimizing the cost of offloading when solving the optimization problem of task-offloading, ignoring [...] Read more.

In recent years, mobile edge computing has become one of the popular methods to provide computing resources for the body area network, but existing research only considers the problem of minimizing the cost of offloading when solving the optimization problem of task-offloading, ignoring the trust problem of edge computing nodes, and offloading tasks on edge nodes may cause user information disclosure and reduce the quality of user experience. In response to this situation, this study aims to minimize the average user cost and designs a task-offloading strategy based on the D3QN (dueling double deep Q-network) algorithm in conjunction with the blockchain information security storage model. This strategy uses deep reinforcement learning algorithms to obtain the minimum average offloading cost of the system while considering user latency, energy consumption, and data protection conditions. The experimental simulation results show that compared to traditional schemes and other reinforcement learning-based schemes, this scheme can more effectively reduce the average cost of the system, and the average cost is reduced by 31.25% when reaching convergence. In addition, as the complexity of the model increases, this scheme can provide users with better experience quality, with 53.7% of the 1000 users having a very good experience quality. Full article

(This article belongs to the Special Issue The Applications of Deep Neural Network in Edge Computing)

► Show Figures

Figure 1

25 pages, 5686 KiB

Open AccessArticle

Carbon Dioxide Emission Reduction-Oriented Optimal Control of Traffic Signals in Mixed Traffic Flow Based on Deep Reinforcement Learning

by Zhaowei Wang, Le Xu and Jianxiao Ma

Sustainability 2023, 15(24), 16564; https://doi.org/10.3390/su152416564 - 5 Dec 2023

Cited by 8 | Viewed by 2855

Abstract

To alleviate intersection traffic congestion and reduce carbon emissions at intersections, research on exploiting reinforcement learning for intersection signal control has become a frontier topic in the field of intelligent transportation. This study utilizes a deep reinforcement learning algorithm based on the D3QN [...] Read more.

To alleviate intersection traffic congestion and reduce carbon emissions at intersections, research on exploiting reinforcement learning for intersection signal control has become a frontier topic in the field of intelligent transportation. This study utilizes a deep reinforcement learning algorithm based on the D3QN (dueling double deep Q network) to achieve adaptive control of signal timings. Under a mixed traffic environment with connected and automated vehicles (CAVs) and human-driven vehicles (HDVs), this study constructs a reward function (Reward—CO₂ Reduction) to minimize vehicle waiting time and carbon dioxide emissions at the intersection. Additionally, to account for the spatiotemporal distribution characteristics of traffic flow, an adaptive-phase action space and a fixed-phase action space are designed to optimize action selections. The proposed algorithm is validated in a SUMO simulation with different traffic volumes and CAV penetration rates. The experimental results are compared with other control strategies like Webster’s method (fixed-time control). The analysis shows that the proposed model can effectively reduce carbon dioxide emissions when the traffic volume is low or medium. As the penetration rate of CAVs increases, the average carbon dioxide emissions and waiting time can be further reduced with the proposed model. The significance of this study lies in its dual achievement: by presenting a flexible strategy that not only reduces the environmental impact by lowering carbon dioxide emissions but also enhances traffic efficiency, it provides a tangible example of the advancement of green intelligent transportation systems. Full article

► Show Figures

Figure 1

17 pages, 2793 KiB

Open AccessArticle

Multi-Objective Flexible Flow Shop Production Scheduling Problem Based on the Double Deep Q-Network Algorithm

by Hua Gong, Wanning Xu, Wenjuan Sun and Ke Xu

Processes 2023, 11(12), 3321; https://doi.org/10.3390/pr11123321 - 29 Nov 2023

Cited by 9 | Viewed by 2611

Abstract

In this paper, motivated by the production process of electronic control modules in the digital electronic detonators industry, we study a multi-objective flexible flow shop scheduling problem. The objective is to find a feasible schedule that minimizes both the makespan and the total [...] Read more.

In this paper, motivated by the production process of electronic control modules in the digital electronic detonators industry, we study a multi-objective flexible flow shop scheduling problem. The objective is to find a feasible schedule that minimizes both the makespan and the total tardiness. Considering the constraints imposed by the jobs and the machines throughout the manufacturing process, a mixed integer programming model is formulated. By transforming the scheduling problem into a Markov decision process, the agent state features and the actions are designed based on the processing status of the machines and the jobs, along with heuristic rules. Furthermore, a reward function based on the optimization objectives is designed. Based on the deep reinforcement learning algorithm, the Dueling Double Deep Q-Network (D3QN) algorithm is designed to solve the scheduling problem by incorporating the target network, the dueling network, and the experience replay buffer. The D3QN algorithm is compared with heuristic rules, the genetic algorithm (GA), and the optimal solutions generated by Gurobi. The ablation experiments are designed. The experimental results demonstrate the high performance of the D3QN algorithm with the target network and the dueling network proposed in this paper. The scheduling model and the algorithm proposed in this paper can provide theoretical support to make the production plan of electronic control modules reasonable and improve production efficiency. Full article

(This article belongs to the Special Issue Production Scheduling and Optimization Control on Advanced Manufacturing (2nd Edition))

► Show Figures

Figure 1

15 pages, 836 KiB

Open AccessArticle

RIS-Assisted Robust Beamforming for UAV Anti-Jamming and Eavesdropping Communications: A Deep Reinforcement Learning Approach

by Chao Zou, Cheng Li, Yong Li and Xiaojuan Yan

Electronics 2023, 12(21), 4490; https://doi.org/10.3390/electronics12214490 - 1 Nov 2023

Cited by 7 | Viewed by 3249

Abstract

The reconfigurable intelligent surface (RIS) has been widely recognized as a rising paradigm for physical layer security due to its potential to substantially adjust the electromagnetic propagation environment. In this regard, this paper adopted the RIS deployed on an unmanned aerial vehicle (UAV) [...] Read more.

The reconfigurable intelligent surface (RIS) has been widely recognized as a rising paradigm for physical layer security due to its potential to substantially adjust the electromagnetic propagation environment. In this regard, this paper adopted the RIS deployed on an unmanned aerial vehicle (UAV) to enhance information transmission while defending against both jamming and eavesdropping attacks. Furthermore, an innovative deep reinforcement learning (DRL) approach is proposed with the purpose of optimizing the power allocation of the base station (BS) and the discrete phase shifts of the RIS. Specifically, considering the imperfect illegitimate node’s channel state information (CSI), we first reformulated the non-convex and non-conventional original problem into a Markov decision process (MDP) framework. Subsequently, a noisy dueling double-deep Q-network with prioritized experience replay (Noisy-D3QN-PER) algorithm was developed with the objective of maximizing the achievable sum rate while ensuring the fulfillment of the security requirements. Finally, the numerical simulations showed that our proposed algorithm outperformed the baselines on the system rate and at transmission protection level. Full article

(This article belongs to the Special Issue Physical Layer Security in Future IoT Networks: Theories, Technologies, and Applications)

► Show Figures

Figure 1

Search Results (27)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (27)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI