MDPI - Publisher of Open Access Journals

26 pages, 2081 KiB

Open AccessArticle

Tariff-Sensitive Global Supply Chains: Semi-Markov Decision Approach with Reinforcement Learning

by Duygu Yilmaz Eroglu

Systems 2025, 13(8), 645; https://doi.org/10.3390/systems13080645 (registering DOI) - 1 Aug 2025

Viewed by 138

Global supply chains often face uncertainties in production lead times, fluctuating exchange rates, and varying tariff regulations, all of which can significantly impact total profit. To address these challenges, this study formulates a multi-country supply chain problem as a Semi-Markov Decision Process (SMDP), [...] Read more.

Global supply chains often face uncertainties in production lead times, fluctuating exchange rates, and varying tariff regulations, all of which can significantly impact total profit. To address these challenges, this study formulates a multi-country supply chain problem as a Semi-Markov Decision Process (SMDP), integrating both currency variability and tariff levels. Using a Q-learning-based method (SMART), we explore three scenarios: (1) wide currency gaps under a uniform tariff, (2) narrowed currency gaps encouraging more local sourcing, and (3) distinct tariff structures that highlight how varying duties can reshape global fulfillment decisions. Beyond these baselines we analyze uncertainty-extended variants and targeted sensitivities (quantity discounts, tariff escalation, and the joint influence of inventory holding costs and tariff costs). Simulation results, accompanied by policy heatmaps and performance metrics, illustrate how small or large shifts in exchange rates and tariffs can alter sourcing strategies, transportation modes, and inventory management. A Deep Q-Network (DQN) is also applied to validate the Q-learning policy, demonstrating alignment with a more advanced neural model for moderate-scale problems. These findings underscore the adaptability of reinforcement learning in guiding practitioners and policymakers, especially under rapidly changing trade environments where exchange rate volatility and incremental tariff changes demand robust, data-driven decision-making. Full article

(This article belongs to the Special Issue Modelling and Simulation of Transportation Systems)

► Show Figures

Figure 1

27 pages, 3211 KiB

Open AccessArticle

Hybrid Deep Learning-Reinforcement Learning for Adaptive Human-Robot Task Allocation in Industry 5.0

by Claudio Urrea

Systems 2025, 13(8), 631; https://doi.org/10.3390/systems13080631 - 26 Jul 2025

Viewed by 468

Abstract

Human-Robot Collaboration (HRC) is pivotal for flexible, worker-centric manufacturing in Industry 5.0, yet dynamic task allocation remains difficult because operator states—fatigue and skill—fluctuate abruptly. I address this gap with a hybrid framework that couples real-time perception and double-estimating reinforcement learning. A Convolutional Neural [...] Read more.

Human-Robot Collaboration (HRC) is pivotal for flexible, worker-centric manufacturing in Industry 5.0, yet dynamic task allocation remains difficult because operator states—fatigue and skill—fluctuate abruptly. I address this gap with a hybrid framework that couples real-time perception and double-estimating reinforcement learning. A Convolutional Neural Network (CNN) classifies nine fatigue–skill combinations from synthetic physiological cues (heart-rate, blink rate, posture, wrist acceleration); its outputs feed a Double Deep Q-Network (DDQN) whose state vector also includes task-queue and robot-status features. The DDQN optimises a multi-objective reward balancing throughput, workload and safety and executes at 10 Hz within a closed-loop pipeline implemented in MATLAB R2025a and RoboDK v5.9. Benchmarking on a 1000-episode HRC dataset (2500 allocations·episode⁻¹) shows the hybrid CNN+DDQN controller raises throughput to 60.48 ± 0.08 tasks·min⁻¹ (+21% vs. rule-based, +12% vs. SARSA, +8% vs. Dueling DQN, +5% vs. PPO), trims operator fatigue by 7% and sustains 99.9% collision-free operation (one-way ANOVA, p < 0.05; post-hoc power 1 − β = 0.87). Visual analyses confirm responsive task reallocation as fatigue rises or skill varies. The approach outperforms strong baselines (PPO, A3C, Dueling DQN) by mitigating Q-value over-estimation through double learning, providing robust policies under stochastic human states and offering a reproducible blueprint for multi-robot, Industry 5.0 factories. Future work will validate the controller on a physical Doosan H2017 cell and incorporate fairness constraints to avoid workload bias across multiple operators. Full article

(This article belongs to the Section Systems Engineering)

► Show Figures

Figure 1

20 pages, 3000 KiB

Open AccessArticle

NRNH-AR: A Small Robotic Agent Using Tri-Fold Learning for Navigation and Obstacle Avoidance

by Carlos Vasquez-Jalpa, Mariko Nakano, Martin Velasco-Villa and Osvaldo Lopez-Garcia

Appl. Sci. 2025, 15(15), 8149; https://doi.org/10.3390/app15158149 - 22 Jul 2025

Viewed by 247

Abstract

We propose a tri-fold learning algorithm, called Neuroevolution of Hybrid Neural Networks in a Robotic Agent (acronym in Spanish, NRNH-AR), based on deep reinforcement learning (DRL), with self-supervised learning (SSL) and unsupervised learning (USL) steps, specifically designed to be implemented in a small [...] Read more.

We propose a tri-fold learning algorithm, called Neuroevolution of Hybrid Neural Networks in a Robotic Agent (acronym in Spanish, NRNH-AR), based on deep reinforcement learning (DRL), with self-supervised learning (SSL) and unsupervised learning (USL) steps, specifically designed to be implemented in a small autonomous navigation robot capable of operating in constrained physical environments. The NRNH-AR algorithm is designed for a small physical robotic agent with limited resources. The proposed algorithm was evaluated in four critical aspects: computational cost, learning stability, required memory size, and operation speed. The results obtained show that the performance of NRNH-AR is within the ranges of the Deep Q Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed Deep Deterministic Policy Gradient (TD3). The proposed algorithm comprises three types of learning algorithms: SSL, USL, and DRL. Thanks to the series of learning algorithms, the proposed algorithm optimizes the use of resources and demonstrates adaptability in dynamic environments, a crucial aspect of navigation robotics. By integrating computer vision techniques based on a Convolutional Neuronal Network (CNN), the algorithm enhances its abilities to understand visual observations of the environment rapidly and detect a specific object, avoiding obstacles. Full article

(This article belongs to the Special Issue Advanced Technologies in Intelligent Software Methodologies, Tools, and Techniques)

► Show Figures

Figure 1

19 pages, 3650 KiB

Open AccessArticle

Enhanced-Dueling Deep Q-Network for Trustworthy Physical Security of Electric Power Substations

by Nawaraj Kumar Mahato, Junfeng Yang, Jiaxuan Yang, Gangjun Gong, Jianhong Hao, Jing Sun and Jinlu Liu

Energies 2025, 18(12), 3194; https://doi.org/10.3390/en18123194 - 18 Jun 2025

Viewed by 369

Abstract

This paper introduces an Enhanced-Dueling Deep Q-Network (EDDQN) specifically designed to bolster the physical security of electric power substations. We model the intricate substation security challenge as a Markov Decision Process (MDP), segmenting the facility into three zones, each with potential normal, suspicious, [...] Read more.

This paper introduces an Enhanced-Dueling Deep Q-Network (EDDQN) specifically designed to bolster the physical security of electric power substations. We model the intricate substation security challenge as a Markov Decision Process (MDP), segmenting the facility into three zones, each with potential normal, suspicious, or attacked states. The EDDQN agent learns to strategically select security actions, aiming for optimal threat prevention while minimizing disruptive errors and false alarms. This methodology integrates Double DQN for stable learning, Prioritized Experience Replay (PER) to accelerate the learning process, and a sophisticated neural network architecture tailored to the complexities of multi-zone substation environments. Empirical evaluation using synthetic data derived from historical incident patterns demonstrates the significant advantages of EDDQN over other standard DQN variations, yielding an average reward of 7.5, a threat prevention success rate of 91.1%, and a notably low false alarm rate of 0.5%. The learned action policy exhibits a proactive security posture, establishing EDDQN as a promising and reliable intelligent solution for enhancing the physical resilience of power substations against evolving threats. This research directly addresses the critical need for adaptable and intelligent security mechanisms within the electric power infrastructure. Full article

(This article belongs to the Special Issue Energy, Electrical and Power Engineering: 3rd Edition)

► Show Figures

Graphical abstract

25 pages, 7158 KiB

Open AccessArticle

Anti-Jamming Decision-Making for Phased-Array Radar Based on Improved Deep Reinforcement Learning

by Hang Zhao, Hu Song, Rong Liu, Jiao Hou and Xianxiang Yu

Electronics 2025, 14(11), 2305; https://doi.org/10.3390/electronics14112305 - 5 Jun 2025

Viewed by 612

Abstract

In existing phased-array radar systems, anti-jamming strategies are mainly generated through manual judgment. However, manually designing or selecting anti-jamming decisions is often difficult and unreliable in complex jamming environments. Therefore, reinforcement learning is applied to anti-jamming decision-making to solve the above problems. However, [...] Read more.

In existing phased-array radar systems, anti-jamming strategies are mainly generated through manual judgment. However, manually designing or selecting anti-jamming decisions is often difficult and unreliable in complex jamming environments. Therefore, reinforcement learning is applied to anti-jamming decision-making to solve the above problems. However, the existing anti-jamming decision-making models based on reinforcement learning often suffer from problems such as low convergence speeds and low decision-making accuracy. In this paper, a multi-aspect improved deep Q-network (MAI-DQN) is proposed to improve the exploration policy, the network structure, and the training methods of the deep Q-network. In order to solve the problem of the

ϵ

-greedy strategy being highly dependent on hyperparameter settings, and the Q-value being overly influenced by the action in other deep Q-networks, this paper proposes a structure that combines a noisy network, a dueling network, and a double deep Q-network, which incorporates an adaptive exploration policy into the neural network and increases the influence of the state itself on the Q-value. These enhancements enable a highly adaptive exploration strategy and a high-performance network architecture, thereby improving the decision-making accuracy of the model. In order to calculate the target value more accurately during the training process and improve the stability of the parameter update, this paper proposes a training method that combines n-step learning, target soft update, variable learning rate, and gradient clipping. Moreover, a novel variable double-depth priority experience replay (VDDPER) method that more accurately simulates the storage and update mechanism of human memory is used in the MAI-DQN. The VDDPER improves the decision-making accuracy by dynamically adjusting the sample size based on different values of experience during training, enhancing exploration during the early stages of training, and placing greater emphasis on high-value experiences in the later stages. Enhancements to the training method improve the model’s convergence speed. Moreover, a reward function combining signal-level and data-level benefits is proposed to adapt to complex jamming environments, which ensures a high reward convergence speed with fewer computational resources. The findings of a simulation experiment show that the proposed phased-array radar anti-jamming decision-making method based on MAI-DQN can achieve a high convergence speed and high decision-making accuracy in environments where deceptive jamming and suppressive jamming coexist. Full article

(This article belongs to the Special Issue Advanced Radar Waveform Design and Intelligent Countermeasures in Integrated Radar and Communication Systems)

► Show Figures

Figure 1

19 pages, 4737 KiB

Open AccessArticle

A Novel Reactive Power Sharing Control Strategy for Shipboard Microgrids Based on Deep Reinforcement Learning

by Wangyang Li, Hong Zhao, Jingwei Zhu and Tiankai Yang

J. Mar. Sci. Eng. 2025, 13(4), 718; https://doi.org/10.3390/jmse13040718 - 3 Apr 2025

Cited by 1 | Viewed by 541

Abstract

Reactive power sharing in distributed generators (DGs) is one of the key issues in the control technologies of greenship microgrids. Reactive power imbalance in ship microgrids can cause instability and potential equipment damage. In order to improve the poor performance of the traditional [...] Read more.

Reactive power sharing in distributed generators (DGs) is one of the key issues in the control technologies of greenship microgrids. Reactive power imbalance in ship microgrids can cause instability and potential equipment damage. In order to improve the poor performance of the traditional adaptive droop control methods used in microgrids under high-load conditions and influenced by virtual impedance parameters, this paper proposes a novel strategy based on the deep reinforcement learning DQN-VI, in which a deep Q network (DQN) is combined with the virtual impedance (VI) method. Unlike traditional methods which may use static or heuristically adjusted VI parameters, the DQN-VI strategy employs deep reinforcement learning to dynamically optimize these parameters, enhancing the microgrid’s performance under varying conditions. The proposed DQN-VI strategy considers the current situation in greenships, wherein microgrids are generally equipped with cables of different lengths and measuring the impedance of each cable is challenging due to the lack of space. By modeling the control process as a Markov decision process, the observation space, action space, and reward function are designed. In addition, a deep neural network is used to estimate the Q function that describes the relationship between the state and the action. During the training of the DQN agent, the process is optimized step-by-step by observing the state and rewards of the system, thereby effectively improving the performance of the microgrids. The comparative simulation experiments verify the effectiveness and superiority of the proposed strategy. Full article

(This article belongs to the Special Issue Optimization and Control of Marine Renewable Energy Systems)

► Show Figures

Figure 1

17 pages, 3949 KiB

Open AccessArticle

A Novel Approach to Autonomous Driving Using Double Deep Q-Network-Bsed Deep Reinforcement Learning

by Ahmed Khlifi, Mohamed Othmani and Monji Kherallah

World Electr. Veh. J. 2025, 16(3), 138; https://doi.org/10.3390/wevj16030138 - 1 Mar 2025

Cited by 1 | Viewed by 2371

Abstract

Deep reinforcement learning (DRL) trains agents to make decisions by learning from rewards and penalties, using trial and error. It combines reinforcement learning (RL) with deep neural networks (DNNs), enabling agents to process large datasets and learn from complex environments. DRL has achieved [...] Read more.

Deep reinforcement learning (DRL) trains agents to make decisions by learning from rewards and penalties, using trial and error. It combines reinforcement learning (RL) with deep neural networks (DNNs), enabling agents to process large datasets and learn from complex environments. DRL has achieved notable success in gaming, robotics, decision-making, etc. However, real-world applications, such as self-driving cars, face challenges due to complex state and action spaces, requiring precise control. Researchers continue to develop new algorithms to improve performance in dynamic settings. A key algorithm, Deep Q-Network (DQN), uses neural networks to approximate the Q-value function but suffers from overestimation bias, leading to suboptimal outcomes. To address this, Double Deep Q-Network (DDQN) was introduced, which decouples action selection from evaluation, thereby reducing bias and promoting more stable learning. This study evaluates the effectiveness of DQN and DDQN in autonomous driving using the CARLA simulator. The key findings emphasize DDQN’s advantages in significantly reducing overestimation bias and enhancing policy performance, making it a more robust and reliable approach for complex real-world applications like self-driving cars. The results underscore DDQN’s potential to improve decision-making accuracy and stability in dynamic environments. Full article

► Show Figures

Figure 1

17 pages, 1774 KiB

Open AccessArticle

Training a Minesweeper Agent Using a Convolutional Neural Network

by Wenbo Wang and Chengyou Lei

Appl. Sci. 2025, 15(5), 2490; https://doi.org/10.3390/app15052490 - 25 Feb 2025

Viewed by 1301

Abstract

The Minesweeper game is modeled as a sequential decision-making task, for which a neural network architecture, state encoding, and reward function were herein designed. Both a Deep Q-Network (DQN) and supervised learning methods were successfully applied to optimize the training of the game. [...] Read more.

The Minesweeper game is modeled as a sequential decision-making task, for which a neural network architecture, state encoding, and reward function were herein designed. Both a Deep Q-Network (DQN) and supervised learning methods were successfully applied to optimize the training of the game. The experiments were conducted on the AutoDL platform using an NVIDIA RTX 3090 GPU for efficient computation. The results showed that in a 6 × 6 grid with four mines, the DQN model achieved an average win rate of 93.3% (standard deviation: 0.77%), while the supervised learning method achieved 91.2% (standard deviation: 0.9%), both outperforming human players and baseline algorithms and demonstrating high intelligence. The mechanisms of the two methods in the Minesweeper task were analyzed, with the reasons for the faster training speed and more stable performance of supervised learning explained from the perspectives of means–ends analysis and feedback control. Although there is room for improvement in sample efficiency and training stability in the DQN model, its greater generalization ability makes it highly promising for application in more complex decision-making tasks. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

37 pages, 1551 KiB

Open AccessArticle

Deep Reinforcement Learning: A Chronological Overview and Methods

by Juan Terven

AI 2025, 6(3), 46; https://doi.org/10.3390/ai6030046 - 24 Feb 2025

Cited by 5 | Viewed by 9307

Abstract

Introduction: Deep reinforcement learning (deep RL) integrates the principles of reinforcement learning with deep neural networks, enabling agents to excel in diverse tasks ranging from playing board games such as Go and Chess to controlling robotic systems and autonomous vehicles. By leveraging foundational [...] Read more.

Introduction: Deep reinforcement learning (deep RL) integrates the principles of reinforcement learning with deep neural networks, enabling agents to excel in diverse tasks ranging from playing board games such as Go and Chess to controlling robotic systems and autonomous vehicles. By leveraging foundational concepts of value functions, policy optimization, and temporal difference methods, deep RL has rapidly evolved and found applications in areas such as gaming, robotics, finance, and healthcare. Objective: This paper seeks to provide a comprehensive yet accessible overview of the evolution of deep RL and its leading algorithms. It aims to serve both as an introduction for newcomers to the field and as a practical guide for those seeking to select the most appropriate methods for specific problem domains. Methods: We begin by outlining fundamental reinforcement learning principles, followed by an exploration of early tabular Q-learning methods. We then trace the historical development of deep RL, highlighting key milestones such as the advent of deep Q-networks (DQN). The survey extends to policy gradient methods, actor–critic architectures, and state-of-the-art algorithms such as proximal policy optimization, soft actor–critic, and emerging model-based approaches. Throughout, we discuss the current challenges facing deep RL, including issues of sample efficiency, interpretability, and safety, as well as open research questions involving large-scale training, hierarchical architectures, and multi-task learning. Results: Our analysis demonstrates how critical breakthroughs have driven deep RL into increasingly complex application domains. We highlight existing limitations and ongoing bottlenecks, such as high data requirements and the need for more transparent, ethically aligned systems. Finally, we survey potential future directions, highlighting the importance of reliability and ethical considerations for real-world deployments. Full article

► Show Figures

Figure 1

31 pages, 3605 KiB

Open AccessArticle

Intelligent IoT-Based Network Clustering and Camera Distribution Algorithm Using Reinforcement Learning

by Islam T. Almalkawi, Rami Halloush, Mohammad F. Al-Hammouri, Alaa Alghazo, Loiy Al-Abed, Mohammad Amra, Ayooub Alsarhan and Sami Aziz Alshammari

Technologies 2025, 13(1), 4; https://doi.org/10.3390/technologies13010004 - 24 Dec 2024

Viewed by 1891

Abstract

The advent of a wide variety of affordable communication devices and cameras has enabled IoT systems to provide effective solutions for a wide range of civil and military applications. One of the potential applications is a surveillance system in which several cameras collaborate [...] Read more.

The advent of a wide variety of affordable communication devices and cameras has enabled IoT systems to provide effective solutions for a wide range of civil and military applications. One of the potential applications is a surveillance system in which several cameras collaborate to monitor a specific area. However, existing surveillance systems are often based on traditional camera distribution and come with additional communication costs and redundancy in the detection range. Thus, we propose a smart and efficient camera distribution system based on machine learning using two Reinforcement Learning (RL) methods: Q-Learning and neural networks. Our proposed approach initially uses a geometric distributed network clustering algorithm that optimizes camera placement based on the camera Field of View (FoV). Then, to improve the camera distribution system, we integrate it with an RL technique, the role of which is to dynamically adjust the previous/existing setup to maximize target coverage while minimizing the number of cameras. The reinforcement agent modifies system parameters—such as the overlap distance between adjacent cameras, the camera FoV, and the number of deployed cameras—based on changing traffic distribution and conditions in the surveilled area. Simulation results confirm that the proposed camera distribution algorithm outperforms the existing methods when comparing the required number of cameras, network coverage percentage, and traffic coverage. Full article

► Show Figures

Figure 1

25 pages, 5732 KiB

Open AccessArticle

Analyzing the Impact of Binaural Beats on Anxiety Levels by a New Method Based on Denoised Harmonic Subtraction and Transient Temporal Feature Extraction

by Devika Rankhambe, Bharati Sanjay Ainapure, Bhargav Appasani, Avireni Srinivasulu and Nicu Bizon

Bioengineering 2024, 11(12), 1251; https://doi.org/10.3390/bioengineering11121251 - 10 Dec 2024

Viewed by 1866

Abstract

Anxiety is a widespread mental health issue, and binaural beats have been explored as a potential non-invasive treatment. EEG data reveal changes in neural oscillation and connectivity linked to anxiety reduction; however, harmonics introduced during signal acquisition and processing often distort these findings. [...] Read more.

Anxiety is a widespread mental health issue, and binaural beats have been explored as a potential non-invasive treatment. EEG data reveal changes in neural oscillation and connectivity linked to anxiety reduction; however, harmonics introduced during signal acquisition and processing often distort these findings. Existing methods struggle to effectively reduce harmonics and capture the fine-grained temporal dynamics of EEG signals, leading to inaccurate feature extraction. Hence, a novel Denoised Harmonic Subtraction and Transient Temporal Feature Extraction is proposed to improve the analysis of the impact of binaural beats on anxiety levels. Initially, a novel Wiener Fused Convo Filter is introduced to capture spatial features and eliminate linear noise in EEG signals. Next, an Intrinsic Harmonic Subtraction Network is employed, utilizing the Attentive Weighted Least Mean Square (AW-LMS) algorithm to capture nonlinear summation and resonant coupling effects, effectively eliminating the misinterpretation of brain rhythms. To address the challenge of fine-grained temporal dynamics, an Embedded Transfo XL Recurrent Network is introduced to detect and extract relevant parameters associated with transient events in EEG data. Finally, EEG data undergo harmonic reduction and temporal feature extraction before classification with a cross-correlated Markov Deep Q-Network (DQN). This facilitates anxiety level classification into normal, mild, moderate, and severe categories. The model demonstrated a high accuracy of 95.6%, precision of 90%, sensitivity of 93.2%, and specificity of 96% in classifying anxiety levels, outperforming previous models. This integrated approach enhances EEG signal processing, enabling reliable anxiety classification and offering valuable insights for therapeutic interventions. Full article

(This article belongs to the Special Issue Adaptive Neurostimulation: Innovative Strategies for Stimulation)

► Show Figures

Figure 1

13 pages, 1769 KiB

Open AccessArticle

Collaborative Beamforming with DQN for Interference Mitigation in 5G and Beyond Networks

by Alaelddin F. Y. Mohammed, Salman Md Sultan and Sakshi Patni

Telecom 2024, 5(4), 1192-1204; https://doi.org/10.3390/telecom5040060 - 3 Dec 2024

Viewed by 1957

Abstract

This paper addresses the problem of side lobe interference in 5G networks by proposing a unique collaborative beamforming strategy based on Deep Q-Network (DQN) reinforcement learning. Our method, which operates in the sub-6 GHz band, maximizes beam steering and power management by using [...] Read more.

This paper addresses the problem of side lobe interference in 5G networks by proposing a unique collaborative beamforming strategy based on Deep Q-Network (DQN) reinforcement learning. Our method, which operates in the sub-6 GHz band, maximizes beam steering and power management by using a two-antenna system with DQN-controlled phase shifters. We provide an OFDM cellular network environment where inter-cell interference is managed while many base stations serve randomly dispersed customers. In order to reduce interference strength and improve signal-to-interference-plus-noise ratio (SINR), the DQN agent learns to modify the interference angle. Our model integrates experience replay memory with a long short-term memory (LSTM) recurrent neural network for time series prediction to enhance learning stability. The outcomes of our simulations show that our suggested DQN approach works noticeably better than current DQN and Q-learning methods. In particular, our technique reaches a maximum of 29.18 dB and a minimum of 5.15 dB, whereas the other approaches only manage 0.77–27.04 dB. Additionally, we significantly decreased the average interference level to 5.42 dB compared to competing approaches of 38.84 dB and 34.12 dB. The average sum-rate capacity is also increased to 3.90 by the suggested strategy, outperforming previous approaches. These findings demonstrate how well our cooperative beamforming method reduces interference and improves overall network performance in 5G systems. Full article

► Show Figures

Figure 1

24 pages, 6852 KiB

Open AccessArticle

Automatic Landing Control for Fixed-Wing UAV in Longitudinal Channel Based on Deep Reinforcement Learning

by Jinghang Li, Shuting Xu, Yu Wu and Zhe Zhang

Drones 2024, 8(10), 568; https://doi.org/10.3390/drones8100568 - 10 Oct 2024

Cited by 4 | Viewed by 2652

Abstract

The objective is to address the control problem associated with the landing process of unmanned aerial vehicles (UAVs), with a particular focus on fixed-wing UAVs. The Proportional–Integral–Derivative (PID) controller is a widely used control method, which requires the tuning of its parameters to [...] Read more.

The objective is to address the control problem associated with the landing process of unmanned aerial vehicles (UAVs), with a particular focus on fixed-wing UAVs. The Proportional–Integral–Derivative (PID) controller is a widely used control method, which requires the tuning of its parameters to account for the specific characteristics of the landing environment and the potential for external disturbances. In contrast, neural networks can be modeled to operate under given inputs, allowing for a more precise control strategy. In light of these considerations, a control system based on reinforcement learning is put forth, which is integrated with the conventional PID guidance law to facilitate the autonomous landing of fixed-wing UAVs and the automated tuning of PID parameters through the use of a Deep Q-learning Network (DQN). A traditional PID control system is constructed based on a fixed-wing UAV dynamics model, with the flight state being discretized. The landing problem is transformed into a Markov Decision Process (MDP), and the reward function is designed in accordance with the landing conditions and the UAV’s attitude, respectively. The state vectors are fed into the neural network framework, and the optimized PID parameters are output by the reinforcement learning algorithm. The optimal policy is obtained through the training of the network, which enables the automatic adjustment of parameters and the optimization of the traditional PID control system. Furthermore, the efficacy of the control algorithms in actual scenarios is validated through the simulation of UAV state vector perturbations and ideal gliding curves. The results demonstrate that the controller modified by the DQN network exhibits a markedly superior convergence effect and maneuverability compared to the unmodified traditional controller. Full article

(This article belongs to the Special Issue The Conceptual Design Methodology for UAV: New Research and New Development)

► Show Figures

Figure 1

14 pages, 2905 KiB

Open AccessArticle

An Adjustment Strategy for Tilted Moiré Fringes via Deep Q-Network

by Chuan Jin, Dajie Yu, Haifeng Sun, Junbo Liu, Ji Zhou and Jian Wang

Photonics 2024, 11(7), 666; https://doi.org/10.3390/photonics11070666 - 17 Jul 2024

Cited by 2 | Viewed by 1462

Abstract

Overlay accuracy, one of the three fundamental indicators of lithography, is directly influenced by alignment precision. During the alignment process based on the Moiré fringe method, a slight angular misalignment between the mask and wafer will cause the Moiré fringes to tilt, thereby [...] Read more.

Overlay accuracy, one of the three fundamental indicators of lithography, is directly influenced by alignment precision. During the alignment process based on the Moiré fringe method, a slight angular misalignment between the mask and wafer will cause the Moiré fringes to tilt, thereby affecting the alignment accuracy. This paper proposes a leveling strategy based on the DQN (Deep Q-Network) algorithm. This strategy involves using four consecutive frames of wafer tilt images as the input values for a convolutional neural network (CNN), which serves as the environment model. The environment model is divided into two groups: the horizontal plane tilt environment model and the vertical plane tilt environment model. After convolution through the CNN and training with the pooling operation, the Q-value consisting of n discrete actions is output. In the DQN algorithm, the main contributions of this paper lie in three points: the adaptive application of environmental model input, parameter optimization of the loss function, and the possibility of application in the actual environment to provide some ideas. The environment model input interface can be applied to different tilt models and more complex scenes. The optimization of the loss function can match the leveling of different tilt models. Considering the application of this strategy in actual scenarios, motion calibration and detection between the mask and the wafer provide some ideas. To verify the reliability of the algorithm, simulations were conducted to generate tilted Moiré fringes resulting from tilt angles of the wafer plate, and the phase of the tilted Moiré fringes was subsequently calculated. The angle of the wafer was automatically adjusted using the DQN algorithm, and then various angles were measured. Repeated measurements were also conducted at the same angle. The angle deviation accuracy of the horizontal plane tilt environment model reached 0.0011 degrees, and the accuracy of repeated measurements reached 0.00025 degrees. The angle deviation accuracy of the vertical plane tilt environment model reached 0.0043 degrees, and repeated measurements achieved a precision of 0.00027 degrees. Moreover, in practical applications, it also provides corresponding ideas to ensure the determination of the relative position between the mask and wafer and the detection of movement, offering the potential for its application in the industry. Full article

(This article belongs to the Section Optoelectronics and Optical Materials)

► Show Figures

Figure 1

16 pages, 4157 KiB

Open AccessArticle

Enhancing Autonomous Driving Navigation Using Soft Actor-Critic

by Badr Ben Elallid, Nabil Benamar, Miloud Bagaa and Yassine Hadjadj-Aoul

Future Internet 2024, 16(7), 238; https://doi.org/10.3390/fi16070238 - 4 Jul 2024

Cited by 5 | Viewed by 2501

Abstract

Autonomous vehicles have gained extensive attention in recent years, both in academia and industry. For these self-driving vehicles, decision-making in urban environments poses significant challenges due to the unpredictable behavior of traffic participants and intricate road layouts. While existing decision-making approaches based on [...] Read more.

Autonomous vehicles have gained extensive attention in recent years, both in academia and industry. For these self-driving vehicles, decision-making in urban environments poses significant challenges due to the unpredictable behavior of traffic participants and intricate road layouts. While existing decision-making approaches based on Deep Reinforcement Learning (DRL) show potential for tackling urban driving situations, they suffer from slow convergence, especially in complex scenarios with high mobility. In this paper, we present a new approach based on the Soft Actor-Critic (SAC) algorithm to control the autonomous vehicle to enter roundabouts smoothly and safely and ensure it reaches its destination without delay. For this, we introduce a destination vector concatenated with extracted features using Convolutional Neural Networks (CNN). To evaluate the performance of our model, we conducted extensive experiments in the CARLA simulator and compared it with the Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) models. Qualitative results reveal that our model converges rapidly and achieves a high success rate in scenarios with high traffic compared to the DQN and PPO models. Full article

► Show Figures

Figure 1

Search Results (58)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (58)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI