Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (435)

Search Parameters:
Keywords = deep reinforcement learning (DRL) method

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 12556 KiB  
Article
Energy Management for Microgrids with Hybrid Hydrogen-Battery Storage: A Reinforcement Learning Framework Integrated Multi-Objective Dynamic Regulation
by Yi Zheng, Jinhua Jia and Dou An
Processes 2025, 13(8), 2558; https://doi.org/10.3390/pr13082558 - 13 Aug 2025
Abstract
The integration of renewable energy resources (RES) into microgrids (MGs) poses significant challenges due to the intermittent nature of generation and the increasing complexity of multi-energy scheduling. To enhance operational flexibility and reliability, this paper proposes an intelligent energy management system (EMS) for [...] Read more.
The integration of renewable energy resources (RES) into microgrids (MGs) poses significant challenges due to the intermittent nature of generation and the increasing complexity of multi-energy scheduling. To enhance operational flexibility and reliability, this paper proposes an intelligent energy management system (EMS) for MGs incorporating a hybrid hydrogen-battery energy storage system (HHB-ESS). The system model jointly considers the complementary characteristics of short-term and long-term storage technologies. Three conflicting objectives are defined: economic cost (EC), system response stability, and battery life loss (BLO). To address the challenges of multi-objective trade-offs and heterogeneous storage coordination, a novel deep-reinforcement-learning (DRL) algorithm, termed MOATD3, is developed based on a dynamic reward adjustment mechanism (DRAM). Simulation results under various operational scenarios demonstrate that the proposed method significantly outperforms baseline methods, achieving a maximum improvement of 31.4% in SRS and a reduction of 46.7% in BLO. Full article
Show Figures

Figure 1

35 pages, 2799 KiB  
Article
GAPO: A Graph Attention-Based Reinforcement Learning Algorithm for Congestion-Aware Task Offloading in Multi-Hop Vehicular Edge Computing
by Hongwei Zhao, Xuyan Li, Chengrui Li and Lu Yao
Sensors 2025, 25(15), 4838; https://doi.org/10.3390/s25154838 - 6 Aug 2025
Viewed by 323
Abstract
Efficient task offloading for delay-sensitive applications, such as autonomous driving, presents a significant challenge in multi-hop Vehicular Edge Computing (VEC) networks, primarily due to high vehicle mobility, dynamic network topologies, and complex end-to-end congestion problems. To address these issues, this paper proposes a [...] Read more.
Efficient task offloading for delay-sensitive applications, such as autonomous driving, presents a significant challenge in multi-hop Vehicular Edge Computing (VEC) networks, primarily due to high vehicle mobility, dynamic network topologies, and complex end-to-end congestion problems. To address these issues, this paper proposes a graph attention-based reinforcement learning algorithm, named GAPO. The algorithm models the dynamic VEC network as an attributed graph and utilizes a graph neural network (GNN) to learn a network state representation that captures the global topological structure and node contextual information. Building on this foundation, an attention-based Actor–Critic framework makes joint offloading decisions by intelligently selecting the optimal destination and collaboratively determining the ratios for offloading and resource allocation. A multi-objective reward function, designed to minimize task latency and to alleviate link congestion, guides the entire learning process. Comprehensive simulation experiments and ablation studies show that, compared to traditional heuristic algorithms and standard deep reinforcement learning methods, GAPO significantly reduces average task completion latency and substantially decreases backbone link congestion. In conclusion, by deeply integrating the state-aware capabilities of GNNs with the decision-making abilities of DRL, GAPO provides an efficient, adaptive, and congestion-aware solution to the resource management problems in dynamic VEC environments. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

19 pages, 3116 KiB  
Article
Few-Shot Intelligent Anti-Jamming Access with Fast Convergence: A GAN-Enhanced Deep Reinforcement Learning Approach
by Tianxiao Wang, Yingtao Niu and Zhanyang Zhou
Appl. Sci. 2025, 15(15), 8654; https://doi.org/10.3390/app15158654 - 5 Aug 2025
Viewed by 231
Abstract
To address the small-sample training bottleneck and inadequate convergence efficiency of Deep Reinforcement Learning (DRL)-based communication anti-jamming methods in complex electromagnetic environments, this paper proposes a Generative Adversarial Network-enhanced Deep Q-Network (GA-DQN) anti-jamming method. The method constructs a Generative Adversarial Network (GAN) to [...] Read more.
To address the small-sample training bottleneck and inadequate convergence efficiency of Deep Reinforcement Learning (DRL)-based communication anti-jamming methods in complex electromagnetic environments, this paper proposes a Generative Adversarial Network-enhanced Deep Q-Network (GA-DQN) anti-jamming method. The method constructs a Generative Adversarial Network (GAN) to learn the time–frequency distribution characteristics of short-period jamming and to generate high-fidelity mixed samples. Furthermore, it screens qualified samples using the Pearson correlation coefficient to form a sample set, which is input into the DQN network model for pre-training to expand the experience replay buffer, effectively improving the convergence speed and decision accuracy of DQN. Our simulation results show that under periodic jamming, compared with the DQN algorithm, this algorithm significantly reduces the number of interference occurrences in the early communication stage and improves the convergence speed, to a certain extent. Under dynamic jamming and intelligent jamming, the algorithm significantly outperforms the DQN, Proximal Policy Optimization (PPO), and Q-learning (QL) algorithms. Full article
Show Figures

Figure 1

22 pages, 2030 KiB  
Article
A Deep Reinforcement Learning Framework for Cascade Reservoir Operations Under Runoff Uncertainty
by Jing Xu, Jiabin Qiao, Qianli Sun and Keyan Shen
Water 2025, 17(15), 2324; https://doi.org/10.3390/w17152324 - 5 Aug 2025
Viewed by 328
Abstract
Effective management of cascade reservoir systems is essential for balancing hydropower generation, flood control, and ecological sustainability, especially under increasingly uncertain runoff conditions driven by climate change. Traditional optimization methods, while widely used, often struggle with high dimensionality and fail to adequately address [...] Read more.
Effective management of cascade reservoir systems is essential for balancing hydropower generation, flood control, and ecological sustainability, especially under increasingly uncertain runoff conditions driven by climate change. Traditional optimization methods, while widely used, often struggle with high dimensionality and fail to adequately address inflow variability. This study introduces a novel deep reinforcement learning (DRL) framework that tightly couples probabilistic runoff forecasting with adaptive reservoir scheduling. We integrate a Long Short-Term Memory (LSTM) neural network to model runoff uncertainty and generate probabilistic inflow forecasts, which are then embedded into a Proximal Policy Optimization (PPO) algorithm via Monte Carlo sampling. This unified forecast–optimize architecture allows for dynamic policy adjustment in response to stochastic hydrological conditions. A case study on China’s Xiluodu–Xiangjiaba cascade system demonstrates that the proposed LSTM-PPO framework achieves superior performance compared to traditional baselines, notably improving power output, storage utilization, and spillage reduction. The results highlight the method’s robustness and scalability, suggesting strong potential for supporting resilient water–energy nexus management under complex environmental uncertainty. Full article
(This article belongs to the Section Hydrology)
Show Figures

Figure 1

17 pages, 1152 KiB  
Article
PortRSMs: Learning Regime Shifts for Portfolio Policy
by Bingde Liu and Ryutaro Ichise
J. Risk Financial Manag. 2025, 18(8), 434; https://doi.org/10.3390/jrfm18080434 - 5 Aug 2025
Viewed by 400
Abstract
This study proposes a novel Deep Reinforcement Learning (DRL) policy network structure for portfolio management called PortRSMs. PortRSMs employs stacked State-Space Models (SSMs) for the modeling of multi-scale continuous regime shifts in financial time series, striking a balance between exploring consistent distribution properties [...] Read more.
This study proposes a novel Deep Reinforcement Learning (DRL) policy network structure for portfolio management called PortRSMs. PortRSMs employs stacked State-Space Models (SSMs) for the modeling of multi-scale continuous regime shifts in financial time series, striking a balance between exploring consistent distribution properties over short periods and maintaining sensitivity to sudden shocks in price sequences. PortRSMs also performs cross-asset regime fusion through hypergraph attention mechanisms, providing a more comprehensive state space for describing changes in asset correlations and co-integration. Experiments conducted on two different trading frequencies in the stock markets of the United States and Hong Kong show the superiority of PortRSMs compared to other approaches in terms of profitability, risk–return balancing, robustness, and the ability to handle sudden market shocks. Specifically, PortRSMs achieves up to a 0.03 improvement in the annual Sharpe ratio in the U.S. market, and up to a 0.12 improvement for the Hong Kong market compared to baseline methods. Full article
(This article belongs to the Special Issue Machine Learning Applications in Finance, 2nd Edition)
Show Figures

Figure 1

16 pages, 3099 KiB  
Article
Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control with Spatio-Temporal Attention Mechanism
by Wenzhe Jia and Mingyu Ji
Appl. Sci. 2025, 15(15), 8605; https://doi.org/10.3390/app15158605 - 3 Aug 2025
Viewed by 492
Abstract
Traffic congestion in large-scale road networks significantly impacts urban sustainability. Traditional traffic signal control methods lack adaptability to dynamic traffic conditions. Recently, deep reinforcement learning (DRL) has emerged as a promising solution for optimizing signal control. This study proposes a Multi-Agent Deep Reinforcement [...] Read more.
Traffic congestion in large-scale road networks significantly impacts urban sustainability. Traditional traffic signal control methods lack adaptability to dynamic traffic conditions. Recently, deep reinforcement learning (DRL) has emerged as a promising solution for optimizing signal control. This study proposes a Multi-Agent Deep Reinforcement Learning (MADRL) framework for large-scale traffic signal control. The framework employs spatio-temporal attention networks to extract relevant traffic patterns and a hierarchical reinforcement learning strategy for coordinated multi-agent optimization. The problem is formulated as a Markov Decision Process (MDP) with a novel reward function that balances vehicle waiting time, throughput, and fairness. We validate our approach on simulated large-scale traffic scenarios using SUMO (Simulation of Urban Mobility). Experimental results demonstrate that our framework reduces vehicle waiting time by 25% compared to baseline methods while maintaining scalability across different road network sizes. The proposed spatio-temporal multi-agent reinforcement learning framework effectively optimizes large-scale traffic signal control, providing a scalable and efficient solution for smart urban transportation. Full article
Show Figures

Figure 1

24 pages, 3172 KiB  
Article
A DDPG-LSTM Framework for Optimizing UAV-Enabled Integrated Sensing and Communication
by Xuan-Toan Dang, Joon-Soo Eom, Binh-Minh Vu and Oh-Soon Shin
Drones 2025, 9(8), 548; https://doi.org/10.3390/drones9080548 - 1 Aug 2025
Viewed by 417
Abstract
This paper proposes a novel dual-functional radar-communication (DFRC) framework that integrates unmanned aerial vehicle (UAV) communications into an integrated sensing and communication (ISAC) system, termed the ISAC-UAV architecture. In this system, the UAV’s mobility is leveraged to simultaneously serve multiple single-antenna uplink users [...] Read more.
This paper proposes a novel dual-functional radar-communication (DFRC) framework that integrates unmanned aerial vehicle (UAV) communications into an integrated sensing and communication (ISAC) system, termed the ISAC-UAV architecture. In this system, the UAV’s mobility is leveraged to simultaneously serve multiple single-antenna uplink users (UEs) and perform radar-based sensing tasks. A key challenge stems from the target position uncertainty due to movement, which impairs matched filtering and beamforming, thereby degrading both uplink reception and sensing performance. Moreover, UAV energy consumption associated with mobility must be considered to ensure energy-efficient operation. We aim to jointly maximize radar sensing accuracy and minimize UAV movement energy over multiple time steps, while maintaining reliable uplink communications. To address this multi-objective optimization, we propose a deep reinforcement learning (DRL) framework based on a long short-term memory (LSTM)-enhanced deep deterministic policy gradient (DDPG) network. By leveraging historical target trajectory data, the model improves prediction of target positions, enhancing sensing accuracy. The proposed DRL-based approach enables joint optimization of UAV trajectory and uplink power control over time. Extensive simulations validate that our method significantly improves communication quality and sensing performance, while ensuring energy-efficient UAV operation. Comparative results further confirm the model’s adaptability and robustness in dynamic environments, outperforming existing UAV trajectory planning and resource allocation benchmarks. Full article
Show Figures

Figure 1

18 pages, 3506 KiB  
Review
A Review of Spatial Positioning Methods Applied to Magnetic Climbing Robots
by Haolei Ru, Meiping Sheng, Jiahui Qi, Zhanghao Li, Lei Cheng, Jiahao Zhang, Jiangjian Xiao, Fei Gao, Baolei Wang and Qingwei Jia
Electronics 2025, 14(15), 3069; https://doi.org/10.3390/electronics14153069 - 31 Jul 2025
Viewed by 252
Abstract
Magnetic climbing robots hold significant value for operations in complex industrial environments, particularly for the inspection and maintenance of large-scale metal structures. High-precision spatial positioning is the foundation for enabling autonomous and intelligent operations in such environments. However, the existing literature lacks a [...] Read more.
Magnetic climbing robots hold significant value for operations in complex industrial environments, particularly for the inspection and maintenance of large-scale metal structures. High-precision spatial positioning is the foundation for enabling autonomous and intelligent operations in such environments. However, the existing literature lacks a systematic and comprehensive review of spatial positioning techniques tailored to magnetic climbing robots. This paper addresses this gap by categorizing and evaluating current spatial positioning approaches. Initially, single-sensor-based methods are analyzed with a focus on external sensor approaches. Then, multi-sensor fusion methods are explored to overcome the shortcomings of single-sensor-based approaches. Multi-sensor fusion methods include simultaneous localization and mapping (SLAM), integrated positioning systems, and multi-robot cooperative positioning. To address non-uniform noise and environmental interference, both analytical and learning-based reinforcement approaches are reviewed. Common analytical methods include Kalman-type filtering, particle filtering, and correlation filtering, while typical learning-based approaches involve deep reinforcement learning (DRL) and neural networks (NNs). Finally, challenges and future development trends are discussed. Multi-sensor fusion and lightweight design are the future trends in the advancement of spatial positioning technologies for magnetic climbing robots. Full article
(This article belongs to the Special Issue Advancements in Robotics: Perception, Manipulation, and Interaction)
Show Figures

Figure 1

24 pages, 1147 KiB  
Article
A Channel-Aware AUV-Aided Data Collection Scheme Based on Deep Reinforcement Learning
by Lizheng Wei, Minghui Sun, Zheng Peng, Jingqian Guo, Jiankuo Cui, Bo Qin and Jun-Hong Cui
J. Mar. Sci. Eng. 2025, 13(8), 1460; https://doi.org/10.3390/jmse13081460 - 30 Jul 2025
Viewed by 178
Abstract
Underwater sensor networks (UWSNs) play a crucial role in subsea operations like marine exploration and environmental monitoring. A major challenge for UWSNs is achieving effective and energy-efficient data collection, particularly in deep-sea mining, where energy limitations and long-term deployment are key concerns. This [...] Read more.
Underwater sensor networks (UWSNs) play a crucial role in subsea operations like marine exploration and environmental monitoring. A major challenge for UWSNs is achieving effective and energy-efficient data collection, particularly in deep-sea mining, where energy limitations and long-term deployment are key concerns. This study introduces a Channel-Aware AUV-Aided Data Collection Scheme (CADC) that utilizes deep reinforcement learning (DRL) to improve data collection efficiency. It features an innovative underwater node traversal algorithm that accounts for unique underwater signal propagation characteristics, along with a DRL-based path planning approach to mitigate propagation losses and enhance data energy efficiency. CADC achieves a 71.2% increase in energy efficiency compared to existing clustering methods and shows a 0.08% improvement over the Deep Deterministic Policy Gradient (DDPG), with a 2.3% faster convergence than the Twin Delayed DDPG (TD3), and reduces energy cost to only 22.2% of that required by the TSP-based baseline. By combining a channel-aware traversal with adaptive DRL navigation, CADC effectively optimizes data collection and energy consumption in underwater environments. Full article
Show Figures

Figure 1

17 pages, 2007 KiB  
Article
Optimizing Pretrained Autonomous Driving Models Using Deep Reinforcement Learning
by Vasileios Kochliaridis and Ioannis Vlahavas
Appl. Sci. 2025, 15(15), 8411; https://doi.org/10.3390/app15158411 - 29 Jul 2025
Viewed by 209
Abstract
Vision-based end-to-end navigation systems have shown impressive capabilities, especially when combined with Imitation Learning (IL) and advanced Deep Learning architectures, such as Transformers. One such example is CIL++, a Transformer-based architecture that learns to map navigation states to vehicle controls based on expert [...] Read more.
Vision-based end-to-end navigation systems have shown impressive capabilities, especially when combined with Imitation Learning (IL) and advanced Deep Learning architectures, such as Transformers. One such example is CIL++, a Transformer-based architecture that learns to map navigation states to vehicle controls based on expert demonstrations only. Nevertheless, reliance on experts’ datasets limits generalization and can lead to failures in unknown circumstances. Deep Reinforcement Learning (DRL) can address this issue by fine-tuning the pretrained policy, using a reward function that aims to improve its weaknesses through interaction with the environment. However, fine-tuning with DRL can lead to the Catastrophic Forgetting (CF) problem, where a policy forgets the expert behaviors learned from the demonstrations as it learns to optimize the new reward function. In this paper, we present CILRLv3, a DRL-based training method that is immune to CF, enabling pretrained navigation agents to improve their driving skills across new scenarios. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

52 pages, 3733 KiB  
Article
A Hybrid Deep Reinforcement Learning and Metaheuristic Framework for Heritage Tourism Route Optimization in Warin Chamrap’s Old Town
by Rapeepan Pitakaso, Thanatkij Srichok, Surajet Khonjun, Natthapong Nanthasamroeng, Arunrat Sawettham, Paweena Khampukka, Sairoong Dinkoksung, Kanya Jungvimut, Ganokgarn Jirasirilerd, Chawapot Supasarn, Pornpimol Mongkhonngam and Yong Boonarree
Heritage 2025, 8(8), 301; https://doi.org/10.3390/heritage8080301 - 28 Jul 2025
Viewed by 964
Abstract
Designing optimal heritage tourism routes in secondary cities involves complex trade-offs between cultural richness, travel time, carbon emissions, spatial coherence, and group satisfaction. This study addresses the Personalized Group Trip Design Problem (PGTDP) under real-world constraints by proposing DRL–IMVO–GAN—a hybrid multi-objective optimization framework [...] Read more.
Designing optimal heritage tourism routes in secondary cities involves complex trade-offs between cultural richness, travel time, carbon emissions, spatial coherence, and group satisfaction. This study addresses the Personalized Group Trip Design Problem (PGTDP) under real-world constraints by proposing DRL–IMVO–GAN—a hybrid multi-objective optimization framework that integrates Deep Reinforcement Learning (DRL) for policy-guided initialization, an Improved Multiverse Optimizer (IMVO) for global search, and a Generative Adversarial Network (GAN) for local refinement and solution diversity. The model operates within a digital twin of Warin Chamrap’s old town, leveraging 92 POIs, congestion heatmaps, and behaviorally clustered tourist profiles. The proposed method was benchmarked against seven state-of-the-art techniques, including PSO + DRL, Genetic Algorithm with Multi-Neighborhood Search (Genetic + MNS), Dual-ACO, ALNS-ASP, and others. Results demonstrate that DRL–IMVO–GAN consistently dominates across key metrics. Under equal-objective weighting, it attained the highest heritage score (74.2), shortest travel time (21.3 min), and top satisfaction score (17.5 out of 18), along with the highest hypervolume (0.85) and Pareto Coverage Ratio (0.95). Beyond performance, the framework exhibits strong generalization in zero- and few-shot scenarios, adapting to unseen POIs, modified constraints, and new user profiles without retraining. These findings underscore the method’s robustness, behavioral coherence, and interpretability—positioning it as a scalable, intelligent decision-support tool for sustainable and user-centered cultural tourism planning in secondary cities. Full article
(This article belongs to the Special Issue AI and the Future of Cultural Heritage)
Show Figures

Figure 1

18 pages, 1127 KiB  
Article
Deep Reinforcement Learning Method for Wireless Video Transmission Based on Large Deviations
by Yongxiao Xie and Shian Song
Mathematics 2025, 13(15), 2434; https://doi.org/10.3390/math13152434 - 28 Jul 2025
Viewed by 181
Abstract
In scalable video transmission research, the video transmission process is commonly modeled as a Markov decision process, where deep reinforcement learning (DRL) methods are employed to optimize the wireless transmission of scalable videos. Furthermore, the adaptive DRL algorithm can address the energy shortage [...] Read more.
In scalable video transmission research, the video transmission process is commonly modeled as a Markov decision process, where deep reinforcement learning (DRL) methods are employed to optimize the wireless transmission of scalable videos. Furthermore, the adaptive DRL algorithm can address the energy shortage problem caused by the uncertainty of energy capture and accumulated storage, thereby reducing video interruptions and enhancing user experience. To further optimize resources in wireless energy transmission and tackle the challenge of balancing exploration and exploitation in the DRL algorithm, this paper develops an adaptive DRL algorithm that extends classical DRL frameworks by integrating dropout techniques during both the training and prediction processes. Moreover, to address the issue of continuous negative rewards, which are often attributed to incomplete training in the wireless video transmission DRL algorithm, this paper introduces the Cramér large deviation principle for specific discrimination. It identifies the optimal negative reward frequency boundary and minimizes the probability of misjudgment regarding continuous negative rewards. Finally, experimental validation is performed using the 2048-game environment that simulates wireless scalable video transmission conditions. The results demonstrate that the adaptive DRL algorithm described in this paper achieves superior convergence speed and higher cumulative rewards compared to the classical DRL approaches. Full article
(This article belongs to the Special Issue Optimization Theory, Method and Application, 2nd Edition)
Show Figures

Figure 1

33 pages, 4841 KiB  
Article
Research on Task Allocation in Four-Way Shuttle Storage and Retrieval Systems Based on Deep Reinforcement Learning
by Zhongwei Zhang, Jingrui Wang, Jie Jin, Zhaoyun Wu, Lihui Wu, Tao Peng and Peng Li
Sustainability 2025, 17(15), 6772; https://doi.org/10.3390/su17156772 - 25 Jul 2025
Viewed by 372
Abstract
The four-way shuttle storage and retrieval system (FWSS/RS) is an advanced automated warehousing solution for achieving green and intelligent logistics, and task allocation is crucial to its logistics efficiency. However, current research on task allocation in three-dimensional storage environments is mostly conducted in [...] Read more.
The four-way shuttle storage and retrieval system (FWSS/RS) is an advanced automated warehousing solution for achieving green and intelligent logistics, and task allocation is crucial to its logistics efficiency. However, current research on task allocation in three-dimensional storage environments is mostly conducted in the single-operation mode that handles inbound or outbound tasks individually, with limited attention paid to the more prevalent composite operation mode where inbound and outbound tasks coexist. To bridge this gap, this study investigates the task allocation problem in an FWSS/RS under the composite operation mode, and deep reinforcement learning (DRL) is introduced to solve it. Initially, the FWSS/RS operational workflows and equipment motion characteristics are analyzed, and a task allocation model with the total task completion time as the optimization objective is established. Furthermore, the task allocation problem is transformed into a partially observable Markov decision process corresponding to reinforcement learning. Each shuttle is regarded as an independent agent that receives localized observations, including shuttle position information and task completion status, as inputs, and a deep neural network is employed to fit value functions to output action selections. Correspondingly, all agents are trained within an independent deep Q-network (IDQN) framework that facilitates collaborative learning through experience sharing while maintaining decentralized decision-making based on individual observations. Moreover, to validate the efficiency and effectiveness of the proposed model and method, experiments were conducted across various problem scales and transport resource configurations. The experimental results demonstrate that the DRL-based approach outperforms conventional task allocation methods, including the auction algorithm and the genetic algorithm. Specifically, the proposed IDQN-based method reduces the task completion time by up to 12.88% compared to the auction algorithm, and up to 8.64% compared to the genetic algorithm across multiple scenarios. Moreover, task-related factors are found to have a more significant impact on the optimization objectives of task allocation than transport resource-related factors. Full article
Show Figures

Figure 1

25 pages, 51196 KiB  
Article
Research on Robot Obstacle Avoidance and Generalization Methods Based on Fusion Policy Transfer Learning
by Suyu Wang, Zhenlei Xu, Peihong Qiao, Quan Yue, Ya Ke and Feng Gao
Biomimetics 2025, 10(8), 493; https://doi.org/10.3390/biomimetics10080493 - 25 Jul 2025
Viewed by 490
Abstract
In nature, organisms often rely on the integration of local sensory information and prior experience to flexibly adapt to complex and dynamic environments, enabling efficient path selection. This bio-inspired mechanism of perception and behavioral adjustment provides important insights for path planning in mobile [...] Read more.
In nature, organisms often rely on the integration of local sensory information and prior experience to flexibly adapt to complex and dynamic environments, enabling efficient path selection. This bio-inspired mechanism of perception and behavioral adjustment provides important insights for path planning in mobile robots operating under uncertainty. In recent years, the introduction of deep reinforcement learning (DRL) has empowered mobile robots to autonomously learn navigation strategies through interaction with the environment, allowing them to identify obstacle distributions and perform path planning even in unknown scenarios. To further enhance the adaptability and path planning performance of robots in complex environments, this paper develops a deep reinforcement learning framework based on the Soft Actor–Critic (SAC) algorithm. First, to address the limited adaptability of existing transfer learning methods, we propose an action-level fusion mechanism that dynamically integrates prior and current policies during inference, enabling more flexible knowledge transfer. Second, a bio-inspired radar perception optimization method is introduced, which mimics the biological mechanism of focusing on key regions while ignoring redundant information, thereby enhancing the expressiveness of sensory inputs. Finally, a reward function based on ineffective behavior recognition is designed to reduce unnecessary exploration during training. The proposed method is validated in both the Gazebo simulation environment and real-world scenarios. Experimental results demonstrate that the approach achieves faster convergence and superior obstacle avoidance performance in path planning tasks, exhibiting strong transferability and generalization across various obstacle configurations. Full article
(This article belongs to the Section Biological Optimisation and Management)
Show Figures

Figure 1

18 pages, 4058 KiB  
Article
A Transferable DRL-Based Intelligent Secondary Frequency Control for Islanded Microgrids
by Sijia Li, Frede Blaabjerg and Amjad Anvari-Moghaddam
Electronics 2025, 14(14), 2826; https://doi.org/10.3390/electronics14142826 - 14 Jul 2025
Viewed by 303
Abstract
Frequency instability poses a significant challenge to the overall stability of islanded microgrid systems. Deep reinforcement learning (DRL)-based intelligent control strategies are drawing considerable attention for their ability to operate without the need for previous system dynamics information and the capacity for autonomous [...] Read more.
Frequency instability poses a significant challenge to the overall stability of islanded microgrid systems. Deep reinforcement learning (DRL)-based intelligent control strategies are drawing considerable attention for their ability to operate without the need for previous system dynamics information and the capacity for autonomous learning. This paper proposes an intelligent frequency secondary compensation solution that divides the traditional secondary frequency control into two layers. The first layer is based on a PID controller and the second layer is an intelligent controller based on DRL. To address the typically extensive training durations associated with DRL controllers, this paper integrates transfer learning, which significantly expedites the training process. This scheme improves control accuracy and reduces computational redundancy. Simulation tests are executed on an islanded microgrid with four distributed generators and an IEEE 13-bus system is utilized for further validation. Finally, the proposed method is validated on the OPAL-RT real-time test platform. The results demonstrate the superior performance of the proposed method. Full article
(This article belongs to the Special Issue Recent Advances in Control and Optimization in Microgrids)
Show Figures

Figure 1

Back to TopTop