Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (127)

Search Parameters:
Keywords = actors and agent networks

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 621 KiB  
Article
Support Needs of Agrarian Women to Build Household Livelihood Resilience: A Case Study of the Mekong River Delta, Vietnam
by Tran T. N. Tran, Tanh T. N. Nguyen, Elizabeth C. Ashton and Sharon M. Aka
Climate 2025, 13(8), 163; https://doi.org/10.3390/cli13080163 (registering DOI) - 1 Aug 2025
Viewed by 40
Abstract
Agrarian women are at the forefront of rural livelihoods increasingly affected by the frequency and severity of climate change impacts. However, their household livelihood resilience (HLR) remains limited due to gender-blind policies, scarce sex-disaggregated data, and inadequate consideration of gender-specific needs in resilience-building [...] Read more.
Agrarian women are at the forefront of rural livelihoods increasingly affected by the frequency and severity of climate change impacts. However, their household livelihood resilience (HLR) remains limited due to gender-blind policies, scarce sex-disaggregated data, and inadequate consideration of gender-specific needs in resilience-building efforts. Grounded in participatory feminist research, this study employed a multi-method qualitative approach, including semi-structured interviews and oral history narratives, with 60 women in two climate-vulnerable provinces. Data were analyzed through thematic coding, CATWOE (Customers, Actors, Transformation, Worldview, Owners, Environmental Constraints) analysis, and descriptive statistics. The findings identify nine major climate-related events disrupting livelihoods and reveal a limited understanding of HLR as a long-term, transformative concept. Adaptation strategies remain short-term and focused on immediate survival. Barriers to HLR include financial constraints, limited access to agricultural resources and technology, and entrenched gender norms restricting women’s leadership and decision-making. While local governments, women’s associations, and community networks provide some support, gaps in accessibility and adequacy persist. Participants expressed the need for financial assistance, vocational training, agricultural technologies, and stronger peer networks. Strengthening HLR among agrarian women requires gender-sensitive policies, investment in local support systems, and community-led initiatives. Empowering agrarian women as agents of change is critical for fostering resilient rural livelihoods and achieving inclusive, sustainable development. Full article
Show Figures

Graphical abstract

24 pages, 5286 KiB  
Article
Graph Neural Network-Enhanced Multi-Agent Reinforcement Learning for Intelligent UAV Confrontation
by Kunhao Hu, Hao Pan, Chunlei Han, Jianjun Sun, Dou An and Shuanglin Li
Aerospace 2025, 12(8), 687; https://doi.org/10.3390/aerospace12080687 (registering DOI) - 31 Jul 2025
Viewed by 143
Abstract
Unmanned aerial vehicles (UAVs) are widely used in surveillance and combat for their efficiency and autonomy, whilst complex, dynamic environments challenge the modeling of inter-agent relations and information transmission. This research proposes a novel UAV tactical choice-making algorithm utilizing graph neural networks to [...] Read more.
Unmanned aerial vehicles (UAVs) are widely used in surveillance and combat for their efficiency and autonomy, whilst complex, dynamic environments challenge the modeling of inter-agent relations and information transmission. This research proposes a novel UAV tactical choice-making algorithm utilizing graph neural networks to tackle these challenges. The proposed algorithm employs a graph neural network to process the observed state information, the convolved output of which is then fed into a reconstructed critic network incorporating a Laplacian convolution kernel. This research first enhances the accuracy of obtaining unstable state information in hostile environments. The proposed algorithm uses this information to train a more precise critic network. In turn, this improved critic network guides the actor network to make decisions that better meet the needs of the battlefield. Coupled with a policy transfer mechanism, this architecture significantly enhances the decision-making efficiency and environmental adaptability within the multi-agent system. Results from the experiments show that the average effectiveness of the proposed algorithm across the six planned scenarios is 97.4%, surpassing the baseline by 23.4%. In addition, the integration of transfer learning makes the network convergence speed three times faster than that of the baseline algorithm. This algorithm effectively improves the information transmission efficiency between the environment and the UAV and provides strong support for UAV formation combat. Full article
(This article belongs to the Special Issue New Perspective on Flight Guidance, Control and Dynamics)
Show Figures

Figure 1

25 pages, 516 KiB  
Article
Exploring a Sustainable Pathway Towards Enhancing National Innovation Capacity from an Empirical Analysis
by Sylvia Novillo-Villegas, Ana Belén Tulcanaza-Prieto, Alexander X. Chantera and Christian Chimbo
Sustainability 2025, 17(15), 6922; https://doi.org/10.3390/su17156922 - 30 Jul 2025
Viewed by 162
Abstract
Innovation is a strategic driver of sustainable competitive advantage and long-term economic growth. This study proposes an empirical framework to support the sustained development of national innovation capacity by examining key enabling factors. Drawing on an extensive review of the literature, the research [...] Read more.
Innovation is a strategic driver of sustainable competitive advantage and long-term economic growth. This study proposes an empirical framework to support the sustained development of national innovation capacity by examining key enabling factors. Drawing on an extensive review of the literature, the research investigates the interrelationships among governmental support (GS), innovation agents (IA), university–industry R&D collaborations (UIRD), and innovation cluster development (ICD), and their influence on two critical innovation outcomes, knowledge creation (KC) and knowledge diffusion (KD). Using panel data from G7 countries spanning 2008 to 2018, sourced from international organizations such as the World Bank, the World Intellectual Property Organization, and the World Economic Forum, the study applies regression analysis to test the proposed conceptual model. Results highlight the foundational role of GS in providing a balanced framework to foster collaborative networks among IA and enhancing the effectiveness of UIRD. Furthermore, IA emerges as a pivotal actor in advancing innovation efforts, while the development of innovation clusters is shown to selectively enhance specific innovation outcomes. These findings offer theoretical and practical contributions for policymakers, researchers, and stakeholders aiming to design supportive ecosystems that strengthen sustainable national innovation capacity. Full article
Show Figures

Figure 1

30 pages, 4578 KiB  
Article
Unpacking Performance Variability in Deep Reinforcement Learning: The Role of Observation Space Divergence
by Sooyoung Jang and Ahyun Lee
Appl. Sci. 2025, 15(15), 8247; https://doi.org/10.3390/app15158247 - 24 Jul 2025
Viewed by 180
Abstract
Deep Reinforcement Learning (DRL) algorithms often exhibit significant performance variability across different training runs, even with identical settings. This paper investigates the hypothesis that a key contributor to this variability is the divergence in the observation spaces explored by individual learning agents. We [...] Read more.
Deep Reinforcement Learning (DRL) algorithms often exhibit significant performance variability across different training runs, even with identical settings. This paper investigates the hypothesis that a key contributor to this variability is the divergence in the observation spaces explored by individual learning agents. We conducted an empirical study using Proximal Policy Optimization (PPO) agents trained on eight Atari environments. We analyzed the collected agent trajectories by qualitatively visualizing and quantitatively measuring the divergence in their explored observation spaces. Furthermore, we cross-evaluated the learned actor and value networks, measuring the average absolute TD-error, the RMSE of value estimates, and the KL divergence between policies to assess their functional similarity. We also conducted experiments where agents were trained from identical network initializations to isolate the source of this divergence. Our findings reveal a strong correlation: environments with low-performance variance (e.g., Freeway) showed high similarity in explored observation spaces and learned networks across agents. Conversely, environments with high-performance variability (e.g., Boxing, Qbert) demonstrated significant divergence in both explored states and network functionalities. This pattern persisted even when agents started with identical network weights. These results suggest that differences in experiential trajectories, driven by the stochasticity of agent–environment interactions, lead to specialized agent policies and value functions, thereby contributing substantially to the observed inconsistencies in DRL performance. Full article
(This article belongs to the Special Issue Advancements and Applications in Reinforcement Learning)
Show Figures

Figure 1

19 pages, 2893 KiB  
Article
Reactive Power Optimization of a Distribution Network Based on Graph Security Reinforcement Learning
by Xu Zhang, Xiaolin Gui, Pei Sun, Xing Li, Yuan Zhang, Xiaoyu Wang, Chaoliang Dang and Xinghua Liu
Appl. Sci. 2025, 15(15), 8209; https://doi.org/10.3390/app15158209 - 23 Jul 2025
Viewed by 195
Abstract
With the increasing integration of renewable energy, the secure operation of distribution networks faces significant challenges, such as voltage limit violations and increased power losses. To address the issue of reactive power and voltage security under renewable generation uncertainty, this paper proposes a [...] Read more.
With the increasing integration of renewable energy, the secure operation of distribution networks faces significant challenges, such as voltage limit violations and increased power losses. To address the issue of reactive power and voltage security under renewable generation uncertainty, this paper proposes a graph-based security reinforcement learning method. First, a graph-enhanced neural network is designed, to extract both topological and node-level features from the distribution network. Then, a primal-dual approach is introduced to incorporate voltage security constraints into the agent’s critic network, by constructing a cost critic to guide safe policy learning. Finally, a dual-critic framework is adopted to train the actor network and derive an optimal policy. Experiments conducted on real load profiles demonstrated that the proposed method reduced the voltage violation rate to 0%, compared to 4.92% with the Deep Deterministic Policy Gradient (DDPG) algorithm and 5.14% with the Twin Delayed DDPG (TD3) algorithm. Moreover, the average node voltage deviation was effectively controlled within 0.0073 per unit. Full article
(This article belongs to the Special Issue IoT Technology and Information Security)
Show Figures

Figure 1

22 pages, 2108 KiB  
Article
Deep Reinforcement Learning for Real-Time Airport Emergency Evacuation Using Asynchronous Advantage Actor–Critic (A3C) Algorithm
by Yujing Zhou, Yupeng Yang, Bill Deng Pan, Yongxin Liu, Sirish Namilae, Houbing Herbert Song and Dahai Liu
Mathematics 2025, 13(14), 2269; https://doi.org/10.3390/math13142269 - 15 Jul 2025
Viewed by 383
Abstract
Emergencies can occur unexpectedly and require immediate action, especially in aviation, where time pressure and uncertainty are high. This study focused on improving emergency evacuation in airport and aircraft scenarios using real-time decision-making support. A system based on the Asynchronous Advantage Actor–Critic (A3C) [...] Read more.
Emergencies can occur unexpectedly and require immediate action, especially in aviation, where time pressure and uncertainty are high. This study focused on improving emergency evacuation in airport and aircraft scenarios using real-time decision-making support. A system based on the Asynchronous Advantage Actor–Critic (A3C) algorithm, an advanced deep reinforcement learning method, was developed to generate faster and more efficient evacuation routes compared to traditional models. The A3C model was tested in various scenarios, including different environmental conditions and numbers of agents, and its performance was compared with the Deep Q-Network (DQN) algorithm. The results showed that A3C achieved evacuations 43.86% faster on average and converged in fewer episodes (100 vs. 250 for DQN). In dynamic environments with moving threats, A3C also outperformed DQN in maintaining agent safety and adapting routes in real time. As the number of agents increased, A3C maintained high levels of efficiency and robustness. These findings demonstrate A3C’s strong potential to enhance evacuation planning through improved speed, adaptability, and scalability. The study concludes by highlighting the practical benefits of applying such models in real-world emergency response systems, including significantly faster evacuation times, real-time adaptability to evolving threats, and enhanced scalability for managing large crowds in high-density environments including airport terminals. The A3C-based model offers a cost-effective alternative to full-scale evacuation drills by enabling virtual scenario testing, supports proactive safety planning through predictive modeling, and contributes to the development of intelligent decision-support tools that improve coordination and reduce response time during emergencies. Full article
Show Figures

Figure 1

37 pages, 1029 KiB  
Article
Autonomous Reinforcement Learning for Intelligent and Sustainable Autonomous Microgrid Energy Management
by Iacovos Ioannou, Saher Javaid, Yasuo Tan and Vasos Vassiliou
Electronics 2025, 14(13), 2691; https://doi.org/10.3390/electronics14132691 - 3 Jul 2025
Viewed by 401
Abstract
Effective energy management in microgrids is essential for integrating renewable energy sources and maintaining operational stability. Machine learning (ML) techniques offer significant potential for optimizing microgrid performance. This study provides a comprehensive comparative performance evaluation of four ML-based control strategies: deep Q-networks (DQNs), [...] Read more.
Effective energy management in microgrids is essential for integrating renewable energy sources and maintaining operational stability. Machine learning (ML) techniques offer significant potential for optimizing microgrid performance. This study provides a comprehensive comparative performance evaluation of four ML-based control strategies: deep Q-networks (DQNs), proximal policy optimization (PPO), Q-learning, and advantage actor–critic (A2C). These strategies were rigorously tested using simulation data from a representative islanded microgrid model, with metrics evaluated across diverse seasonal conditions (autumn, spring, summer, winter). Key performance indicators included overall episodic reward, unmet load, excess generation, energy storage system (ESS) state-of-charge (SoC) imbalance, ESS utilization, and computational runtime. Results from the simulation indicate that the DQN-based agent consistently achieved superior performance across all evaluated seasons, effectively balancing economic rewards, reliability, and battery health while maintaining competitive computational runtimes. Specifically, DQN delivered near-optimal rewards by significantly reducing unmet load, minimizing excess renewable energy curtailment, and virtually eliminating ESS SoC imbalance, thereby prolonging battery life. Although the tabular Q-learning method showed the lowest computational latency, it was constrained by limited adaptability in more complex scenarios. PPO and A2C, while offering robust performance, incurred higher computational costs without additional performance advantages over DQN. This evaluation clearly demonstrates the capability and adaptability of the DQN approach for intelligent and autonomous microgrid management, providing valuable insights into the relative advantages and limitations of various ML strategies in complex energy management scenarios. Full article
(This article belongs to the Special Issue Artificial Intelligence-Driven Emerging Applications)
Show Figures

Figure 1

30 pages, 4491 KiB  
Article
IoT-Enabled Adaptive Traffic Management: A Multiagent Framework for Urban Mobility Optimisation
by Ibrahim Mutambik
Sensors 2025, 25(13), 4126; https://doi.org/10.3390/s25134126 - 2 Jul 2025
Cited by 2 | Viewed by 624
Abstract
This study evaluates the potential of IoT-enabled adaptive traffic management systems for mitigating urban congestion, enhancing mobility, and reducing environmental impacts in densely populated cities. Using London as a case study, the research develops a multiagent simulation framework to assess the effectiveness of [...] Read more.
This study evaluates the potential of IoT-enabled adaptive traffic management systems for mitigating urban congestion, enhancing mobility, and reducing environmental impacts in densely populated cities. Using London as a case study, the research develops a multiagent simulation framework to assess the effectiveness of advanced traffic management strategies—including adaptive signal control and dynamic rerouting—under varied traffic scenarios. Unlike conventional models that rely on static or reactive approaches, this framework integrates real-time data from IoT-enabled sensors with predictive analytics to enable proactive adjustments to traffic flows. Distinctively, the study couples this integration with a multiagent simulation environment that models the traffic actors—private vehicles, buses, cyclists, and emergency services—as autonomous, behaviourally dynamic agents responding to real-time conditions. This enables a more nuanced, realistic, and scalable evaluation of urban mobility strategies. The simulation results indicate substantial performance gains, including a 30% reduction in average travel times, a 50% decrease in congestion at major intersections, and a 28% decline in CO2 emissions. These findings underscore the transformative potential of sensor-driven adaptive systems for advancing sustainable urban mobility. The study addresses critical gaps in the existing literature by focusing on scalability, equity, and multimodal inclusivity, particularly through the prioritisation of high-occupancy and essential traffic. Furthermore, it highlights the pivotal role of IoT sensor networks in real-time traffic monitoring, control, and optimisation. By demonstrating a novel and practical application of sensor technologies to traffic systems, the proposed framework makes a significant and timely contribution to the field and offers actionable insights for smart city planning and transportation policy. Full article
(This article belongs to the Special Issue Vehicular Sensing for Improved Urban Mobility: 2nd Edition)
Show Figures

Figure 1

20 pages, 5480 KiB  
Article
Model-Data Hybrid-Driven Real-Time Optimal Power Flow: A Physics-Informed Reinforcement Learning Approach
by Ximing Zhang, Xiyuan Ma, Yun Yu, Duotong Yang, Zhida Lin, Changcheng Zhou, Huan Xu and Zhuohuan Li
Energies 2025, 18(13), 3483; https://doi.org/10.3390/en18133483 - 1 Jul 2025
Viewed by 312
Abstract
With the rapid development of artificial intelligence technology, DRL has shown great potential in solving complex real-time optimal power flow problems of modern power systems. Nevertheless, traditional DRL methodologies confront dual bottlenecks: (a) suboptimal coordination between exploratory behavior policies and experience-based data exploitation [...] Read more.
With the rapid development of artificial intelligence technology, DRL has shown great potential in solving complex real-time optimal power flow problems of modern power systems. Nevertheless, traditional DRL methodologies confront dual bottlenecks: (a) suboptimal coordination between exploratory behavior policies and experience-based data exploitation in practical applications, compounded by (b) users’ distrust from the opacity of model decision mechanics. To address these, a model–data hybrid-driven physics-informed reinforcement learning (PIRL) algorithm is proposed in this paper. Specifically, the proposed methodology uses the proximal policy optimization (PPO) algorithm as the agent’s foundational framework and constructs a PI-actor network embedded with prior model knowledge derived from power flow sensitivity into the agent’s actor network via the PINN method, which achieves dual optimization objectives: (a) enhanced environmental perceptibility to improve experience utilization efficiency via gradient-awareness from model knowledge during actor network updates, and (b) improved user trustworthiness through mathematically constrained action gradient information derived from explicit model knowledge, ensuring actor updates adhere to safety boundaries. The simulation and validation results show that the PIRL algorithm outperforms the baseline PPO algorithm in terms of training stability, exploration efficiency, economy, and security. Full article
Show Figures

Figure 1

37 pages, 4400 KiB  
Article
Optimizing Weighted Fair Queuing with Deep Reinforcement Learning for Dynamic Bandwidth Allocation
by Mays A. Mawlood and Dhari Ali Mahmood
Telecom 2025, 6(3), 46; https://doi.org/10.3390/telecom6030046 - 1 Jul 2025
Viewed by 449
Abstract
The rapid growth of high-quality telecommunications demands enhanced queueing system performance. Traditional bandwidth distribution often struggles to adapt to dynamic changes, network conditions, and erratic traffic patterns. Internet traffic fluctuates over time, causing resource underutilization. To address these challenges, this paper proposes a [...] Read more.
The rapid growth of high-quality telecommunications demands enhanced queueing system performance. Traditional bandwidth distribution often struggles to adapt to dynamic changes, network conditions, and erratic traffic patterns. Internet traffic fluctuates over time, causing resource underutilization. To address these challenges, this paper proposes a new adaptive algorithm called Weighted Fair Queues continual Deep Reinforcement Learning (WFQ continual-DRL), which integrates the advanced deep reinforcement learning Soft Actor-Critic (SAC) algorithm with the Elastic Weight Consolidation (EWC) approach. This technique is designed to overcome neural networks’ catastrophic forgetting, thereby enhancing network routers’ dynamic bandwidth allocation. The agent is trained to allocate bandwidth weights for multiple queues dynamically by interacting with the environment to observe queue lengths. The performance of the proposed adaptive algorithm was evaluated for eight queues until it expanded to twelve-queue systems. The model achieved higher cumulative rewards as compared to previous studies, indicating improved overall performance. The values of the Mean Squared Error (MSE) and Mean Absolute Error (MAE) decreased, suggesting effectively optimized bandwidth allocation. Reducing Root Mean Square Error (RMSE) indicated improved prediction accuracy and enhanced fairness computed by Jain’s index. The proposed algorithm was validated by employing real-world network traffic data, ensuring a robust model under dynamic queuing requirements. Full article
Show Figures

Figure 1

20 pages, 2579 KiB  
Article
ERA-MADDPG: An Elastic Routing Algorithm Based on Multi-Agent Deep Deterministic Policy Gradient in SDN
by Wanwei Huang, Hongchang Liu, Yingying Li and Linlin Ma
Future Internet 2025, 17(7), 291; https://doi.org/10.3390/fi17070291 - 29 Jun 2025
Viewed by 336
Abstract
To address the fact that changes in network topology can have an impact on the performance of routing, this paper proposes an Elastic Routing Algorithm based on Multi-Agent Deep Deterministic Policy Gradient (ERA-MADDPG), which is implemented within the framework of Multi-Agent Deep Deterministic [...] Read more.
To address the fact that changes in network topology can have an impact on the performance of routing, this paper proposes an Elastic Routing Algorithm based on Multi-Agent Deep Deterministic Policy Gradient (ERA-MADDPG), which is implemented within the framework of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) in deep reinforcement learning. The algorithm first builds a three-layer architecture based on Software-Defined Networking (SDN). The top-down layers are the multi-agent layer, the controller layer, and the data layer. The architecture’s processing flow, including real-time data layer information collection and dynamic policy generation, enables the ERA-MADDPG algorithm to exhibit strong elasticity by quickly adjusting routing decisions in response to topology changes. The actor-critic framework combined with Convolutional Neural Networks (CNN) to implement the ERA-MADDPG routing algorithm effectively improves training efficiency, enhances learning stability, facilitates collaboration, and improves algorithm generalization and applicability. Finally, simulation experiments demonstrate that the convergence speed of the ERA-MADDPG routing algorithm outperforms that of the Multi-Agent Deep Q-Network (MADQN) algorithm and the Smart Routing based on Deep Reinforcement Learning (SR-DRL) algorithm, and the training speed in the initial phase is improved by approximately 20.9% and 39.1% compared to the MADQN algorithm and SR-DRL algorithm, respectively. The elasticity performance of ERA-MADDPG is quantified by re-convergence speed: under 5–15% topology node/link changes, its re-convergence speed is over 25% faster than that of MADQN and SR-DRL, demonstrating superior capability to maintain routing efficiency in dynamic environments. Full article
Show Figures

Figure 1

26 pages, 1093 KiB  
Article
Qualitatively Pre-Testing a Tailored Financial Literacy Measurement Instrument for Professional Athletes
by Jaco Moolman and Christina Cornelia Shuttleworth
J. Risk Financial Manag. 2025, 18(6), 317; https://doi.org/10.3390/jrfm18060317 - 10 Jun 2025
Viewed by 874
Abstract
The aim of this study was to qualitatively pre-test a research instrument to assess the financial literacy skills of professional athletes who compete in a team sport environment. Questions were developed based on a review of the current literature and an analysis of [...] Read more.
The aim of this study was to qualitatively pre-test a research instrument to assess the financial literacy skills of professional athletes who compete in a team sport environment. Questions were developed based on a review of the current literature and an analysis of qualitative data from twelve structured expert interviews, selected using actor–network theory and purposive sampling. The findings showed how qualitative data can be considered and enumerated to guide the development of 28 validated questions to assess financial literacy within a specific group. This study helps to fill a gap in the literature since there is a paucity of qualitatively mediated research that focuses on specific target groups in the field of financial literacy. This research instrument could be of value to professional athletes, sports club management, players’ associations, educators, researchers, sports agents, and advisors by providing them with a greater understanding of their clients’ financial literacy skills and financial needs. Full article
(This article belongs to the Special Issue Behavioral Finance and Financial Management)
Show Figures

Figure 1

37 pages, 13864 KiB  
Article
LSTM-Enhanced Deep Reinforcement Learning for Robust Trajectory Tracking Control of Skid-Steer Mobile Robots Under Terra-Mechanical Constraints
by Jose Manuel Alcayaga, Oswaldo Anibal Menéndez, Miguel Attilio Torres-Torriti, Juan Pablo Vásconez, Tito Arévalo-Ramirez and Alvaro Javier Prado Romo
Robotics 2025, 14(6), 74; https://doi.org/10.3390/robotics14060074 - 29 May 2025
Viewed by 2187
Abstract
Autonomous navigation in mining environments is challenged by complex wheel–terrain interaction, traction losses caused by slip dynamics, and sensor limitations. This paper investigates the effectiveness of Deep Reinforcement Learning (DRL) techniques for the trajectory tracking control of skid-steer mobile robots operating under terra-mechanical [...] Read more.
Autonomous navigation in mining environments is challenged by complex wheel–terrain interaction, traction losses caused by slip dynamics, and sensor limitations. This paper investigates the effectiveness of Deep Reinforcement Learning (DRL) techniques for the trajectory tracking control of skid-steer mobile robots operating under terra-mechanical constraints. Four state-of-the-art DRL algorithms, i.e., Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor–Critic (SAC), are selected to evaluate their ability to generate stable and adaptive control policies under varying environmental conditions. To address the inherent partial observability in real-world navigation, this study presents an original approach that integrates Long Short-Term Memory (LSTM) networks into DRL-based controllers. This allows control agents to retain and leverage temporal dependencies to infer unobservable system states. The developed agents were trained and tested in simulations and then assessed in field experiments under uneven terrain and dynamic model parameter changes that lead to traction losses in mining environments, targeting various trajectory tracking tasks, including lemniscate and squared-type reference trajectories. This contribution strengthens the robustness and adaptability of DRL agents by enabling better generalization of learned policies compared with their baseline counterparts, while also significantly improving trajectory tracking performance. In particular, LSTM-based controllers achieved reductions in tracking errors of 10%, 74%, 21%, and 37% for DDPG-LSTM, PPO-LSTM, TD3-LSTM, and SAC-LSTM, respectively, compared with their non-recurrent counterparts. Furthermore, DDPG-LSTM and TD3-LSTM reduced their control effort through the total variation in control input by 15% and 20% compared with their respective baseline controllers, respectively. Findings from this work provide valuable insights into the role of memory-augmented reinforcement learning for robust motion control in unstructured and high-uncertainty environments. Full article
(This article belongs to the Section Intelligent Robots and Mechatronics)
Show Figures

Figure 1

25 pages, 1528 KiB  
Article
A Collaborative Multi-Agent Reinforcement Learning Approach for Non-Stationary Environments with Unknown Change Points
by Suyu Wang, Quan Yue, Zhenlei Xu, Peihong Qiao, Zhentao Lyu and Feng Gao
Mathematics 2025, 13(11), 1738; https://doi.org/10.3390/math13111738 - 24 May 2025
Viewed by 1017
Abstract
Reinforcement learning has achieved significant success in sequential decision-making problems but exhibits poor adaptability in non-stationary environments with unknown dynamics, a challenge particularly pronounced in multi-agent scenarios. This study aims to enhance the adaptive capability of multi-agent systems in such volatile environments. We [...] Read more.
Reinforcement learning has achieved significant success in sequential decision-making problems but exhibits poor adaptability in non-stationary environments with unknown dynamics, a challenge particularly pronounced in multi-agent scenarios. This study aims to enhance the adaptive capability of multi-agent systems in such volatile environments. We propose a novel cooperative Multi-Agent Reinforcement Learning (MARL) algorithm based on MADDPG, termed MACPH, which innovatively incorporates three mechanisms: a Composite Experience Replay Buffer (CERB) mechanism that balances recent and important historical experiences through a dual-buffer structure and mixed sampling; an Adaptive Parameter Space Noise (APSN) mechanism that perturbs actor network parameters and dynamically adjusts the perturbation intensity to achieve coherent and state-dependent exploration; and a Huber loss function mechanism to mitigate the impact of outliers in Temporal Difference errors and enhance training stability. The study was conducted in standard and non-stationary navigation and communication task scenarios. Ablation studies confirmed the positive contributions of each component and their synergistic effects. In non-stationary scenarios featuring abrupt environmental changes, experiments demonstrate that MACPH outperforms baseline algorithms such as DDPG, MADDPG, and MATD3 in terms of reward performance, adaptation speed, learning stability, and robustness. The proposed MACPH algorithm offers an effective solution for multi-agent reinforcement learning applications in complex non-stationary environments. Full article
(This article belongs to the Special Issue Application of Machine Learning and Data Mining, 2nd Edition)
Show Figures

Figure 1

34 pages, 5896 KiB  
Article
Networked Multi-Agent Deep Reinforcement Learning Framework for the Provision of Ancillary Services in Hybrid Power Plants
by Muhammad Ikram, Daryoush Habibi and Asma Aziz
Energies 2025, 18(10), 2666; https://doi.org/10.3390/en18102666 - 21 May 2025
Viewed by 442
Abstract
Inverter-based resources (IBRs) are becoming more prominent due to the increasing penetration of renewable energy sources that reduce power system inertia, compromising power system stability and grid support services. At present, optimal coordination among generation technologies remains a significant challenge for frequency control [...] Read more.
Inverter-based resources (IBRs) are becoming more prominent due to the increasing penetration of renewable energy sources that reduce power system inertia, compromising power system stability and grid support services. At present, optimal coordination among generation technologies remains a significant challenge for frequency control services. This paper presents a novel networked multi-agent deep reinforcement learning (N—MADRL) scheme for optimal dispatch and frequency control services. First, we develop a model-free environment consisting of a photovoltaic (PV) plant, a wind plant (WP), and an energy storage system (ESS) plant. The proposed framework uses a combination of multi-agent actor-critic (MAAC) and soft actor-critic (SAC) schemes for optimal dispatch of active power, mitigating frequency deviations, aiding reserve capacity management, and improving energy balancing. Second, frequency stability and optimal dispatch are formulated in the N—MADRL framework using the physical constraints under a dynamic simulation environment. Third, a decentralised coordinated control scheme is implemented in the HPP environment using communication-resilient scenarios to address system vulnerabilities. Finally, the practicality of the N—MADRL approach is demonstrated in a Grid2Op dynamic simulation environment for optimal dispatch, energy reserve management, and frequency control. Results demonstrated on the IEEE 14 bus network show that compared to PPO and DDPG, N—MADRL achieves 42.10% and 61.40% higher efficiency for optimal dispatch, along with improvements of 68.30% and 74.48% in mitigating frequency deviations, respectively. The proposed approach outperforms existing methods under partially, fully, and randomly connected scenarios by effectively handling uncertainties, system intermittency, and communication resiliency. Full article
(This article belongs to the Collection Artificial Intelligence and Smart Energy)
Show Figures

Figure 1

Back to TopTop