Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (104)

Search Parameters:
Keywords = multi-agent reinforcement learning (RL)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 3230 KB  
Article
Real-Time Cooperative Path Planning and Collision Avoidance for Autonomous Logistics Vehicles Using Reinforcement Learning and Distributed Model Predictive Control
by Mingxin Li, Hui Li, Yunan Yao, Yulei Zhu, Hailong Weng, Huabiao Jin and Taiwei Yang
Machines 2026, 14(1), 27; https://doi.org/10.3390/machines14010027 - 24 Dec 2025
Abstract
In industrial environments such as ports and warehouses, autonomous logistics vehicles face significant challenges in coordinating multiple vehicles while ensuring safe and efficient path planning. This study proposes a novel real-time cooperative control framework for autonomous vehicles, combining reinforcement learning (RL) and distributed [...] Read more.
In industrial environments such as ports and warehouses, autonomous logistics vehicles face significant challenges in coordinating multiple vehicles while ensuring safe and efficient path planning. This study proposes a novel real-time cooperative control framework for autonomous vehicles, combining reinforcement learning (RL) and distributed model predictive control (DMPC). The RL agent dynamically adjusts the optimization weights of the DMPC to adapt to the vehicle’s real-time environment, while the DMPC enables decentralized path planning and collision avoidance. The system leverages multi-source sensor fusion, including GNSS, UWB, IMU, LiDAR, and stereo cameras, to provide accurate state estimations of vehicles. Simulation results demonstrate that the proposed RL-DMPC approach outperforms traditional centralized control strategies in terms of tracking accuracy, collision avoidance, and safety margins. Furthermore, the proposed method significantly improves control smoothness compared to rule-based strategies. This framework is particularly effective in dynamic and constrained industrial settings, offering a robust solution for multi-vehicle coordination with minimal communication delays. The study highlights the potential of combining RL with DMPC to achieve real-time, scalable, and adaptive solutions for autonomous logistics. Full article
(This article belongs to the Special Issue Control and Path Planning for Autonomous Vehicles)
36 pages, 3105 KB  
Review
Reinforcement Learning for Industrial Automation: A Comprehensive Review of Adaptive Control and Decision-Making in Smart Factories
by Yasser M. Alginahi, Omar Sabri and Wael Said
Machines 2025, 13(12), 1140; https://doi.org/10.3390/machines13121140 - 15 Dec 2025
Viewed by 459
Abstract
The accelerating integration of Artificial Intelligence (AI) in Industrial Automation has established Reinforcement Learning (RL) as a transformative paradigm for adaptive control, intelligent optimization, and autonomous decision-making in smart factories. Despite the growing literature, existing reviews often emphasize algorithmic performance or domain-specific applications, [...] Read more.
The accelerating integration of Artificial Intelligence (AI) in Industrial Automation has established Reinforcement Learning (RL) as a transformative paradigm for adaptive control, intelligent optimization, and autonomous decision-making in smart factories. Despite the growing literature, existing reviews often emphasize algorithmic performance or domain-specific applications, neglecting broader links between methodological evolution, technological maturity, and industrial readiness. To address this gap, this study presents a bibliometric review mapping the development of RL and Deep Reinforcement Learning (DRL) research in Industrial Automation and robotics. Following the PRISMA 2020 protocol to guide the data collection procedures and inclusion criteria, 672 peer-reviewed journal articles published between 2017 and 2026 were retrieved from Scopus, ensuring high-quality, interdisciplinary coverage. Quantitative bibliometric analyses were conducted in R using Bibliometrix and Biblioshiny, including co-authorship, co-citation, keyword co-occurrence, and thematic network analyses, to reveal collaboration patterns, influential works, and emerging research trends. Results indicate that 42% of studies employed DRL, 27% focused on Multi-Agent RL (MARL), and 31% relied on classical RL, with applications concentrated in robotic control (33%), process optimization (28%), and predictive maintenance (19%). However, only 22% of the studies reported real-world or pilot implementations, highlighting persistent challenges in scalability, safety validation, interpretability, and deployment readiness. By integrating a review with bibliometric mapping, this study provides a comprehensive taxonomy and a strategic roadmap linking theoretical RL research with practical industrial applications. This roadmap is structured across four critical dimensions: (1) Algorithmic Development (e.g., safe, explainable, and data-efficient RL), (2) Integration Technologies (e.g., digital twins and IoT), (3) Validation Maturity (from simulation to real-world pilots), and (4) Human-Centricity (addressing trust, collaboration, and workforce transition). These insights can guide researchers, engineers, and policymakers in developing scalable, safe, and human-centric RL solutions, prioritizing research directions, and informing the implementation of Industry 5.0–aligned intelligent automation systems emphasizing transparency, sustainability, and operational resilience. Full article
Show Figures

Figure 1

15 pages, 1506 KB  
Review
Towards LLM Enhanced Decision: A Survey on Reinforcement Learning Based Ship Collision Avoidance
by Yizhou Wu, Jin Liu, Xingye Li, Junsheng Xiao, Tao Zhang, Haitong Xu and Lei Zhang
J. Mar. Sci. Eng. 2025, 13(12), 2275; https://doi.org/10.3390/jmse13122275 - 28 Nov 2025
Viewed by 417
Abstract
This comprehensive review examines the works of reinforcement learning (RL) in ship collision avoidance (SCA) from 2014 to the present, analyzing the methods designed for both single-agent and multi-agent collaborative paradigms. While prior research has demonstrated RL’s advantages in environmental adaptability, autonomous decision-making, [...] Read more.
This comprehensive review examines the works of reinforcement learning (RL) in ship collision avoidance (SCA) from 2014 to the present, analyzing the methods designed for both single-agent and multi-agent collaborative paradigms. While prior research has demonstrated RL’s advantages in environmental adaptability, autonomous decision-making, and online optimization over traditional control methods, this study systematically addresses the algorithmic improvements, implementation challenges, and functional roles of RL techniques in SCA, such as Deep Q-Network (DQN), Proximal Policy Optimization (PPO), and Multi-Agent Reinforcement Learning (MARL). It also highlights how these technologies address critical challenges in SCA, including dynamic obstacle avoidance, compliance with Convention on the International Regulations for Preventing Collisions at Sea (COLREGs), and coordination in dense traffic scenarios, while underscoring persistent limitations such as idealized assumptions, scalability issues, and robustness in uncertain environments. Contributions include a structured analysis of recent technological evolution, and a Large Language Model (LLM) based hierarchical architecture integrating perception, communication, decision-making, and execution layers for future SCA systems, which prioritizes the development of scalable, adaptive frameworks that ensure robust and compliant autonomous navigation in complex, real-world maritime environments. Full article
Show Figures

Figure 1

15 pages, 3102 KB  
Article
Physics-Informed Reinforcement Learning for Multi-Band Octagonal Fractal Frequency-Selective Surface Optimization
by Gaoya Dong, Ming Liu and Xin He
Electronics 2025, 14(23), 4656; https://doi.org/10.3390/electronics14234656 - 26 Nov 2025
Viewed by 278
Abstract
Diverse application scenarios demand frequency-selective surfaces (FSSs) with tailored center frequencies and bandwidths. However, their design traditionally relies on iterative full-wave simulations using tools such as the High-Frequency Structure Simulator (HFSS) and Computer Simulation Technology (CST), which are time-consuming and labor-intensive. To overcome [...] Read more.
Diverse application scenarios demand frequency-selective surfaces (FSSs) with tailored center frequencies and bandwidths. However, their design traditionally relies on iterative full-wave simulations using tools such as the High-Frequency Structure Simulator (HFSS) and Computer Simulation Technology (CST), which are time-consuming and labor-intensive. To overcome these limitations, this work proposes an octagonal fractal frequency-selective surface (OF-FSS) composed of a square ring resonator and an octagonal fractal geometry, where the fractal configuration supports single-band and multi-band resonance. A physics-informed reinforcement learning (PIRL) algorithm is developed, enabling the RL agent to directly interact with CST and autonomously optimize key structural parameters. Using the proposed PIRL framework, the OF-FSS achieves both single-band and dual-band responses with desired frequency responses. Full-wave simulations validate that the integration of OF-FSS and PIRL provides an efficient and physically interpretable strategy for designing advanced multi-band FSSs. Full article
(This article belongs to the Special Issue Reinforcement Learning: Emerging Techniques and Future Prospects)
Show Figures

Figure 1

35 pages, 26321 KB  
Article
DualSynNet: A Dual-Center Collaborative Space Network with Federated Graph Reinforcement Learning for Autonomous Task Optimization
by Xuewei Niu, Jiabin Yuan, Lili Fan and Keke Zha
Aerospace 2025, 12(12), 1051; https://doi.org/10.3390/aerospace12121051 - 26 Nov 2025
Viewed by 284
Abstract
Recent space exploration roadmaps from China, the United States, and Russia highlight the establishment of Mars bases as a major objective. Future deep-space missions will span the inner solar system and extend beyond the asteroid belt, demanding network control systems that sustain reliable [...] Read more.
Recent space exploration roadmaps from China, the United States, and Russia highlight the establishment of Mars bases as a major objective. Future deep-space missions will span the inner solar system and extend beyond the asteroid belt, demanding network control systems that sustain reliable communication and efficient scheduling across vast distances. Current centralized or regionalized technologies, such as the Deep-Space Network and planetary relay constellations, are limited by long delays, sparse visibility, and heterogeneous onboard resources, and thus cannot meet these demands. To address these challenges, we propose a dual-center architecture, DualSynNet, anchored at Earth and Mars and enhanced by Lagrange-point relays and a minimal heliocentric constellation to provide scalable multi-mission coverage. On this basis, we develop a federated multi-agent reinforcement learning framework with graph attention (Fed-GAT-MADDPG), integrating centralized critics, decentralized actors, and interplanetary parameter synchronization for adaptive, resource-aware scheduling. A unified metric system: Reachability, Rapidity, and Availability, is introduced to evaluate connectivity, latency, and resource sustainability. Simulation results demonstrate that our method increases task completion to 52.4%, reduces deadline expiration, constrains rover low-state-of-charge exposure to approximately 0.8%, and maintains consistently high hardware reliability across rover and satellite nodes. End-to-end latency is reduced, with a shorter tail distribution due to fewer prolonged buffering or stagnation periods. Ablation studies confirm the essential role of graph attention, as removing it reduces completion and raises expiration. These results indicate that the integration of a dual-center architecture with federated graph reinforcement learning yields a robust, scalable, and resource-efficient framework suitable for next-generation interplanetary exploration. Full article
(This article belongs to the Section Astronautics & Space Science)
Show Figures

Figure 1

18 pages, 12842 KB  
Article
Progressive Policy Learning: A Hierarchical Framework for Dexterous Bimanual Manipulation
by Kang-Won Lee, Jung-Woo Lee, Seongyong Kim and Soo-Chul Lim
Mathematics 2025, 13(22), 3585; https://doi.org/10.3390/math13223585 - 8 Nov 2025
Viewed by 793
Abstract
Dexterous bimanual manipulation remains a challenging task in reinforcement learning (RL) due to the vast state–action space and the complex interdependence between the hands. Conventional end-to-end learning struggles to handle this complexity, and multi-agent RL often faces limitations in stably acquiring cooperative movements. [...] Read more.
Dexterous bimanual manipulation remains a challenging task in reinforcement learning (RL) due to the vast state–action space and the complex interdependence between the hands. Conventional end-to-end learning struggles to handle this complexity, and multi-agent RL often faces limitations in stably acquiring cooperative movements. To address these issues, this study proposes a hierarchical progressive policy learning framework for dexterous bimanual manipulation. In the proposed method, one hand’s policy is first trained to stably grasp the object, and, while maintaining this grasp, the other hand’s manipulation policy is progressively learned. This hierarchical decomposition reduces the search space for each policy and enhances both the connectivity and the stability of learning by training the subsequent policy on the stable states generated by the preceding policy. Simulation results show that the proposed framework outperforms conventional end-to-end and multi-agent RL approaches. The proposed method was demonstrated via sim-to-real transfer on a physical dual-arm platform and empirically validated on a bimanual cube manipulation task. Full article
Show Figures

Graphical abstract

16 pages, 2735 KB  
Article
From Invariance to Symmetry Breaking in FIM-Aware Cooperative Heterogeneous Agent Networks
by Jihua Dou, Kunpeng Ouyang, Zefei Wu, Zhixin Hu, Jianxin Lin and Huachuan Wang
Symmetry 2025, 17(11), 1899; https://doi.org/10.3390/sym17111899 - 7 Nov 2025
Cited by 1 | Viewed by 450
Abstract
We recast cooperative localization and scheduling in heterogeneous multi-agent systems through the lens of symmetry and symmetry breaking. On the geometric side, the Fisher Information Matrix (FIM) objective is invariant to rigid Euclidean transformations of the global frame, while its maximization admits symmetric [...] Read more.
We recast cooperative localization and scheduling in heterogeneous multi-agent systems through the lens of symmetry and symmetry breaking. On the geometric side, the Fisher Information Matrix (FIM) objective is invariant to rigid Euclidean transformations of the global frame, while its maximization admits symmetric optimal sensor formations; on the algorithmic side, heterogeneity and task constraints break permutation symmetry across agents, requiring policies that are sensitive to role asymmetries. We model communication as a random graph and quantify structural symmetry via topology metrics (average path length, clustering, betweenness) and graph automorphism-related indices, connecting these to estimation uncertainty. We then design a hybrid reward for reinforcement learning (RL) that is equivariant to agent relabeling within roles yet intentionally introduces asymmetry through distance/FIM terms to avoid degenerate symmetric configurations with poor observability. Simulations show that (i) symmetry-aware, FIM-optimized path planning reduces localization error versus symmetric but non-informative placements; and (ii) controlled symmetry breaking in policy learning improves robustness and data rate–reward trade-offs over baselines. Our results position symmetry/asymmetry as first-class design principles that unify estimation-theoretic invariances with learning-based coordination in complex heterogeneous networks. Under DDPG training, the total data rate (SDR) reaches 6.63±0.97 and the average reward per step (ARPS) is 80.70±6.94, representing improvements of approximately 11.8% over the baseline (5.93±3.51) and 11.1% over SAC (5.97±2.66), respectively. The network’s mean shortest-path length is L=1.721, and the average betweenness centrality of the coordination nodes is ≈0.098. Moreover, the FIM-optimized path-planning strategy achieves the lowest localization error among all evaluated policies. Full article
Show Figures

Figure 1

25 pages, 1436 KB  
Article
Scaling Swarm Coordination with GNNs—How Far Can We Go?
by Gianluca Aguzzi, Davide Domini, Filippo Venturini and Mirko Viroli
AI 2025, 6(11), 282; https://doi.org/10.3390/ai6110282 - 1 Nov 2025
Viewed by 1003
Abstract
The scalability of coordination policies is a critical challenge in swarm robotics, where agent numbers may vary substantially between deployment scenarios. Reinforcement learning (RL) offers a promising avenue for learning decentralized policies from local interactions, yet a fundamental question remains: can policies trained [...] Read more.
The scalability of coordination policies is a critical challenge in swarm robotics, where agent numbers may vary substantially between deployment scenarios. Reinforcement learning (RL) offers a promising avenue for learning decentralized policies from local interactions, yet a fundamental question remains: can policies trained on one swarm size transfer to different population scales without retraining? This zero-shot transfer problem is particularly challenging because the traditional RL approaches learn fixed-dimensional representations tied to specific agent counts, making them brittle to population changes at deployment time. While existing work addresses scalability through population-aware training (e.g., mean-field methods) or multi-size curricula (e.g., population transfer learning), these approaches either impose restrictive assumptions or require explicit exposure to varied team sizes during training. Graph Neural Networks (GNNs) offer a fundamentally different path. Their permutation invariance and ability to process variable-sized graphs suggest potential for zero-shot generalization across swarm sizes, where policies trained on a single population scale could deploy directly to larger or smaller teams. However, this capability remains largely unexplored in the context of swarm coordination. For this reason, we empirically investigate this question by combining GNNs with deep Q-learning in cooperative swarms. We focused on well-established 2D navigation tasks that are commonly used in the swarm robotics literature to study coordination and scalability, providing a controlled yet meaningful setting for our analysis. To address this, we introduce Deep Graph Q-Learning (DGQL), which embeds agent-neighbor graphs into Q-learning and trains on fixed-size swarms. Across two benchmarks (goal reaching and obstacle avoidance), we deploy up to three times larger teams. The DGQL preserves a functional coordination without retraining, but efficiency degrades with size. The ultimate goal distance grows monotonically (15–29 agents) and worsens beyond roughly twice the training size (20 agents), with task-dependent trade-offs. Our results quantify scalability limits of GNN-enhanced DQL and suggest architectural and training strategies to better sustain performance across scales. Full article
(This article belongs to the Section AI in Autonomous Systems)
Show Figures

Figure 1

14 pages, 451 KB  
Article
Federated Decision Transformers for Scalable Reinforcement Learning in Smart City IoT Systems
by Laila AlTerkawi and Mokhled AlTarawneh
Future Internet 2025, 17(11), 492; https://doi.org/10.3390/fi17110492 - 27 Oct 2025
Viewed by 1236
Abstract
The rapid proliferation of devices on the Internet of Things (IoT) in smart city environments enables autonomous decision-making, but introduces challenges of scalability, coordination, and privacy. Existing reinforcement learning (RL) methods, such as Multi-Agent Actor–Critic (MAAC), depend on centralized critics and recurrent structures, [...] Read more.
The rapid proliferation of devices on the Internet of Things (IoT) in smart city environments enables autonomous decision-making, but introduces challenges of scalability, coordination, and privacy. Existing reinforcement learning (RL) methods, such as Multi-Agent Actor–Critic (MAAC), depend on centralized critics and recurrent structures, which limit scalability and create single points of failure. This paper proposes a Federated Decision Transformer (FDT) framework that integrates transformer-based sequence modeling with federated learning. By replacing centralized critics with self-attention-driven trajectory modeling, the FDT preserves data locality, enhances privacy, and supports decentralized policy learning across distributed IoT nodes. We benchmarked the FDT against MAAC in a mobile edge computing (MEC) environment with identical hyperparameter configurations. The results demonstrate that the FDT achieves superior reward efficiency, scalability, and adaptability in dynamic IoT networks, although with slightly higher variance during early training. These findings highlight transformer-based federated RL as a robust and privacy-preserving alternative to critic-based methods for large-scale IoT systems. Full article
(This article belongs to the Special Issue Internet of Things (IoT) in Smart City)
Show Figures

Figure 1

20 pages, 10806 KB  
Article
An Adaptive Exploration-Oriented Multi-Agent Co-Evolutionary Method Based on MATD3
by Suyu Wang, Zhentao Lyu, Quan Yue, Qichen Shang, Ya Ke and Feng Gao
Electronics 2025, 14(21), 4181; https://doi.org/10.3390/electronics14214181 - 26 Oct 2025
Viewed by 1049
Abstract
As artificial intelligence continues to evolve, reinforcement learning (RL) has shown remarkable potential for solving complex sequential decision problems and is now applied in diverse areas, including robotics, autonomous vehicles, and financial analytics. Among the various RL paradigms, multi-agent reinforcement learning (MARL) stands [...] Read more.
As artificial intelligence continues to evolve, reinforcement learning (RL) has shown remarkable potential for solving complex sequential decision problems and is now applied in diverse areas, including robotics, autonomous vehicles, and financial analytics. Among the various RL paradigms, multi-agent reinforcement learning (MARL) stands out for its ability to manage cooperative and competitive interactions within multi-entity systems. However, mainstream MARL algorithms still face critical challenges in training stability and policy generalization due to factors such as environmental non-stationarity, policy coupling, and inefficient sample utilization. To mitigate these limitations, this study introduces an enhanced algorithm named MATD3_AHD, developed by extending the MATD3 framework, which integrates TD3 and MADDPG principles. The goal is to improve the learning efficiency and overall policy effectiveness of agents operating in complex environments. The proposed method incorporates three key mechanisms: (1) an Adaptive Exploration Policy (AEP), which dynamically adjusts the perturbation magnitude based on TD error to improve both exploration capability and training stability; (2) a Hierarchical Sampling Policy (HSP), which enhances experience utilization through sample clustering and prioritized replay; and (3) a Dynamic Delayed Update (DDU), which adaptively modulates the actor update frequency based on critic network errors, thereby accelerating convergence and improving policy stability. Experiments conducted on multiple benchmark tasks within the Multi-Agent Particle Environment (MPE) demonstrate the superior performance of MATD3_AHD compared to baseline methods such as MADDPG and MATD3. The proposed MATD3_AHD algorithm outperforms baseline methods—by an average of 5% over MATD3 and 20% over MADDPG—achieving faster convergence, higher rewards, and more stable policy learning, thereby confirming its robustness and generalization capability. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

20 pages, 1343 KB  
Article
Hybrid CDN Architecture Integrating Edge Caching, MEC Offloading, and Q-Learning-Based Adaptive Routing
by Aymen D. Salman, Akram T. Zeyad, Asia Ali Salman Al-karkhi, Safanah M. Raafat and Amjad J. Humaidi
Computers 2025, 14(10), 433; https://doi.org/10.3390/computers14100433 - 13 Oct 2025
Cited by 1 | Viewed by 1592
Abstract
Content Delivery Networks (CDNs) have evolved to meet surging data demands and stringent low-latency requirements driven by emerging applications like high-definition video streaming, virtual reality, and IoT. This paper proposes a hybrid CDN architecture that synergistically combines edge caching, Multi-access Edge Computing (MEC) [...] Read more.
Content Delivery Networks (CDNs) have evolved to meet surging data demands and stringent low-latency requirements driven by emerging applications like high-definition video streaming, virtual reality, and IoT. This paper proposes a hybrid CDN architecture that synergistically combines edge caching, Multi-access Edge Computing (MEC) offloading, and reinforcement learning (Q-learning) for adaptive routing. In the proposed system, popular content is cached at radio access network edges (e.g., base stations) and computation-intensive tasks are offloaded to MEC servers, while a Q-learning agent dynamically routes user requests to the optimal service node (cache, MEC server, or origin) based on the network state. The study presented detailed system design and provided comprehensive simulation-based evaluation. The results demonstrate that the proposed hybrid approach significantly improves cache hit ratios and reduces end-to-end latency compared to traditional CDNs and simpler edge architectures. The Q-learning-enabled routing adapts to changing load and content popularity, converging to efficient policies that outperform static baselines. The proposed hybrid model has been tested against variants lacking MEC, edge caching, or the RL-based controller to isolate each component’s contributions. The paper concludes with a discussion on practical considerations, limitations, and future directions for intelligent CDN networking at the edge. Full article
(This article belongs to the Special Issue Edge and Fog Computing for Internet of Things Systems (2nd Edition))
Show Figures

Figure 1

17 pages, 1076 KB  
Article
Adaptive Cyber Defense Through Hybrid Learning: From Specialization to Generalization
by Muhammad Omer Farooq
Future Internet 2025, 17(10), 464; https://doi.org/10.3390/fi17100464 - 9 Oct 2025
Viewed by 619
Abstract
This paper introduces a hybrid learning framework that synergistically combines Reinforcement Learning (RL) and Supervised Learning (SL) to train autonomous cyber-defense agents capable of operating effectively in dynamic and adversarial environments. The proposed approach leverages RL for strategic exploration and policy development, while [...] Read more.
This paper introduces a hybrid learning framework that synergistically combines Reinforcement Learning (RL) and Supervised Learning (SL) to train autonomous cyber-defense agents capable of operating effectively in dynamic and adversarial environments. The proposed approach leverages RL for strategic exploration and policy development, while incorporating SL to distill high-reward trajectories into refined policy updates, enhancing sample efficiency, learning stability, and robustness. The framework first targets specialized agent training, where each agent is optimized against a specific adversarial behavior. Subsequently, it is extended to enable the training of a generalized agent that learns to counter multiple, diverse attack strategies through multi-task and curriculum learning techniques. Comprehensive experiments conducted in the CybORG simulation environment demonstrate that the hybrid RL–SL framework consistently outperforms pure RL baselines across both specialized and generalized settings, achieving higher cumulative rewards. Specifically, hybrid-trained agents achieve up to 23% higher cumulative rewards in specialized defense tasks and approximately 18% improvements in generalized defense scenarios compared to RL-only agents. Moreover, incorporating temporal context into the observation space yields a further 4–6% performance gain in policy robustness. Furthermore, we investigate the impact of augmenting the observation space with historical actions and rewards, revealing consistent, albeit incremental, gains in SL-based learning performance. Key contributions of this work include: (i) a novel hybrid learning paradigm that integrates RL and SL for effective cyber-defense policy learning, (ii) a scalable extension for training generalized agents across heterogeneous threat models, and (iii) empirical analysis on the role of temporal context in agent observability and decision-making. Collectively, the results highlight the promise of hybrid learning strategies for building intelligent, resilient, and adaptable cyber-defense systems in evolving threat landscapes. Full article
(This article belongs to the Special Issue AI and Security in 5G Cooperative Cognitive Radio Networks)
Show Figures

Figure 1

34 pages, 2388 KB  
Article
Safe Reinforcement Learning for Buildings: Minimizing Energy Use While Maximizing Occupant Comfort
by Mohammad Esmaeili, Sascha Hammes, Samuele Tosatto, David Geisler-Moroder and Philipp Zech
Energies 2025, 18(19), 5313; https://doi.org/10.3390/en18195313 - 9 Oct 2025
Cited by 2 | Viewed by 2207
Abstract
With buildings accounting for 40% of global energy consumption, heating, ventilation, and air conditioning (HVAC) systems represent the single largest opportunity for emissions reduction, consuming up to 60% of commercial building energy while maintaining occupant comfort. This critical balance between energy efficiency and [...] Read more.
With buildings accounting for 40% of global energy consumption, heating, ventilation, and air conditioning (HVAC) systems represent the single largest opportunity for emissions reduction, consuming up to 60% of commercial building energy while maintaining occupant comfort. This critical balance between energy efficiency and human comfort has traditionally relied on rule-based and model predictive control strategies. Given the multi-objective nature and complexity of modern HVAC systems, these approaches fall short in satisfying both objectives. Recently, reinforcement learning (RL) has emerged as a method capable of learning optimal control policies directly from system interactions without requiring explicit models. However, standard RL approaches frequently violate comfort constraints during exploration, making them unsuitable for real-world deployment where occupant comfort cannot be compromised. This paper addresses two fundamental challenges in HVAC control: the difficulty of constrained optimization in RL and the challenge of defining appropriate comfort constraints across diverse conditions. We adopt a safe RL with a neural barrier certificate framework that (1) transforms the constrained HVAC problem into an unconstrained optimization and (2) constructs these certificates in a data-driven manner using neural networks, adapting to building-specific comfort patterns without manual threshold setting. This approach enables the agent to almost guarantee solutions that improve energy efficiency and ensure defined comfort limits. We validate our approach through seven experiments spanning residential and commercial buildings, from single-zone heat pump control to five-zone variable air volume (VAV) systems. Our safe RL framework achieves energy reduction compared to baseline operation while maintaining higher comfort compliance than unconstrained RL. The data-driven barrier construction discovers building-specific comfort patterns, enabling context-aware optimization impossible with fixed thresholds. While neural approximation prevents absolute safety guarantees, reducing catastrophic safety failures compared to unconstrained RL while maintaining adaptability positions this approach as a developmental bridge between RL theory and real-world building automation, though the considerable gap in both safety and energy performance relative to rule-based control indicates the method requires substantial improvement for practical deployment. Full article
(This article belongs to the Special Issue Energy Efficiency and Energy Saving in Buildings)
Show Figures

Figure 1

50 pages, 4498 KB  
Review
Reinforcement Learning for Electric Vehicle Charging Management: Theory and Applications
by Panagiotis Michailidis, Iakovos Michailidis and Elias Kosmatopoulos
Energies 2025, 18(19), 5225; https://doi.org/10.3390/en18195225 - 1 Oct 2025
Viewed by 2648
Abstract
The growing complexity of electric vehicle charging station (EVCS) operations—driven by grid constraints, renewable integration, user variability, and dynamic pricing—has positioned reinforcement learning (RL) as a promising approach for intelligent, scalable, and adaptive control. After outlining the core theoretical foundations, including RL algorithms, [...] Read more.
The growing complexity of electric vehicle charging station (EVCS) operations—driven by grid constraints, renewable integration, user variability, and dynamic pricing—has positioned reinforcement learning (RL) as a promising approach for intelligent, scalable, and adaptive control. After outlining the core theoretical foundations, including RL algorithms, agent architectures, and EVCS classifications, this review presents a structured survey of influential research, highlighting how RL has been applied across various charging contexts and control scenarios. This paper categorizes RL methodologies from value-based to actor–critic and hybrid frameworks, and explores their integration with optimization techniques, forecasting models, and multi-agent coordination strategies. By examining key design aspects—including agent structures, training schemes, coordination mechanisms, reward formulation, data usage, and evaluation protocols—this review identifies broader trends across central control dimensions such as scalability, uncertainty management, interpretability, and adaptability. In addition, the review assesses common baselines, performance metrics, and validation settings used in the literature, linking algorithmic developments with real-world deployment needs. By bridging theoretical principles with practical insights, this work provides comprehensive directions for future RL applications in EVCS control, while identifying methodological gaps and opportunities for safer, more efficient, and sustainable operation. Full article
(This article belongs to the Special Issue Advanced Technologies for Electrified Transportation and Robotics)
Show Figures

Figure 1

20 pages, 3181 KB  
Article
Integrating Reinforcement Learning and LLM with Self-Optimization Network System
by Xing Xu, Jianbin Zhao, Yu Zhang and Rongpeng Li
Network 2025, 5(3), 39; https://doi.org/10.3390/network5030039 - 16 Sep 2025
Viewed by 2736
Abstract
The rapid expansion of communication networks and increasingly complex service demands have presented significant challenges to the intelligent management of network resources. To address these challenges, we have proposed a network self-optimization framework integrating the predictive capabilities of the Large Language Model (LLM) [...] Read more.
The rapid expansion of communication networks and increasingly complex service demands have presented significant challenges to the intelligent management of network resources. To address these challenges, we have proposed a network self-optimization framework integrating the predictive capabilities of the Large Language Model (LLM) with the decision-making capabilities of multi-agent Reinforcement Learning (RL). Specifically, historical network traffic data are converted into structured inputs to forecast future traffic patterns using a GPT-2-based prediction module. Concurrently, a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm leverages real-time sensor data—including link delay and packet loss rates collected by embedded network sensors—to dynamically optimize bandwidth allocation. This sensor-driven mechanism enables the system to perform real-time optimization of bandwidth allocation, ensuring accurate monitoring and proactive resource scheduling. We evaluate our framework in a heterogeneous network simulated using Mininet under diverse traffic scenarios. Experimental results show that the proposed method significantly reduces network latency and packet loss, as well as improves robustness and resource utilization, highlighting the effectiveness of integrating sensor-driven RL optimization with predictive insights from LLMs. Full article
Show Figures

Figure 1

Back to TopTop