Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (141)

Search Parameters:
Keywords = advantage actor–critic

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 516 KiB  
Article
Exploring a Sustainable Pathway Towards Enhancing National Innovation Capacity from an Empirical Analysis
by Sylvia Novillo-Villegas, Ana Belén Tulcanaza-Prieto, Alexander X. Chantera and Christian Chimbo
Sustainability 2025, 17(15), 6922; https://doi.org/10.3390/su17156922 - 30 Jul 2025
Viewed by 162
Abstract
Innovation is a strategic driver of sustainable competitive advantage and long-term economic growth. This study proposes an empirical framework to support the sustained development of national innovation capacity by examining key enabling factors. Drawing on an extensive review of the literature, the research [...] Read more.
Innovation is a strategic driver of sustainable competitive advantage and long-term economic growth. This study proposes an empirical framework to support the sustained development of national innovation capacity by examining key enabling factors. Drawing on an extensive review of the literature, the research investigates the interrelationships among governmental support (GS), innovation agents (IA), university–industry R&D collaborations (UIRD), and innovation cluster development (ICD), and their influence on two critical innovation outcomes, knowledge creation (KC) and knowledge diffusion (KD). Using panel data from G7 countries spanning 2008 to 2018, sourced from international organizations such as the World Bank, the World Intellectual Property Organization, and the World Economic Forum, the study applies regression analysis to test the proposed conceptual model. Results highlight the foundational role of GS in providing a balanced framework to foster collaborative networks among IA and enhancing the effectiveness of UIRD. Furthermore, IA emerges as a pivotal actor in advancing innovation efforts, while the development of innovation clusters is shown to selectively enhance specific innovation outcomes. These findings offer theoretical and practical contributions for policymakers, researchers, and stakeholders aiming to design supportive ecosystems that strengthen sustainable national innovation capacity. Full article
Show Figures

Figure 1

32 pages, 1432 KiB  
Article
From Carbon to Capability: How Corporate Green and Low-Carbon Transitions Foster New Quality Productive Forces in China
by Lili Teng, Yukun Luo and Shuwen Wei
Sustainability 2025, 17(15), 6657; https://doi.org/10.3390/su17156657 - 22 Jul 2025
Viewed by 393
Abstract
China’s national strategies emphasize both achieving carbon peaking and neutrality (“dual carbon” objectives) and fostering high-quality economic development. This dual focus highlights the critical importance of the Green and Low-Carbon Transition (GLCT) of the economy and the development of New Quality Productive Forces [...] Read more.
China’s national strategies emphasize both achieving carbon peaking and neutrality (“dual carbon” objectives) and fostering high-quality economic development. This dual focus highlights the critical importance of the Green and Low-Carbon Transition (GLCT) of the economy and the development of New Quality Productive Forces (NQPF). Firms are central actors in this transformation, prompting the core research question: How does corporate engagement in GLCT contribute to the formation of NQPF? We investigate this relationship using panel data comprising 33,768 firm-year observations for A-share listed companies across diverse industries in China from 2012 to 2022. Corporate GLCT is measured via textual analysis of annual reports, while an NQPF index, incorporating both tangible and intangible dimensions, is constructed using the entropy method. Our empirical analysis relies primarily on fixed-effects regressions, supplemented by various robustness checks and alternative econometric specifications. The results demonstrate a significantly positive relationship: corporate GLCT robustly promotes the development of NQPF, with dynamic lag structures suggesting delayed productivity realization. Mechanism analysis reveals that this effect operates through three primary channels: improved access to financing, stimulated collaborative innovation and enhanced resource-allocation efficiency. Heterogeneity analysis indicates that the positive impact of GLCT on NQPF is more pronounced for state-owned enterprises (SOEs), firms operating in high-emission sectors, those in energy-efficient or environmentally friendly industries, technology-intensive sectors, non-heavily polluting industries and companies situated in China’s eastern regions. Overall, our findings suggest that corporate GLCT enhances NQPF by improving resource-utilization efficiency and fostering innovation, with these effects amplified by specific regional advantages and firm characteristics. This study offers implications for corporate strategy, highlighting how aligning GLCT initiatives with core business objectives can drive NQPF, and provides evidence relevant for policymakers aiming to optimize environmental governance and foster sustainable economic pathways. Full article
(This article belongs to the Section Economic and Business Aspects of Sustainability)
Show Figures

Figure 1

10 pages, 1207 KiB  
Proceeding Paper
Generalized Net Model for Analysis of Behavior and Efficiency of Intelligent Virtual Agents in Risky Environment
by Dilyana Budakova, Velyo Vasilev and Lyudmil Dakovski
Eng. Proc. 2025, 100(1), 56; https://doi.org/10.3390/engproc2025100056 - 17 Jul 2025
Viewed by 63
Abstract
In this article, two generalized net models (GNMs) are proposed to study the behavior and effectiveness of intelligent virtual agents (IVA) working in a risky environment under different scenarios and training algorithms. The proposed GNMs allow for the selection of machine learning algorithms [...] Read more.
In this article, two generalized net models (GNMs) are proposed to study the behavior and effectiveness of intelligent virtual agents (IVA) working in a risky environment under different scenarios and training algorithms. The proposed GNMs allow for the selection of machine learning algorithms such as intensity of characteristics Q-learning (InCh-Q), as well as the modification of multi-plan reinforcement learning (RL), proximal policy optimization (PPO), soft actor–critic (SAC), the generative adversarial imitation learning (GAIL) algorithm, and behavioral cloning (CB). The choice of action, the change in priorities, and the achievement of goals by the IVA are studied under different scenarios, such as fire extinguishing, rescue operations, evacuation, patrolling, and training. Transitions in the GNMs represent the scenarios and learning algorithms. The tokens that pass through the GNMs can be the GNMs of the IVA architecture or the IVA memory model, which are enriched with knowledge and experience during the experiments, as the scenarios develop. The proposed GNMs are formally correct and, at the same time, understandable, practically applicable, and convenient for interpretation. Achieving GNMs that meet these requirements is a complex problem. Therefore, issues related to the design and use of GNMs for the reliable modeling and analysis of the behavior and effectiveness of IVAs operating in a dynamic and risky environment are discussed. Some advantages and challenges in using GNMs compared to other classical models used to study IVA behavior are considered. Full article
Show Figures

Figure 1

22 pages, 2108 KiB  
Article
Deep Reinforcement Learning for Real-Time Airport Emergency Evacuation Using Asynchronous Advantage Actor–Critic (A3C) Algorithm
by Yujing Zhou, Yupeng Yang, Bill Deng Pan, Yongxin Liu, Sirish Namilae, Houbing Herbert Song and Dahai Liu
Mathematics 2025, 13(14), 2269; https://doi.org/10.3390/math13142269 - 15 Jul 2025
Viewed by 383
Abstract
Emergencies can occur unexpectedly and require immediate action, especially in aviation, where time pressure and uncertainty are high. This study focused on improving emergency evacuation in airport and aircraft scenarios using real-time decision-making support. A system based on the Asynchronous Advantage Actor–Critic (A3C) [...] Read more.
Emergencies can occur unexpectedly and require immediate action, especially in aviation, where time pressure and uncertainty are high. This study focused on improving emergency evacuation in airport and aircraft scenarios using real-time decision-making support. A system based on the Asynchronous Advantage Actor–Critic (A3C) algorithm, an advanced deep reinforcement learning method, was developed to generate faster and more efficient evacuation routes compared to traditional models. The A3C model was tested in various scenarios, including different environmental conditions and numbers of agents, and its performance was compared with the Deep Q-Network (DQN) algorithm. The results showed that A3C achieved evacuations 43.86% faster on average and converged in fewer episodes (100 vs. 250 for DQN). In dynamic environments with moving threats, A3C also outperformed DQN in maintaining agent safety and adapting routes in real time. As the number of agents increased, A3C maintained high levels of efficiency and robustness. These findings demonstrate A3C’s strong potential to enhance evacuation planning through improved speed, adaptability, and scalability. The study concludes by highlighting the practical benefits of applying such models in real-world emergency response systems, including significantly faster evacuation times, real-time adaptability to evolving threats, and enhanced scalability for managing large crowds in high-density environments including airport terminals. The A3C-based model offers a cost-effective alternative to full-scale evacuation drills by enabling virtual scenario testing, supports proactive safety planning through predictive modeling, and contributes to the development of intelligent decision-support tools that improve coordination and reduce response time during emergencies. Full article
Show Figures

Figure 1

37 pages, 1029 KiB  
Article
Autonomous Reinforcement Learning for Intelligent and Sustainable Autonomous Microgrid Energy Management
by Iacovos Ioannou, Saher Javaid, Yasuo Tan and Vasos Vassiliou
Electronics 2025, 14(13), 2691; https://doi.org/10.3390/electronics14132691 - 3 Jul 2025
Viewed by 401
Abstract
Effective energy management in microgrids is essential for integrating renewable energy sources and maintaining operational stability. Machine learning (ML) techniques offer significant potential for optimizing microgrid performance. This study provides a comprehensive comparative performance evaluation of four ML-based control strategies: deep Q-networks (DQNs), [...] Read more.
Effective energy management in microgrids is essential for integrating renewable energy sources and maintaining operational stability. Machine learning (ML) techniques offer significant potential for optimizing microgrid performance. This study provides a comprehensive comparative performance evaluation of four ML-based control strategies: deep Q-networks (DQNs), proximal policy optimization (PPO), Q-learning, and advantage actor–critic (A2C). These strategies were rigorously tested using simulation data from a representative islanded microgrid model, with metrics evaluated across diverse seasonal conditions (autumn, spring, summer, winter). Key performance indicators included overall episodic reward, unmet load, excess generation, energy storage system (ESS) state-of-charge (SoC) imbalance, ESS utilization, and computational runtime. Results from the simulation indicate that the DQN-based agent consistently achieved superior performance across all evaluated seasons, effectively balancing economic rewards, reliability, and battery health while maintaining competitive computational runtimes. Specifically, DQN delivered near-optimal rewards by significantly reducing unmet load, minimizing excess renewable energy curtailment, and virtually eliminating ESS SoC imbalance, thereby prolonging battery life. Although the tabular Q-learning method showed the lowest computational latency, it was constrained by limited adaptability in more complex scenarios. PPO and A2C, while offering robust performance, incurred higher computational costs without additional performance advantages over DQN. This evaluation clearly demonstrates the capability and adaptability of the DQN approach for intelligent and autonomous microgrid management, providing valuable insights into the relative advantages and limitations of various ML strategies in complex energy management scenarios. Full article
(This article belongs to the Special Issue Artificial Intelligence-Driven Emerging Applications)
Show Figures

Figure 1

22 pages, 1150 KiB  
Article
Risk-Sensitive Deep Reinforcement Learning for Portfolio Optimization
by Xinyao Wang and Lili Liu
J. Risk Financial Manag. 2025, 18(7), 347; https://doi.org/10.3390/jrfm18070347 - 22 Jun 2025
Viewed by 1098
Abstract
Navigating the complexity of petroleum futures markets—marked by extreme volatility, geopolitical uncertainty, and macroeconomic shocks—demands adaptive and risk-sensitive strategies. This paper explores an Adaptive Risk-sensitive Transformer-based Deep Reinforcement Learning (ART-DRL) framework to improve portfolio optimization in commodity futures trading. While deep reinforcement learning [...] Read more.
Navigating the complexity of petroleum futures markets—marked by extreme volatility, geopolitical uncertainty, and macroeconomic shocks—demands adaptive and risk-sensitive strategies. This paper explores an Adaptive Risk-sensitive Transformer-based Deep Reinforcement Learning (ART-DRL) framework to improve portfolio optimization in commodity futures trading. While deep reinforcement learning (DRL) has been applied in equities and forex, its use in commodities remains underexplored. We evaluate DRL models, including Deep Q-Networks (DQN), Proximal Policy Optimization (PPO), Advantage Actor-Critic (A2C), and Deep Deterministic Policy Gradient (DDPG), integrating dynamic reward functions and asset-specific optimization. Empirical results show improvements in risk-adjusted performance, with an annualized return of 1.353, a Sharpe Ratio of 4.340, and a Sortino Ratio of 57.766. Although the return is below DQN (1.476), the proposed model achieves better stability and risk control. Notably, the models demonstrate resilience by learning from historical periods of extreme volatility, including the COVID-19 pandemic (2020–2021) and geopolitical shocks such as the Russia–Ukraine conflict (2022), despite testing commencing in January 2023. This research offers a practical, data-driven framework for risk-sensitive decision-making in commodities, showing how machine learning can support portfolio management under volatile market conditions. Full article
Show Figures

Figure 1

17 pages, 379 KiB  
Article
Paradoxes of Language Policy in Morocco: Deconstructing the Ideology of Language Alternation and the Resurgence of French in STEM Instruction
by Brahim Chakrani, Adam Ziad and Abdenbi Lachkar
Languages 2025, 10(6), 135; https://doi.org/10.3390/languages10060135 - 9 Jun 2025
Viewed by 949
Abstract
Language-in-education policies often serve hidden political and economic agendas, and thus language policy research must examine policies beyond official state discourse. This article critically analyzes Morocco’s Language Alternation Policy (LAP), introduced in 2019, using the historical–structural approach. It examines the broader historical context [...] Read more.
Language-in-education policies often serve hidden political and economic agendas, and thus language policy research must examine policies beyond official state discourse. This article critically analyzes Morocco’s Language Alternation Policy (LAP), introduced in 2019, using the historical–structural approach. It examines the broader historical context and structural factors that shape the adoption and implementation of LAP. While the official policy discourse frames LAP as an egalitarian reform aimed at promoting balanced multilingualism by alternating instructional media in science education, its de facto implementation reveals a stark contradiction. The ideological underpinnings of LAP are the resurgence of French as the exclusive medium of instruction in science and technology classrooms. This policy undercuts a decades-long Arabization of science and the promotion of the Amazigh language, as well as denying Moroccans the potential advantages of learning English. The disparity between official policy discourse and implementation reveals the influence of France’s neocolonial agenda, exercised through Francophonie, international clientelism, and financial patronage. Through implementing LAP to align with France’s interests in Morocco, French-trained political actors undermine the country’s decolonization efforts and preserve the long-standing socioeconomic privileges of the francophone elite. We analyze how LAP functions ideologically to resolidify France’s cultural and linguistic hegemony and reinforce pre- and post-independence linguistic and social inequalities. Full article
(This article belongs to the Special Issue Sociolinguistic Studies: Insights from Arabic)
19 pages, 2374 KiB  
Article
Vehicle Lateral Control Based on Augmented Lagrangian DDPG Algorithm
by Zhi Li, Meng Wang and Haitao Zhao
Appl. Sci. 2025, 15(10), 5463; https://doi.org/10.3390/app15105463 - 13 May 2025
Viewed by 432
Abstract
This paper studies the safe trajectory tracking control of intelligent vehicles, which is still an open and challenging problem. A deep reinforcement learning algorithm based on augmented Lagrangian safety constraints is proposed to the lateral control of vehicle trajectory tracking. First, the tracking [...] Read more.
This paper studies the safe trajectory tracking control of intelligent vehicles, which is still an open and challenging problem. A deep reinforcement learning algorithm based on augmented Lagrangian safety constraints is proposed to the lateral control of vehicle trajectory tracking. First, the tracking control of intelligent vehicles is described as a reinforcement learning process based on the Constrained Markov Decision Process (CMDP). The actor-critic neural network based reinforcement learning framework is established and the environment of reinforcement learning is designed to include the vehicle model, tracking model, road model and reward function. Secondly, the augmented Lagrangian Deep Deterministic Policy Gradient (DDPG) method is proposed for updating, in which a replay separation buffer method is used to solve the problem of sample correlation, and a neural network with the same structure is copied to solve the update divergence problem. Finally, a vehicle lateral control approach is obtained, whose effectiveness and advantages over existing results are verified through simulation results. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

26 pages, 469 KiB  
Article
Research on Offloading and Resource Allocation for MEC with Energy Harvesting Based on Deep Reinforcement Learning
by Jun Chen, Junyu Mi, Chen Guo, Qing Fu, Weidong Tang, Wenlang Luo and Qing Zhu
Electronics 2025, 14(10), 1911; https://doi.org/10.3390/electronics14101911 - 8 May 2025
Cited by 1 | Viewed by 512
Abstract
Mobile edge computing (MEC) systems empowered by energy harvesting (EH) significantly enhance sustainable computing capabilities for mobile devices (MDs). This paper investigates a multi-user multi-server MEC network, in which energy-constrained users dynamically harvest ambient energy to flexibly allocate resources among local computation, task [...] Read more.
Mobile edge computing (MEC) systems empowered by energy harvesting (EH) significantly enhance sustainable computing capabilities for mobile devices (MDs). This paper investigates a multi-user multi-server MEC network, in which energy-constrained users dynamically harvest ambient energy to flexibly allocate resources among local computation, task offloading, or intentional task discarding. We formulate a stochastic optimization problem aiming to minimize the time-averaged weighted sum of execution delay, energy consumption, and task discard penalty. To address the energy causality constraints and temporal coupling effects, we develop a Lyapunov optimization-based drift-plus-penalty framework that decomposes the long-term optimization into sequential per-time-slot subproblems. Furthermore, to overcome the curse of dimensionality in high-dimensional action, we propose hierarchical deep reinforcement learning (DRL) solutions incorporating both Q-learning with experience replay and asynchronous advantage actor–critic (A3C) architectures. Extensive simulations demonstrate that our DRL-driven approach achieves lower costs compared with conventional model predictive control methods, while maintaining robust performance under stochastic energy arrivals and channel variations. Full article
Show Figures

Figure 1

23 pages, 682 KiB  
Article
A Blockchain-Based Strategy for Certifying Timestamps in a Distributed Healthcare Emergency Response Systems
by Daniele Marletta, Alessandro Midolo and Emiliano Tramontana
Future Internet 2025, 17(5), 210; https://doi.org/10.3390/fi17050210 - 7 May 2025
Viewed by 841
Abstract
A high level of data integrity is a strong requirement in systems where the life of people depends on accurate and timely responses. In healthcare emergency response systems, a centralized authority that handles data related to occurring events is prone to challenges, such [...] Read more.
A high level of data integrity is a strong requirement in systems where the life of people depends on accurate and timely responses. In healthcare emergency response systems, a centralized authority that handles data related to occurring events is prone to challenges, such as, e.g., disputes over event timestamps and data authenticity. To address both the potential lack of trust among collaborating parties and the inability of an authority to clearly certify events by itself, this paper proposes a blockchain-based framework designed to provide proof of integrity and authenticity of data in healthcare emergency response systems. The proposed solution integrates blockchain technology to certify the accuracy of events throughout their incident lifecycle. Critical events are timestamped and hashed using SHA-256; then, such hashes are stored immutably on an EVM-compatible blockchain via smart contracts. The system combines blockchain technology with cloud storage to ensure scalability, security, and transparency. Blockchain technology provides the advantage of eliminating a trusted server, providing timestamping and reducing costs by forgoing such a service. The experimental results, using publicly available incident data, demonstrated the feasibility and effectiveness of this approach. The system provides a cost-effective, scalable solution for managing incident data while keeping a proof of their integrity. The proposed blockchain-based framework offers a reliable, transparent mechanism for certifying incident-related data. This fosters trust among healthcare emergency response system actors. Full article
(This article belongs to the Special Issue Security and Privacy in Blockchains and the IoT—3rd Edition)
Show Figures

Figure 1

24 pages, 8761 KiB  
Article
Interruption-Aware Computation Offloading in the Industrial Internet of Things
by Khoi Anh Bui and Myungsik Yoo
Sensors 2025, 25(9), 2904; https://doi.org/10.3390/s25092904 - 4 May 2025
Viewed by 598
Abstract
Designing an efficient task offloading system is essential in the Industrial Internet of Things (IIoT). Owing to the limited computational capability of IIoT devices, offloading tasks to edge servers enhances computational efficiency. When an edge server is overloaded, it may experience interruptions, preventing [...] Read more.
Designing an efficient task offloading system is essential in the Industrial Internet of Things (IIoT). Owing to the limited computational capability of IIoT devices, offloading tasks to edge servers enhances computational efficiency. When an edge server is overloaded, it may experience interruptions, preventing it from serving local devices. Existing studies mainly address interruptions by rerouting, rescheduling, or implementing reactive strategies to mitigate their impact. In this study, we introduce an interruption-aware proactive task offloading framework for IIoT. We develop a load-based interruption model in which the probability of server interruption is formulated as an exponential function of the total computational load, which provides a more realistic estimation of service availability. This framework employs Multi-Agent Advantage Actor–Critic (MAA2C)—a simple yet efficient approach that enables decentralized decision-making while handling large action spaces and maintaining coordination through the centralized critic to make adaptive offloading decisions, taking into account edge availability, resource limitations, device cooperation, and interruptions. Experimental results show that our approach effectively reduces the average total service delay by optimizing the tradeoff between system delay and availability in IIoT networks. Additionally, we investigate the impact of various system parameters on performance under an interruptible edge task offloading scenario, providing valuable insights into how these parameters influence the overall system behavior and efficiency. Full article
Show Figures

Figure 1

16 pages, 614 KiB  
Article
A Comparative Study of a Deep Reinforcement Learning Solution and Alternative Deep Learning Models for Wildfire Prediction
by Cristian Vidal-Silva, Roberto Pizarro, Miguel Castillo-Soto, Ben Ingram, Claudia de la Fuente, Vannessa Duarte, Claudia Sangüesa and Alfredo Ibañez
Appl. Sci. 2025, 15(7), 3990; https://doi.org/10.3390/app15073990 - 4 Apr 2025
Cited by 3 | Viewed by 1067
Abstract
Wildfires pose an escalating threat to ecosystems and human settlements, making accurate forecasting essential for early mitigation. This study compared three deep learning models for wildfire prediction: Deep Reinforcement Learning (DRL) with Actor–Critic architecture, Convolutional Neural Network (CNN), and Transformer-based models. The models [...] Read more.
Wildfires pose an escalating threat to ecosystems and human settlements, making accurate forecasting essential for early mitigation. This study compared three deep learning models for wildfire prediction: Deep Reinforcement Learning (DRL) with Actor–Critic architecture, Convolutional Neural Network (CNN), and Transformer-based models. The models were trained and evaluated using historical data from Chile (2000–2023), including wildfire occurrences, meteorological variables, topography, and vegetation indices. After preprocessing and class balancing, each model was tested over 100 experimental runs. All models achieved outstanding performance, with F1-Scores exceeding 0.999 and perfect AUC-ROC scores. The Transformer model showed a slight advantage over the CNN (99.94%) and Actor–Critic DRL (99.93%) in accuracy. Feature importance analysis identified wind speed, temperature, and vegetation indices as the most influential variables. While DRL offers theoretical benefits for adaptive decision-making, Transformer architectures more effectively capture spatiotemporal dependencies in wildfire dynamics. The findings can support the integration of deep learning models into early warning systems, contributing to proactive wildfire risk management. Future work will include validation with diverse regional datasets, real-time deployment, and collaboration with emergency response agencies. Full article
Show Figures

Figure 1

17 pages, 2956 KiB  
Article
A3C-R: A QoS-Oriented Energy-Saving Routing Algorithm for Software-Defined Networks
by Sunan Wang, Rong Song, Xiangyu Zheng, Wanwei Huang and Hongchang Liu
Future Internet 2025, 17(4), 158; https://doi.org/10.3390/fi17040158 - 3 Apr 2025
Cited by 1 | Viewed by 473
Abstract
With the rapid growth of Internet applications and network traffic, existing routing algorithms are usually difficult to guarantee the quality of service (QoS) indicators such as delay, bandwidth, and packet loss rate as well as network energy consumption for various data flows with [...] Read more.
With the rapid growth of Internet applications and network traffic, existing routing algorithms are usually difficult to guarantee the quality of service (QoS) indicators such as delay, bandwidth, and packet loss rate as well as network energy consumption for various data flows with business characteristics. They have problems such as unbalanced traffic scheduling and unreasonable network resource allocation. Aiming at the above problems, this paper proposes a QoS-oriented energy-saving routing algorithm A3C-R in the software-defined network (SDN) environment. Based on the asynchronous update advantages of the asynchronous advantage Actor-Critic (A3C) algorithm and the advantages of independent interaction between multiple agents and the environment, the A3C-R algorithm can effectively improve the convergence of the routing algorithm. The process of the A3C-R algorithm first takes QoS indicators such as delay, bandwidth, and packet loss rate and the network energy consumption of the link as input. Then, it creates multiple agents to start asynchronous training, through the continuous updating of Actors and Critics in each agent and periodically synchronizes the model parameters to the global model. After the algorithm training converges, it can output the link weights of the network topology to facilitate the calculation of intelligent routing strategies that meet QoS requirements and lower network energy consumption. The experimental results indicate that the A3C-R algorithm, compared to the baseline algorithms ECMP, I-DQN, and DDPG-EEFS, reduces delay by approximately 9.4%, increases throughput by approximately 7.0%, decreases the packet loss rate by approximately 9.5%, and improves energy-saving percentage by approximately 10.8%. Full article
Show Figures

Figure 1

26 pages, 4783 KiB  
Article
A Hybrid Decision-Making Framework for UAV-Assisted MEC Systems: Integrating a Dynamic Adaptive Genetic Optimization Algorithm and Soft Actor–Critic Algorithm with Hierarchical Action Decomposition and Uncertainty-Quantified Critic Ensemble
by Yu Yang, Yanjun Shi, Xing Cui, Jiajian Li and Xijun Zhao
Drones 2025, 9(3), 206; https://doi.org/10.3390/drones9030206 - 13 Mar 2025
Viewed by 1134
Abstract
With the continuous progress of UAV technology and the rapid development of mobile edge computing (MEC), the UAV-assisted MEC system has shown great application potential in special fields such as disaster rescue and emergency response. However, traditional deep reinforcement learning (DRL) decision-making methods [...] Read more.
With the continuous progress of UAV technology and the rapid development of mobile edge computing (MEC), the UAV-assisted MEC system has shown great application potential in special fields such as disaster rescue and emergency response. However, traditional deep reinforcement learning (DRL) decision-making methods suffer from limitations such as difficulty in balancing multiple objectives and training convergence when making mixed action space decisions for UAV path planning and task offloading. This article innovatively proposes a hybrid decision framework based on the improved Dynamic Adaptive Genetic Optimization Algorithm (DAGOA) and soft actor–critic with hierarchical action decomposition, an uncertainty-quantified critic ensemble, and adaptive entropy temperature, where DAGOA performs an effective search and optimization in discrete action space, while SAC can perform fine control and adjustment in continuous action space. By combining the above algorithms, the joint optimization of drone path planning and task offloading can be achieved, improving the overall performance of the system. The experimental results show that the framework offers significant advantages in improving system performance, reducing energy consumption, and enhancing task completion efficiency. When the system adopts a hybrid decision framework, the reward score increases by a maximum of 153.53% compared to pure deep reinforcement learning algorithms for decision-making. Moreover, it can achieve an average improvement of 61.09% on the basis of various reinforcement learning algorithms such as proposed SAC, proximal policy optimization (PPO), deep deterministic policy gradient (DDPG), and twin delayed deep deterministic policy gradient (TD3). Full article
(This article belongs to the Special Issue Unmanned Aerial Vehicles for Enhanced Emergency Response)
Show Figures

Figure 1

21 pages, 1959 KiB  
Article
Coverage Path Planning Using Actor–Critic Deep Reinforcement Learning
by Sergio Isahí Garrido-Castañeda, Juan Irving Vasquez and Mayra Antonio-Cruz
Sensors 2025, 25(5), 1592; https://doi.org/10.3390/s25051592 - 5 Mar 2025
Cited by 1 | Viewed by 1553
Abstract
One of the main capabilities a mobile robot must demonstrate is the ability to explore its environment. The core challenge in exploration lies in planning the route to fully cover the environment. Despite recent advances, this problem remains unsolved. This study proposes an [...] Read more.
One of the main capabilities a mobile robot must demonstrate is the ability to explore its environment. The core challenge in exploration lies in planning the route to fully cover the environment. Despite recent advances, this problem remains unsolved. This study proposes an approach to address the coverage path planning problem, where the mobile robot is tasked with exploring and completely covering a terrain using a deep reinforcement learning framework. The environment is divided into cells, with obstacles designated as prohibited areas. The robot is trained using two state-of-the-art reinforcement learning algorithms based on actor–critic methods: Advantage Actor–Critic (A2C) and Proximal Policy Optimization (PPO). By defining a set of observations, states, and a reward function tailored to characteristics of the environment and the desired behavior of the robot, the training process is conducted, resulting in optimized policies for each algorithm. Then, these policies are evaluated to determine the most effective approach to accomplish the proposed task. Our findings demonstrate that actor–critic methods can produce policies capable of guiding a robot to efficiently explore and cover new environments. Full article
(This article belongs to the Topic Advances in Mobile Robotics Navigation, 2nd Volume)
Show Figures

Figure 1

Back to TopTop