Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (842)

Search Parameters:
Keywords = Markov Decision Process

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 5234 KB  
Article
Training Agents for Strategic Curling Through a Unified Reinforcement Learning Framework
by Yuseong Son, Jaeyoung Park and Byunghwan Jeon
Mathematics 2026, 14(3), 403; https://doi.org/10.3390/math14030403 - 23 Jan 2026
Abstract
Curling presents a challenging continuous-control problem in which shot outcomes depend on long-horizon interactions between complex physical dynamics, strategic intent, and opponent responses. Despite recent progress in applying reinforcement learning (RL) to games and sports, curling lacks a unified environment that jointly supports [...] Read more.
Curling presents a challenging continuous-control problem in which shot outcomes depend on long-horizon interactions between complex physical dynamics, strategic intent, and opponent responses. Despite recent progress in applying reinforcement learning (RL) to games and sports, curling lacks a unified environment that jointly supports stable, rule-consistent simulation, structured state abstraction, and scalable agent training. To address this gap, we introduce a comprehensive learning framework for curling AI, consisting of a full-sized simulation environment, a task-aligned Markov decision process (MDP) formulation, and a two-phase training strategy designed for stable long-horizon optimization. First, we propose a novel MDP formulation that incorporates stone configuration, game context, and dynamic scoring factors, enabling an RL agent to reason simultaneously about physical feasibility and strategic desirability. Second, we present a two-phase curriculum learning procedure that significantly improves sample efficiency: Phase 1 trains the agent to master delivery mechanics by rewarding accurate placement around the tee line, while Phase 2 transitions to strategic learning with score-based rewards that encourage offensive and defensive planning. This staged training stabilizes policy learning and reduces the difficulty of direct exploration in the full curling action space. We integrate this MDP and training procedure into a unified Curling RL Framework, built upon a custom simulator designed for stability, reproducibility, and efficient RL training and a self-play mechanism tailored for strategic decision-making. Agent policies are optimized using Soft Actor–Critic (SAC), an entropy-regularized off-policy algorithm designed for continuous control. As a case study, we compare the learned agent’s shot patterns with elite match records from the men’s division of the Le Gruyère AOP European Curling Championships 2023, using 6512 extracted shot images. Experimental results demonstrate that the proposed framework learns diverse, human-like curling shots and outperforms ablated variants across both learning curves and head-to-head evaluations. Beyond curling, our framework provides a principled template for developing RL agents in physics-driven, strategy-intensive sports environments. Full article
(This article belongs to the Special Issue Applications of Intelligent Game and Reinforcement Learning)
17 pages, 26741 KB  
Article
Dual-Agent Deep Reinforcement Learning for Low-Carbon Economic Dispatch in Wind-Integrated Microgrids Based on Carbon Emission Flow
by Wenjun Qiu, Hebin Ruan, Xiaoxiao Yu, Yuhang Li, Yicheng Liu and Zhiyi He
Energies 2026, 19(2), 551; https://doi.org/10.3390/en19020551 - 22 Jan 2026
Viewed by 6
Abstract
High renewable penetration in microgrids makes low-carbon economic dispatch under uncertainty challenging, and single-agent deep reinforcement learning (DRL) often yields unstable cost–emission trade-offs. This study proposes a dual-agent DRL framework that explicitly balances operational economy and environmental sustainability. A Proximal Policy Optimization (PPO) [...] Read more.
High renewable penetration in microgrids makes low-carbon economic dispatch under uncertainty challenging, and single-agent deep reinforcement learning (DRL) often yields unstable cost–emission trade-offs. This study proposes a dual-agent DRL framework that explicitly balances operational economy and environmental sustainability. A Proximal Policy Optimization (PPO) agent focuses on minimizing operating cost, while a Soft Actor–Critic (SAC) agent targets carbon emission reduction; their actions are combined through an adaptive weighting strategy. The framework is supported by carbon emission flow (CEF) theory, which enables network-level tracing of carbon flows, and a stepped carbon pricing mechanism that internalizes dynamic carbon costs. Demand response (DR) is incorporated to enhance operational flexibility. The dispatch problem is formulated as a Markov Decision Process, allowing the dual-agent system to learn policies through interaction with the environment. Case studies on a modified PJM 5-bus test system show that, compared with a Deep Deterministic Policy Gradient (DDPG) baseline, the proposed method reduces total operating cost, carbon emissions, and wind curtailment by 16.8%, 11.3%, and 15.2%, respectively. These results demonstrate that the proposed framework is an effective solution for economical and low-carbon operation in renewable-rich power systems. Full article
Show Figures

Figure 1

32 pages, 2490 KB  
Article
SADQN-Based Residual Energy-Aware Beamforming for LoRa-Enabled RF Energy Harvesting for Disaster-Tolerant Underground Mining Networks
by Hilary Kelechi Anabi, Samuel Frimpong and Sanjay Madria
Sensors 2026, 26(2), 730; https://doi.org/10.3390/s26020730 (registering DOI) - 21 Jan 2026
Viewed by 58
Abstract
The end-to-end efficiency of radio-frequency (RF)-powered wireless communication networks (WPCNs) in post-disaster underground mine environments can be enhanced through adaptive beamforming. The primary challenges in such scenarios include (i) identifying the most energy-constrained nodes, i.e., nodes with the lowest residual energy to prevent [...] Read more.
The end-to-end efficiency of radio-frequency (RF)-powered wireless communication networks (WPCNs) in post-disaster underground mine environments can be enhanced through adaptive beamforming. The primary challenges in such scenarios include (i) identifying the most energy-constrained nodes, i.e., nodes with the lowest residual energy to prevent the loss of tracking and localization functionality; (ii) avoiding reliance on the computationally intensive channel state information (CSI) acquisition process; and (iii) ensuring long-range RF wireless power transfer (LoRa-RFWPT). To address these issues, this paper introduces an adaptive and safety-aware deep reinforcement learning (DRL) framework for energy beamforming in LoRa-enabled underground disaster networks. Specifically, we develop a Safe Adaptive Deep Q-Network (SADQN) that incorporates residual energy awareness to enhance energy harvesting under mobility, while also formulating a SADQN approach with dual-variable updates to mitigate constraint violations associated with fairness, minimum energy thresholds, duty cycle, and uplink utilization. A mathematical model is proposed to capture the dynamics of post-disaster underground mine environments, and the problem is formulated as a constrained Markov decision process (CMDP). To address the inherent NP hardness of this constrained reinforcement learning (CRL) formulation, we employ a Lagrangian relaxation technique to reduce complexity and derive near-optimal solutions. Comprehensive simulation results demonstrate that SADQN significantly outperforms all baseline algorithms: increasing cumulative harvested energy by approximately 11% versus DQN, 15% versus Safe-DQN, and 40% versus PSO, and achieving substantial gains over random beamforming and non-beamforming approaches. The proposed SADQN framework maintains fairness indices above 0.90, converges 27% faster than Safe-DQN and 43% faster than standard DQN in terms of episodes, and demonstrates superior stability, with 33% lower performance variance than Safe-DQN and 66% lower than DQN after convergence, making it particularly suitable for safety-critical underground mining disaster scenarios where reliable energy delivery and operational stability are paramount. Full article
25 pages, 3073 KB  
Article
A Two-Stage Intelligent Reactive Power Optimization Method for Power Grids Based on Dynamic Voltage Partitioning
by Tianliang Xue, Xianxin Gan, Lei Zhang, Su Wang, Qin Li and Qiuting Guo
Electronics 2026, 15(2), 447; https://doi.org/10.3390/electronics15020447 - 20 Jan 2026
Viewed by 65
Abstract
Aiming at issues such as reactive power distribution fluctuations and insufficient local support caused by large-scale integration of renewable energy in new power systems, as well as the poor adaptability of traditional methods and bottlenecks of deep reinforcement learning in complex power grids, [...] Read more.
Aiming at issues such as reactive power distribution fluctuations and insufficient local support caused by large-scale integration of renewable energy in new power systems, as well as the poor adaptability of traditional methods and bottlenecks of deep reinforcement learning in complex power grids, a two-stage intelligent optimization method for grid reactive power based on dynamic voltage partitioning is proposed. Firstly, a comprehensive indicator system covering modularity, regulation capability, and membership degree is constructed. Adaptive MOPSO is employed to optimize K-means clustering centers, achieving dynamic grid partitioning and decoupling large-scale optimization problems. Secondly, a Markov Decision Process model is established for each partition, incorporating a penalty mechanism for safety constraint violations into the reward function. The DDPG algorithm is improved through multi-experience pool probabilistic replay and sampling mechanisms to enhance agent training. Finally, an optimal reactive power regulation scheme is obtained through two-stage collaborative optimization. Simulation case studies demonstrate that this method effectively reduces solution complexity, accelerates convergence, accurately addresses reactive power dynamic distribution and local support deficiencies, and ensures voltage security and optimal grid losses. Full article
Show Figures

Figure 1

33 pages, 4465 KB  
Article
Environmentally Sustainable HVAC Management in Smart Buildings Using a Reinforcement Learning Framework SACEM
by Abdullah Alshammari, Ammar Ahmed E. Elhadi and Ashraf Osman Ibrahim
Sustainability 2026, 18(2), 1036; https://doi.org/10.3390/su18021036 - 20 Jan 2026
Viewed by 110
Abstract
Heating, ventilation, and air-conditioning (HVAC) systems dominate energy consumption in hot-climate buildings, where maintaining occupant comfort under extreme outdoor conditions remains a critical challenge, particularly under emerging time-of-use (TOU) electricity pricing schemes. While deep reinforcement learning (DRL) has shown promise for adaptive HVAC [...] Read more.
Heating, ventilation, and air-conditioning (HVAC) systems dominate energy consumption in hot-climate buildings, where maintaining occupant comfort under extreme outdoor conditions remains a critical challenge, particularly under emerging time-of-use (TOU) electricity pricing schemes. While deep reinforcement learning (DRL) has shown promise for adaptive HVAC control, existing approaches often suffer from comfort violations, myopic decision making, and limited robustness to uncertainty. This paper proposes a comfort-first hybrid control framework that integrates Soft Actor–Critic (SAC) with a Cross-Entropy Method (CEM) refinement layer, referred to as SACEM. The framework combines data-efficient off-policy learning with short-horizon predictive optimization and safety-aware action projection to explicitly prioritize thermal comfort while minimizing energy use, operating cost, and peak demand. The control problem is formulated as a Markov Decision Process using a simplified thermal model representative of commercial buildings in hot desert climates. The proposed approach is evaluated through extensive simulation using Saudi Arabian summer weather conditions, realistic occupancy patterns, and a three-tier TOU electricity tariff. Performance is assessed against state-of-the-art baselines, including PPO, TD3, and standard SAC, using comfort, energy, cost, and peak demand metrics, complemented by ablation and disturbance-based stress tests. Results show that SACEM achieves a comfort score of 95.8%, while reducing energy consumption and operating cost by approximately 21% relative to the strongest baseline. The findings demonstrate that integrating comfort-dominant reward design with decision-time look-ahead yields robust, economically viable HVAC control suitable for deployment in hot-climate smart buildings. Full article
Show Figures

Figure 1

26 pages, 972 KB  
Article
Constructing Non-Markovian Decision Process via History Aggregator
by Yongyi Wang, Lingfeng Li and Wenxin Li
Appl. Sci. 2026, 16(2), 955; https://doi.org/10.3390/app16020955 - 16 Jan 2026
Viewed by 148
Abstract
In the domain of algorithmic decision-making, non-Markovian dynamics manifest as a significant impediment, especially for paradigms such as Reinforcement Learning (RL), thereby exerting far-reaching consequences on the advancement and effectiveness of the associated systems. Nevertheless, the existing benchmarks are deficient in comprehensively assessing [...] Read more.
In the domain of algorithmic decision-making, non-Markovian dynamics manifest as a significant impediment, especially for paradigms such as Reinforcement Learning (RL), thereby exerting far-reaching consequences on the advancement and effectiveness of the associated systems. Nevertheless, the existing benchmarks are deficient in comprehensively assessing the capacity of decision algorithms to handle non-Markovian dynamics. To address this deficiency, we have devised a generalized methodology grounded in category theory. Notably, we established the category of Markov Decision Processes (MDP) and the category of non-Markovian Decision Processes (NMDP), and proved the equivalence relationship between them. This theoretical foundation provides a novel perspective for understanding and addressing non-Markovian dynamics. We further introduced non-Markovianity into decision-making problem settings via the History Aggregator for State (HAS). With HAS, we can precisely control the state dependency structure of decision-making problems in the time series. Our analysis demonstrates the effectiveness of our method in representing a broad range of non-Markovian dynamics. This approach facilitates a more rigorous and flexible evaluation of decision algorithms by testing them in problem settings where non-Markovian dynamics are explicitly constructed. Full article
(This article belongs to the Special Issue Advances in Intelligent Decision-Making Systems)
Show Figures

Figure 1

28 pages, 2028 KB  
Article
Dynamic Resource Games in the Wood Flooring Industry: A Bayesian Learning and Lyapunov Control Framework
by Yuli Wang and Athanasios V. Vasilakos
Algorithms 2026, 19(1), 78; https://doi.org/10.3390/a19010078 - 16 Jan 2026
Viewed by 158
Abstract
Wood flooring manufacturers face complex challenges in dynamically allocating resources across multi-channel markets, characterized by channel conflicts, demand uncertainty, and long-term cumulative effects of decisions. Traditional static optimization or myopic approaches struggle to address these intertwined factors, particularly when critical market states like [...] Read more.
Wood flooring manufacturers face complex challenges in dynamically allocating resources across multi-channel markets, characterized by channel conflicts, demand uncertainty, and long-term cumulative effects of decisions. Traditional static optimization or myopic approaches struggle to address these intertwined factors, particularly when critical market states like brand reputation and customer base cannot be precisely observed. This paper establishes a systematic and theoretically grounded online decision framework to tackle this problem. We first model the problem as a Partially Observable Stochastic Dynamic Game. The core innovation lies in introducing an unobservable market position vector as the central system state, whose evolution is jointly influenced by firm investments, inter-channel competition, and macroeconomic randomness. The model further captures production lead times, physical inventory dynamics, and saturation/cross-channel effects of marketing investments, constructing a high-fidelity dynamic system. To solve this complex model, we propose a hierarchical online learning and control algorithm named L-BAP (Lyapunov-based Bayesian Approximate Planning), which innovatively integrates three core modules. It employs particle filters for Bayesian inference to nonparametrically estimate latent market states online. Simultaneously, the algorithm constructs a Lyapunov optimization framework that transforms long-term discounted reward objectives into tractable single-period optimization problems through virtual debt queues, while ensuring stability of physical systems like inventory. Finally, the algorithm embeds a game-theoretic module to predict and respond to rational strategic reactions from each channel. We provide theoretical performance analysis, rigorously proving the mean-square boundedness of system queues and deriving the performance gap between long-term rewards and optimal policies under complete information. This bound clearly quantifies the trade-off between estimation accuracy (determined by particle count) and optimization parameters. Extensive simulations demonstrate that our L-BAP algorithm significantly outperforms several strong baselines—including myopic learning and decentralized reinforcement learning methods—across multiple dimensions: long-term profitability, inventory risk control, and customer service levels. Full article
(This article belongs to the Section Analysis of Algorithms and Complexity Theory)
Show Figures

Figure 1

26 pages, 10192 KB  
Article
Multi-Robot Task Allocation with Spatiotemporal Constraints via Edge-Enhanced Attention Networks
by Yixiang Hu, Daxue Liu, Jinhong Li, Junxiang Li and Tao Wu
Appl. Sci. 2026, 16(2), 904; https://doi.org/10.3390/app16020904 - 15 Jan 2026
Viewed by 143
Abstract
Multi-Robot Task Allocation (MRTA) with spatiotemporal constraints presents significant challenges in environmental adaptability. Existing learning-based methods often overlook environmental spatial constraints, leading to spatial information distortion. To address this, we formulate the problem as an asynchronous Markov Decision Process over a directed heterogeneous [...] Read more.
Multi-Robot Task Allocation (MRTA) with spatiotemporal constraints presents significant challenges in environmental adaptability. Existing learning-based methods often overlook environmental spatial constraints, leading to spatial information distortion. To address this, we formulate the problem as an asynchronous Markov Decision Process over a directed heterogeneous graph and propose a novel heterogeneous graph neural network named the Edge-Enhanced Attention Network (E2AN). This network integrates a specialized encoder, the Edge-Enhanced Heterogeneous Graph Attention Network (E2HGAT), with an attention-based decoder. By incorporating edge attributes to effectively characterize path costs under spatial constraints, E2HGAT corrects spatial distortion. Furthermore, our approach supports flexible extension to diverse payload scenarios via node attribute adaptation. Extensive experiments conducted in simulated environments with obstructed maps demonstrate that the proposed method outperforms baseline algorithms in task success rate. Remarkably, the model maintains its advantages in generalization tests on unseen maps as well as in scalability tests across varying problem sizes. Ablation studies further validate the critical role of the proposed encoder in capturing spatiotemporal dependencies. Additionally, real-time performance analysis confirms the method’s feasibility for online deployment. Overall, this study offers an effective solution for MRTA problems with complex constraints. Full article
(This article belongs to the Special Issue Motion Control for Robots and Automation)
Show Figures

Figure 1

19 pages, 6478 KB  
Article
An Intelligent Dynamic Cluster Partitioning and Regulation Strategy for Distribution Networks
by Keyan Liu, Kaiyuan He, Dongli Jia, Huiyu Zhan, Wanxing Sheng, Zukun Li, Yuxuan Huang, Sijia Hu and Yong Li
Energies 2026, 19(2), 384; https://doi.org/10.3390/en19020384 - 13 Jan 2026
Viewed by 174
Abstract
As distributed generators (DGs) and flexible adjustable loads (FALs) further penetrate distribution networks (DNs), to reduce regulation complexity compared with traditional centralized control frameworks, DGs and FALs in DNs should be packed in several clusters to enable their dispatch to become standard in [...] Read more.
As distributed generators (DGs) and flexible adjustable loads (FALs) further penetrate distribution networks (DNs), to reduce regulation complexity compared with traditional centralized control frameworks, DGs and FALs in DNs should be packed in several clusters to enable their dispatch to become standard in the industry. To mitigate the negative influence of DGs’ and FALs’ spatiotemporal distribution and uncertain output characteristics on dispatch, this paper proposes an intelligent dynamic cluster partitioning strategy for DNs, from which the DN’s resources and loads can be intelligently aggregated, organized, and regulated in a dynamic and optimal way with relatively high implementation efficiency. An environmental model based on the Markov decision process (MDP) technique is first developed for DN cluster partitioning, in which a continuous state space, a discrete action space, and a dispatching performance-oriented reward are designed. Then, a novel random forest Q-learning network (RF-QN) is developed to implement dynamic cluster partitioning by interacting with the proposed environmental model, from which the generalization and robust capability to estimate the Q-function can be improved by taking advantage of combining deep learning and decision trees. Finally, a modified IEEE-33-node system is adopted to verify the effectiveness of the proposed intelligent dynamic cluster partitioning and regulation strategy; the results also indicate that the proposed RF-QN is superior to the traditional deep Q-learning (DQN) model in terms of renewable energy accommodation rate, training efficiency, and portioning and regulation performance. Full article
(This article belongs to the Special Issue Advanced in Modeling, Analysis and Control of Microgrids)
Show Figures

Figure 1

27 pages, 5365 KB  
Article
Autonomous Maneuvering Decision-Making Method for Unmanned Aerial Vehicle Based on Soft Actor-Critic Algorithm
by Shiming Quan, Su Cao, Chang Wang and Huangchao Yu
Drones 2026, 10(1), 35; https://doi.org/10.3390/drones10010035 - 6 Jan 2026
Viewed by 219
Abstract
Focusing on continuous action space methods for autonomous maneuvering decision making in 1v1 unmanned aerial vehicle scenarios, this paper first establishes a UAV kinematic model and a decision-making framework under the Markov Decision Process. Second, a continuous control strategy based on the Soft [...] Read more.
Focusing on continuous action space methods for autonomous maneuvering decision making in 1v1 unmanned aerial vehicle scenarios, this paper first establishes a UAV kinematic model and a decision-making framework under the Markov Decision Process. Second, a continuous control strategy based on the Soft Actor-Critic (SAC) reinforcement learning algorithm is developed to generate precise maneuvering commands. Then, a multi-dimensional situation-coupled reward function is designed, introducing a Health Point (HP) metric to assess situational advantages and simulate cumulative effects quantitatively. Finally, extensive simulations in a custom Gym environment validate the effectiveness of the proposed method and its robustness under both ideal and noisy observation conditions. Full article
Show Figures

Figure 1

17 pages, 1664 KB  
Article
SBF-DRL: A Multi-Vehicle Safety Enhancement Framework Based on Deep Reinforcement Learning with Integrated Safety Barrier Function
by Yanfei Peng, Wei Yuan, Fei Miao and Wei Hao
World Electr. Veh. J. 2026, 17(1), 24; https://doi.org/10.3390/wevj17010024 - 5 Jan 2026
Viewed by 169
Abstract
Although deep reinforcement learning has achieved great success in the field of autonomous driving, it still faces technical obstacles, such as balancing safety and efficiency in complex driving environments. This paper proposes a deep reinforcement learning multi-vehicle safety enhancement framework that integrates a [...] Read more.
Although deep reinforcement learning has achieved great success in the field of autonomous driving, it still faces technical obstacles, such as balancing safety and efficiency in complex driving environments. This paper proposes a deep reinforcement learning multi-vehicle safety enhancement framework that integrates a safety barrier function (SBF-DRL). SBF-DRL first provides independent monitoring assurance for each autonomous vehicle through redundant functions and maintains safety in local vehicles to ensure the safety of the entire multi-autonomous vehicle driving system. Secondly, combining the safety barrier function constraints and the deep reinforcement learning algorithm, a meta-control policy using Markov Decision Process modeling is proposed to provide a safe logic switching assurance mechanism. The experimental results show that SBF-DRL’s collision rate is controlled below 3% in various driving scenarios, which is far lower than other baseline algorithms, and achieves a more effective trade-off between safety and efficiency. Full article
(This article belongs to the Section Vehicle and Transportation Systems)
Show Figures

Figure 1

26 pages, 2431 KB  
Article
Multi-Objective Deep Reinforcement Learning for Dynamic Task Scheduling Under Time-of-Use Electricity Price in Cloud Data Centers
by Xiao Liao, Yiqian Li, Luyao Liu, Lihao Deng, Jinlong Hu and Xiaofei Wu
Electronics 2026, 15(1), 232; https://doi.org/10.3390/electronics15010232 - 4 Jan 2026
Viewed by 273
Abstract
The high energy consumption and substantial electricity costs of cloud data centers pose significant challenges related to carbon emissions and operational expenses for service providers. The temporal variability of electricity pricing in real-world scenarios adds complexity to this problem while simultaneously offering novel [...] Read more.
The high energy consumption and substantial electricity costs of cloud data centers pose significant challenges related to carbon emissions and operational expenses for service providers. The temporal variability of electricity pricing in real-world scenarios adds complexity to this problem while simultaneously offering novel opportunities for mitigation. This study addresses the task scheduling optimization problem under time-of-use pricing conditions in cloud computing environments by proposing an innovative task scheduling approach. To balance the three competing objectives of electricity cost, energy consumption, and task delay, we formulate a price-aware, multi-objective task scheduling optimization problem and establish a Markov decision process model. By integrating prioritized experience replay with a multi-objective preference vector selection mechanism, we design a dynamic, multi-objective deep reinforcement learning algorithm named TEPTS. The simulation results demonstrate that TEPTS achieves superior convergence and diversity compared to three other multi-objective optimization methods while exhibiting excellent scalability across varying test durations and system workload intensities. Specifically, under the TOU pricing scenario, the task migration rate during peak periods exceeds 33.90%, achieving a 13.89% to 36.89% reduction in energy consumption and a 14.09% to 45.33% reduction in electricity costs. Full article
Show Figures

Figure 1

13 pages, 10544 KB  
Article
Stability-Guaranteed Grant-Free Access for Cyber–Physical System over Space–Air–Ground Integrated Networks
by Xiaoyang Wang, Wei Li, Zhiyu Li, Dan Liu, Guangchuan Pan and Yan Wu
Electronics 2026, 15(1), 193; https://doi.org/10.3390/electronics15010193 - 1 Jan 2026
Viewed by 164
Abstract
In this paper, we investigate the grant-free (GF) accessing for cyber–physical systems (CPSs) over space–air–ground integrated networks (SAGINs) by jointly considering system stability and power consumption. The problem of GF access for CPSs over SAGINs is modeled as a Markov decision process where [...] Read more.
In this paper, we investigate the grant-free (GF) accessing for cyber–physical systems (CPSs) over space–air–ground integrated networks (SAGINs) by jointly considering system stability and power consumption. The problem of GF access for CPSs over SAGINs is modeled as a Markov decision process where preamble sequences are chosen to minimize power consumption while guaranteeing system stability. To solve this problem, a distributed multi-agent deep reinforcement learning framework based on factorization technology is proposed. In addition, a local network based on hierarchical reinforcement learning is designed to prevent the explosion of the dimension of the action space, in turn reducing the computational complexity of the proposed algorithm. Finally, the simulation results validate the performance superiority of the proposed scheme in terms of convergence, power consumption and stability compared with the baseline schemes. Full article
Show Figures

Figure 1

25 pages, 3099 KB  
Article
Research on Improved PPO-Based Unmanned Surface Vehicle Trajectory Tracking Control Integrated with Pure Pursuit Guidance
by Hongyu Li, Runyu Yang, Yu Zhang, Yicheng Wen, Qunhong Tian, Weizhuang Ma, Zongsheng Wang and Shaobo Yang
J. Mar. Sci. Eng. 2026, 14(1), 70; https://doi.org/10.3390/jmse14010070 - 30 Dec 2025
Viewed by 220
Abstract
To address the low trajectory tracking accuracy and limited robustness of conventional reinforcement learning algorithms under complex marine environments involving wind, wave, and current disturbances, this study proposes a proximal policy optimization (PPO) algorithm incorporating an intrinsic curiosity mechanism to solve the unmanned [...] Read more.
To address the low trajectory tracking accuracy and limited robustness of conventional reinforcement learning algorithms under complex marine environments involving wind, wave, and current disturbances, this study proposes a proximal policy optimization (PPO) algorithm incorporating an intrinsic curiosity mechanism to solve the unmanned surface vehicle (USV) trajectory tracking control problem. The proposed approach is developed on the basis of a three-degree-of-freedom (3-DOF) USV model and formulated within a Markov decision process (MDP) framework, where a multidimensional state space and a continuous action space are defined, and a multi-objective composite reward function is designed. By incorporating a pure pursuit guidance algorithm, the complexity of engineering implementation is reduced. Furthermore, an improved PPO algorithm integrated with an intrinsic curiosity mechanism is adopted as the trajectory tracking controller, in which the exploration incentives provided by the intrinsic curiosity module (ICM) guide the agent to explore the state space efficiently and converge rapidly to an optimal control policy. The final experimental results indicate that, compared with the conventional PPO algorithm, the improved PPO–ICM controller achieves a reduction of 54.2% in average lateral error and 47.1% in average heading error under simple trajectory conditions. Under the complex trajectory condition, the average lateral error and average heading error are reduced by 91.8% and 41.9%, respectively. These results effectively demonstrate that the proposed PPO–ICM algorithm attains high tracking accuracy and strong generalization capability across different trajectory scenarios, and can provide a valuable reference for the application of intelligent control algorithms in the USV domain. Full article
Show Figures

Figure 1

25 pages, 4363 KB  
Article
Demand Response Potential Evaluation Based on Multivariate Heterogeneous Features and Stacking Mechanism
by Chong Gao, Zhiheng Xu, Ran Cheng, Junxiao Zhang, Xinghang Weng, Huahui Zhang, Tao Yu and Wencong Xiao
Energies 2026, 19(1), 194; https://doi.org/10.3390/en19010194 - 30 Dec 2025
Viewed by 221
Abstract
Accurate evaluation of demand response (DR) potential at the individual user level is critical for the effective implementation and optimization of demand response programs. However, existing data-driven methods often suffer from insufficient feature representation, limited characterization of load profile dynamics, and ineffective fusion [...] Read more.
Accurate evaluation of demand response (DR) potential at the individual user level is critical for the effective implementation and optimization of demand response programs. However, existing data-driven methods often suffer from insufficient feature representation, limited characterization of load profile dynamics, and ineffective fusion of heterogeneous features, leading to suboptimal evaluation performance. To address these challenges, this paper proposes a novel demand response potential evaluation method based on multivariate heterogeneous features and a Stacking-based ensemble mechanism. First, multidimensional indicator features are extracted from historical electricity consumption data and external factors (e.g., weather, time-of-use pricing), capturing load shape, variability, and correlation characteristics. Second, to enrich the information space and preserve temporal dynamics, typical daily load profiles are transformed into two-dimensional image features using the Gramian Angular Difference Field (GADF), the Markov Transition Field (MTF), and an Improved Recurrence Plot (IRP), which are then fused into a single RGB image. Third, a differentiated modeling strategy is adopted: scalar indicator features are processed by classical machine learning models (Support Vector Machine, Random Forest, XGBoost), while image features are fed into a deep convolutional neural network (SE-ResNet-20). Finally, a Stacking ensemble learning framework is employed to intelligently integrate the outputs of base learners, with a Decision Tree as the meta-learner, thereby enhancing overall evaluation accuracy and robustness. Experimental results on a real-world dataset demonstrate that the proposed method achieves superior performance compared to individual models and conventional fusion approaches, effectively leveraging both structured indicators and unstructured image representations for high-precision demand response potential evaluation. Full article
(This article belongs to the Section F1: Electrical Power System)
Show Figures

Figure 1

Back to TopTop