Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (74)

Search Parameters:
Keywords = constrained Markov decision process

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 11332 KB  
Article
Intelligent Optimization Methods for Cloud–Edge Collaborative Vehicular Networks via the Integration of Bayesian Decision-Making and Reinforcement Learning
by Youjian Yu, Zhaowei Song, Sifeng Zhu and Qinghua Zhang
Future Internet 2026, 18(4), 215; https://doi.org/10.3390/fi18040215 - 17 Apr 2026
Viewed by 126
Abstract
To improve vehicle user service quality and address data privacy and security issues in intelligent transportation vehicle networking systems, a three-tier communication architecture with cloud-edge-end collaboration was designed in this paper. A Bayesian decision criterion was utilized to divide user data segments into [...] Read more.
To improve vehicle user service quality and address data privacy and security issues in intelligent transportation vehicle networking systems, a three-tier communication architecture with cloud-edge-end collaboration was designed in this paper. A Bayesian decision criterion was utilized to divide user data segments into fine-grained slices based on their privacy levels, and differential privacy techniques were applied to protect the offloaded data. To achieve multi-objective optimization between user service quality and data privacy and security, the problem was formulated as a constrained Markov decision process. A communication model, a caching model, a latency model, an energy consumption model, and a data-fragment privacy protection model were designed. Additionally, a deep reinforcement learning algorithm based on the actor–critic approach was proposed for the collaborative and centralized training of multiple intelligent agents (CTMA-AC), enabling multi-objective optimization decision-making for the protection of offloaded private user data. Simulation experiments demonstrate that the proposed multi-agent collaborative privacy data offloading protection strategy can effectively safeguard private user data while ensuring high service quality. Full article
(This article belongs to the Section Network Virtualization and Edge/Fog Computing)
25 pages, 3942 KB  
Article
Deep Reinforcement Learning-Based Scheduling for an Electric–Hydrogen Integrated Station Using a Data-Driven Electrolyzer Model
by Dongdong Li, Liang Liu and Haiyu Liao
Appl. Sci. 2026, 16(7), 3605; https://doi.org/10.3390/app16073605 - 7 Apr 2026
Viewed by 352
Abstract
To address the inaccurate scheduling of electric–hydrogen integrated stations (EHISs) caused by the limited accuracy of conventional mechanistic models for proton exchange membrane (PEM) electrolyzers, this study proposes a deep reinforcement learning (DRL)-based scheduling strategy incorporating a data-driven electrolyzer model. First, a deep [...] Read more.
To address the inaccurate scheduling of electric–hydrogen integrated stations (EHISs) caused by the limited accuracy of conventional mechanistic models for proton exchange membrane (PEM) electrolyzers, this study proposes a deep reinforcement learning (DRL)-based scheduling strategy incorporating a data-driven electrolyzer model. First, a deep XGBoost model is developed to characterize the hydrogen production behavior of the PEM electrolyzer, thereby replacing the traditional mechanistic model and reducing prediction errors. Second, the EHIS scheduling problem is formulated as a constrained Markov decision process (CMDP) that explicitly considers user demand and carbon emission constraints. Third, an improved deep Q-network (DQN) algorithm integrating Lagrangian relaxation and the template policy-based reinforcement learning (TPRL) method is designed to solve the scheduling problem, which enhances convergence speed and generalization performance under similar operating scenarios. The simulation results demonstrate that the proposed method can effectively alleviate the decision-making risks introduced by model inaccuracies and significantly improve the operational profitability of the station while satisfying user demand and carbon emission constraints. Full article
(This article belongs to the Section Electrical, Electronics and Communications Engineering)
Show Figures

Figure 1

27 pages, 4837 KB  
Article
AI-Driven Adaptive Encryption Framework for a Modular Hardware-Based Data Security Device: Conceptual Architecture, Formal Foundations, and Security Analysis
by Pruthviraj Pawar and Gregory Epiphaniou
Appl. Sci. 2026, 16(7), 3522; https://doi.org/10.3390/app16073522 - 3 Apr 2026
Viewed by 272
Abstract
This paper presents a conceptual architecture for an AI-Driven Adaptive Encryption Device (AI-AED), a tri-modular hardware platform embodied in a registered industrial design. The device integrates a Secure Input Module, an AI-Enhanced Central Processing Unit with biometric authentication, and a Secure Output Module [...] Read more.
This paper presents a conceptual architecture for an AI-Driven Adaptive Encryption Device (AI-AED), a tri-modular hardware platform embodied in a registered industrial design. The device integrates a Secure Input Module, an AI-Enhanced Central Processing Unit with biometric authentication, and a Secure Output Module connected by unidirectional buses. We formalise the adaptive encryption policy as a constrained Markov decision process (CMDP) over a discrete action space of 216 cryptographic configurations, with safety constraints that provably prevent convergence to insecure states. A formal threat model based on extended Dolev–Yao assumptions with four physical access tiers defines attacker capabilities, and anti-downgrade safeguards enforce a monotonically non-decreasing security floor during threat escalation. An information-theoretic analysis shows that adaptive algorithm selection contributes an additional entropy term H(α) to ciphertext uncertainty, upper-bounded by log2(|L_enc|) ≈ 1.58 bits, while noting this represents increased attacker uncertainty rather than a strengthening of any individual cipher. A component-level latency model estimates 0.91–1.00 ms pipeline latency under normal operation and 3.14–3.42 ms under active threat, including integration overhead. Simulation validation over 1000 episodes compares a tabular Q-learning baseline against the proposed Deep Q-Network operating on the continuous state space: the DQN achieves 82% fewer constraint violations, 6× faster threat response, and more stable policy switching, demonstrating the advantage of continuous-state reinforcement learning for safety-critical adaptive encryption. All claims are positioned as theoretical contributions requiring empirical validation through prototype implementation. Full article
Show Figures

Figure 1

31 pages, 1333 KB  
Article
Optimal Security Task Offloading in Cognitive IoT Networks: Provably Optimal Threshold Policies and Model-Free Learning
by Ning Wang and Yali Ren
IoT 2026, 7(2), 30; https://doi.org/10.3390/iot7020030 - 26 Mar 2026
Viewed by 436
Abstract
The proliferation of Internet of Things (IoT) devices has introduced significant security challenges. Resource-constrained devices face sophisticated threats but lack the computational capacity for advanced security analysis. This study investigates optimal security task allocation in Cognitive IoT (CIoT) networks. It specifically examines when [...] Read more.
The proliferation of Internet of Things (IoT) devices has introduced significant security challenges. Resource-constrained devices face sophisticated threats but lack the computational capacity for advanced security analysis. This study investigates optimal security task allocation in Cognitive IoT (CIoT) networks. It specifically examines when IoT devices should process security tasks locally or offload them to Mobile Edge Computing (MEC) servers. The problem is formulated as a Continuous-Time Markov Decision Process (CTMDP). The study demonstrates that the optimal offloading policy has a threshold structure. Security tasks are offloaded to MEC servers when the offloading queue length is below a critical threshold, k. Otherwise, tasks are processed locally. This structural property is robust to changes in MEC server configurations and threat arrival patterns. It ensures an optimal and easily implementable security policy under the exponential model. Theoretical analysis establishes upper bounds on the performance of AI-based security controllers using the same models. The results also show that standard model-free Q-learning algorithms can recover optimal thresholds without any prior knowledge of the system parameters. Simulations across multiple reinforcement learning architectures, including Q-learning, State–Action–Reward–State–Action (SARSA), and Deep Q-networks (DQN), confirm that all methods converge to the predicted threshold. This empirically validates the analytical findings. The threshold structure remains effective under practical imperfections such as imperfect sensing and parameter estimation errors. Systems maintain 85% to 93% of their optimal performance. This work extends threshold Markov Decision Process (MDP) analysis from classical queuing theory to the context of CIoT security offloading. It provides optimal and practical policies and model-free algorithms for use by resource-constrained devices. Full article
Show Figures

Figure 1

37 pages, 2896 KB  
Article
Energy-Efficient Resilience Scheduling for Elevator Group Control via Queueing-Based Planning and Safe Reinforcement Learning
by Tingjie Zhang, Tiantian Zhang, Hao Zou, Chuanjiang Li and Jun Huang
Machines 2026, 14(3), 352; https://doi.org/10.3390/machines14030352 - 21 Mar 2026
Viewed by 337
Abstract
High-rise elevator group control systems operate under pronounced nonstationarity during commuting peaks, post-event surges, and capacity degradation, where the waiting time distribution becomes right-tail heavy and stresses service-level agreements (SLAs) defined by coverage and high-quantile targets. At the same time, the time-of-use tariffs [...] Read more.
High-rise elevator group control systems operate under pronounced nonstationarity during commuting peaks, post-event surges, and capacity degradation, where the waiting time distribution becomes right-tail heavy and stresses service-level agreements (SLAs) defined by coverage and high-quantile targets. At the same time, the time-of-use tariffs and carbon constraints sharpen the tension between peak-power control, energy savings, and service capacity. This paper proposes a two-layer resilience scheduling framework that integrates queueing-based planning with safe reinforcement learning (RL) fine-tuning. In the planning layer, parsimonious queueing approximations and scenario-based evaluation construct a finite set of implementable mode cards and emergency switching cards; Sample Average Approximation (SAA) combined with Conditional Value-at-Risk (CVaR) constraints filter candidates to enforce tail-risk-aware service limits while keeping power demand within a prescribed envelope. In the execution layer, online dispatch is formulated as a constrained Markov decision process; within the planning layer limits, action masking and Lagrangian safe RL learn small adaptive adjustments to suppress tail-waiting risk and improve recovery dynamics without increasing peak-power commitments. The experiments under morning peaks and post-event surges confirm tail risk reduction and accelerated recovery. For partial outages, the framework prioritizes SLA coverage and recovery speed, accepting a bounded increase in tail risk as a manageable trade-off. Throughout all tests, peak power remains within the prescribed limits. Improvements persist across random seeds and demand fluctuations, indicating distributional robustness and cross-scenario generalization. Ablation studies further reveal complementary roles: removing the planning layer CVaR screening worsens tail performance, while removing the execution layer action masking increases constraint violations and destabilizes recovery. Full article
Show Figures

Figure 1

30 pages, 1414 KB  
Article
Graph-Attention Constrained DRL for Joint Task Offloading and Resource Allocation in UAV-Assisted Internet of Vehicles
by Peiying Zhang, Xiangguo Zheng, Konstantin Igorevich Kostromitin, Wei Zhang, Huiling Shi and Lizhuang Tan
Drones 2026, 10(3), 201; https://doi.org/10.3390/drones10030201 - 13 Mar 2026
Viewed by 494
Abstract
Unmanned aerial vehicles (UAVs) acting as mobile aerial edge platforms can deliver on-demand communication and computing for the Internet of Vehicles (IoV) via flexible deployment and line-of-sight (LoS) links, improving reliability and reducing latency. However, high vehicle mobility, time-varying channels, and limited onboard [...] Read more.
Unmanned aerial vehicles (UAVs) acting as mobile aerial edge platforms can deliver on-demand communication and computing for the Internet of Vehicles (IoV) via flexible deployment and line-of-sight (LoS) links, improving reliability and reducing latency. However, high vehicle mobility, time-varying channels, and limited onboard energy make task offloading and resource coordination challenging. This paper studies joint task offloading and resource allocation in a UAV-assisted IoV system, where the UAV selects its hovering position from discrete candidate sites each time slot and splits vehicular tasks between the UAV and a roadside unit (RSU) to relieve backhaul congestion and enhance edge resource utilization. Considering vehicle mobility, multi-stage queue dynamics, and UAV energy consumption for communication, computation, and movement, the online optimization of position selection, task splitting, and bandwidth allocation is formulated as a constrained Markov decision process (CMDP). The goal is to maximize the number of tasks completed within the latency deadlines while satisfying the UAV energy budget. To solve this CMDP, we propose a graph-attention-based constrained twin delayed deep deterministic policy gradient (GAT-CTD3) algorithm. A graph attention network captures spatial correlations and resource competition among active vehicles, while a Lagrangian TD3 framework enforces long-term energy constraints and improves learning stability via twin critics, delayed policy updates, and target smoothing. The simulation results demonstrate that it outperforms the comparative scheme in terms of task completion rate, delay, and energy consumption per completed task, and exhibits strong robustness in situations with dense traffic. Full article
Show Figures

Figure 1

21 pages, 1796 KB  
Article
Research on Time Constraint Strategy of Flight Ground Support Operations Based on Causal Inference
by Xiaoqing Xing, Wenjing Wang, Hongyun Fan, Lei Xu and Mian Zhong
Aerospace 2026, 13(3), 272; https://doi.org/10.3390/aerospace13030272 - 13 Mar 2026
Viewed by 303
Abstract
To improve the punctuality of flight schedules, causal inference methods are introduced to model the potential causal structure and intervention effects among ground support operations of flights. The effectiveness of these methods in improving flight punctuality is verified under experimental conditions. When the [...] Read more.
To improve the punctuality of flight schedules, causal inference methods are introduced to model the potential causal structure and intervention effects among ground support operations of flights. The effectiveness of these methods in improving flight punctuality is verified under experimental conditions. When the causal relationship of Flight Ground Support (FGS) is determined, the research initiates from the perspective of FGS. A time-constrained strategy based on the Q-learning causal optimal strategy algorithm is proposed to transform causal effects into causal strategies. Initially, the influencing factors of FGS operations are classified into intervention groups. The causal effects of these influencing factors on their target support operations are calculated, and the influence degrees of the causes on the results within the causal relationship are investigated. Subsequently, the time constraint of the FGS process is characterized as a Markov decision process. The experimental results indicate that, compared with the traditional probability strategy, the causal strategy that considers the causal relationship enables over 51% of the flight plans to depart on time, with an average increase of 2.79%. The proposed method is not restricted to a specific airport or a single ground handling process configuration. Under the condition that ground handling operations are observable and sufficient historical operational data are available, it provides an interpretable optimization framework for time-constraint decision-making in flight ground handling operations across airports of different scales. Full article
(This article belongs to the Special Issue Emerging Trends in Air Traffic Flow and Airport Operations Control)
Show Figures

Figure 1

21 pages, 4680 KB  
Article
Hierarchical Thermocline-Aware Navigation for Underwater Gliders via Multi-Objective Path Planning and Reinforcement Learning
by Zizhao Song, Mingsong Bao and Tingting Guo
J. Mar. Sci. Eng. 2026, 14(5), 498; https://doi.org/10.3390/jmse14050498 - 6 Mar 2026
Viewed by 413
Abstract
Navigation planning and execution for underwater gliders operating in thermocline-affected environments is challenging due to the coupled influence of energy constraints, spatially distributed environmental disturbances, and limited control authority. Spatially varying thermocline structures act as structured environmental disturbances that degrade motion efficiency and [...] Read more.
Navigation planning and execution for underwater gliders operating in thermocline-affected environments is challenging due to the coupled influence of energy constraints, spatially distributed environmental disturbances, and limited control authority. Spatially varying thermocline structures act as structured environmental disturbances that degrade motion efficiency and tracking accuracy, and therefore must be explicitly considered in both path planning and control design. This paper proposes a hierarchical control-oriented decision framework for underwater glider navigation in thermocline regions. At the planning layer, a thermocline-aware multi-objective optimization problem is formulated to regulate the trade-off between navigation efficiency and cumulative environmental disturbance, characterized by total path length and cumulative thermocline exposure, respectively. A multi-objective artificial bee colony (MOABC) algorithm is employed to generate a set of Pareto-optimal reference trajectories that explicitly reveal this trade-off. At the execution layer, pitch angle regulation is formulated as a stochastic tracking control problem under environmental uncertainty. A Markov Decision Process (MDP) is constructed to model the coupled effects of pitch control on energy consumption and trajectory deviation, and a deep deterministic policy gradient (DDPG) algorithm is adopted to synthesize a feedback control policy for adaptive pitch regulation during path execution. Simulation results demonstrate that the proposed framework effectively reduces cumulative thermocline exposure and overall energy consumption while maintaining improved trajectory consistency compared with representative benchmark methods. These results indicate that integrating multi-objective planning with learning-based control provides an effective control-oriented solution for constrained underwater glider navigation in thermally stratified environments. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

22 pages, 2733 KB  
Article
Attention-Enhanced Multi-Agent Deep Reinforcement Learning for Inverter-Based Volt-VAR Control in Active Distribution Networks
by Wenwen Chen, Hao Niu, Linbo Liu, Jianglong Lin and Huan Quan
Mathematics 2026, 14(5), 839; https://doi.org/10.3390/math14050839 - 1 Mar 2026
Viewed by 442
Abstract
The increasing penetration of inverter-interfaced photovoltaic (PV) generation in active distribution networks (ADNs) intensifies fast voltage violations and makes real-time Volt-VAR control (VVC) challenging, especially when each inverter has only partial and noisy measurements and communication is limited. Existing local droop-type strategies lack [...] Read more.
The increasing penetration of inverter-interfaced photovoltaic (PV) generation in active distribution networks (ADNs) intensifies fast voltage violations and makes real-time Volt-VAR control (VVC) challenging, especially when each inverter has only partial and noisy measurements and communication is limited. Existing local droop-type strategies lack coordination, while fully centralized optimization/learning is often impractical for online deployment. To address these gaps, an attention-enhanced multi-agent deep reinforcement learning (MADRL) framework is developed for inverter-based VVC under the centralized training and decentralized execution (CTDE) paradigm. First, the voltage regulation problem is formulated as a decentralized partially observable Markov decision process (Dec-POMDP) to explicitly account for system stochasticity and temporal variability under partial observability. To solve this complex game, an attention-enhanced MADRL architecture is employed, where an agent-level attention mechanism is integrated into the centralized critic. Unlike traditional methods that treat all neighbor information equally, the proposed mechanism enables each inverter agent to dynamically prioritize and selectively focus on the most influential states from other agents, effectively capturing complex intercorrelations while enhancing training stability and learning efficiency. Operating under the CTDE paradigm, the framework realizes coordinated reactive power support using only local measurements, ensuring high scalability and practical implementability in communication-constrained environments. Simulations on the IEEE 33-bus system with six PV inverters show that the proposed method reduces the average voltage deviation on the test set from 0.0117 p.u. (droop control) and 0.0112 p.u. (MADDPG) to 0.0074 p.u., while maintaining millisecond-level execution time comparable to other MADRL baselines. Scalability tests with up to 12 agents further demonstrate robust performance of the proposed method under higher PV penetration. Full article
Show Figures

Figure 1

26 pages, 3134 KB  
Article
The Optimal Mining Strategy of Proof of Stake Consensus in Peercoin Blockchain
by Bolun Yang, Jiamin Hao, Yao Ma and Li Zhou
Electronics 2026, 15(5), 974; https://doi.org/10.3390/electronics15050974 - 27 Feb 2026
Viewed by 342
Abstract
The integration of distributed data storage, P2P networks, consensus mechanisms, cryptography and other technologies, the application of blockchain technology has expanded from the initial financial field to many other areas, such as logistics and auditing. The consensus mechanism is the soul of blockchain [...] Read more.
The integration of distributed data storage, P2P networks, consensus mechanisms, cryptography and other technologies, the application of blockchain technology has expanded from the initial financial field to many other areas, such as logistics and auditing. The consensus mechanism is the soul of blockchain technology, and it is of great significance to conduct a rigorous mathematical analysis. As far as we know, the Proof of Stake (PoS) consensus mechanism is only a qualitative description of the rich and the poor, the rich are richer, the poor are poorer, and there is no quantitative mathematical analysis. This paper presents a novel quantitative framework to quantitatively analyze the PoS consensus mechanism. Under the premise of not carrying out the attack, we use the expected reward and the reward ratio as the evaluation indicators, quantitatively analyze the optimal fund allocation strategy of the two parties game under the PoS consensus mechanism from the perspective of rich miners, and construct the reward function as the objective function. The inequality constrains the optimization problem and solves it using the Karush-Kuhn-Tucker condition. We consider the two schemes of assignment strategy and random strategy, and get the optimal fund allocation strategy. At the same time, it is compared with the general strategy to obtain the optimization effect of the optimal strategy. After that, we compare the situation in which both sides of the game use the optimal strategy. We found that for assignment strategy, the mining activity will not indicate that the rich are richer and the poor are poorer. However, for the random strategy, this will not happen. The random strategy is also the most common strategy in practice. We also use Markov decision process (MDP) to give the optimal strategy calculation method under the rational miner game, which is also applicable to the n-parties game. The work of this paper helps the blockchain developers to analyze the PoS consensus mechanism, and the adoption strategy of the assignment strategy and the random strategy can be used as the future research direction. Full article
(This article belongs to the Special Issue Data Privacy Protection in Blockchain Systems)
Show Figures

Figure 1

27 pages, 3414 KB  
Article
Efficiency-Optimized Hydrogen Production in PV–Battery–PEM Microgrids with Frequency Response Coordination
by Fan Yang, Ze Geng and Yifan Deng
Energies 2026, 19(5), 1181; https://doi.org/10.3390/en19051181 - 27 Feb 2026
Viewed by 500
Abstract
In industrial hydrogen production powered by high-penetration renewable energy, photovoltaic (PV) microgrids can provide low-carbon electricity for green hydrogen. However, the intermittency of PV generation and load uncertainties reduce the electrolyzer’s dwell time in its high-efficiency operating region, while imposing additional constraints on [...] Read more.
In industrial hydrogen production powered by high-penetration renewable energy, photovoltaic (PV) microgrids can provide low-carbon electricity for green hydrogen. However, the intermittency of PV generation and load uncertainties reduce the electrolyzer’s dwell time in its high-efficiency operating region, while imposing additional constraints on system frequency stability. This paper proposes a stochastic energy management strategy for a PV-battery-proton exchange membrane (PEM) microgrid to improve hydrogen production efficiency and ensure frequency stability. The proposed strategy uses an Efficiency-oriented Energy Allocation Markov Decision Process (EA-MDP) to model load uncertainties and incorporate available PV power into the decision-making process. The battery acts as a short-term buffer to smooth fluctuations in PV output and load demand, ensuring that the PEM electrolyzer operates efficiently despite PV intermittency. A two-timescale control strategy coordinates the Flex-MPPT with the PEM electrolyzer to maintain optimal efficiency. The strategy improves the PEM electrolyzer’s average efficiency by 14.3%, cumulative hydrogen production by 12.5% compared to traditional methods, and ensures frequency deviations are constrained within ±0.05 Hz. These results demonstrate enhanced dynamic stability, operational reliability, and efficient integration of renewable energy in the microgrid, supporting long-term sustainable hydrogen production. Full article
(This article belongs to the Section A: Sustainable Energy)
Show Figures

Figure 1

29 pages, 11326 KB  
Article
Constrained Soft Actor–Critic for Joint Computation Offloading and Resource Allocation in UAV-Assisted Edge Computing
by Nawazish Muhammad Alvi, Waqas Muhammad Alvi, Xiaolong Zhou, Jun Li and Yifei Wei
Sensors 2026, 26(4), 1149; https://doi.org/10.3390/s26041149 - 10 Feb 2026
Viewed by 670
Abstract
Unmanned Aerial Vehicle (UAV)-assisted edge computing supports latency-sensitive applications by offloading computational tasks to ground-based servers. However, determining optimal resource allocation under strict latency constraints and stochastic channel conditions remains challenging. This paper addresses the joint computation partitioning and power allocation problem for [...] Read more.
Unmanned Aerial Vehicle (UAV)-assisted edge computing supports latency-sensitive applications by offloading computational tasks to ground-based servers. However, determining optimal resource allocation under strict latency constraints and stochastic channel conditions remains challenging. This paper addresses the joint computation partitioning and power allocation problem for UAV-assisted edge computing systems. We formulate the problem as a Constrained Markov Decision Process (CMDP) that explicitly models latency constraints, rather than relying on implicit reward shaping. To solve this CMDP, we propose Constrained Soft Actor–Critic (C-SAC), a deep reinforcement learning algorithm that combines maximum-entropy policy optimization with Lagrangian dual methods. C-SAC employs a dedicated constraint critic network to estimate long-term constraint violations and an adaptive Lagrange multiplier that automatically balances energy efficiency against latency satisfaction without manual tuning. Extensive experiments demonstrate that C-SAC achieves an 18.9% constraint violation rate. This represents a 60.6-percentage-point improvement compared to unconstrained Soft Actor–Critic, with 79.5%, and a 22.4-percentage-point improvement over deterministic TD3-Lagrangian, achieving 41.3%. The learned policies exhibit strong channel-adaptive behavior with a correlation coefficient of 0.894 between the local computation ratio and channel quality, despite the absence of explicit channel modeling in the reward function. Ablation studies confirm that both adaptive mechanisms are essential, while sensitivity analyses show that C-SAC maintains robust performance with violation rates varying by less than 2 percentage points even as channel variability triples. These results establish constrained reinforcement learning as an effective approach for reliable UAV edge computing under stringent quality-of-service requirements. Full article
(This article belongs to the Special Issue Communications and Networking Based on Artificial Intelligence)
Show Figures

Figure 1

38 pages, 2429 KB  
Article
Fairness-Constrained Dynamic Pricing via Shielded Deep Reinforcement Learning
by Wenchuan Qiao, Lincoln C. Wood, Shanshan Tang, Zeyu Teng and Min Huang
Mathematics 2026, 14(4), 600; https://doi.org/10.3390/math14040600 - 9 Feb 2026
Viewed by 491
Abstract
Firms increasingly develop dynamic pricing policies to maximize revenue for perishable products with limited inventory over a finite selling horizon. This trend is enabled by the growing availability of sales data and is observed across industries such as airlines, hotels, cruise lines, fashion, [...] Read more.
Firms increasingly develop dynamic pricing policies to maximize revenue for perishable products with limited inventory over a finite selling horizon. This trend is enabled by the growing availability of sales data and is observed across industries such as airlines, hotels, cruise lines, fashion, and seasonal retail. Given customer heterogeneity, firms may further adopt discriminatory pricing across customer groups. However, excessive price disparities can trigger legal risks and consumer backlash, motivating price fairness constraints that bound inter-group price differences in each selling period. We formulate this problem as an action-constrained Markov decision process (ACMDP) with unknown demand functions and adopt a model-free deep reinforcement learning (DRL) framework. However, standard DRL algorithms for unconstrained MDPs cannot directly handle these fairness constraints. Therefore, we introduce an optimization-based shielding mechanism. From the DRL pricing agent’s perspective, this mechanism converts the ACMDP into a shield-induced unconstrained MDP. Meanwhile, it guarantees constraint satisfaction for all executed prices. Building on this framework, we propose the Shield Soft Actor-Critic (Shield-SAC) algorithm. This is the first Shield-SAC method for fairness-aware pricing under instantaneous and hard price fairness constraints. We test it in two simulated markets of different scales and validate that Shield-SAC achieves strong revenue performance while consistently enforcing the price fairness constraints during both training and deployment. Full article
Show Figures

Figure 1

24 pages, 3006 KB  
Article
A Digital-Twin-Enabled AI-Driven Adaptive Planning Platform for Sustainable and Reliable Manufacturing
by Mingyuan Li, Chun-Ming Yang, Wei Lo and Yi-Wei Kao
Machines 2026, 14(2), 197; https://doi.org/10.3390/machines14020197 - 9 Feb 2026
Viewed by 859
Abstract
The manufacturing systems face growing demands due to the instability of the market, the demanding sustainability policies, and the high rate of old equipment, but traditional planning structures are mostly fixed and deterministic, leading to the inefficiency of joint optimization of operational stability [...] Read more.
The manufacturing systems face growing demands due to the instability of the market, the demanding sustainability policies, and the high rate of old equipment, but traditional planning structures are mostly fixed and deterministic, leading to the inefficiency of joint optimization of operational stability and environmental sustainability in unpredictable situations. This research proposed and empirically tested an artificial-intelligence-based adaptive planning platform, which combines a physics-based Digital Twin (DT) and a Pareto-conditioned Multi-Objective Proximal Policy Optimization (MO-PPO) algorithm to be able to co-optimize reliability and sustainability indicators in real-time. The platform reinvents manufacturing planning as a Constrained Multi-Objective Markov Decision Process (CMDP), optimizing an Overall Equipment Effectiveness (OEE) and energy carbon intensity as well as material waste, and strongly adhering to operational restrictions. The study utilizes a four-layer cyber–physical architecture, which includes an edge-based data acquisition layer, a high-fidelity stochastic simulation engine that is calibrated via Bayesian inference, a graph attention network-based state-encoding layer, and a closed-loop execution loop that runs with 60 s long planning cycles. In this study, a statistically significant enhancement was shown in 10,000 stochastic simulation experiments and a 12-week industrial pilot deployment: 96.8% schedule performance, 84.7% OEE, 16.5% cut in specific energy usage (2.38 kWh/kg), 17.1% reduction in material-waste rate (6.8%), and 21.4% enhancement in carbon effectiveness, outperforming all baseline strategies (p = 0.001). The analysis showed that there was a surprising synergistic correlation between waste minimization and OEE enhancement (r = −0.73), and 34.1% of overall OEE improvement could be explained by sustainability strategies. This study provides a robust framework for adaptive, resilient, and eco-friendly manufacturing processes in line with Industry 5.0 ideologies. Full article
(This article belongs to the Special Issue Digital Twins in Smart Manufacturing)
Show Figures

Figure 1

33 pages, 3714 KB  
Article
SADQN-Based Residual Energy-Aware Beamforming for LoRa-Enabled RF Energy Harvesting for Disaster-Tolerant Underground Mining Networks
by Hilary Kelechi Anabi, Samuel Frimpong and Sanjay Madria
Sensors 2026, 26(2), 730; https://doi.org/10.3390/s26020730 - 21 Jan 2026
Viewed by 292
Abstract
The end-to-end efficiency of radio-frequency (RF)-powered wireless communication networks (WPCNs) in post-disaster underground mine environments can be enhanced through adaptive beamforming. The primary challenges in such scenarios include (i) identifying the most energy-constrained nodes, i.e., nodes with the lowest residual energy to prevent [...] Read more.
The end-to-end efficiency of radio-frequency (RF)-powered wireless communication networks (WPCNs) in post-disaster underground mine environments can be enhanced through adaptive beamforming. The primary challenges in such scenarios include (i) identifying the most energy-constrained nodes, i.e., nodes with the lowest residual energy to prevent the loss of tracking and localization functionality; (ii) avoiding reliance on the computationally intensive channel state information (CSI) acquisition process; and (iii) ensuring long-range RF wireless power transfer (LoRa-RFWPT). To address these issues, this paper introduces an adaptive and safety-aware deep reinforcement learning (DRL) framework for energy beamforming in LoRa-enabled underground disaster networks. Specifically, we develop a Safe Adaptive Deep Q-Network (SADQN) that incorporates residual energy awareness to enhance energy harvesting under mobility, while also formulating a SADQN approach with dual-variable updates to mitigate constraint violations associated with fairness, minimum energy thresholds, duty cycle, and uplink utilization. A mathematical model is proposed to capture the dynamics of post-disaster underground mine environments, and the problem is formulated as a constrained Markov decision process (CMDP). To address the inherent NP hardness of this constrained reinforcement learning (CRL) formulation, we employ a Lagrangian relaxation technique to reduce complexity and derive near-optimal solutions. Comprehensive simulation results demonstrate that SADQN significantly outperforms all baseline algorithms: increasing cumulative harvested energy by approximately 11% versus DQN, 15% versus Safe-DQN, and 40% versus PSO, and achieving substantial gains over random beamforming and non-beamforming approaches. The proposed SADQN framework maintains fairness indices above 0.90, converges 27% faster than Safe-DQN and 43% faster than standard DQN in terms of episodes, and demonstrates superior stability, with 33% lower performance variance than Safe-DQN and 66% lower than DQN after convergence, making it particularly suitable for safety-critical underground mining disaster scenarios where reliable energy delivery and operational stability are paramount. Full article
Show Figures

Figure 1

Back to TopTop