MDPI - Publisher of Open Access Journals

23 pages, 1272 KB

Open AccessArticle

Dynamic Optimization of Incoming Quality Control Policies for Cost, Carbon, and Energy Reduction Using Bayesian Reinforcement Learning

by David Massetti, Mehdi Raoofi, Tiziano Miroglio, Marco Mosca and Flavio Tonelli

Sustainability 2026, 18(12), 6094; https://doi.org/10.3390/su18126094 (registering DOI) - 13 Jun 2026

Abstract

The transition towards sustainable manufacturing necessitates complex optimization that integrates economic goals with environmental factors, such as energy consumption and greenhouse gas emissions. This research addresses the critical challenge of optimizing the Incoming Quality Control (IQC) policy for raw material batches. The primary [...] Read more.

The transition towards sustainable manufacturing necessitates complex optimization that integrates economic goals with environmental factors, such as energy consumption and greenhouse gas emissions. This research addresses the critical challenge of optimizing the Incoming Quality Control (IQC) policy for raw material batches. The primary objective is formulated as a multi-criteria control problem that jointly minimizes the weekly final product cost, carbon footprint, and energy consumption. To handle sequential decision making under uncertainty, we adopt a scalarized reinforcement learning (RL) reward that combines these objectives into a single value function and explores different trade-offs through alternative weight configurations. To effectively handle the uncertainty in incoming quality and the sequential decision making required for dynamic control, the optimization problem is modeled as a Bayesian Adaptive Markov Decision Process (BAMDP). To maintain computational tractability despite the continuous belief space inherent in the BAMDP formulation, we employ a Deep Q-Network (DQN) architecture acting as an approximate dynamic programming solver. The Bayesian framework represents model uncertainty explicitly, updates beliefs as new inspection evidence becomes available, and allows prior domain knowledge on supplier quality to be incorporated into the learning process. The BAMDP formulation is used to learn a set of adaptive inspection policies that adjust the IQC strategy over time to achieve conflicting goals: reducing inspection costs while maintaining standard quality, minimizing energy consumption, and lowering CO₂-equivalent emissions. The goal is to find robust policies that balance these trade-offs under different quality and demand conditions. This methodology aligns with the principles of Industry 5.0 by leveraging advanced artificial intelligence (AI) methods, such as reinforcement learning (RL), coupled with a stochastic simulation of the production system, based on a geometric/physical model of the component’s tolerance chains, to support decision-makers in designing and assessing sustainable IQC strategies. Comparative simulations on the case study, including a benchmark against ISO 2859-1 sampling plans, confirm that this dynamic and risk-aware optimization paradigm can reduce overall cost, energy use, and environmental impact across various quality conditions, while preserving outgoing quality. Full article

(This article belongs to the Special Issue Leveraging AI in Industry 4.0: Overcoming Challenges and Seizing Opportunities for Sustainable Operations Management)

40 pages, 2120 KB

Open AccessArticle

Transformer–DDQN-Based Explainable and Active Intrusion Detection Architecture for Network Traffic Analysis

by Ayşe Okutan Kara and Aytuğ Boyacı

Appl. Sci. 2026, 16(12), 5912; https://doi.org/10.3390/app16125912 - 11 Jun 2026

Viewed by 65

Abstract

This study proposes a novel intrusion detection and response architecture that formulates network traffic analysis as a sequential decision-making problem rather than a static classification task. The architecture integrates a Transformer Encoder for temporal feature extraction with a Dueling Double Deep Q-Network (DDQN) [...] Read more.

This study proposes a novel intrusion detection and response architecture that formulates network traffic analysis as a sequential decision-making problem rather than a static classification task. The architecture integrates a Transformer Encoder for temporal feature extraction with a Dueling Double Deep Q-Network (DDQN) to enable autonomous and risk-aware security decisions. Network flows are modeled within a Markov Decision Process, where the agent learns an optimal policy over a hierarchical action space consisting of IGNORE, LOG, ESCALATE, and BLOCK actions. To evaluate generalization capability, a transfer learning-based cross-domain adaptation strategy was employed. The CICIDS2018 and CICIoT2023 datasets were re-partitioned using a stratified 70/15/15 train/validation/test split. The proposed model achieved high detection performance on these datasets with F1-scores of 99.48% and 99.13%, respectively. After transfer learning to the AWID3 dataset, the model preserved strong generalization capability with F1-scores of 96.76% and 96.61%, demonstrating its robustness across wired, IoT, and wireless network environments. A risk-aware reward function is designed to balance detection accuracy and operational cost, while Integrated Gradients-based explainability is incorporated to analyze decision behavior. Experimental results further show that the proposed Transformer–DDQN framework achieves more stable learning, lower optimization loss, and more consistent action policies compared to alternative reinforcement learning-based approaches. The model operates with high computational efficiency while maintaining real-time processing capability in high-throughput network environments. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

16 pages, 19013 KB

Open AccessArticle

Risk-Prioritized Experience Replay for Stable In-Hand Manipulation

by Yunsik Jung, Lingfeng Tao, Michael Bowman, Jiucai Zhang and Xiaoli Zhang

Sensors 2026, 26(12), 3633; https://doi.org/10.3390/s26123633 - 7 Jun 2026

Viewed by 205

Abstract

Deep reinforcement learning (DRL) has shown strong capability for multi-finger dexterous in-hand manipulation, where high-dimensional control and complex object interactions make policy learning challenging. However, many existing DRL approaches emphasize task completion and learning efficiency without explicitly accounting for manipulation risk, which can [...] Read more.

Deep reinforcement learning (DRL) has shown strong capability for multi-finger dexterous in-hand manipulation, where high-dimensional control and complex object interactions make policy learning challenging. However, many existing DRL approaches emphasize task completion and learning efficiency without explicitly accounting for manipulation risk, which can lead to overly aggressive behaviors and unstable object handling. This study proposes Risk-Prioritized Experience Replay (Risk-PER), a replay-sampling strategy that incorporates task-specific risk scores derived from prior transitions. The proposed method assigns each transition a risk score based on three binary indicators related to manipulation instability and then biases replay toward lower-risk experiences while still allowing the agent to learn from risk-related events. Risk-PER is integrated with Deep Deterministic Policy Gradient (DDPG) and evaluated in MuJoCo simulation on two Allegro Hand in-hand manipulation tasks involving a block and an egg. Across the evaluated settings, Risk-PER achieves higher success rates, lower manipulation risk, and more stable learning behavior than HER and reward–penalty-based risk-averse baselines. These results suggest that incorporating task-specific risk awareness into replay prioritization can improve both learning efficiency and manipulation stability in dexterous in-hand manipulation. Full article

(This article belongs to the Special Issue Advanced Sensors and AI Integration for Human–Robot Teaming)

► Show Figures

Figure 1

21 pages, 1251 KB

Open AccessArticle

Robust Fast 3D Beam Alignment for UAV-Assisted mmWave and Terahertz Communications

by Loubna Gafari, Wissal Attaoui, Essaid Sabir and Elmahdi Driouch

Sensors 2026, 26(11), 3612; https://doi.org/10.3390/s26113612 - 5 Jun 2026

Viewed by 307

Abstract

Unmanned aerial vehicle (UAV)-assisted millimeter-wave (mmWave) and terahertz (THz) communications are promising enablers of ultra-reliable and low-latency communication in next-generation wireless networks. However, the initial access and beam alignment process remains challenging because highly directional beams must be rapidly aligned in a three-dimensional [...] Read more.

Unmanned aerial vehicle (UAV)-assisted millimeter-wave (mmWave) and terahertz (THz) communications are promising enablers of ultra-reliable and low-latency communication in next-generation wireless networks. However, the initial access and beam alignment process remains challenging because highly directional beams must be rapidly aligned in a three-dimensional environment. In this paper, we investigate a risk-aware beam alignment framework for UAV-assisted mmWave/THz systems, where user equipment scans a 3D spherical region to detect UAV base stations. The objective is to jointly minimize the expected cell-search latency and its variance while satisfying detection-failure and link-quality constraints. To solve this non-convex optimization problem efficiently, we employ the Lévy Self-Renewable Flow Direction Algorithm (LSRFDA), which combines Lévy-flight exploration with self-renewal to improve convergence robustness. A unified propagation model is adopted to cover both mmWave and THz regimes by incorporating free-space spreading loss and frequency-dependent molecular absorption. Extensive Monte Carlo simulations compare the proposed approach with Particle Swarm Optimization, Random Search, Reinforcement Learning, and PPO-Lagrangian methods. The results show that LSRFDA achieves lower latency, lower latency variation, more reliable detection, and lower energy consumption across a wide range of UAV densities and coverage radii. These outcomes highlight the effectiveness of risk-aware geometric optimization for fast and dependable initial access in UAV-assisted 5G mmWave and 6G THz networks. Full article

(This article belongs to the Special Issue Integrated Sensing, Control, and Communication (ISC²) for Low-Altitude Intelligent Networks)

► Show Figures

Figure 1

29 pages, 2484 KB

Open AccessArticle

SafeCodeRL: Security-Constrained Multi-Agent Reinforcement Learning for Trustworthy LLM-Generated IoT/CPS Software

by Zhihua Wang, Junfan Chen, Zixiang Wei, Lan Lin and Guoxiang Tong

Sensors 2026, 26(11), 3502; https://doi.org/10.3390/s26113502 - 2 Jun 2026

Viewed by 274

Abstract

Internet of Things (IoT), sensor-network, and cyber-physical system (CPS) software increasingly relies on large language models (LLMs) and autonomous agents for code generation, maintenance, and vulnerability repair. However, LLM-generated edge services, telemetry APIs, configuration handlers, and data-aggregation routines can introduce SQL injection, path [...] Read more.

Internet of Things (IoT), sensor-network, and cyber-physical system (CPS) software increasingly relies on large language models (LLMs) and autonomous agents for code generation, maintenance, and vulnerability repair. However, LLM-generated edge services, telemetry APIs, configuration handlers, and data-aggregation routines can introduce SQL injection, path traversal, command injection, hard-coded credentials, and unsafe device-control logic, which may compromise sensing data integrity and system safety. Existing approaches largely rely on static post hoc analysis and lack a unified modeling of the generation process, making it difficult to achieve a principled trade-off between functionality and security. To address this challenge, we propose SafeCodeRL, a framework that integrates multi-agent collaboration with constrained reinforcement learning for trustworthy LLM-generated IoT/CPS software. SafeCodeRL models code generation as a security-aware sequential decision process, where Planner, Code, Security, Test, and Critic agents jointly optimize task decomposition, code synthesis, vulnerability auditing, and sandbox-based validation. We design a constraint-aware policy based on Proximal Policy Optimization, augmented with a Lagrangian mechanism and a shielding strategy to explicitly enforce security constraints. Experiments on real-world engineering and security benchmarks, including SWE-bench, SecurityEval, and CyberSecEval, show that SafeCodeRL reduces high-risk vulnerabilities by over 60% while maintaining high functional correctness. A scenario-level IoT/CPS case study further demonstrates that SafeCodeRL substantially improves secure pass rates for sensor telemetry, edge gateway, configuration-management, and data-aggregation tasks, providing a practical path toward trustworthy AI-assisted software development for sensor-driven systems. Full article

(This article belongs to the Section Internet of Things)

► Show Figures

Figure 1

29 pages, 2602 KB

Open AccessArticle

Transition-Sensitive Congestion Dynamics in Heterogeneous Urban Traffic Networks Under Coordinated Reinforcement Learning

by Zhenghan Ouyang, Chenxin Li, Yifeng Tang, Yuqingyun Shu, Zhiling Wang, Yuhang Ma and Tongqiang Ding

Sustainability 2026, 18(11), 5561; https://doi.org/10.3390/su18115561 - 1 Jun 2026

Viewed by 152

Abstract

Urban traffic networks under high-demand and incident-like perturbations can evolve from stable operation to cascading congestion, increasing delay, stop-and-go traffic, fuel or energy consumption, and traffic-related emissions. These effects make congestion regulation an important component of sustainable urban traffic management. Existing signal control [...] Read more.

Urban traffic networks under high-demand and incident-like perturbations can evolve from stable operation to cascading congestion, increasing delay, stop-and-go traffic, fuel or energy consumption, and traffic-related emissions. These effects make congestion regulation an important component of sustainable urban traffic management. Existing signal control methods still focus mainly on local delay reduction or short-horizon response, limiting their ability to regulate congestion propagation and stress-induced network degradation. This paper proposes Mamba-PTC, a coordinated reinforcement learning framework for urban signal control in heterogeneous traffic networks. The framework combines centralized multi-intersection control with a simplified Mamba-style sequence encoder and a transition-aware objective optimized by PPO. To connect control with network-level traffic dynamics, we introduce a transition risk indicator for online regulation and macroscopic observables for evaluation, including a composite congestion measure and an instability-amplification proxy. Experiments on stressed heterogeneous urban networks show that Mamba-PTC improves the throughput–duration profile while reducing congestion degradation indicators under heavy load and perturbation. Matched control comparisons, ablation analysis, and cross-network validation further show that these gains arise from the joint effect of temporal representation, transition-aware objective design, and coordinated control. The results suggest that coordinated reinforcement learning can support sustainable network operation by regulating congestion growth in stressed urban traffic networks. The findings provide a basis for designing congestion-aware signal control strategies, robustness evaluation protocols, and future intelligent traffic management systems for stressed urban networks. Full article

(This article belongs to the Section Sustainable Transportation)

► Show Figures

Figure 1

31 pages, 2000 KB

Open AccessArticle

Adaptive Constraint Regulation for Human Preference-Aware Safe Reinforcement Learning of On-Ramp Merging

by Jingjia Teng, Wenjie Huang, Shijie Yuan, Manjiang Hu, Hongmao Qin, Yang Li, Yougang Bian and Bai Li

Machines 2026, 14(6), 605; https://doi.org/10.3390/machines14060605 - 28 May 2026

Viewed by 347

Abstract

Reinforcement learning (RL) has been widely utilized for decision-making in highway on-ramp merging scenarios. However, most existing methods incorporate safety through reward functions, which may allow autonomous vehicles to trade safety for higher cumulative rewards. Moreover, personalized human risk preferences are rarely considered, [...] Read more.

Reinforcement learning (RL) has been widely utilized for decision-making in highway on-ramp merging scenarios. However, most existing methods incorporate safety through reward functions, which may allow autonomous vehicles to trade safety for higher cumulative rewards. Moreover, personalized human risk preferences are rarely considered, making the learned policies difficult to adapt to heterogeneous user-specific risk requirements and potentially resulting in overly conservative or insufficiently cautious behaviors. To address these issues, this paper proposes a Risk-Aware Personal Preference-Based Safe Reinforcement Learning framework (RAPRL), for autonomous decision-making in on-ramp merging scenarios. Specifically, the high-level decision-making problem is formulated as a constrained Markov decision process (CMDP), in which safety requirements are explicitly represented as constraints rather than reward terms. To enable personalized safety regulation, a fuzzy logic mechanism is developed to adaptively determine the constraint cost limit according to the driver’s risk preference and the surrounding traffic density. The resulting safe RL problem is solved using a Lagrangian-based soft actor-critic algorithm (SAC). Furthermore, an Action Shielding Mechanism is designed to assess the potential risk of candidate actions before execution and replace unsafe or infeasible actions, thereby improving safety during both policy learning and execution. Theoretical analysis shows that the proposed shielding mechanism can reduce unsafe exploration and improve sample efficiency. Extensive simulations in on-ramp merging scenarios demonstrate that RAPRL effectively reduces safety violations while maintaining driving efficiency. Compared with the SAC Discrete method, the proposed method improves the success rate by 4.76% and reduces the collision ratio by 70%, indicating a better safety–efficiency trade-off. Full article

(This article belongs to the Special Issue Optimization-Based Motion Planning & Control for Autonomous Driving in Dynamic Environments)

► Show Figures

Figure 1

36 pages, 11622 KB

Open AccessArticle

Explainable Hybrid Intelligence for Predicting Tunnel Water Inrush Quantity Under Small-Sample, High-Heterogeneity Conditions: GAN Augmentation and Swarm-Optimized CatBoost

by Rui Huang, Yige Chen, Lanjing Wang, Jing Zhan, Yuanfan Ji, Tingyu Huang and Yanbo Yang

Infrastructures 2026, 11(6), 183; https://doi.org/10.3390/infrastructures11060183 - 25 May 2026

Viewed by 239

Abstract

This study aims to explore a leakage-aware and explainable machine learning framework for predicting tunnel water inrush quantity (WIQ) under small-sample and high-heterogeneity geological conditions. A project-level dataset was compiled at a fixed spatial granularity of 30 m per excavation segment by integrating [...] Read more.

This study aims to explore a leakage-aware and explainable machine learning framework for predicting tunnel water inrush quantity (WIQ) under small-sample and high-heterogeneity geological conditions. A project-level dataset was compiled at a fixed spatial granularity of 30 m per excavation segment by integrating forward prospecting outputs, construction-face observations, and geological reports, and six hydrogeological–structural indicators were used to predict the water inflow rate in cubic meters per hour. To overcome data scarcity and improve generalization, a tabular generative adversarial network (GAN) was introduced to augment the training distribution while preserving marginal statistics and inter-variable dependence, and a swarm-intelligence optimizer was employed to tune a Categorical Boosting (CatBoost) regressor for stable performance. In addition, six mainstream tree-based learners were benchmarked under a unified protocol, and model transparency was ensured through a multi-level interpretability suite combining SHapley Additive exPlanations (SHAP) attribution, partial dependence with individual conditional expectation (ICE) diagnostics, and interaction surfaces. Results show that, under the present fixed split, training-set augmentation was associated with improved performance for the evaluated baseline learners, and the proposed hybrid model achieved encouraging hold-out accuracy. However, because the dataset contains only 55 real samples and the test set contains only 11 real samples, the reported performance should be interpreted as an initial project-specific indication rather than robust evidence of generalizable reliability. Interpretability analyses further identify lithologic and reflector-related factors as dominant drivers, and reveal nonlinear response patterns and interaction-sensitive high-risk regions. Overall, the proposed framework shows potential to improve predictive performance and engineering interpretability for the studied project, and may provide a useful reference for drainage and reinforcement planning. Further confirmation through repeated data splitting, additional samples, and external validation is still needed before broader application. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence for Geotechnical Engineering)

► Show Figures

Figure 1

24 pages, 1138 KB

Open AccessArticle

RIB-Guard: A Risk-Aware Information Bottleneck Defense for Black-Box Large Language Models

by Muen Cai, Yuan Shen, Xiong Luo and Jian Hu

Entropy 2026, 28(6), 585; https://doi.org/10.3390/e28060585 - 24 May 2026

Viewed by 192

Abstract

Large language models (LLMs) remain vulnerable to jailbreak attacks, especially in black-box settings where target-model gradients and internal tokenization are inaccessible. Recent information bottleneck-based defenses cast prompt protection as a compression problem, but existing methods still rely heavily on white-box optimization and the [...] Read more.

Large language models (LLMs) remain vulnerable to jailbreak attacks, especially in black-box settings where target-model gradients and internal tokenization are inaccessible. Recent information bottleneck-based defenses cast prompt protection as a compression problem, but existing methods still rely heavily on white-box optimization and the intrinsic alignment strength of the protected model. To address these limitations, we propose RIB-Guard, a safety-aware information bottleneck defense for black-box LLMs. RIB-Guard learns a token-level masking policy that extracts a minimally safety-sufficient prompt via reinforcement learning using only black-box feedback. In addition, it introduces an independent lightweight safety head to estimate residual jailbreak risk and provide model-agnostic safety guidance during training. The proposed framework jointly balances prompt compactness, benign utility preservation, and residual risk suppression within a unified objective. Experimental results on direct single-turn harmful and benign prompt settings show that RIB-Guard improves jailbreak robustness while maintaining competitive benign utility. By extending information bottleneck-based prompt protection from white-box to black-box settings, RIB-Guard provides a step toward safety-aware information-theoretic front-end defense for black-box LLMs. Full article

(This article belongs to the Special Issue The Information Bottleneck Method: Theory and Applications)

► Show Figures

Figure 1

19 pages, 307 KB

Open AccessArticle

Parenting in the Digital Era: Quantitative and Qualitative Insights from Families of Children with Neurodevelopmental Disorders

by Niccolò Butti, Eleonora Mascheroni, Vittoria Maucci, Roberta Nossa, Lucia Scaccia, Francesca Masserano, Emilia Biffi and Rosario Montirosso

Children 2026, 13(6), 716; https://doi.org/10.3390/children13060716 - 22 May 2026

Viewed by 218

Abstract

Background/Objectives: This study explored parents’ perspectives regarding digital media use in children and adolescents with neurodevelopmental disorders (NDs) and examined how these views vary according to family and clinical characteristics. Methods: Data were collected from an Italian survey involving 352 families. Items assessed [...] Read more.

Background/Objectives: This study explored parents’ perspectives regarding digital media use in children and adolescents with neurodevelopmental disorders (NDs) and examined how these views vary according to family and clinical characteristics. Methods: Data were collected from an Italian survey involving 352 families. Items assessed the perceived effects of digital devices on child development and parenting, awareness of screen time guidelines, and use of time- and content-limiting tools. Quantitative analyses were complemented by a reflexive thematic analysis of open-ended responses describing how digital media influenced parenting. Results: Parents expressed divergent attitudes towards digital media, with broadly similar proportions reporting positive, neutral, and negative views regarding both child development and parenting. More favourable views were associated with greater perceived benefits for children and were more frequent among parents of children with more severe functional disabilities. About half had discussed screen use with health professionals, and most were aware of existing guidelines. Thematic analysis identified six themes related to digital parenting: educational means (digital devices as tools for communication, learning, and socialisation), entertainment (screens as a source of leisure or behavioural management), reward (digital media used as reinforcement), screen time as a “necessity” (technology as an integral and sometimes rehabilitative part of daily life), negative effects on the child (concerns about detachment, reduced social interaction, and mood dysregulation), and parental behaviour and attitudes (reflecting the emotional burden of regulation and broader beliefs about digital media). Conclusions: Parents of children with NDs navigate digital media use through a complex balance of perceived risks and benefits. Findings highlight the need for family-centred guidance and assistive technology approaches that promote digital inclusion while addressing parental stress and regulatory challenges. Full article

(This article belongs to the Special Issue Screen Time in Childhood: Risks, Benefits, and Outcomes)

33 pages, 521 KB

Open AccessArticle

Multi-Shift Scheduling of Electric Service Operations Under Fuzzy Uncertainty via Preference-Guided Deep Learning: The Single-Vehicle Case

by Francesco Nucci

Eng 2026, 7(5), 244; https://doi.org/10.3390/eng7050244 - 16 May 2026

Viewed by 348

Abstract

The electrification of field service fleets introduces complex constraints: shift limits, overtime fairness, and battery–range feasibility. This paper proposes the Multi-Shift Single Electric Vehicle Routing Problem under Possibilistic Uncertainty (MS-SEVRP-PU), a formulation focused on a single-vehicle multi-shift planning unit and capturing imprecise travel/service [...] Read more.

The electrification of field service fleets introduces complex constraints: shift limits, overtime fairness, and battery–range feasibility. This paper proposes the Multi-Shift Single Electric Vehicle Routing Problem under Possibilistic Uncertainty (MS-SEVRP-PU), a formulation focused on a single-vehicle multi-shift planning unit and capturing imprecise travel/service times and state-of-charge dynamics. Travel durations and energy consumption are modelled as triangular fuzzy numbers to reflect expert knowledge when probabilistic data is limited. A closed-form credibility function evaluates overtime risk, while an Ordered Weighted Averaging (OWA) aggregation of per-shift risks ensures fairness by discouraging systematic overload on specific shifts. To solve this multi-objective problem, we develop a Pareto-Conditioned Transformer with risk-aware and battery-conscious large neighbourhood search (PCT-RABLNS), combining a preference-conditioned attention policy with targeted local search. Computational experiments on calibrated municipal maintenance case studies indicate that PCT-RABLNS improves hypervolume by 2–5% over strong baselines and reduces maximum shift overtime risk by 15–25%, with a marginal makespan overhead of only 1–3%. The results demonstrate that the proposed framework is a promising decision-support approach for energy-aware, risk-fair, and operationally compliant planning of single-vehicle, multi-shift electric service operations, jointly integrating multi-shift routing, fuzzy uncertainty, and preference-conditioned reinforcement learning. The paper also discusses how the framework can be extended to multi-vehicle settings. Full article

(This article belongs to the Special Issue Interdisciplinary Insights in Engineering Research 2026)

► Show Figures

Figure 1

30 pages, 2075 KB

Open AccessSystematic Review

Human–AI Collaboration in Risk- and Uncertainty-Aware Portfolio Reinforcement Learning: A Critical Review

by Firdaous Khemlichi, Youness Idrissi Khamlichi and Safae Elhaj Ben Ali

Information 2026, 17(5), 476; https://doi.org/10.3390/info17050476 - 13 May 2026

Viewed by 403

Abstract

Financial markets are characterized by non-stationarity, regime shifts, and complex cross-asset interactions, which challenge traditional portfolio optimization and motivate reinforcement learning (RL) for adaptive decision-making. However, many RL-based approaches remain predominantly return-centric, with risk, uncertainty, and human oversight only weakly integrated, limiting robustness [...] Read more.

Financial markets are characterized by non-stationarity, regime shifts, and complex cross-asset interactions, which challenge traditional portfolio optimization and motivate reinforcement learning (RL) for adaptive decision-making. However, many RL-based approaches remain predominantly return-centric, with risk, uncertainty, and human oversight only weakly integrated, limiting robustness and practical applicability. This review provides a critical synthesis of risk-aware and uncertainty-sensitive reinforcement learning for portfolio optimization from a human–AI collaboration perspective. We analyze major architectural paradigms—including single-agent, hierarchical, multi-agent, and modular systems—together with risk modeling strategies (e.g., reward shaping, constraint-based optimization, and downside risk measures such as CVaR) and probabilistic approaches to uncertainty estimation (e.g., Bayesian neural networks, Monte Carlo dropout, and ensembles). A structured analysis of 57 fully assessed studies reveals that only 5 (9%) explicitly couple uncertainty estimation with risk constraint mechanisms, while 38 (69%) treat risk and uncertainty as structurally independent components. We identify a central structural limitation: risk objectives are rarely conditioned on epistemic uncertainty, while uncertainty estimates seldom influence constraint mechanisms or capital allocation. This decoupling leads to fragmented frameworks that remain difficult to deploy in real financial environments. By integrating architectural design, risk modeling, uncertainty estimation, and evaluation practices, this review proposes a unified, deployment-oriented perspective for developing governance-aligned portfolio decision-support systems. Full article

(This article belongs to the Special Issue Decision Models for Economics and Business Management)

► Show Figures

Figure 1

25 pages, 694 KB

Open AccessArticle

A New Hybrid Method: CDRL-QNN for Stable IoT Intrusion Detection

by Muhammed Yusuf Küçükkara, Furkan Atban and Cüneyt Bayılmış

Mathematics 2026, 14(10), 1608; https://doi.org/10.3390/math14101608 - 9 May 2026

Viewed by 266

Abstract

The rapid expansion of the Internet of Things (IoT) has increased the risk of large-scale Distributed Denial-of-Service (DDoS) attacks. In high-availability IoT environments, the operational costs of false positives and false negatives are asymmetric, whereas conventional deep learning models usually optimize static accuracy-based [...] Read more.

The rapid expansion of the Internet of Things (IoT) has increased the risk of large-scale Distributed Denial-of-Service (DDoS) attacks. In high-availability IoT environments, the operational costs of false positives and false negatives are asymmetric, whereas conventional deep learning models usually optimize static accuracy-based objectives. To address this, we propose CDRL-QNN, a cost-aware and chaos-driven reinforcement learning quantum neural network framework in which a parameterized quantum circuit serves as the action-value function approximator within a Deep Q-Network (DQN) agent. The framework incorporates asymmetric operational penalties through both the reward function and sample-wise weighted Bellman optimization, while a logistic-map-based deterministic perturbation mechanism is used to promote exploration under constrained quantum-circuit training conditions. Evaluated on a computationally constrained balanced subset of the CIC-DDoS2019 dataset, the proposed framework reduced false negatives from 49 to 33 without increasing false positives, improving recall from 0.9673 to 0.9780 and F1-score from 0.9738 to 0.9793 while lowering operational cost. These findings suggest that hybrid quantum representations can be integrated into cost-sensitive reinforcement learning pipelines for IoT intrusion detection under constrained experimental conditions. Full article

(This article belongs to the Special Issue Cybersecurity and Data Protection: Modern Methods and New Applications)

► Show Figures

Figure 1

24 pages, 8395 KB

Open AccessArticle

Energy-Aware Multi-Agent Proximal Policy Optimization with Depletion Safety Constraints for Multi-Robot Coordination

by Yassin Abdelmeguid and Ammar Hasan

Robotics 2026, 15(5), 95; https://doi.org/10.3390/robotics15050095 - 8 May 2026

Viewed by 640

Abstract

Multi-robot systems operating on battery power face fundamental constraints through which energy limitations directly impact mission success. The existing multi-agent reinforcement learning approaches optimize for task performance without explicit energy consideration, leading to inefficient consumption and depletion risk. This paper presents a framework [...] Read more.

Multi-robot systems operating on battery power face fundamental constraints through which energy limitations directly impact mission success. The existing multi-agent reinforcement learning approaches optimize for task performance without explicit energy consideration, leading to inefficient consumption and depletion risk. This paper presents a framework for energy-aware multi-agent coordination that treats battery management as a safety constraint, rather than an optimization objective. We introduce Energy-Aware Multi-Agent Proximal Policy Optimization (EA-MAPPO) with energy-augmented observations and shaped rewards and extend it to Safe Energy-Aware MAPPO (SEA-MAPPO) combining predictive action masking with safety-oriented reward shaping. An experimental validation on the Georgia Tech Robotarium with 7 agents demonstrates that SEA-MAPPO reaches 95% goal completion 19× faster than standard MAPPO, requiring only 0.5 M environment steps versus 9.4 M. Throughout training, SEA-MAPPO reduces cumulative depletion events by 93% compared to MAPPO while maintaining superior energy efficiency. SEA-MAPPO achieves 100% goal completion versus 81.5% for MAPPO at the same training budget. Physical deployment on GTernal robots without fine-tuning achieves 100% goal completion with zero depletion events across 70 robot-trials, with the energy predictor achieving

R^{2} = 0.89

with measured power consumption. Full article

(This article belongs to the Section AI in Robotics)

► Show Figures

Figure 1

48 pages, 10103 KB

Open AccessReview

A Survey of Risk-Calibrated Certifiably Safe and Resource-Aware (RCSR) Path Planning for Unmanned Aerial Vehicles

by Nathan Johnson, Sima Shafaei, Andrew Karem and Sayani Sarkar

Drones 2026, 10(5), 351; https://doi.org/10.3390/drones10050351 - 7 May 2026

Viewed by 731

Abstract

Effective mission planning, path search, and path following are critical for unmanned aerial vehicles (UAVs) operating in complex, dynamic, and resource-constrained environments. Classical path planning approaches, including graph-based search, sampling-based methods, and trajectory optimization, provide structured solutions with performance guarantees but often exhibit [...] Read more.

Effective mission planning, path search, and path following are critical for unmanned aerial vehicles (UAVs) operating in complex, dynamic, and resource-constrained environments. Classical path planning approaches, including graph-based search, sampling-based methods, and trajectory optimization, provide structured solutions with performance guarantees but often exhibit limited adaptability to uncertainty, environmental disturbances, and evolving mission constraints. Reinforcement learning (RL) offers a complementary capability by enabling adaptive decision-making and online response to dynamic obstacles and partial observability. This paper examines UAV path planning and navigation within a Risk-Calibrated, Certifiably Safe, and Resource-Aware (RCSR) framework, with emphasis on its implications for mission planning, path search, and path following. Classical planning techniques are reviewed alongside recent advances in RL-based navigation for single-UAV and multi-UAV systems. Particular attention is given to safe reinforcement learning, constrained optimization, and runtime assurance mechanisms that address safety, regulatory compliance, and resource limitations in real-world deployments. Through a comparative analysis of classical, learning-based, and hybrid planning architectures, this work highlights key trade-offs among adaptability, safety, computational cost, and energy efficiency. The paper concludes by identifying hybrid learning–planning approaches as a practical direction for scalable, reliable, and deployable UAV mission planning systems. Full article

(This article belongs to the Special Issue Advances in Cartography, Mission Planning, Path Search, and Path Following for Drones: 2nd Edition)

► Show Figures

Figure 1

Search Results (77)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (77)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI