MDPI - Publisher of Open Access Journals

30 pages, 11873 KB

Open AccessArticle

Unsupervised Oil Spill Detection in Shipborne Radar Imagery Using Autoencoder-Enhanced Q-Learning and Improved Bat Optimization

by Jin Yan, Binghui Chen, Jin Xu, Zekun Guo, Minghao Yan, Mengxin Sun and Lin Qiao

Remote Sens. 2026, 18(12), 1876; https://doi.org/10.3390/rs18121876 - 7 Jun 2026

Viewed by 217

Abstract

Marine oil spill accidents pose a serious threat to the marine ecological environment. Therefore, efficient and accurate oil spill detection is of great significance for emergency response. To address the issues of blurred oil-slick boundaries, prominent co-frequency interference and severe speckle noise in [...] Read more.

Marine oil spill accidents pose a serious threat to the marine ecological environment. Therefore, efficient and accurate oil spill detection is of great significance for emergency response. To address the issues of blurred oil-slick boundaries, prominent co-frequency interference and severe speckle noise in shipborne radar images, this study proposed an oil spill detection method based on radar data collected from a real oil spill event at a terminal in Dalian Bay. The proposed method integrates an autoencoder, feature dimensionality reduction, pseudo-labeling, reinforcement learning and an improved intelligent optimization algorithm. First, an autoencoder was adopted to extract compact nonlinear local features from the radar images, and principal component analysis (PCA) was employed for feature dimensionality reduction. Subsequently, K-Means clustering was used to construct pseudo-labels, and the reduced features were discretized to build the state space for reinforcement learning. Based on this, the Q-learning algorithm was introduced to automatically extract the region of interest (ROI). Finally, for the ROI, an improved bat algorithm incorporating a dynamic weighting factor and a multi-constraint fitness function was designed to achieve fine segmentation of the oil-slick target. The experimental results showed that the proposed method outperformed classic intelligent optimization algorithms and the conventional bat optimization algorithm in oil-slick segmentation performance. Ablation experiments further verified the effectiveness of autoencoder-based feature learning, K-Means pseudo-labeling, and Q-learning-based ROI localization. This method may provide a new technical approach for timely offshore oil spill monitoring and emergency analysis. Full article

(This article belongs to the Special Issue Advances in Deep Learning and Machine Learning for Remote Sensing Image Analysis)

► Show Figures

Figure 1

28 pages, 22349 KB

Open AccessArticle

Real-Time Elevation and Orientation-Aware Visual Localization for GNSS-Denied Drone Navigation

by Hadi Fares, Ammar Mohanna and Bilal Kaddouh

Drones 2026, 10(6), 445; https://doi.org/10.3390/drones10060445 - 6 Jun 2026

Viewed by 242

Abstract

Global Navigation Satellite Systems (GNSS)-denied environments pose significant challenges for autonomous drone navigation, requiring robust visual localization systems capable of real-time performance. Existing approaches either sacrifice accuracy for speed or fail to adapt to varying flight altitudes and orientations, limiting their practical deployment. [...] Read more.

Global Navigation Satellite Systems (GNSS)-denied environments pose significant challenges for autonomous drone navigation, requiring robust visual localization systems capable of real-time performance. Existing approaches either sacrifice accuracy for speed or fail to adapt to varying flight altitudes and orientations, limiting their practical deployment. We present Real-Time Elevation and Orientation-Aware Localization Architecture (REOLA), a visual localization system that combines similarity-driven autonomous window sizing, element-wise correlation-based orientation detection, and reinforcement learning with human feedback (RLHF) enhancement for publicly available satellite imagery. On desktop hardware (i7-10700K + RTX 3070), the REOLA achieved approximately 59 FPS performance with sub-5-m accuracy across diverse flight conditions through intelligent similarity-based matching, combined with efficient MobileNet-V3 embeddings and FAISS similarity search. For embedded deployment on NVIDIA Jetson Orin Nano, the system achieved 22.5 FPS, meeting real-time requirements for autonomous drone localization. The system autonomously selects optimal window sizes corresponding to the current elevation and determines drone orientation through element-wise correlation scoring across discrete rotation angles. Enhanced through RLHF, the REOLA achieved a

97.1 %

success rate (sub-5-m localization) while processing frames in 17 milliseconds on desktop hardware (44.4 ms on embedded hardware), providing a substantial margin over real-time requirements. The approach demonstrates particular superiority over traditional keypoint-based methods in challenging environments with repetitive patterns such as agricultural fields, rocky mountains, dense forests, and grasslands, where conventional keypoint detection struggles. We explicitly identify featureless sand dune deserts and open-sea or coastal water flights as out of scope, since the reference satellite imagery in those regimes does not contain stable landmarks. Full article

(This article belongs to the Topic Advanced Methods in Unmanned Aerial Vehicle Control, Navigation, and Safety)

► Show Figures

Figure 1

22 pages, 826 KB

Open AccessArticle

Hamilton–Jacobi–Bellman Equations and Reinforcement Learning: A Theoretical Framework and Empirical Study for Dynamic Credit Decision-Making

by Lei Jin and Runchi Zhang

Mathematics 2026, 14(11), 2004; https://doi.org/10.3390/math14112004 - 4 Jun 2026

Viewed by 158

Abstract

Traditional credit scoring models treat lending decisions as static classification, ignoring the dynamic evolution of borrower risk and long-term profit optimisation. This paper reinterprets credit risk management as a discrete-time stochastic optimal control problem and integrates the Hamilton–Jacobi–Bellman (HJB) framework with deep reinforcement [...] Read more.

Traditional credit scoring models treat lending decisions as static classification, ignoring the dynamic evolution of borrower risk and long-term profit optimisation. This paper reinterprets credit risk management as a discrete-time stochastic optimal control problem and integrates the Hamilton–Jacobi–Bellman (HJB) framework with deep reinforcement learning. Theoretically, we establish the equivalence between a discrete Markov decision process and the HJB equation, prove the existence and uniqueness of the optimal value function, derive the closed-form Riccati solution under linear-quadratic assumptions, and provide a convergence analysis of neural network value iteration. Empirically, using LendingClub loan data (2016–2018), we implement a PPO-based dynamic credit policy. The proposed model achieves an average reward of 1.6726 and a total reward of 867,613, significantly outperforming static baselines as well as a DQN baseline. Ablation experiments show that replacing the policy network with a linear mapping reduces the average reward by 40.8%, confirming the necessity of nonlinear function approximation. Sensitivity analysis and statistical tests (p < 0.001) confirm the robustness and significance of the gains. This work provides a rigorous mathematical foundation and empirical evidence for shifting credit scoring from static classification to dynamic optimisation. Full article

(This article belongs to the Special Issue Advanced Methods, Modeling and Optimization for Financial Engineering and Risk Management)

► Show Figures

Figure 1

21 pages, 4328 KB

Open AccessArticle

Reinforcement Learning-Based Policy for Haul-Truck Dispatch: A Framework for Earthmoving and Quarry Operations

by Mohsen Hatami, Ian Flood and Forough Foroutan

Buildings 2026, 16(11), 2274; https://doi.org/10.3390/buildings16112274 - 4 Jun 2026

Viewed by 232

Abstract

Truck-to-excavator assignment is a time-critical control problem in open-pit earthmoving systems (mines, quarries, and large cut-and-fill construction sites) where stochastic travel and service times, changing queues, and equipment outages continually alter the best dispatch decision. A deep reinforcement learning (DRL) dispatch policy is [...] Read more.

Truck-to-excavator assignment is a time-critical control problem in open-pit earthmoving systems (mines, quarries, and large cut-and-fill construction sites) where stochastic travel and service times, changing queues, and equipment outages continually alter the best dispatch decision. A deep reinforcement learning (DRL) dispatch policy is developed and trained using a discrete-event simulation (DES) digital twin of the Sungun copper mine haulage system. The dispatch task is formulated as a Markov decision process using state features that represent fleet locations, excavator and dump queues, and short-term congestion conditions. The resulting deep artificial neural network (DANN) policy is tuned via systematic hyperparameter optimisation and evaluated against a priority-based rule-of-thumb dispatch baseline under long-horizon operating tracks. Results show that the final trained policy improves the average production rate per truck cycle by approximately 17% while reducing avoidable waiting and maintaining stable performance over extended operation, with inference fast enough for real-time dispatch use. Model fidelity is supported by close agreement between simulated and observed daily completed-cycle counts. Robustness is assessed through controlled truck load-capacity perturbations, and scalability is examined through fleet-size sensitivity, which reveals diminishing returns as additional trucks are added under a fixed excavation–haulage configuration. Practical deployment considerations and implications for construction earthmoving logistics are discussed. Full article

(This article belongs to the Special Issue Selected Papers from the 20th International Conference on Computing in Civil and Building Engineering (ICCCBE 2024))

► Show Figures

Figure 1

8 pages, 1842 KB

Open AccessProceeding Paper

Machine Learning-Based Resolution of Strategic Conflicts in U-Space Airspaces

by Manuel González, Sandra Amarillo, Juan Vicente Balbastre and Alex Sanchis

Eng. Proc. 2026, 133(1), 186; https://doi.org/10.3390/engproc2026133186 - 2 Jun 2026

Viewed by 94

Abstract

The rapid expansion of Unmanned Aircraft System (UAS) operations has created an urgent need for scalable strategic conflict resolution methods within the U-space framework. When requested 4D flight plans overlap with previously authorised ones, the Flight Authorisation Service (FAS) denies the request, and [...] Read more.

The rapid expansion of Unmanned Aircraft System (UAS) operations has created an urgent need for scalable strategic conflict resolution methods within the U-space framework. When requested 4D flight plans overlap with previously authorised ones, the Flight Authorisation Service (FAS) denies the request, and can provide the UAS operator with an alternative route, free of conflict. This work introduces a Machine Learning-based tool designed to support this process, which consists of three sequential phases. First, an Octree spatial partitioning technique is proposed, discretising the airspace, further identifying the previously occupied cells and visualising the occupied airspace, so that the UAS operator can manually find an alternative route. Then, the widely known A* pathfinding algorithm is implemented in this discretized airspace, allowing the shortest or most optimal conflict-free alternative route. Finally, the methodology integrates a Machine Learning (Reinforcement Learning) model, created from scratch and trained with realistic flight trajectories from a PX4 Simulator, to further optimise flight paths, explicitly accounting for operational constraints such as distance and battery consumption. In this work, both methods are compared, addressing traditional algorithms limitations with Machine Learning (ML) techniques, showing that a near-optimal behaviour can be achieved with the ML approach, at a fraction of the computation time needed. Full article

(This article belongs to the Proceedings of The 15th EASN International Conference on “Innovation in Aviation & Space Towards Sustainability Today & Tomorrow”)

► Show Figures

Figure 1

24 pages, 5195 KB

Open AccessArticle

Hierarchical Decision-Making for UAV Close-Range Dynamic Tracking Using a Pursuit-Strategy Action Space

by Yu Lai, Yong Chen, Yang Yang, Jialong Jian and Yuanfei Liu

Aerospace 2026, 13(6), 508; https://doi.org/10.3390/aerospace13060508 - 29 May 2026

Viewed by 196

Abstract

In close-range dynamic UAV tracking, the sharp decrease in relative distance and rapidly changing relative-motion conditions require UAVs to execute highly dynamic maneuvers. Traditional autonomous decision-making systems struggle with the curse of dimensionality in continuous action spaces or suffer from strategy-level rigidity when [...] Read more.

In close-range dynamic UAV tracking, the sharp decrease in relative distance and rapidly changing relative-motion conditions require UAVs to execute highly dynamic maneuvers. Traditional autonomous decision-making systems struggle with the curse of dimensionality in continuous action spaces or suffer from strategy-level rigidity when using predefined discrete maneuver primitives. This paper aims to resolve these limitations by developing a dimension-reduced yet highly continuous decision-making framework. We propose a hierarchical deep reinforcement learning architecture based on a geometric pursuit-strategy action space. The top-level Proximal Policy Optimization agent evaluates the relative-motion state to output discrete guidance-mode commands: lag pursuit, lead pursuit, or pure pursuit. A mid-level guidance translator converts these intents into continuous flight reference commands based on angular geometry and energy maneuverability. The bottom-level guidance translator utilizes a high-fidelity JSBSim fixed-wing aircraft flight-dynamics model for precise aerodynamic control. Monte Carlo simulations and comparative experiments across representative initial postures show that the proposed framework improves training convergence compared with a conventional continuous-control PPO baseline and achieves more stable high-level guidance-mode selection than a Double-DQN baseline. In simulation tests under predefined geometric tracking-success criteria, the model achieved a 91.5% success rate in initially favorable configurations and a 64.0% success rate when starting from a challenging configuration. By abstracting complex maneuvers into geometric pursuit strategies, this hierarchical framework lowers exploration dimensionality while maintaining the continuous kinematic logic of flight trajectories, providing an interpretable and simulation-validated decision-making framework for UAV close-range dynamic tracking and autonomous flight control. Full article

► Show Figures

Figure 1

31 pages, 2000 KB

Open AccessArticle

Adaptive Constraint Regulation for Human Preference-Aware Safe Reinforcement Learning of On-Ramp Merging

by Jingjia Teng, Wenjie Huang, Shijie Yuan, Manjiang Hu, Hongmao Qin, Yang Li, Yougang Bian and Bai Li

Machines 2026, 14(6), 605; https://doi.org/10.3390/machines14060605 - 28 May 2026

Viewed by 342

Abstract

Reinforcement learning (RL) has been widely utilized for decision-making in highway on-ramp merging scenarios. However, most existing methods incorporate safety through reward functions, which may allow autonomous vehicles to trade safety for higher cumulative rewards. Moreover, personalized human risk preferences are rarely considered, [...] Read more.

Reinforcement learning (RL) has been widely utilized for decision-making in highway on-ramp merging scenarios. However, most existing methods incorporate safety through reward functions, which may allow autonomous vehicles to trade safety for higher cumulative rewards. Moreover, personalized human risk preferences are rarely considered, making the learned policies difficult to adapt to heterogeneous user-specific risk requirements and potentially resulting in overly conservative or insufficiently cautious behaviors. To address these issues, this paper proposes a Risk-Aware Personal Preference-Based Safe Reinforcement Learning framework (RAPRL), for autonomous decision-making in on-ramp merging scenarios. Specifically, the high-level decision-making problem is formulated as a constrained Markov decision process (CMDP), in which safety requirements are explicitly represented as constraints rather than reward terms. To enable personalized safety regulation, a fuzzy logic mechanism is developed to adaptively determine the constraint cost limit according to the driver’s risk preference and the surrounding traffic density. The resulting safe RL problem is solved using a Lagrangian-based soft actor-critic algorithm (SAC). Furthermore, an Action Shielding Mechanism is designed to assess the potential risk of candidate actions before execution and replace unsafe or infeasible actions, thereby improving safety during both policy learning and execution. Theoretical analysis shows that the proposed shielding mechanism can reduce unsafe exploration and improve sample efficiency. Extensive simulations in on-ramp merging scenarios demonstrate that RAPRL effectively reduces safety violations while maintaining driving efficiency. Compared with the SAC Discrete method, the proposed method improves the success rate by 4.76% and reduces the collision ratio by 70%, indicating a better safety–efficiency trade-off. Full article

(This article belongs to the Special Issue Optimization-Based Motion Planning & Control for Autonomous Driving in Dynamic Environments)

► Show Figures

Figure 1

23 pages, 5064 KB

Open AccessArticle

Delay and Energy Optimization in Heterogeneous GEO–LEO Satellite Networks: A GNN-Enhanced Game-Theoretic and DRL Approach

by Yiyu Wang, Zhufang Kuang and Mingxiao Lei

Future Internet 2026, 18(6), 288; https://doi.org/10.3390/fi18060288 - 27 May 2026

Viewed by 216

Abstract

As 6G mobile communications evolve, Low Earth Orbit (LEO) satellite mobile edge computing (MEC) enables globally seamless computing. However, the high mobility of LEO satellites disrupts service continuity and resource stability. Existing approaches often use oversimplified models that ignore multi-beam interference and dynamic [...] Read more.

As 6G mobile communications evolve, Low Earth Orbit (LEO) satellite mobile edge computing (MEC) enables globally seamless computing. However, the high mobility of LEO satellites disrupts service continuity and resource stability. Existing approaches often use oversimplified models that ignore multi-beam interference and dynamic task queueing. To address this, we establish a hierarchical Geostationary Earth Orbit (GEO)–LEO synergistic architecture, where the integration is implemented by utilizing GEO satellites as stability anchors and remote cloud relays, while LEO satellites provide low-latency edge processing. We formulate fine-grained models for two-level beam-centric communication and preemptive dynamic queueing. The resulting joint task offloading and resource allocation problem is a complex mixed-integer nonlinear program (MINLP). To effectively solve this MINLP, we decouple it hierarchically: first determine discrete offloading decisions, then optimize continuous resource allocations based on them, proposing a novel framework termed G²DRL (GNN-enhanced Game-theoretic and deep reinforcement learning). Simulation results demonstrate that G²DRL significantly reduces the weighted sum of system delay and energy, showing superior convergence stability and performance over state-of-the-art DRL baselines. Full article

► Show Figures

Figure 1

57 pages, 9973 KB

Open AccessReview

Digital Twin- and AI-Enabled Intelligent Optimisation Design of Agricultural Machinery: A Review

by Pengsheng Ding and Jianmin Gao

Agronomy 2026, 16(11), 1038; https://doi.org/10.3390/agronomy16111038 - 24 May 2026

Viewed by 456

Abstract

The optimisation design of agricultural machinery is shifting from offline, experience-driven engineering towards adaptive, data-driven, and closed-loop intelligent optimisation. Conventional approaches based on computer-aided engineering (CAE), empirical testing, mathematical modelling, and static multi-objective optimisation have provided an important engineering foundation, but they remain [...] Read more.

The optimisation design of agricultural machinery is shifting from offline, experience-driven engineering towards adaptive, data-driven, and closed-loop intelligent optimisation. Conventional approaches based on computer-aided engineering (CAE), empirical testing, mathematical modelling, and static multi-objective optimisation have provided an important engineering foundation, but they remain limited under unstructured field conditions involving soil heterogeneity, crop variability, climatic disturbance, and nonlinear machinery–environment interactions. This review systematically examines the evolution of intelligent optimisation design for agricultural machinery from conventional simulation-based methods to artificial intelligence (AI)- and digital twin (DT)-enabled paradigms. First, mathematical modelling, response surface methodology, discrete element method (DEM), computational fluid dynamics (CFD), multi-body dynamics (MBD), heuristic algorithms, and early AI-assisted surrogate optimisation are reviewed to clarify their contributions and limitations. Second, frontier enabling technologies are analysed, including agriculture-specific large models, generative AI, lightweight edge intelligence, deep reinforcement learning (DRL), embodied AI, federated learning (FL), and privacy-preserving computing. Third, system-level applications integrating DT and AI are discussed, with emphasis on full-lifecycle machinery optimisation, device–edge–cloud collaborative control, multi-agent fleet coordination, predictive maintenance, and Agriculture 5.0-oriented intelligent equipment systems. Key deployment bottlenecks are further identified, including sim-to-real inconsistency, virtual–physical mismatch in DTs, edge-side trade-offs among accuracy, latency, energy consumption, and cost, insufficient validation standards, and economic adoption barriers. Finally, a 2025–2030 roadmap is proposed, highlighting large-model–DT closed loops, control biomimetics, green low-carbon optimisation, and trustworthy human–machine symbiosis for sustainable Agriculture 5.0. Full article

(This article belongs to the Special Issue Digital Twin and AI-Enhanced Simulation in Agricultural Systems)

► Show Figures

Figure 1

24 pages, 1774 KB

Open AccessArticle

Block-Wise State Encoding for Action-Masked Reinforcement Learning in Flexible Job-Shop Scheduling

by Kostiantyn Hrishchenko and Oleksii Pysarchuk

Algorithms 2026, 19(6), 423; https://doi.org/10.3390/a19060423 - 23 May 2026

Viewed by 214

Abstract

This paper addresses the flexible job-shop scheduling problem (FJSP) as a constrained combinatorial optimization task with a large discrete action space. Although action-masked reinforcement learning has shown promise for such problems, the effect of structured vector-state encoding in scheduling has received less attention. [...] Read more.

This paper addresses the flexible job-shop scheduling problem (FJSP) as a constrained combinatorial optimization task with a large discrete action space. Although action-masked reinforcement learning has shown promise for such problems, the effect of structured vector-state encoding in scheduling has received less attention. The main contribution of this work is a structured block-wise state representation and a multi-branch feature extraction module for action-masked Proximal Policy Optimization (PPO). The proposed representation decomposes the scheduling state into three heterogeneous components capturing resource availability, operation readiness, and temporal attributes of operation–machine alternatives. Instead of flattening these signals into a single vector, the proposed encoder processes each block separately before aggregation, with the aim of preserving semantic structure during policy learning. To isolate the effect of representation design, we compare the proposed multi-branch encoder with a baseline single-branch multilayer perceptron under identical PPO hyperparameters and training conditions. Experiments on the Brandimarte MK benchmark suite show that the proposed architecture yields a lower best-achieved makespan on nine of ten instances and improves the best baseline result by up to 27.84%. Additional validation on selected Behnke and Geiger instances indicates that the BR encoder’s advantage extends to larger FJSP cases while preserving sub-second inference. Full article

(This article belongs to the Special Issue Machine Learning for Planning and Logistics)

► Show Figures

Figure 1

36 pages, 4636 KB

Open AccessReview

Optimal Plastic Design of Reinforced Concrete Structures: A State-of-the-Art Review from Steel Plasticity to Modern RC Applications

by Zahraa Saleem Sharhan and Majid Movahedi Rad

Buildings 2026, 16(10), 1981; https://doi.org/10.3390/buildings16101981 - 17 May 2026

Viewed by 376

Abstract

Plastic design enables efficient structural systems by exploiting controlled inelastic deformation and force redistribution. While mature in steel structures due to stable ductility and well-defined yielding, its extension to reinforced concrete (RC) remains challenging because cracking, stiffness degradation, confinement dependency, and progressive damage [...] Read more.

Plastic design enables efficient structural systems by exploiting controlled inelastic deformation and force redistribution. While mature in steel structures due to stable ductility and well-defined yielding, its extension to reinforced concrete (RC) remains challenging because cracking, stiffness degradation, confinement dependency, and progressive damage govern deformation capacity and collapse mechanisms. This paper presents a state-of-the-art review of optimal plastic design methodologies for RC structures by tracing the evolution from classical plasticity theory to modern damage-informed, reliability-oriented, and sustainability-driven formulations. A systematic and structured literature review of more than 90 peer-reviewed journal articles (1990–2025) was conducted using Scopus, Web of Science, and ScienceDirect. The selected studies are classified by structural system type, plastic analysis approach, constitutive modeling strategy, and strengthening technique, including CFRP and hybrid fiber systems, optimization framework, and uncertainty treatment. The review highlights how nonlinear elasto-plastic and damage–plasticity models improve the prediction of plastic hinge development, redistribution, and failure-mode transitions, and how metaheuristic optimization, topology optimization, surrogate modeling, and machine learning are increasingly used to manage discrete design variables and computational cost. Reliability-based methods (e.g., FORM/SORM and simulation) are shown to be essential for quantifying deformation-capacity uncertainty and ensuring consistent collapse-prevention performance. A comparative assessment of nine plastic design methodologies is also provided, identifying their core assumptions, limitations, and domains of applicability within a structured evaluative framework. Remaining challenges include robust deformation-capacity prediction, reproducible calibration of damage models, and integration of life-cycle sustainability criteria within reliability-constrained plastic optimization. Future research directions are proposed toward multi-objective reliability-based design, durability-informed plastic modeling, and hybrid physics-informed AI-assisted workflows. Full article

(This article belongs to the Special Issue Resilient and Sustainable Buildings: Advances in Architecture and Structural Systems)

► Show Figures

Figure 1

24 pages, 10404 KB

Open AccessArticle

Experience Extractor for Adaptive Tradeoff Between Exploration and Exploitation in Reinforcement Learning

by Zhi Yi, Zhongmin Wu, Yongming Xie, Ming Chen and Yinglong Dai

Mathematics 2026, 14(10), 1624; https://doi.org/10.3390/math14101624 - 11 May 2026

Viewed by 368

Abstract

In Reinforcement Learning (RL), the agent cannot distinguish between exploratory and exploitative experience. Not all sequential experiences contribute equally to the agent’s optimization, and the same experience holds different importance at different learning stages. We propose the Extractor for Adaptive Tradeoff Between Exploration [...] Read more.

In Reinforcement Learning (RL), the agent cannot distinguish between exploratory and exploitative experience. Not all sequential experiences contribute equally to the agent’s optimization, and the same experience holds different importance at different learning stages. We propose the Extractor for Adaptive Tradeoff Between Exploration and Exploitation (EATBEE), a task-oriented tool that tightly couples with the agent’s current knowledge and enables adaptive knowledge acquisition. We compare the originally sampled data with the task-driven data distribution to clearly illustrate their deviation. Then, we show how EATBEE identifies and extracts beneficial data for the agent. The monotonic improvement policy is theoretically validated under the assumption that the experience trajectory keeps a high degree of trajectory similarity after extraction. EATBEE serves as an independent module that can be seamlessly integrated with most existing RL algorithms. We substantiate the efficacy and practical applicability of the EATBEE method through experiments conducted in both discrete and continuous environments. Full article

(This article belongs to the Special Issue Applications of Intelligent Game and Reinforcement Learning)

► Show Figures

Figure 1

24 pages, 475 KB

Open AccessArticle

Multi-Strategy Market Dynamics Analysis: A Novel Framework for Agent-Based Economic Modeling with Reinforcement Learning

by Yuhang Du and Yuhan Zhao

Mathematics 2026, 14(10), 1621; https://doi.org/10.3390/math14101621 - 11 May 2026

Viewed by 373

Abstract

This paper presents a Multi-Strategy Market Dynamics Analysis (MSMDA) framework for agent-based economic modeling with reinforcement learning. The primary methodological contribution is an integrated strategy–stability–macro inference pipeline that links population-level strategy evolution to dynamic market stability and model-internal counterfactual policy analysis. The framework [...] Read more.

This paper presents a Multi-Strategy Market Dynamics Analysis (MSMDA) framework for agent-based economic modeling with reinforcement learning. The primary methodological contribution is an integrated strategy–stability–macro inference pipeline that links population-level strategy evolution to dynamic market stability and model-internal counterfactual policy analysis. The framework is organized into six analytical components: Strategy Temporal Pattern Recognition (STPR), Strategy Transition Detection and Analysis (STDA), Strategy-Macro Causality Analysis (SMCA), the Dynamic Market Stability Index (DMSI), the Adaptive Rationality Equilibrium (ARE), and the Information Asymmetry Propagation (IAP) metric. The method is evaluated within a simulation dataset comprising 447,129 records across four experimental scenarios, 1500 discrete time periods, and 200 heterogeneous firms governed by proximal policy optimization. Results show that competitive strategies dominate market emergence patterns at 60.8% of all observations and achieve superior average profitability of 28.07 monetary units per period, compared with

- 4.49

for dumping strategies and 7.83 for market power strategies. The DMSI reveals a mean stability of 0.372 with standard deviation 0.097, peaking at 0.780 during strategic consolidation and collapsing to zero during a major demand shock. Within the simulated economy, doubly-robust counterfactual analysis projects a 28.4% GDP increase from a market power-to-competition intervention and a 31.2% increase under full ARE optimization at

ρ^{*} = 0.6

. The ARE further identifies a Pareto-optimal market configuration that jointly maximizes per-firm profit at 229.82 monetary units per period and systemic stability at DMSI

= 0.67

, indicating that efficiency and resilience need not conflict in the calibrated simulation environment. To address time-series autocorrelation in bootstrap inference throughout the framework, we employ a moving block bootstrap with data-adaptive block length selection based on the spectral density at frequency zero, providing finite-sample confidence intervals for the reported test statistics and counterfactual projections. Full article

(This article belongs to the Section E5: Financial Mathematics)

► Show Figures

Figure 1

25 pages, 4053 KB

Open AccessArticle

Resource Allocation for D2D Communications in Multi-Slice NOMA-Based Cellular Networks

by Lijun Dong, Jingjing Wu and Yitong Yang

Future Internet 2026, 18(5), 246; https://doi.org/10.3390/fi18050246 - 6 May 2026

Viewed by 226

Abstract

Significant challenges will be encountered in next-generation cellular networks to achieve both high spectral efficiency (SE) and diverse quality of service (QoS) requirements simultaneously, particularly under stringent bandwidth and power budgets within highly dynamic and dense topologies. To address these challenges, we formulate [...] Read more.

Significant challenges will be encountered in next-generation cellular networks to achieve both high spectral efficiency (SE) and diverse quality of service (QoS) requirements simultaneously, particularly under stringent bandwidth and power budgets within highly dynamic and dense topologies. To address these challenges, we formulate an optimization problem in a multi-slice non-orthogonal multiple access (NOMA) system with underlay device-to-device (D2D) communications. This problem aims to maximize SE and satisfy user QoS demands by jointly optimizing power allocation and resource block (RB) assignment. To solve this non-convex and NP-hard problem, we propose a resource allocation mechanism based on joint optimization and cooperative multi-agent deep reinforcement learning (MADRL). Specifically, we construct an optimization framework based on successive convex approximation (SCA) and the Lagrange duality method to derive an analytical iterative solution for the optimal power allocation under a given RB assignment, thereby avoiding the inherent discretization error of the action space in pure learning methods. Furthermore, we propose a cooperative multi-agent algorithm based on dueling double deep Q-Network (CMAD3QN) to address the discrete RB assignment problem. Simulation results demonstrate that, compared with benchmark schemes, the proposed scheme exhibits faster convergence speed and significantly enhances system spectral efficiency while ensuring slice isolation and resource constraints. Full article

(This article belongs to the Special Issue 6G Wireless Network Technologies)

► Show Figures

Figure 1

44 pages, 10357 KB

Open AccessArticle

An Adaptive QAPF Framework with a Discrete CBF-Inspired Safety Filter and Adaptive Reward Shaping for Safe Mobile Robot Navigation

by Elizabeth Isaac, Asha J. George, Iacovos Ioannou, Jisha P. Abraham, Suresh Kallam, G. S. Pradeep Ghantasala, Pellakuri Vidyullatha and Vasos Vassiliou

Electronics 2026, 15(9), 1945; https://doi.org/10.3390/electronics15091945 - 3 May 2026

Viewed by 426

Abstract

Mobile robot navigation remains challenging when fast convergence, collision avoidance and deployability must be satisfied simultaneously. The original Q-learning with Artificial Potential Field (QAPF) paradigm is extended in this paper with three coordinated mechanisms that together yield a reported-horizon convergence reduction of approximately [...] Read more.

Mobile robot navigation remains challenging when fast convergence, collision avoidance and deployability must be satisfied simultaneously. The original Q-learning with Artificial Potential Field (QAPF) paradigm is extended in this paper with three coordinated mechanisms that together yield a reported-horizon convergence reduction of approximately four orders of magnitude (from

\sim 3 \times 10^{6}

episodes to

\sim 200

to 230 episodes under the present protocol) and an internal-ablation collision-rate reduction of approximately one order of magnitude (

6.2 %

to

0.3 %

), and that open a new capability frontier covering dynamic obstacles, multi-robot coordination, energy-aware velocity modulation and embedded-deployable inference timing. The first mechanism is a potential-based reward-shaping schedule whose unclipped fixed-weight form follows the policy-invariant shaping theorem, while the implemented clipped and time-varying form is used as an empirically stable approximation. Under the present experimental protocol, the reported convergence horizon is reduced from the

\sim 3 \times 10^{6}

episodes reported for the original QAPF formulation to approximately 200 to 230 episodes; this comparison is protocol-dependent and is not claimed as a controlled one-to-one runtime speedup. The second mechanism is a discrete Control Barrier Function (CBF)-inspired action filter (thediscrete filter described in this paper is inspired by the continuous-time CBF literature, but does not carry a forward-invariance proof; it is used as an empirical safety mechanism rather than as a formal Control Barrier Function in the formal continuous-time sense) with per episode visit memory by which the held-out collision rate is reduced from

6.2 %

for QAPF alone to

0.3 %

while

93.8 %

task completion is maintained, where this collision-rate comparison is internal to the QAPF ablation because the prior QAPF reference does not report a comparable held-out collision metric. The third mechanism is a set of extensions to dynamic obstacles, two-robot cooperative navigation under a centralized scheme (with an explicit

O (N^{2})

scaling-cost analysis and three decentralization strategies for fleets beyond the small-N regime), curriculum learning and energy-aware velocity modulation. Disturbance robustness tests, empirical timeout/stagnation detection for unreachable-goal cases, i7 reference inference timing with projected embedded-device latencies, multi-axis generalization over obstacle density and grid size, scalability analysis for centralized multi-robot coordination and a scope comparison against A* and RRT* are added by the revised evaluation. Across 30 independent seeds on held-out static maps,

94.5 \pm 2.1 %

success is achieved by adaptive QAPF while

93.8 \pm 2.3 %

success with

0.3 \pm 0.4 %

collisions is achieved by QAPF+CBF. Under a separate finite robustness suite,

85.0 \pm 4.1 %

success is retained by QAPF+CBF in the combined disturbance regime. The timing study indicates that the 20 Hz real-time threshold is comfortably exceeded by all methods on the measured i7 reference platform and by all projected embedded-device equivalents. The results show that a lightweight and safety-oriented navigation policy for grid-based mobile-robot settings can be provided by APF-guided tabular reinforcement learning when it is paired with a discrete safety filter and a clarified energy and robustness analysis. Full article

(This article belongs to the Special Issue AI for Industry)

► Show Figures

Figure 1

Search Results (243)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (243)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI