MDPI - Publisher of Open Access Journals

23 pages, 1272 KB

Open AccessArticle

Dynamic Optimization of Incoming Quality Control Policies for Cost, Carbon, and Energy Reduction Using Bayesian Reinforcement Learning

by David Massetti, Mehdi Raoofi, Tiziano Miroglio, Marco Mosca and Flavio Tonelli

Sustainability 2026, 18(12), 6094; https://doi.org/10.3390/su18126094 (registering DOI) - 13 Jun 2026

Abstract

The transition towards sustainable manufacturing necessitates complex optimization that integrates economic goals with environmental factors, such as energy consumption and greenhouse gas emissions. This research addresses the critical challenge of optimizing the Incoming Quality Control (IQC) policy for raw material batches. The primary [...] Read more.

The transition towards sustainable manufacturing necessitates complex optimization that integrates economic goals with environmental factors, such as energy consumption and greenhouse gas emissions. This research addresses the critical challenge of optimizing the Incoming Quality Control (IQC) policy for raw material batches. The primary objective is formulated as a multi-criteria control problem that jointly minimizes the weekly final product cost, carbon footprint, and energy consumption. To handle sequential decision making under uncertainty, we adopt a scalarized reinforcement learning (RL) reward that combines these objectives into a single value function and explores different trade-offs through alternative weight configurations. To effectively handle the uncertainty in incoming quality and the sequential decision making required for dynamic control, the optimization problem is modeled as a Bayesian Adaptive Markov Decision Process (BAMDP). To maintain computational tractability despite the continuous belief space inherent in the BAMDP formulation, we employ a Deep Q-Network (DQN) architecture acting as an approximate dynamic programming solver. The Bayesian framework represents model uncertainty explicitly, updates beliefs as new inspection evidence becomes available, and allows prior domain knowledge on supplier quality to be incorporated into the learning process. The BAMDP formulation is used to learn a set of adaptive inspection policies that adjust the IQC strategy over time to achieve conflicting goals: reducing inspection costs while maintaining standard quality, minimizing energy consumption, and lowering CO₂-equivalent emissions. The goal is to find robust policies that balance these trade-offs under different quality and demand conditions. This methodology aligns with the principles of Industry 5.0 by leveraging advanced artificial intelligence (AI) methods, such as reinforcement learning (RL), coupled with a stochastic simulation of the production system, based on a geometric/physical model of the component’s tolerance chains, to support decision-makers in designing and assessing sustainable IQC strategies. Comparative simulations on the case study, including a benchmark against ISO 2859-1 sampling plans, confirm that this dynamic and risk-aware optimization paradigm can reduce overall cost, energy use, and environmental impact across various quality conditions, while preserving outgoing quality. Full article

(This article belongs to the Special Issue Leveraging AI in Industry 4.0: Overcoming Challenges and Seizing Opportunities for Sustainable Operations Management)

40 pages, 2120 KB

Open AccessArticle

Transformer–DDQN-Based Explainable and Active Intrusion Detection Architecture for Network Traffic Analysis

by Ayşe Okutan Kara and Aytuğ Boyacı

Appl. Sci. 2026, 16(12), 5912; https://doi.org/10.3390/app16125912 - 11 Jun 2026

Viewed by 42

Abstract

This study proposes a novel intrusion detection and response architecture that formulates network traffic analysis as a sequential decision-making problem rather than a static classification task. The architecture integrates a Transformer Encoder for temporal feature extraction with a Dueling Double Deep Q-Network (DDQN) [...] Read more.

This study proposes a novel intrusion detection and response architecture that formulates network traffic analysis as a sequential decision-making problem rather than a static classification task. The architecture integrates a Transformer Encoder for temporal feature extraction with a Dueling Double Deep Q-Network (DDQN) to enable autonomous and risk-aware security decisions. Network flows are modeled within a Markov Decision Process, where the agent learns an optimal policy over a hierarchical action space consisting of IGNORE, LOG, ESCALATE, and BLOCK actions. To evaluate generalization capability, a transfer learning-based cross-domain adaptation strategy was employed. The CICIDS2018 and CICIoT2023 datasets were re-partitioned using a stratified 70/15/15 train/validation/test split. The proposed model achieved high detection performance on these datasets with F1-scores of 99.48% and 99.13%, respectively. After transfer learning to the AWID3 dataset, the model preserved strong generalization capability with F1-scores of 96.76% and 96.61%, demonstrating its robustness across wired, IoT, and wireless network environments. A risk-aware reward function is designed to balance detection accuracy and operational cost, while Integrated Gradients-based explainability is incorporated to analyze decision behavior. Experimental results further show that the proposed Transformer–DDQN framework achieves more stable learning, lower optimization loss, and more consistent action policies compared to alternative reinforcement learning-based approaches. The model operates with high computational efficiency while maintaining real-time processing capability in high-throughput network environments. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

19 pages, 5656 KB

Open AccessArticle

Deep Reinforcement-Learning-Optimized Adaptive EKF for Robust Utility Harmonic Impedance Estimation

by Zhirong Tang, Xin Wei, Zhaobin Wei, Fei Tan, Cong Tian, Ying Tang and Xuedou Xiong

Electronics 2026, 15(12), 2557; https://doi.org/10.3390/electronics15122557 - 10 Jun 2026

Viewed by 154

Abstract

Accurate estimation of the utility harmonic impedance at the Point of Common Coupling (PCC) is critical for harmonic pollution management in industrial power grids. Existing non-invasive methods rely heavily on restrictive assumptions that are rarely satisfied in practice, and conventional filtering-based approaches suffer [...] Read more.

Accurate estimation of the utility harmonic impedance at the Point of Common Coupling (PCC) is critical for harmonic pollution management in industrial power grids. Existing non-invasive methods rely heavily on restrictive assumptions that are rarely satisfied in practice, and conventional filtering-based approaches suffer from accuracy degradation in dynamic scenarios due to fixed-rule updates of the noise covariance. This paper proposes a deep reinforcement learning (RL)-optimized adaptive extended Kalman filter (AEKF) method for robust harmonic impedance estimation. A state-space model is established without restrictive assumptions, and a deep Q-network (DQN) framework is designed to optimize noise covariance updates adaptively. Simulation results show that the method achieves reliable estimation under normal conditions. Although errors rise under strong noise, it remains stable and exhibits better noise robustness than conventional methods. Field measurements in actual power grid environments further verified the feasibility and application potential of the proposed method in field engineering. Full article

(This article belongs to the Special Issue Reinforcement Learning: Emerging Techniques and Future Prospects)

► Show Figures

Figure 1

49 pages, 37729 KB

Open AccessFeature PaperArticle

Comparative Evaluation of Classical, Hybrid, and RL-Based 3D Trajectory Planning for Multi-UAV Systems

by Ilya Mashkov, Angelika Kochetkova, Valerii Serpiva, Grigoriy Yashin and Pavel Golikov

Drones 2026, 10(6), 452; https://doi.org/10.3390/drones10060452 (registering DOI) - 9 Jun 2026

Viewed by 154

Abstract

This study investigates offline trajectory planning strategies for multi-UAV missions in complex 3D environments, with the aim of systematically comparing classical, hybrid, and reinforcement learning-based approaches under unified evaluation conditions. Two simulation scenarios were considered: an uneven terrain environment with elevation-induced constraints and [...] Read more.

This study investigates offline trajectory planning strategies for multi-UAV missions in complex 3D environments, with the aim of systematically comparing classical, hybrid, and reinforcement learning-based approaches under unified evaluation conditions. Two simulation scenarios were considered: an uneven terrain environment with elevation-induced constraints and a planar obstacle-rich environment. The evaluated planners include graph-based (A*), sampling-based (RRT, RRT*), gradient-based (APF), a hybrid APF B-RRT* method, and a DQN-based reinforcement learning planner with spatial attention and reward shaping. Performance was assessed using geometric, safety, energetic, and computational metrics. The results show that A* consistently produces the shortest and most stable trajectories with low energy consumption but at increased computational cost in high-resolution environments. Sampling-based planners exhibit higher variability and planning time, while APF achieves computational efficiency but may violate safety margins. The hybrid planner provides improved robustness across scenarios. The reinforcement learning planner demonstrates consistent safety compliance and strong inter-UAV separation in both environments, also with longer trajectories and higher energy usage. Overall, the study highlights trade-offs between determinism, scalability, safety, and adaptability across planning paradigms. Full article

(This article belongs to the Special Issue Advances in Cartography, Mission Planning, Path Search, and Path Following for Drones: 2nd Edition)

► Show Figures

Figure 1

22 pages, 10692 KB

Open AccessArticle

Research on Auxiliary Decision-Making System for Manned Underwater Vehicle Damage Management Based on Deep Reinforcement Learning

by Qingchao Xu, Hui Feng, Haixiang Xu, Fang Tang, Yong Wang, Yifeng Chen and Liping Zhou

Sensors 2026, 26(12), 3678; https://doi.org/10.3390/s26123678 - 9 Jun 2026

Viewed by 183

Abstract

In underwater navigation, MUVs risk damage from obstacles and equipment. Effective damage management supports timely decisions and maximizes functionality recovery. Existing approaches can be roughly categorized into rule-based reasoning, case-based reasoning and expert systems. However, the primary limitation of the existing approaches is [...] Read more.

In underwater navigation, MUVs risk damage from obstacles and equipment. Effective damage management supports timely decisions and maximizes functionality recovery. Existing approaches can be roughly categorized into rule-based reasoning, case-based reasoning and expert systems. However, the primary limitation of the existing approaches is their inability to adapt to dynamically changing scenarios. In this paper, an auxiliary decision-making system (ADMS) for manned underwater vehicle (MUV) damage management based on deep reinforcement learning (DRL) is proposed to address the problem of cabin flooding. This system is designed to provide auxiliary decision-making in emergency situations and help preserve MUV vitality. Furthermore, a comprehensive States–Actions cluster encompassing various damage management measures for real damage scenarios is constructed and digitized. Moreover, several novel reward functions are developed to ensure the DRL model obtains a safe strategy with ADMS operations. Finally, the MUV buoyancy and stability vitality evaluation criteria are defined and analyzed. The simulation results show that the auxiliary decision-making measures given by the ADMS in the damage state are effective and rational. The evaluation criterion for buoyancy vitality can exceed 38%, while the criterion for stability vitality can surpass 92%, with an optimal value exceeding 99%. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

30 pages, 11873 KB

Open AccessArticle

Unsupervised Oil Spill Detection in Shipborne Radar Imagery Using Autoencoder-Enhanced Q-Learning and Improved Bat Optimization

by Jin Yan, Binghui Chen, Jin Xu, Zekun Guo, Minghao Yan, Mengxin Sun and Lin Qiao

Remote Sens. 2026, 18(12), 1876; https://doi.org/10.3390/rs18121876 - 7 Jun 2026

Viewed by 217

Abstract

Marine oil spill accidents pose a serious threat to the marine ecological environment. Therefore, efficient and accurate oil spill detection is of great significance for emergency response. To address the issues of blurred oil-slick boundaries, prominent co-frequency interference and severe speckle noise in [...] Read more.

Marine oil spill accidents pose a serious threat to the marine ecological environment. Therefore, efficient and accurate oil spill detection is of great significance for emergency response. To address the issues of blurred oil-slick boundaries, prominent co-frequency interference and severe speckle noise in shipborne radar images, this study proposed an oil spill detection method based on radar data collected from a real oil spill event at a terminal in Dalian Bay. The proposed method integrates an autoencoder, feature dimensionality reduction, pseudo-labeling, reinforcement learning and an improved intelligent optimization algorithm. First, an autoencoder was adopted to extract compact nonlinear local features from the radar images, and principal component analysis (PCA) was employed for feature dimensionality reduction. Subsequently, K-Means clustering was used to construct pseudo-labels, and the reduced features were discretized to build the state space for reinforcement learning. Based on this, the Q-learning algorithm was introduced to automatically extract the region of interest (ROI). Finally, for the ROI, an improved bat algorithm incorporating a dynamic weighting factor and a multi-constraint fitness function was designed to achieve fine segmentation of the oil-slick target. The experimental results showed that the proposed method outperformed classic intelligent optimization algorithms and the conventional bat optimization algorithm in oil-slick segmentation performance. Ablation experiments further verified the effectiveness of autoencoder-based feature learning, K-Means pseudo-labeling, and Q-learning-based ROI localization. This method may provide a new technical approach for timely offshore oil spill monitoring and emergency analysis. Full article

(This article belongs to the Special Issue Advances in Deep Learning and Machine Learning for Remote Sensing Image Analysis)

► Show Figures

Figure 1

26 pages, 628 KB

Open AccessArticle

A Two-Stage PPO–RLMPA Framework for Dynamic Economic Dispatch with Renewable Energy and Storage Integration

by Kemal Keskin

Biomimetics 2026, 11(6), 400; https://doi.org/10.3390/biomimetics11060400 - 6 Jun 2026

Viewed by 186

Abstract

The Dynamic Economic Dispatch (DED) problem underpins the cost-efficient and reliable operation of modern power systems, yet valve-point loading, ramp-rate coupling, and the growing share of intermittent wind, photovoltaic, and pumped-storage hydro (PSH) resources render it highly non-convex. Metaheuristic methods typically require large [...] Read more.

The Dynamic Economic Dispatch (DED) problem underpins the cost-efficient and reliable operation of modern power systems, yet valve-point loading, ramp-rate coupling, and the growing share of intermittent wind, photovoltaic, and pumped-storage hydro (PSH) resources render it highly non-convex. Metaheuristic methods typically require large computational budgets and hand-crafted constraint-handling rules, whereas deep reinforcement learning agents rarely guarantee the feasibility of the schedules they produce. To address both limitations, this paper proposes a Two-Stage PPO–RLMPA framework that couples data-driven policy learning with a biomimetic metaheuristic search inspired by marine predator–prey dynamics. In the first stage, a Proximal Policy Optimization (PPO) agent is trained on a Markov Decision Process reformulation of DED in which a deterministic Safety Layer projects every raw action onto the feasible set defined by capacity, ramp-rate, and power-balance constraints, so the policy only observes physically viable transitions. In the second stage, the PPO dispatch is refined by the RLMPA module, a Marine Predators Algorithm (MPA) whose exploration–exploitation balance, Lévy-flight foraging, and Fish Aggregating Devices (FADs) attraction mechanisms emulate strategies documented in marine ecosystems; its step-size factor and FADs probability are further adapted online by a Deep Q-Network. This biomimetics-informed refinement translates predator–prey foraging intelligence into economically efficient thermal dispatch under valve-point non-convexity. Across 30 independent runs on ten- and twenty-unit benchmark systems with wind, PV, and PSH integration, the framework attains best costs of USD 368,763 and USD 737,348 on Test Systems 1 and 2, corresponding to reductions of approximately

1.1 %

and

4.4 %

over the CFCEP baseline, with zero post-repair constraint violations in every run. Full article

(This article belongs to the Special Issue Nature-Inspired Sustainable Engineering)

► Show Figures

Figure 1

32 pages, 4524 KB

Open AccessArticle

An Anomaly-Aware, Q-Learning Framework for Real-Time Scheduling in Multi-Station EV Charging Networks

by Md Sabbir Hossen, Gobbi Ramasamy, Ngu Eng Eng and Marran Al Qwaid

Electronics 2026, 15(11), 2494; https://doi.org/10.3390/electronics15112494 - 5 Jun 2026

Viewed by 135

Abstract

Electric vehicle (EV) charging networks face major operational challenges, including demand uncertainty, peak-load congestion, and anomalous charging behavior, particularly in multi-station environments. This study proposes an anomaly-aware Q-learning framework for real-time scheduling in multi-station EV charging systems by integrating short-term load forecasting, anomaly [...] Read more.

Electric vehicle (EV) charging networks face major operational challenges, including demand uncertainty, peak-load congestion, and anomalous charging behavior, particularly in multi-station environments. This study proposes an anomaly-aware Q-learning framework for real-time scheduling in multi-station EV charging systems by integrating short-term load forecasting, anomaly detection, and intelligent scheduling within a unified operational pipeline. The framework combines Prophet, XGBoost, Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) models for short-term demand forecasting, while Convolutional Neural Networks (CNN), Autoencoders, and Isolation Forests are employed for anomaly detection. Forecasting and anomaly information are incorporated into a Q-learning scheduler to support adaptive charger allocation and congestion management. Evaluation using a four-year, real-world dataset comprising more than 2000 EV charging sessions demonstrates improved scheduling performance, achieving reductions in peak load and waiting time while improving energy delivery consistency. The framework further demonstrates low scheduling latency, supporting suitability for real-time deployment in OCPP-compliant smart charging infrastructures. Full article

(This article belongs to the Section Systems & Control Engineering)

► Show Figures

Figure 1

30 pages, 1967 KB

Open AccessArticle

Optimizing Spatial State Representation in Reinforcement Learning for Coverage Path Planning in UAV Search Missions

by Hu Yuan, Shengkai Yan, Zhuzhi Liu, Suli Wang, Qiang Wang and Gaocheng Chen

Drones 2026, 10(6), 442; https://doi.org/10.3390/drones10060442 - 5 Jun 2026

Viewed by 229

Abstract

To enhance path planning efficiency in unmanned aerial vehicle (UAV) search missions in complex environments, this paper proposes a coverage path planning (CPP) algorithm for a UAV that integrates the deep Q-network (DQN) with the A* algorithm (DQN-A*). In the proposed DQN-A* algorithm, [...] Read more.

To enhance path planning efficiency in unmanned aerial vehicle (UAV) search missions in complex environments, this paper proposes a coverage path planning (CPP) algorithm for a UAV that integrates the deep Q-network (DQN) with the A* algorithm (DQN-A*). In the proposed DQN-A* algorithm, a dual-driven reward mechanism is established, comprising a probability-weighted reward and a step-dependent reward, steering the UAV toward high-probability regions. Furthermore, to handle previously unknown obstacles in real time, the algorithm employs a multi-stage obstacle-identification strategy, enabling the UAV to improve coverage of traversable cells by dynamically adjusting its local path when newly detected obstacles are encountered. A theoretical analysis derives a principled recommended range for the UAV positional identifier based on statistical feature analysis; this range is then validated through extensive simulations. Additionally, Hamiltonian path pre-training is introduced to accelerate convergence. Comparative simulations demonstrate that the proposed DQN-A* algorithm achieves higher area-coverage and target-detection probabilities than benchmark algorithms in environments with unknown obstacles, offering valuable insights for positional encoding in deep reinforcement learning (DRL)-based robotic coverage problems. Full article

► Show Figures

Figure 1

9 pages, 1097 KB

Open AccessProceeding Paper

A Reinforcement Learning-Based Adaptive Voltage Regulation Strategy for Wind Energy Integrated Distribution Networks

by Ramesh Kumar Behara and Akshay Kumar Saha

Eng. Proc. 2026, 140(1), 56; https://doi.org/10.3390/engproc2026140056 - 5 Jun 2026

Viewed by 124

Abstract

The inherent variability of wind power generation poses major challenges for maintaining voltage stability and power quality in modern distribution networks. Conventional rule-based and optimisation-driven control strategies often fail to respond effectively to these rapid fluctuations. To address this limitation, this paper introduces [...] Read more.

The inherent variability of wind power generation poses major challenges for maintaining voltage stability and power quality in modern distribution networks. Conventional rule-based and optimisation-driven control strategies often fail to respond effectively to these rapid fluctuations. To address this limitation, this paper introduces an adaptive reinforcement learning (RL) framework that autonomously optimises reactive power compensation and on-load tap changer (OLTC) operations in real time. The proposed deep Q-network (DQN) agent learns optimal control policies through continuous interaction with the grid environment, minimising voltage deviations and network losses under dynamic wind conditions. Using the IEEE 33-bus distribution test system, the trained DQN achieved a substantial improvement in voltage regulation, reducing the average deviation from 0.041 p.u. (rule-based) to 0.014 p.u. and lowered power losses by 24.6/5 compared to traditional optimisation techniques such as Particle Swarm Optimisation (PSO) and static rule-based control. Furthermore, the DQN controller demonstrated the fastest learning convergence within 120 episodes, validating its potential for real-time adaptive voltage control. Overall, the study highlights RL as a promising, scalable solution for autonomous voltage regulation in smart grids integrated with renewables. Full article

► Show Figures

Figure 1

58 pages, 22507 KB

Open AccessArticle

Adaptive Traffic Signal Control Using Multi-Agent Reinforcement Learning: A Comparison of Control Strategies

by Mahmoud Owais, Badr O. Mohammed, Abdulrahman A. Kamal, Abdulrahman Shaban, Ahmed H. Mostafa, Kareem Hatem, John Emad, Salah T. Younis, Samia A. Ali, Alaa E. Abdel-Hakim and Islam M. Alkabbany

Sustainability 2026, 18(11), 5702; https://doi.org/10.3390/su18115702 - 4 Jun 2026

Viewed by 1088

Abstract

Urban traffic congestion remains a persistent challenge for conventional fixed-time signal control, particularly under fluctuating and asymmetric demand. Although multi-agent reinforcement learning (MARL) has shown promise for adaptive traffic signal control, previous studies have often focused on isolated intersections, simplified synthetic networks, or [...] Read more.

Urban traffic congestion remains a persistent challenge for conventional fixed-time signal control, particularly under fluctuating and asymmetric demand. Although multi-agent reinforcement learning (MARL) has shown promise for adaptive traffic signal control, previous studies have often focused on isolated intersections, simplified synthetic networks, or deep-learning-based controllers without systematically comparing tabular and deep-value-based multi-agent approaches under equivalent operating conditions. This study addresses this gap by comparing three traffic signal control strategies: fixed-time control, Multi-Agent Tabular Q-Learning, and multi-agent Deep Q-Network control (MADQN). The evaluation was conducted in a microscopic traffic simulation environment using two complementary testbeds: a synthetic two-intersection corridor, which enables controlled analysis of multi-agent coordination, and a real-world digital twin of the 25 January Corridor in Assiut, Egypt, which tests controller robustness under asymmetric geometry and realistic turning movements. The controllers are assessed under low-, medium-, and high-demand scenarios using queue length, cumulative delay, and Time-To-Collision as operational and safety-related indicators. The results show that MARL-based controllers generally outperform fixed-time control, but their relative performance depends on demand intensity and network complexity. MADQN provides stronger generalization in low-demand and queue-dissipation conditions, whereas Tabular Q-Learning remains highly competitive and can achieve superior delay reduction in several medium- and high-demand cases. These findings indicate that deeper MARL architectures are not universally superior; rather, adaptive signal control deployment should match the controller architecture to the operational objective, traffic demand regime, and practical complexity of the target corridor. Full article

(This article belongs to the Special Issue Sustainable and Smart Transportation Systems)

► Show Figures

Figure 1

39 pages, 3075 KB

Open AccessArticle

From Statistical Filtering to Adaptive Reinforcement Learning: A Progressive Framework for IoT Time-Series Anomaly Detection

by Luis Miguel Pires and Vitor Fialho

Appl. Sci. 2026, 16(11), 5608; https://doi.org/10.3390/app16115608 - 3 Jun 2026

Viewed by 181

Abstract

This paper proposes a lightweight and adaptive anomaly detection framework for Internet of Things (IoT) time-series data that progressively combines statistical filtering with reinforcement learning (RL)-based decision mechanisms. Three classical statistical filters, Hampel, interquartile range (IQR), and Z-score, are initially evaluated under controlled [...] Read more.

This paper proposes a lightweight and adaptive anomaly detection framework for Internet of Things (IoT) time-series data that progressively combines statistical filtering with reinforcement learning (RL)-based decision mechanisms. Three classical statistical filters, Hampel, interquartile range (IQR), and Z-score, are initially evaluated under controlled IoT anomaly scenarios. While fixed-parameter configurations perform well under specific conditions, their performance degrades in non-stationary and heterogeneous environments. To address this limitation, a tabular Q-learning agent is introduced to dynamically select both filtering methods and their associated parameters according to scenario-specific signal characteristics. By extending the action space to include joint filter and parameter selection, the framework improves adaptability while reducing the need for manual tuning. A multi-agent reinforcement learning (MARL) formulation is further introduced to support distributed learning across heterogeneous IoT environments. The framework is additionally evaluated using real-world IoT temperature data augmented with controlled anomaly injection, enabling reproducible benchmarking under partially realistic sensing conditions. Experimental results show that both RL and MARL maintain stable detection performance across heterogeneous sensor streams. While MARL does not systematically outperform the single-agent approach in detection accuracy, it improves scalability and supports scenario-specific policy specialization, which is particularly relevant for distributed IoT deployments. Overall, the proposed approach provides a lightweight, interpretable, and computationally efficient solution for adaptive anomaly detection in resource-constrained IoT systems. Full article

(This article belongs to the Special Issue Software Engineering: Computer Science and System 2026)

► Show Figures

Figure 1

38 pages, 2515 KB

Open AccessFeature PaperArticle

Replacing the Genetic Algorithm with Multi-Objective Bacterial Foraging Optimization in XCS

by Damijan Novak, Iztok Fister and Jani Dugonik

Mathematics 2026, 14(11), 1947; https://doi.org/10.3390/math14111947 - 2 Jun 2026

Viewed by 154

Abstract

This article is positioned into a cross-section of machine learning, cybersecurity, and nature-inspired domains. This article’s main objective is to use eXtended Classifier System (XCS), a known adaptive Reinforcement Learning (RL) algorithm, and alter it to use the Bacterial Foraging Optimization Algorithm (BFOA) [...] Read more.

This article is positioned into a cross-section of machine learning, cybersecurity, and nature-inspired domains. This article’s main objective is to use eXtended Classifier System (XCS), a known adaptive Reinforcement Learning (RL) algorithm, and alter it to use the Bacterial Foraging Optimization Algorithm (BFOA) instead of its original Genetic Algorithm component. This modification transforms XCS into a multi-criteria optimization system (BFOA-XCS) through evaluation of classifier fitness across accuracy, stability, and variance reduction while simultaneously using weighted-sum scalarization. In this way, the method leverages BFOA’s chemotactic search and population dynamics. The proposed BFOA-XCS integration was validated in two experimental phases. First, evaluations across 19 benchmark machine learning datasets demonstrated that Improved BFOA (IBFOA)-XCS achieves the best Friedman ranking among all XCS variants (marginally significant at α = 0.10, supported by medium-to-large effect sizes), with notable variance reduction (15.2 percent) over standard GA-XCS. Second, in a dynamic cybersecurity simulation environment with six attack scenarios, all XCS variants significantly outperformed three of five deep RL baselines (Deep Q-Network (DQN), Q-Learning, and Policy Gradient (REINFORCE)) with large statistical effect sizes. Proximal Policy Optimization (PPO) and Soft Actor–Critic (SAC) achieved higher overall rewards but at substantially greater computational cost: PPO at 5.3× and SAC at 26.1× the XCS compute time per run (2 min 8 s and 10 min 26 s, respectively, vs. 24 s for XCS). The results demonstrate that rule-based XCS with BFOA optimization offers a compelling alternative to neural approaches for cybersecurity defense, combining competitive performance with interpretable policies and substantially lower computational requirements. Full article

(This article belongs to the Special Issue Swarm Intelligence and Optimization: Algorithms and Applications)

► Show Figures

Figure 1

33 pages, 3694 KB

Open AccessArticle

Spectral Efficiency Enhancement in V2X Communications via Joint Subcarrier Assignment and Power Allocation: A Multi-DQN Agent Approach

by Ahmed Ali Al-Masry, Michael Ibrahim, Hesham Elbadawy, Hadia El-Hennawy and Mehaseb Ahmed

Telecom 2026, 7(3), 66; https://doi.org/10.3390/telecom7030066 - 2 Jun 2026

Viewed by 256

Abstract

The rapid increase in interest for Vehicle-to-Everything (V2X) networks has created significant challenges in efficient radio resource management. This paper addresses the problem of joint subcarrier assignment and power allocation to maximize the spectral efficiency of the system. First, this paper mathematically formulates [...] Read more.

The rapid increase in interest for Vehicle-to-Everything (V2X) networks has created significant challenges in efficient radio resource management. This paper addresses the problem of joint subcarrier assignment and power allocation to maximize the spectral efficiency of the system. First, this paper mathematically formulates resource allocation and power allocation as an optimization problem, which is solved using conventional optimization methodologies to establish a baseline for performance benchmarking. To overcome the high computational complexity associated with traditional optimization, we subsequently propose a Multi-Agent Deep Q-Network (Multi-DQN) agent framework based on deep reinforcement learning (DRL). The proposed agent learns optimal allocation strategies through interaction with the environment, enabling adaptive and real-time decision-making. The system performance is investigated in different environments under both line-of-sight (LOS) and non-line-of-sight (NLOS) scenarios, addressing a gap in prior approaches. Simulation results demonstrate that the proposed Multi-DQN agent approach significantly outperforms the enhanced conventional benchmark, achieving higher spectral efficiency (SE) while substantially reducing the computational complexity. Full article

(This article belongs to the Special Issue Wireless Communications for UAVs, IoT, 5G Technologies, Information and Coding Theory)

► Show Figures

Figure 1

32 pages, 8108 KB

Open AccessArticle

Deep Q-Network-Based Backstepping Controller for Synchronization of Fractional-Order Chaotic System

by Murat Erhan Çimen

Appl. Sci. 2026, 16(11), 5536; https://doi.org/10.3390/app16115536 - 2 Jun 2026

Viewed by 116

Abstract

This study presents a reinforcement learning-based approach for the adaptive control of chaotic systems, where a Deep Q-Network (DQN) is employed to adjust the parameters of a nonlinear backstepping controller in order to maximize a predefined reward function. The proposed method is applied [...] Read more.

This study presents a reinforcement learning-based approach for the adaptive control of chaotic systems, where a Deep Q-Network (DQN) is employed to adjust the parameters of a nonlinear backstepping controller in order to maximize a predefined reward function. The proposed method is applied to a fractional-order chaotic system previously introduced as an integer order chaotic system in the literature. A comprehensive dynamical analysis of the system is conducted for different fractional orders, including phase portraits, bifurcation diagrams, and Lyapunov exponents. A nonlinear backstepping controller is then designed for the secondary system to achieve primary–secondary synchronization. The main novelty of this study lies in the integration of a DQN-assisted backstepping controller to perform synchronization across various fractional orders, specifically 0.999, 0.99, and 0.95. The results demonstrate that DQN-based backstepping controller successfully achieves synchronization despite the challenges posed by different fractional-order chaotic dynamics and produces better performance compared to the conventional backstepping controller. Furthermore, the chattering amplitude inherent to the conventional control law is significantly reduced. The evolution of the controller parameters, reward values, cumulative rewards, and loss values during training are presented and discussed in detail. Future studies will extend this approach to other reinforcement learning algorithms and nonlinear adaptive control systems. Full article

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

► Show Figures

Figure 1

Search Results (1,131)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,131)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI