Saved Queries

Deep reinforcement learning (DRL) has been widely adopted to solve decision-making problems in complex environments, demonstrating high performance across various domains. However, DRL-based FPS agents are typically trained with a traditional, monolithic policy that integrates heterogeneous functionalities into a single network. This design hinders policy interpretability and severely limits structural flexibility, since even minor design changes in the action space often necessitate complete retraining of the entire network. These constraints are particularly problematic in game development, where behavioral characteristics are distinct and design updates are frequent. To address these issues, this study proposes a Modular Reinforcement Learning (MRL) framework. Unlike monolithic approaches, this framework decomposes complex agent behaviors into semantically distinct action modules, such as movement and attack, which are optimized in parallel with specialized reward structures. Each module learns a policy specialized for its own behavioral characteristics, and the final agent behavior is obtained by combining the outputs of these modules. This modular design enhances structural flexibility by allowing selective modification and retraining of specific functions, thereby reducing the inefficiency associated with retraining a monolithic policy. Experimental results on the 1-vs-1 training map show that the proposed modular agent achieves a maximum win rate of 83.4% against a traditional monolithic policy agent, demonstrating superior in-game performance. In addition, the retraining time required for modifying specific behaviors is reduced by up to 30%, confirming improved efficiency for development environments that require iterative behavioral updates. Full article

►▼ Show Figures

Graphical abstract

26 pages, 2618 KB

Open AccessArticle

A Cascaded Batch Bayesian Yield Optimization Method for Analog Circuits via Deep Transfer Learning

by Ziqi Wang, Kaisheng Sun and Xiao Shi

Electronics 2026, 15(3), 516; https://doi.org/10.3390/electronics15030516 - 25 Jan 2026

Abstract

In nanometer integrated-circuit (IC) manufacturing, advanced technology scaling has intensified the effects of process variations on circuit reliability and performance. Random fluctuations in parameters such as threshold voltage, channel length, and oxide thickness further degrade design margins and increase the likelihood of functional failures. These variations often lead to rare circuit failure events, underscoring the importance of accurate yield estimation and robust design methodologies. Conventional Monte Carlo yield estimation is computationally infeasible as millions of simulations are required to capture failure events with extremely low probability. This paper presents a novel reliability-based circuit design optimization framework that leverages deep transfer learning to improve the efficiency of repeated yield analysis in optimization iterations. Based on pre-trained neural network models from prior design knowledge, we utilize model fine-tuning to accelerate importance sampling (IS) for yield estimation. To improve estimation accuracy, adversarial perturbations are introduced to calibrate uncertainty near the model decision boundary. Moreover, we propose a cascaded batch Bayesian optimization (CBBO) framework that incorporates a smart initialization strategy and a localized penalty mechanism, guiding the search process toward high-yield regions while satisfying nominal performance constraints. Experimental validation on SRAM circuits and amplifiers reveals that CBBO achieves a computational speedup of 2.02×–4.63× over state-of-the-art (SOTA) methods, without compromising accuracy and robustness. Full article

(This article belongs to the Topic Advanced Integrated Circuit Design and Application)

►▼ Show Figures

Figure 1

20 pages, 1978 KB

Open AccessArticle

UAV-Based Forest Fire Early Warning and Intervention Simulation System with High-Accuracy Hybrid AI Model

by Muhammet Sinan Başarslan and Hikmet Canlı

Appl. Sci. 2026, 16(3), 1201; https://doi.org/10.3390/app16031201 - 23 Jan 2026

Viewed by 155

Abstract

In this study, a hybrid deep learning model that combines the VGG16 and ResNet101V2 architectures is proposed for image-based fire detection. In addition, a balanced drone guidance algorithm is developed to efficiently assign tasks to available UAVs. In the fire detection phase, the hybrid model created by combining the VGG16 and ResNet101V2 architectures has been optimized with Global Average Pooling and layer merging techniques to increase classification success. The DeepFire dataset was used throughout the training process, achieving an extremely high accuracy rate of 99.72% and 100% precision. After fire detection, a task assignment algorithm was developed to assign existing drones to fire points at minimum cost and with balanced load distribution. This algorithm performs task assignments using the Hungarian (Kuhn–Munkres) method and cost optimization, and is adapted to direct approximately equal numbers of drones to each fire when the number of fires is less than the number of drones. The developed system was tested in a Python-based simulation environment and evaluated using performance metrics such as total intervention time, energy consumption, and task balance. The results demonstrate that the proposed hybrid model provides highly accurate fire detection and that the task assignment system creates balanced and efficient intervention scenarios. Full article

(This article belongs to the Special Issue Advanced Analysis and Technology in Fire Science and Engineering - 2nd Edition)

►▼ Show Figures

Figure 1

16 pages, 993 KB

Open AccessArticle

TSS GAZ PTP: Towards Improving Gumbel AlphaZero with Two-Stage Self-Play for Multi-Constrained Electric Vehicle Routing Problems

by Hui Wang, Xufeng Zhang and Chaoxu Mu

Smart Cities 2026, 9(2), 21; https://doi.org/10.3390/smartcities9020021 - 23 Jan 2026

Viewed by 62

Abstract

Deep reinforcement learning (DRL) with self-play has emerged as a promising paradigm for solving combinatorial optimization (CO) problems. The recently proposed Gumbel AlphaZero Plan-to-Play (GAZ PTP) framework adopts a competitive training setup between a learning agent and an opponent to tackle classical CO tasks such as the Traveling Salesman Problem (TSP). However, in complex and multi-constrained environments like the Electric Vehicle Routing Problem (EVRP), standard self-play often suffers from opponent mismatch: when the opponent is either too weak or too strong, the resulting learning signal becomes ineffective. To address this challenge, we introduce Two-Stage Self-Play GAZ PTP (TSS GAZ PTP), a novel DRL method designed to maintain adaptive and effective learning pressure throughout the training process. In the first stage, the learning agent, guided by Gumbel Monte Carlo Tree Search (MCTS), competes against a greedy opponent that follows the best historical policy. As training progresses, the framework transitions to a second stage in which both agents employ Gumbel MCTS, thereby establishing a dynamically balanced competitive environment that encourages continuous strategy refinement. The primary objective of this work is to develop a robust self-play mechanism capable of handling the high-dimensional constraints inherent in real-world routing problems. We first validate our approach on the TSP, a benchmark used in the original GAZ PTP study, and then extend it to the multi-constrained EVRP, which incorporates practical limitations including battery capacity, time windows, vehicle load limits, and charging infrastructure availability. The experimental results show that TSS GAZ PTP consistently outperforms existing DRL methods, with particularly notable improvements on large-scale instances. Full article

(This article belongs to the Special Issue Intelligent Control and Planning for Urban Network Efficiency and Safety Optimization)

►▼ Show Figures

Figure 1

25 pages, 5757 KB

Open AccessArticle

Heatmap-Assisted Reinforcement Learning Model for Solving Larger-Scale TSPs

by Guanqi Liu and Donghong Xu

Electronics 2026, 15(3), 501; https://doi.org/10.3390/electronics15030501 - 23 Jan 2026

Viewed by 73

Abstract

Deep reinforcement learning (DRL)-based algorithms for solving the Traveling Salesman Problem (TSP) have demonstrated competitive potential compared to traditional heuristic algorithms on small-scale TSP instances. However, as the problem size increases, the NP-hard nature of the TSP leads to exponential growth in the combinatorial search space, state–action space explosion, and sharply increased sample complexity, which together cause significant performance degradation for most existing DRL-based models when directly applied to large-scale instances. This research proposes a two-stage reinforcement learning framework, termed GCRL-TSP (Graph Convolutional Reinforcement Learning for the TSP), which consists of a heatmap generation stage based on a graph convolutional neural network, and a heatmap-assisted Proximal Policy Optimization (PPO) training stage, where the generated heatmaps are used as auxiliary guidance for policy optimization. First, we design a divide-and-conquer heatmap generation strategy: a graph convolutional network infers m-node sub-heatmaps, which are then merged into a global edge-probability heatmap. Second, we integrate the heatmap into PPO by augmenting the state representation and restricting the action space toward high-probability edges, improving training efficiency. On standard instances with 200/500/1000 nodes, GCRL-TSP achieves a Gap% of 4.81/4.36/13.20 (relative to Concorde) with runtimes of 36 s/1.12 min/4.65 min. Experimental results show that GCRL-TSP achieves more than twice the solving speed compared to other TSP solving algorithms, while obtaining solution quality comparable to other algorithms on TSPs ranging from 200 to 1000 nodes. Full article

(This article belongs to the Section Artificial Intelligence)

47 pages, 2601 KB

Open AccessReview

A Review of AI-Driven Engineering Modelling and Optimization: Methodologies, Applications and Future Directions

by Jian-Ping Li, Nereida Polovina and Savas Konur

Algorithms 2026, 19(2), 93; https://doi.org/10.3390/a19020093 (registering DOI) - 23 Jan 2026

Viewed by 63

Abstract

Engineering is suffering a significant change driven by the integration of artificial intelligence (AI) into engineering optimization in design, analysis, and operational efficiency across numerous disciplines. This review synthesizes the current landscape of AI-driven optimization methodologies and their impacts on engineering applications. In the literature, several frameworks for AI-based engineering optimization have been identified: (1) machine learning models are trained as objective and constraint functions for optimization problems; (2) machine learning techniques are used to improve the efficiency of optimization algorithms; (3) neural networks approximate complex simulation models such as finite element analysis (FEA) and computational fluid dynamics (CFD) and this makes it possible to optimize complex engineering systems; and (4) machine learning predicts design parameters/initial solutions that are subsequently optimized. Fundamental AI technologies, such as artificial neural networks and deep learning, are examined in this paper, along with commonly used AI-assisted optimization strategies. Representative applications of AI-driven engineering optimization have been surveyed in this paper across multiple fields, including mechanical and aerospace engineering, civil engineering, electrical and computer engineering, chemical and materials engineering, energy and management. These studies demonstrate how AI enables significant improvements in computational modelling, predictive analytics, and generative design while effectively handling complex multi-objective constraints. Despite these advancements, challenges remain in areas such as data quality, model interpretability, and computational cost, particularly in real-time environments. Through a systematic analysis of recent case studies and emerging trends, this paper provides a critical assessment of the state of the art and identifies promising research directions, including physics-informed neural networks, digital twins, and human–AI collaborative optimization frameworks. The findings highlight AI’s potential to redefine engineering optimization paradigms, while emphasizing the need for robust, scalable, and ethically aligned implementations. Full article

(This article belongs to the Special Issue AI-Driven Engineering Optimization)

43 pages, 9628 KB

Open AccessArticle

Comparative Analysis of R-CNN and YOLOv8 Segmentation Features for Tomato Ripening Stage Classification and Quality Estimation

by Ali Ahmad, Jaime Lloret, Lorena Parra, Sandra Sendra and Francesco Di Gioia

Horticulturae 2026, 12(2), 127; https://doi.org/10.3390/horticulturae12020127 - 23 Jan 2026

Viewed by 82

Abstract

Accurate classification of tomato ripening stages and quality estimation is pivotal for optimizing post-harvest management and ensuring market value. This study presents a rigorous comparative analysis of morphological and colorimetric features extracted via two state-of-the-art deep learning-based instance segmentation frameworks—Mask R-CNN and YOLOv8n-seg—and their efficacy in machine learning-driven ripening stage classification and quality prediction. Using 216 fresh-market tomato fruits across four defined ripening stages, we extracted 27 image-derived features per model, alongside 12 laboratory-measured physio-morphological traits. Multivariate analyses revealed that R-CNN features capture nuanced colorimetric and structural variations, while YOLOv8 emphasizes morphological characteristics. Machine learning classifiers trained with stratified 10-fold cross-validation achieved up to 95.3% F1-score when combining both feature sets, with R-CNN and YOLOv8 alone attaining 96.9% and 90.8% accuracy, respectively. These findings highlight a trade-off between the superior precision of R-CNN and the real-time scalability of YOLOv8. Our results demonstrate the potential of integrating complementary segmentation-derived features with laboratory metrics to enable robust, non-destructive phenotyping. This work advances the application of vision-based machine learning in precision agriculture, facilitating automated, scalable, and accurate monitoring of fruit maturity and quality. Full article

(This article belongs to the Special Issue Sustainable Practices in Smart Greenhouses)

45 pages, 1326 KB

Open AccessArticle

Cross-Domain Deep Reinforcement Learning for Real-Time Resource Allocation in Transportation Hubs: From Airport Gates to Seaport Berths

by Zihao Zhang, Qingwei Zhong, Weijun Pan, Yi Ai and Qian Wang

Aerospace 2026, 13(1), 108; https://doi.org/10.3390/aerospace13010108 - 22 Jan 2026

Viewed by 31

Abstract

Efficient resource allocation is critical for transportation hub operations, yet current scheduling systems require substantial domain-specific customization when deployed across different facilities. This paper presents a domain-adaptive deep reinforcement learning (DADRL) framework that learns transferable optimization policies for dynamic resource allocation across structurally similar transportation scheduling problems. The framework integrates dual-level heterogeneous graph attention networks for separating constraint topology from domain-specific features, hypergraph-based constraint modeling for capturing high-order dependencies, and hierarchical policy decomposition that reduces computational complexity from

O (m n T)

O (m + n + T)

. Evaluated on realistic simulators modeling airport gate assignment (Singapore Changi: 50 gates, 300–400 daily flights) and seaport berth allocation (Singapore Port: 40 berths, 80–120 daily vessels), DADRL achieves 87.3% resource utilization in airport operations and 86.3% in port operations, outperforming commercial solvers under strict real-time constraints (Gurobi-MIP with 300 s time limit: 85.1%) while operating 270 times faster (1.1 s versus 298 s per instance). Given unlimited time, Gurobi achieves provably optimal solutions, but DADRL reaches 98.7% of this optimum in 1.1 s, making it suitable for time-critical operational scenarios where exact solvers are computationally infeasible. Critically, policies trained exclusively on airport scenarios retain 92.4% performance when applied to ports without retraining, requiring only 800 adaptation steps compared to 13,200 for domain-specific training. The framework maintains 86.2% performance under operational disruptions and scales to problems three times larger than training instances with only 7% degradation. These results demonstrate that learned optimization principles can generalize across transportation scheduling problems sharing common constraint structures, enabling rapid deployment of AI-based scheduling systems across multi-modal transportation networks with minimal customization and reduced implementation costs. Full article

(This article belongs to the Special Issue Emerging Trends in Air Traffic Flow and Airport Operations Control)

►▼ Show Figures

Figure 1

25 pages, 4209 KB

Open AccessArticle

Stability-Oriented Deep Learning for Hyperspectral Soil Organic Matter Estimation

by Yun Deng and Yuxi Shi

Sensors 2026, 26(2), 741; https://doi.org/10.3390/s26020741 (registering DOI) - 22 Jan 2026

Viewed by 21

Abstract

Soil organic matter (SOM) is a key indicator for evaluating soil fertility and ecological functions, and hyperspectral technology provides an effective means for its rapid and non-destructive estimation. However, in practical soil systems, the spectral response of SOM is often highly covariant with mineral composition, moisture conditions, and soil structural characteristics. Under small-sample conditions, hyperspectral SOM modeling results are usually highly sensitive to spectral preprocessing methods, sample perturbations, and model architecture and parameter configurations, leading to fluctuations in predictive performance across independent runs and thereby limiting model stability and practical applicability. To address these issues, this study proposes a multi-strategy collaborative deep learning modeling framework for small-sample conditions (SE-EDCNN-DA-LWGPSO). Under unified data partitioning and evaluation settings, the framework integrates spectral preprocessing, data augmentation based on sensor perturbation simulation, multi-scale dilated convolution feature extraction, an SE channel attention mechanism, and a linearly weighted generalized particle swarm optimization algorithm. Subtropical red soil samples from Guangxi were used as the study object. Samples were partitioned using the SPXY method, and multiple independent repeated experiments were conducted to evaluate the predictive performance and training consistency of the model under fixed validation conditions. The results indicate that the combination of Savitzky–Golay filtering and first-derivative transformation (SG–1DR) exhibits superior overall stability among various preprocessing schemes. In model structure comparison and ablation analysis, as dilated convolution, data augmentation, and channel attention mechanisms were progressively introduced, the fluctuations of prediction errors on the validation set gradually converged, and the performance dispersion among different independent runs was significantly reduced. Under ten independent repeated experiments, the final model achieved R² = 0.938 ± 0.010, RMSE = 2.256 ± 0.176 g·kg⁻¹, and RPD = 4.050 ± 0.305 on the validation set, demonstrating that the proposed framework has good modeling consistency and numerical stability under small-sample conditions. Full article

(This article belongs to the Section Environmental Sensing)

►▼ Show Figures

Figure 1

26 pages, 4614 KB

Open AccessArticle

CHARMS: A CNN-Transformer Hybrid with Attention Regularization for MRI Super-Resolution

by Xia Li, Haicheng Sun and Tie-Qiang Li

Sensors 2026, 26(2), 738; https://doi.org/10.3390/s26020738 (registering DOI) - 22 Jan 2026

Viewed by 17

Abstract

Magnetic resonance imaging (MRI) super-resolution (SR) enables high-resolution reconstruction from low-resolution acquisitions, reducing scan time and easing hardware demands. However, most deep learning-based SR models are large and computationally heavy, limiting deployment in clinical workstations, real-time pipelines, and resource-restricted platforms such as low-field and portable MRI. We introduce CHARMS, a lightweight convolutional–Transformer hybrid with attention regularization optimized for MRI SR. CHARMS employs a Reverse Residual Attention Fusion backbone for hierarchical local feature extraction, Pixel–Channel and Enhanced Spatial Attention for fine-grained feature calibration, and a Multi-Depthwise Dilated Transformer Attention block for efficient long-range dependency modeling. Novel attention regularization suppresses redundant activations, stabilizes training, and enhances generalization across contrasts and field strengths. Across IXI, Human Connectome Project Young Adult, and paired 3T/7T datasets, CHARMS (~1.9M parameters; ~30 GFLOPs for 256 × 256) surpasses leading lightweight and hybrid baselines (EDSR, PAN, W2AMSN-S, and FMEN) by 0.1–0.6 dB PSNR and up to 1% SSIM at ×2/×4 upscaling, while reducing inference time ~40%. Cross-field fine-tuning yields 7T-like reconstructions from 3T inputs with ~6 dB PSNR and 0.12 SSIM gains over native 3T. With near-real-time performance (~11 ms/slice, ~1.6–1.9 s per 3D volume on RTX 4090), CHARMS offers a compelling fidelity–efficiency balance for clinical workflows, accelerated protocols, and portable MRI. Full article

(This article belongs to the Special Issue Sensing Technologies in Digital Radiology and Image Analysis)

26 pages, 2272 KB

Open AccessArticle

A Reinforcement Learning Approach for Automated Crawling and Testing of Android Apps

by Chien-Hung Liu, Shu-Ling Chen and Kun-Cheng Chan

Appl. Sci. 2026, 16(2), 1093; https://doi.org/10.3390/app16021093 - 21 Jan 2026

Viewed by 76

Abstract

With the growing global popularity of Android apps, ensuring their quality and reliability has become increasingly important, as low-quality apps can lead to poor user experiences and potential business losses. A common approach to testing Android apps involves automatically generating event sequences that interact with the app’s graphical user interface (GUI) to detect crashes. To support this, we developed ACE (Android Crawler), a tool that systematically generates events to test Android apps by automatically exploring their GUIs. However, ACE’s original heuristic-driven exploration can be inefficient in complex application states. To address this, we extend ACE with a deep reinforcement learning-based crawling strategy, called Reinforcement Learning Strategy (RLS), which tightly integrates with ACE’s GUI exploration process by learning to intelligently select GUI components and interaction actions. RLS leverages the Proximal Policy Optimization (PPO) algorithm for stable and efficient learning and incorporates an action mask to filter invalid actions, thereby reducing training time. We evaluate RLS on 15 real-world Android apps and compare its performance against the original ACE and three state-of-the-art Android testing tools. Results show that RLS improves code coverage by an average of 2.1% over ACE’s Nearest unvisited event First Search (NFS) strategy and outperforms all three baseline tools in terms of code coverage. Paired t-test analyses further confirm that these improvements are statistically significant, demonstrating its effectiveness in enhancing automated Android GUI testing. Full article

(This article belongs to the Topic Electronic Communications, IOT and Big Data, 2nd Volume)

►▼ Show Figures

Figure 1

17 pages, 1555 KB

Open AccessArticle

Path Planning in Sparse Reward Environments: A DQN Approach with Adaptive Reward Shaping and Curriculum Learning

by Hongyi Yang, Bo Cai and Yunlong Li

Algorithms 2026, 19(1), 89; https://doi.org/10.3390/a19010089 - 21 Jan 2026

Viewed by 166

Abstract

Deep reinforcement learning (DRL) has shown great potential in path planning tasks. However, in sparse reward environments, DRL still faces significant challenges such as low training efficiency and a tendency to converge to suboptimal policies. Traditional reward shaping methods can partially alleviate these issues, but they typically rely on hand-crafted designs, which often introduce complex reward coupling, make hyperparameter tuning difficult, and limit generalization capability. To address these challenges, this paper proposes Curriculum-guided Learning with Adaptive Reward Shaping for Deep Q-Network (CLARS-DQN), a path planning algorithm that integrates Adaptive Reward Shaping (ARS) and Curriculum Learning (CL). The algorithm consists of two key components: (1) ARS-DQN, which augments the DQN framework with a learnable intrinsic reward function to reduce reward sparsity and dependence on expert knowledge; and (2) a curriculum strategy that guides policy optimization through a staged training process, progressing from simple to complex tasks to enhance generalization. Training also incorporates Prioritized Experience Replay (PER) to improve sample efficiency and training stability. CLARS-DQN outperforms baseline methods in task success rate, path quality, training efficiency, and hyperparameter robustness. In unseen environments, the method improves task success rate and average path length by 12% and 26%, respectively, demonstrating strong generalization. Ablation studies confirm the critical contribution of each module. Full article

►▼ Show Figures

Figure 1

21 pages, 8669 KB

Open AccessArticle

LLM4FB: A One-Sided CSI Feedback and Prediction Framework for Lightweight UEs via Large Language Models

by Xinxin Xie, Xinyu Ning, Yitong Liu, Hanning Wang, Jing Jin and Hongwen Yang

Sensors 2026, 26(2), 691; https://doi.org/10.3390/s26020691 - 20 Jan 2026

Viewed by 112

Abstract

Massive MIMO systems can substantially enhance spectral efficiency, but such gains rely on the availability of accurate channel state information (CSI). However, the increase in the number of antennas leads to a significant growth in feedback overhead, while conventional deep-learning-based CSI feedback methods also impose a substantial computational burden on the user equipment (UE). To address these challenges, this paper proposes LLM4FB, a one-sided CSI feedback framework that leverages a pre-trained large language model (LLM). In this framework, the UE performs only low-complexity linear projections to compress CSI. In contrast, the BS leverages a pre-trained LLM to accurately reconstruct and predict CSI. By utilizing the powerful modeling capabilities of the pre-trained LLM, only a small portion of the parameters needs to be fine-tuned to improve CSI recovery accuracy with low training cost. Furthermore, a multiobjective loss function is designed to simultaneously optimize normalized mean square error (NMSE) and spectral efficiency (SE). Simulation results show that LLM4FB outperforms existing methods across various compression ratios and mobility levels, achieving high-precision CSI feedback with minimal computational capability from terminal devices. Therefore, LLM4FB presents a highly promising solution for next-generation wireless sensor networks and industrial IoT applications, where terminal devices are often strictly constrained by energy and hardware resources. Full article

(This article belongs to the Section Communications)

►▼ Show Figures

Figure 1

23 pages, 4564 KB

Open AccessArticle

Control of Wave Energy Converters Using Reinforcement Learning

by Odai R. Bani Hani, Zeiad Khafagy, Matthew Staber, Ashraf Gaffar and Ossama Abdelkhalik

J. Mar. Sci. Eng. 2026, 14(2), 211; https://doi.org/10.3390/jmse14020211 - 20 Jan 2026

Viewed by 177

Abstract

Efficient control of wave energy converters (WECs) is crucial for maximizing energy capture and reducing the Levelized Cost of Energy (LCoE). In this study, we employ a deep reinforcement learning (DRL) framework based on the Soft Actor-Critic (SAC) and Deep Deterministic Policy Gradient (DDPG) algorithms for WEC control. Our approach leverages a novel decoupled co-simulation architecture, training agents episodically in MATLAB to export a robust policy within the WEC-Sim environment. Furthermore, we utilize a rigorous benchmarking protocol to compare the SAC and DDPG agents against a classical Bang-Singular-Bang (BSB) optimal control benchmark. Evaluation under realistic, irregular Pierson-Moskowitz sea states demonstrates that the performance of the RL agents is very close to that of the BSB optimal control baseline. Monte Carlo simulations show that both the DDPG and SAC agents can perform even better than the BSB when the model of the BSB is different from the simulation environment. Full article

(This article belongs to the Special Issue Advanced Computational Intelligence and Machine Learning Methods in Marine Renewable Energy)

►▼ Show Figures

Figure 1

25 pages, 3073 KB

Open AccessArticle

A Two-Stage Intelligent Reactive Power Optimization Method for Power Grids Based on Dynamic Voltage Partitioning

by Tianliang Xue, Xianxin Gan, Lei Zhang, Su Wang, Qin Li and Qiuting Guo

Electronics 2026, 15(2), 447; https://doi.org/10.3390/electronics15020447 - 20 Jan 2026

Viewed by 69

Abstract

Aiming at issues such as reactive power distribution fluctuations and insufficient local support caused by large-scale integration of renewable energy in new power systems, as well as the poor adaptability of traditional methods and bottlenecks of deep reinforcement learning in complex power grids, a two-stage intelligent optimization method for grid reactive power based on dynamic voltage partitioning is proposed. Firstly, a comprehensive indicator system covering modularity, regulation capability, and membership degree is constructed. Adaptive MOPSO is employed to optimize K-means clustering centers, achieving dynamic grid partitioning and decoupling large-scale optimization problems. Secondly, a Markov Decision Process model is established for each partition, incorporating a penalty mechanism for safety constraint violations into the reward function. The DDPG algorithm is improved through multi-experience pool probabilistic replay and sampling mechanisms to enhance agent training. Finally, an optimal reactive power regulation scheme is obtained through two-stage collaborative optimization. Simulation case studies demonstrate that this method effectively reduces solution complexity, accelerates convergence, accurately addresses reactive power dynamic distribution and local support deficiencies, and ensures voltage security and optimal grid losses. Full article

(This article belongs to the Topic Advanced Strategies for Smart Grid Reliability and Energy Optimization)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 60.

Go to page 1 2 3 4 5

Search Results (2,989)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI