Next Article in Journal
Bioprospecting Native Oleaginous Microalgae for Wastewater Nutrient Remediation and Lipid Production: An Environmentally Sustainable Approach
Previous Article in Journal
Serial Mediation Effects of Driver Fatigue and Cognitive Impairment on the Relationship Between Occupational Stressors and Wellbeing Among Commercial Truck Drivers: A PLS-SEM Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A DQN-Based Intelligent Voltage Control Framework for Enhancing Renewable Integration and Energy Sustainability in Wind-Penetrated Distribution Networks

by
Ramesh Kumar Behara
and
Akshay Kumar Saha
*
Electrical, Electronic, and Computer Engineering, University of KwaZulu-Natal, Durban 4041, South Africa
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(24), 11164; https://doi.org/10.3390/su172411164
Submission received: 26 November 2025 / Accepted: 8 December 2025 / Published: 12 December 2025

Abstract

The increasing penetration of renewable energy resources is central to global sustainability and decarbonisation goals, yet it introduces intermittency and voltage instability in modern distribution networks. Ensuring stable operation while maximising renewable utilisation is critical for achieving long-term energy sustainability, reduced carbon emissions, and efficient grid performance. This study proposes a sustainability-oriented, Reinforcement Learning (RL)-driven voltage control framework that enables reliable and energy-efficient operation of wind-integrated distribution systems. A Deep Q-Network (DQN) agent formulates voltage regulation as a Markov Decision Process (MDP) and autonomously learns optimal control policies for on-load tap changers (OLTCs) and capacitor banks under highly variable wind and load conditions. Using the IEEE 33-bus test system with realistic stochastic wind and ZIP-load models, the results show that the proposed controller maintains voltages within statutory limits, reduces total active power losses by up to 18%, and enhances the network’s capacity to host renewable energy. These improvements translate to increased energy efficiency, reduced technical losses, and greater operational resilience, key enablers of sustainable energy distribution. The findings demonstrate that intelligent RL-based frameworks offer a scalable and model-free tool for advancing sustainable, low-carbon, and resilient power systems.

1. Introduction

Rapid growth in renewable energy, especially wind and solar, has transformed distribution networks from passive, demand-following systems into active, stochastic grids. While this transition supports decarbonization targets, it also introduces pronounced intermittency and uncertainty that degrade voltage profiles, increase technical losses, and complicate protection coordination at the feeder level [1,2,3]. Wind energy has emerged as one of the most promising renewable sources to meet the growing global demand for clean and sustainable electricity [4,5]. Among various wind generator technologies, the Doubly Fed Induction Generator (DFIG) stands out as the preferred option for large-scale wind farms due to its capability to operate efficiently across a broad range of wind speeds [6]. The DFIG configuration enables independent control of active and reactive power, enhancing system flexibility and voltage stability. Furthermore, it requires only about 25–30% of the converter capacity compared to fully rated converter systems, which significantly reduces capital costs, converter losses, and overall system complexity [7]. These advantages make DFIG-based wind turbines a cost-effective and high-performance solution for grid-connected wind energy applications, promoting their widespread adoption in modern renewable-integrated power systems [8,9].
In wind energy conversion systems, the back-to-back converter, consisting of the Rotor-Side Converter (RSC) and Grid-Side Converter (GSC), is crucial for ensuring stable power transfer, regulating the DC-link voltage, and maintaining reliable grid connectivity [10,11]. These converters commonly utilise Proportional–Integral (PI) controllers due to their simple structure and fast transient response [12]. However, the effectiveness of PI controllers depends heavily on precise parameter tuning. When tuned through conventional methods, their performance often deteriorates under the nonlinear, high-dimensional, and time-varying dynamics characteristic of DFIG-based systems. Consequently, such controllers may experience efficiency degradation, slower transient recovery, or instability during wind speed fluctuations, grid faults, and component ageing [13,14].
Traditional voltage regulation strategies, such as rule-based Volt/VAR control, Optimal Power Flow (OPF), and metaheuristic optimisation algorithms, including Simulated Annealing (SA) [15], Genetic Algorithm (GA) [16], and Particle Swarm Optimisation (PSO) [17], have provided valuable benchmarks but exhibit fundamental constraints in dynamic grid environments. Rule-based schemes rely on static threshold settings and are unable to anticipate rapid changes in voltage. OPF approaches demand accurate system models and frequent re-optimisation. In contrast, metaheuristic methods yield one-time optimal set points that lack closed-loop adaptability, rendering them inefficient to retune when system conditions evolve [18,19,20,21,22]. Furthermore, these conventional methods are inherently reactive, addressing voltage violations only after they occur, rather than proactively mitigating them in real time. In practice, rapid fluctuations in wind power and load drive spatiotemporal voltage excursions, forcing operators to frequently actuate OLTCs and capacitor banks to remain within statutory limits, often with delayed or suboptimal responses under conventional schemes [23,24,25].
Recent progress in RL offers a complementary path: model-free, data-driven control that learns policies directly from interactions with a simulated or digital twin environment and then executes real-time decisions with millisecond inference latency [26]. Early studies show that DQNs and related variants can regulate distribution-level voltages and reactive power more effectively than static heuristics under uncertainty, while accommodating nonlinearity, topology changes, and partial modelling errors [27]. Extensions to safe RL and graph/topology-aware encoders further improve constraint satisfaction and sample efficiency, addressing practical concerns for deployment in safety-critical grids [28,29]. Nevertheless, significant gaps persist: (i) many works emphasise photovoltaic-dominant feeders rather than wind-rich networks with different variability spectra; (ii) reward functions often ignore actuation costs and switching wear; (iii) evaluations rarely include robustness tests (sensor noise, unseen wind regimes) or scalability across multiple benchmark feeders; and (iv) interpretability of RL decisions remains limited, hindering operator trust and integration with SCADA/EMS workflows [27,28,29,30].
This paper proposes an Intelligent Reinforcement Learning Controller that frames distribution-level voltage regulation as an MDP and trains a DQN agent to coordinate OLTC tap steps and capacitor bank switching under stochastic wind injections. The controller observes feeder states (bus voltages, reactive power flows, device positions, and wind power), selects discrete actions, and receives rewards that penalise voltage deviation and active power losses while optionally accounting for actuation effort. The training loop runs in a co-simulation environment (e.g., MATLAB/Simulink 2022b), exposing the agent to diverse wind/load trajectories, noise measurements, and device limits to learn generalizable policies. Once trained, the policy executes in real time, enabling proactive, closed-loop regulation without the need for repeated optimisation solutions.

1.1. Research Motivation

The increasing integration of renewable energy sources, particularly wind power, has transformed the operational dynamics of modern distribution networks. While this transition supports global decarbonization targets, it also introduces high stochasticity and intermittency, posing challenges for maintaining voltage stability, reactive power balance, and power quality. Traditional control methods, such as rule-based Volt/VAR schemes and optimisation-based voltage regulation, are reactive in nature, relying on offline computations and pre-tuned parameters that fail to adapt to the rapid, nonlinear fluctuations induced by renewable variability [20,30]. As distribution grids evolve into cyber–physical energy systems, there is an urgent need for autonomous, learning-based controllers that can adapt in real time and make decisions without explicit system models.
Recent progress in artificial intelligence (AI) and RL provides an opportunity to address these limitations. Unlike static optimisation or heuristic algorithms, RL enables model-free adaptive control, where an agent learns to make optimal decisions through continuous interaction with the environment [1,2]. Specifically, DQNs can approximate complex value functions and derive optimal reactive power and tap-changing policies in high-dimensional, uncertain systems [24,26]. However, most existing RL-based power control studies are either confined to small-scale microgrids or focus on transmission-level dispatch, leaving a research gap in distribution-level voltage control for wind-integrated feeders [1,21].
Existing voltage control frameworks often lack interpretability, safety constraints, and scalability, features essential for real-world grid deployment. As modern distribution networks become increasingly decentralised, integrating multiple reactive power devices such as OLTCs, capacitor banks, and STATCOMs, there is a growing need for a coordinated and explainable reinforcement learning (RL) strategy that ensures both reliability and transparency for operators [31]. To address these challenges, this study introduces an Intelligent Reinforcement Learning Controller (IRLC) to enhance dynamic voltage stability in wind-integrated distribution systems. The IRLC learns adaptive control policies through model-free interaction with the grid, coordinates multiple voltage regulation devices to minimise deviations and losses, incorporates safety and interpretability through stable reward design, and ensures scalability under stochastic wind and load conditions. By integrating deep RL with data-driven intelligence, the proposed framework establishes a foundation for autonomous, resilient, and transparent control in renewable-rich smart grids.

1.2. Research Gap

Despite significant progress in renewable energy integration and intelligent grid control, several critical research gaps persist in achieving adaptive, real-time voltage regulation for distribution systems rich in renewable energy.
First, most existing voltage control strategies rely on static or model-dependent approaches, such as OPF, sensitivity-based compensation, or metaheuristic optimisation [20,21]. While these techniques can optimise reactive power dispatch under steady-state conditions, they struggle to adapt to rapid and stochastic fluctuations in wind generation and load demand [27,30]. Their reliance on precise mathematical models and repeated re-optimisation makes them unsuitable for online, data-driven control in systems with nonlinear and time-varying dynamics [29].
Second, the application of machine learning (ML) and deep learning (DL) in distribution system management has primarily focused on forecasting tasks, such as wind or load prediction, rather than control or decision-making [1,20,32]. Supervised learning methods require large labelled datasets and cannot autonomously adapt to unseen operating conditions or contingencies [22,27]. Consequently, such models fail to handle dynamic interactions between wind variability, reactive power compensation, and voltage stability in real time.
Third, although RL has emerged as a promising solution for adaptive control in smart grids, most studies remain limited to microgrids or transmission-level problems, such as economic dispatch and demand response [2,28]. Only a few works have applied single-agent RL for voltage regulation in distribution feeders, and even fewer have specifically addressed wind-dominated networks [21]. Furthermore, existing RL-based voltage control studies often (i) use simplified reward functions that overlook power losses and switching costs, (ii) lack coordinated operation between multiple reactive power devices (OLTCs, capacitor banks, STATCOMs), and (iii) provide limited validation on benchmark IEEE test feeders with realistic wind data [1,3].
Fourth, the interpretability and safety of RL decisions remain underexplored. Most studies treat the RL policy as a black box without incorporating Explainable AI (XAI) or safe learning mechanisms, which are essential for deployment in operational environments [33,34]. The absence of such features limits operator trust and makes real-world implementation challenging, as constraint violations during training or execution can lead to unacceptable voltage deviations or device wear [31].
Finally, few studies evaluate robustness and scalability across different network configurations, measurement uncertainties, or unseen weather conditions. Without these validations, existing models may perform well in simulations but fail under real-world uncertainty and network variability [35,36]. Addressing these limitations requires a framework that unifies adaptive control, robustness, and explainability, bridging the gap between theoretical learning models and practical grid operation.
Therefore, this research introduces an intelligent, reinforcement learning-based voltage control framework designed to operate in renewable-rich, wind-integrated distribution networks. The proposed model offers model-free adaptivity, multi-device coordination, and interpretable decision support, providing a significant advancement toward autonomous, explainable, and resilient voltage regulation in smart power systems. Table 1 presents a summary of research gaps in existing studies on voltage regulation.

1.3. Research Contribution

This research presents an Intelligent Reinforcement Learning Controller (IRLC) for adaptive and explainable voltage regulation in wind-integrated distribution networks. Leveraging Deep Reinforcement Learning (DRL), the proposed approach enables real-time, model-free decision-making to maintain voltage stability amid renewable energy uncertainty. The key contributions are as follows: the study formulates voltage control as an MDP and develops a DQN-based controller that autonomously coordinates OLTC and capacitor operations, adapting continuously to stochastic wind and load variations without explicit models. The framework integrates realistic wind dynamics using the IEEE 33-bus test system with DFIG-based turbines to ensure practical validation. A composite reward function balances voltage deviation minimisation and power loss reduction, improving learning stability and energy efficiency. The study benchmarks the proposed controller against PI, MPC, rule-based, PSO, and hybrid RL–PSO methods, showing superior convergence, voltage regulation, and lower losses. Furthermore, the framework incorporates Explainable AI (XAI) via SHAP analysis to interpret RL decisions and includes safety constraints to ensure secure operation. Finally, the modular design supports SCADA integration and scalability for multi-feeder smart grids, advancing the pathway toward autonomous, self-learning energy systems.

1.4. Sustainability Motivation and Renewable Integration Needs

The integration of renewable energy resources, particularly wind power, is not only a technological progression but also a core pillar of global sustainable development efforts. As countries work toward decarbonisation and long-term energy security, enhancing the hosting capacity of distribution networks becomes critical, and maintaining voltage stability is a key enabler for accommodating higher levels of renewable penetration. Unstable voltage conditions limit renewable uptake and force grid operators to curtail clean energy, undermining sustainability objectives. Furthermore, improving network efficiency by minimising technical losses directly supports the targets of Sustainable Development Goal 7, which emphasises reliable, efficient, and environmentally responsible electricity supply. By enabling stable, loss-reduced, and renewable-friendly grid operation, advanced intelligent control frameworks such as the proposed DQN-based approach contribute meaningfully to the transition toward sustainable, low-carbon energy systems.

2. Background Work-Reviewed Comparative Control Methods

To comprehensively validate the effectiveness of the proposed DQN, based on an intelligent voltage and reactive power control framework, six benchmark controllers were implemented and tested under identical wind, load, and network conditions. These controllers represent the principal paradigms used in distribution system control, from classical deterministic methods to advanced intelligent optimisation schemes. Each technique is summarised and analysed below, with references to recent studies [37,38,39,40,41,42,43,44,45,46], that highlight their capabilities and limitations in renewable-rich smart grids.

2.1. Linear Control (PI-Based)

The PI controller represents the traditional linear control strategy employed in most utility-operated OLTC and reactive power management systems. It adjusts the control signal based on the proportional error, and it is integral to maintain the voltage near a reference value. While PI control offers simplicity, reliability, and low computational demand, its efficiency heavily depends on accurate parameter tuning and assumes quasi-static operating conditions. Under nonlinear or time-varying scenarios, such as fluctuating wind generation, its response becomes sluggish or oscillatory due to fixed gain settings.
Recent literature [37,38] reports that while PI controllers perform adequately in low-disturbance environments, they often fail to maintain voltage quality in dynamic grids where renewable output rapidly varies. Thus, PI control is best suited for steady-state voltage regulation rather than adaptive smart grid operation.

2.2. Nonlinear Control (Model Predictive Control, MPC)

Model Predictive Control (MPC) utilises predictive optimisation to maintain voltages within desired limits by forecasting system behaviour over a finite time horizon. It minimises a multi-objective cost function that typically includes voltage deviation, switching frequency, and reactive power mismatch. The MPC framework utilises system models, derived from linearised power flow equations, to forecast future states. It then computes control actions that minimise the cost function over the prediction window. However, MPC’s dependency on accurate modelling and parameter estimation limits its adaptability in the face of uncertain and nonlinear dynamics caused by renewable intermittency.
Studies such as [39,40,41,42] demonstrate that while MPC provides precise short-term control, its computational burden and model sensitivity make it less suitable for large-scale real-time applications with fast wind variability or high stochasticity. Additionally, frequent re-optimisation increases latency, which may result in suboptimal voltage recovery.

2.3. Rule-Based Controller (RBC)

The Rule-Based Control (RBC) strategy implements a heuristic Volt/VAR control logic, where fixed threshold values dictate OLTC and capacitor switching operations. The rules are designed based on expert knowledge or historical system data, for example, increasing the OLTC tap when voltage drops below 0.95 p.u., or switching in a capacitor bank when reactive demand exceeds a set limit. While RBC methods are easy to implement and require minimal computation, they lack adaptability to real-time disturbances. The fixed thresholds fail to accommodate dynamic grid behaviour, often resulting in delayed, oscillatory, or redundant control actions during rapid load or generation transitions.
Recent works [43,44] show that rule-based approaches perform satisfactorily in static environments but suffer from excessive OLTC operations and reduced power quality in systems with high renewable penetration. Their deterministic nature limits their learning and self-correction capabilities, making them less efficient under uncertain conditions.

2.4. PSO-Machine Model Control

PSO-based control utilises swarm intelligence to optimise reactive power dispatch and OLTC settings, thereby minimising voltage deviation and total active power losses. The algorithm iteratively updates the position of each “particle” (candidate solution) based on its own and its neighbours’ best performances. PSO achieves near-global optimality for static operating points and is effective in multi-objective voltage and VAR optimisation. However, its reliance on iterative re-computation limits real-time applications, as the optimisation must restart for each new operating condition. Moreover, PSO lacks dynamic adaptability; it cannot continuously learn or adjust to rapid changes in wind power without re-initialisation.
The study [15] applied PSO widely to renewable energy control problems due to its fast convergence and simplicity. Although hybridisation with machine learning improves its adaptivity, PSO alone remains computationally intensive and reactive rather than proactive. In the context of DFIG systems, PSO-tuned controllers have shown improvements in energy capture and stability under varying wind conditions. Nevertheless, PSO suffers from premature convergence, often becoming trapped in local minima. This limits its ability to sustain optimal control under highly dynamic conditions such as wind gusts or grid faults [13,20].

2.5. Hybrid RL–PSO Control

The Hybrid Reinforcement Learning–Particle Swarm Optimisation (RL–PSO) method combines the adaptability of RL with the optimisation efficiency of PSO. In this approach, the RL agent learns an approximate control policy, while PSO fine-tunes its parameters or action space to accelerate convergence toward near-optimal solutions. This hybrid structure enhances adaptivity compared to standalone PSO or rule-based methods, resulting in faster learning and reduced voltage deviation. However, the RL–PSO approach suffers from increased computational overhead and slower inference time, particularly in large-scale systems with high dimensionality.
Recent comparative studies [45,46] confirm that RL–PSO controllers outperform classical and metaheuristic approaches in maintaining voltage profiles and minimising losses. Still, their real-time feasibility remains limited by the iterative optimisation embedded in PSO. Thus, while hybridisation enhances learning capability, it lacks the complete autonomy and scalability of deep reinforcement learning frameworks.

2.6. Proposed DQN-Based Control

The DQN controller represents the next generation of intelligent voltage and reactive power control systems. By integrating reinforcement learning with deep neural function approximation, DQNs autonomously learn optimal control policies through interactions with the grid environment. The agent observes system states, such as bus voltages, reactive power injections, and OLTC/capacitor statuses, and selects actions that maximise cumulative rewards.
Key features of DQNs include experience replay, which improves learning stability, and a target network, which mitigates oscillations in policy updates. Unlike PI and MPC, DQNs require no explicit system model; they learn directly from real-time data and adapts to nonstationary operating conditions.
Recent research [47,48,49] demonstrates that DQN-based controllers outperform both traditional and hybrid methods by providing smoother voltage regulation, lower energy losses, and faster transient recovery under stochastic renewable scenarios. In this study, the DQN agent successfully coordinated OLTC tap adjustments and capacitor switching, achieving the lowest Voltage Deviation Index (0.005 p.u.), minimum total losses (145 kW), and highest energy efficiency among all tested approaches.
Table 2 summarises the key comparative insights across all six control strategies. It is evident that while conventional methods (PI, MPC, RBC) are limited by fixed logic or model dependency, metaheuristic and hybrid schemes (PSO, RL–PSO) improve adaptivity but at the cost of higher computational effort. The proposed DQN-based control framework effectively balances adaptivity, scalability, and computational efficiency, making it a robust and data-driven solution for real-time voltage and reactive power management in renewable-integrated smart grids.

3. Modelling and Controlling of DFIG

3.1. Power Control

The DFIG used in wind turbines enables stable electrical power generation at a constant grid frequency. Figure 1 [50] illustrates how the induction machine operates based on the interaction between the stator and rotor magnetomotive forces (MMFs). The stator winding currents create a magnetic field that rotates synchronously with the grid frequency, inducing currents in the rotor circuit in turn. These rotor currents generate a secondary magnetic field that interacts with the stator field. The resulting rotor MMF rotates at the slip frequency, defined as the difference between the synchronous speed and the mechanical rotor speed [51]. This interaction enables the DFIG to exchange both active and reactive power with the grid efficiently, ensuring frequency stability under variable wind conditions.
ω s l i p = ω m m f r o t o r = ω m m f s t a t o r ω r o t o r
The rotor speed remains slightly below the rotational speed of the stator magnetic field. Here, ω s l i p represents the slip frequency, which corresponds to the frequency of the rotor-side current and voltage; ω m m f s t a t o r denotes the stator MMF angular frequency (equal to the grid frequency) in radians per second; and ω r o t o r refers to the rotor’s mechanical angular frequency in radians per second, defined as the product of the mechanical speed and the number of magnetic pole pairs.
Figure 2 [52] illustrates the equivalent circuit of the DFIG in the d–q reference frame, facilitating the dynamic analysis of electrical interactions among the stator, rotor, and converters. The study [52] modelled the DFIG’s behaviour in the rotating reference frame by formulating the voltage, flux linkage, and power equations for both the rotor-side converter (RSC) and the grid-side converter (GSC). This representation enables precise evaluation of active and reactive power exchange, torque generation, and voltage regulation under variable wind speeds. The complete set of governing equations describing the DFIG voltage, flux, and electromagnetic interactions in the d q reference frame can thus be formulated according to standard vector control theory [53].
V d s = R s I d s + φ d s ˙ ω s φ q s
V q s = R s I q s + φ q s ˙ ω s φ d s
V d r = R r I d r + φ d r ˙ ω r φ q r
V q r = R r I q r + φ q r ˙ ω r φ d r
φ d s = L s I d s + L m I d r
φ q s = L s I q s + L m I q r
φ d r = L r I d r + L m I d s
φ q r = L r I q r + L m I q s
In this model, R s and R r denote the stator and rotor resistances, respectively, while L s and L r represent the self-inductances of the stator and rotor windings. The mutual inductance between the stator and rotor circuits is denoted by L m , which governs the electromagnetic coupling essential for torque production. The variables, V d s , V q s , I d s , I q s , correspond to the stator side voltage and current components, whereas, V d r , V q r , I d r and I q r represent the rotor-side quantities, all expressed within the d–q Park reference frame [54]. The per-unit electromagnetic torque ( T e ) in this frame is formulated as a function of the stator and rotor current interactions, as described in [53].
T e = φ d s I q s φ q s I d s = φ q r I d r φ d r I q r = L m I q s I d r I d s I q r
By considering only the resistive effects of the stator and rotor, the active and reactive power components on the stator side of the DFIG can be expressed as follows [53]:
P s = 3 2 V d s I d s + V q s I q s
Q s = 3 2 V q s I d s V d s I q s
The active and reactive rotor powers of the DFIG are:
P r = 3 2 V d r I d r + V q r I q r
Q r = 3 2 V q r I d r V d r I q r
Rewriting the system Equations (11)–(14) to account for rotating frames [50]:
P T = P s + P r = 3 2 V q r ˙ I ˙ q r + V d r ˙ I ˙ d r + V d s I d s + V q s I q s
Q T = Q s + Q r = 3 2 V q r ˙ I ˙ q r V d r ˙ I ˙ d r + V d s I d s V q s I q s
The reactive power of the stator and the electromagnetic torque, which serve as the primary control objectives for the RSC, can be formulated as shown in [55]. Here, I q s and I q r denote the q-axis components of the stator and rotor currents, respectively, while I d s and I d r represent their corresponding d-axis components. Similarly, V q s and V d s indicate the q- and d-axis components of the stator voltage. The parameter p represents the number of pole pairs of the generator. The flux linkages on both stator and rotor sides, as well as the resulting electromagnetic torque, are expressed in terms of their d–q components within the synchronous rotating reference frame, facilitating dynamic control of active and reactive power exchange [56].
Ψ s = L s I s + L m I r
Ψ r = L m I s + L r I r
T m = 3 2 p L m L s   Ψ q s I d r Ψ d s I q r

3.2. DFIG’s Rotor-Side Control

The Rotor-Side Converter (RSC) in a DFIG enables independent and continuous control of active and reactive power, ensuring optimal power extraction from the available wind resource. In a conventional grid-connected setup, the system typically maintains a constant stator flux (φs). Additionally, for medium- and high-rated DFIG systems, the stator resistance is relatively small and can be reasonably neglected without significant loss of accuracy [57]. Under these assumptions, the stator voltage and flux relationships, as expressed in Equations (2) and (3), can be simplified for a more straightforward interpretation of the machine’s dynamic behaviour [58].
V d s = 0
V q s = V s = ω s φ d s
φ d s = φ d s = L s i d s + L m i d r
φ d s = 0 = L r i q s + L m i d r
The stator (Ls), magnetising (Lm), and rotor (Lr) inductances define the electromagnetic coupling within the DFIG. Using these parameters, we can formulate the corresponding expressions for active and reactive power as follows [58]:
P s = 3 2 V d s I d s + V q s I q s =   3 2 L m L s   V s I q r
Q s = 3 2 V q s I d s + V d s I q s = 3 2 V s 2 ω s L s L m L s V s i d r
Equations (23) and (24) demonstrate that the stator-side active and reactive powers are decoupled, allowing for independent regulation of each component. The stator power components are directly proportional to the d-axis and q-axis rotor current components, respectively. In conventional control schemes, Proportional–Integral (PI) controllers are employed to regulate these current components along the d–q axes. The reference current serves as the input to the PI current controller, which generates the corresponding reference voltage and adjusts the rotor current to maintain a constant stator flux linkage. The following transfer function can describe the overall behaviour of the control system:
G ˙ p = I r p * p V r q * p = I r d * V p = 1 R r + L r σ p
The closed-loop transfer function:
H ˙ p = I r q p I r q * p = I r d p I r d * p = K p p + K i L r σ S p
S’ (p) as the characteristic polynomial is given by:
S ˙ p = p 2 + R r + K p L r σ p + K i L r σ
The PI controller’s gains for the rotor dynamics are:
K i = 2 L r σ μ 2
K p = 2 L r σ μ R r

3.3. DFIG’s Grid-Side Control

The MATLAB simulation environment models the Grid-Side Converter (GSC) mathematically. It operates as a three-phase voltage-source inverter (VSI) employing a dual-pulse-width modulation (PWM) control scheme. The converter consists of six Insulated Gate Bipolar Transistors (IGBTs), labelled T1 through T6, where each complementary pair corresponds to one of the three output phases. The DC link, composed of parallel-connected capacitors, ensures a stable DC voltage across the converter terminals. The PWM mechanism modulates the DC-Link voltage into sinusoidal AC waveforms, which the system then filters and connects to the grid through a coupling transformer [59].
As illustrated in Figure 3, the GSC topology comprises the DC link, inverter bridge, and grid filter. The rotor-side converter rectifies the DC voltage, which the grid-side converter then converts into a controlled AC output synchronised with the grid’s frequency and phase. The filtering stage minimises harmonics and ensures compliance with grid interconnection standards. The control of the GSC serves two primary objectives:
  • Regulating the DC-Link voltage to maintain energy balance between converters.
  • Controlling the grid-side current to manage power exchange and reactive support.
The active power (P9sc) output of the GSC is directly proportional to the instantaneous wind speed, as expressed by the following relation [1]:
P G S C = 1 2 ρ A C p k v 3
In this equation, ρ represents the air density, A denotes the swept area of the turbine blades, C p is the power coefficient that defines the aerodynamic efficiency, k is a conversion constant, and v corresponds to the wind speed. This relationship reveals the nonlinear dependence of phase current magnitudes on wind velocity, which complicates the accurate detection of faults and the implementation of dynamic control. Moreover, as shown in Equations (23) and (24), the stator-side active and reactive power components are directly linked to the d-axis and q-axis rotor current components, respectively. Under vector-oriented control (VOC), the system aligns the d-axis grid voltage with the voltage magnitude and maintains the q-axis voltage at zero. Consequently, the grid-side power can be expressed in terms of these rotor current components, as derived from Equations (23) and (24).
P s = 3 2 V d s I d s
Q s = 3 2 V d s I d s

4. Proposed Methodology

Figure 4 illustrates that the proposed IRLC utilises a DQN architecture to perform adaptive voltage regulation in a wind-integrated IEEE 33-bus distribution network. The methodology consists of four primary stages: (1) system modelling, (2) RL problem formulation, (3) DQN architecture design, and (4) training and control process.

4.1. System Modelling

This study utilises the IEEE 33-bus radial distribution network as the test environment, serving as a standard benchmark for evaluating voltage control and reactive power management algorithms due to its moderate complexity and nonlinear load characteristics [32]. The network is modelled in the MATLAB/Simulink 2022b platform to accurately capture steady-state and dynamic responses under variable renewable generation.

4.1.1. Wind Energy Subsystem

The integration of Doubly Fed Induction Generator (DFIG)-based Wind Turbine Generators (WTGs) at selected buses (e.g., Bus 6 and Bus 18) introduces dynamic and stochastic power injection into the distribution system. The turbine extracts mechanical power from the wind, as follows:
P m = 1 2 ρ A C p λ , β v 3
where
ρ is the air density (1.225 kg/m3),
A = π R 2 is the rotor swept area (m2),
C p λ , β is the power coefficient,
v is the wind speed (m/s),
λ = ω r R v is the tip speed ratio,
β is the blade pitch angle (°).
where P m is the mechanical power from the turbine, P s and Q s are the stator powers, and P r and Q r are the rotor powers as explained in Section 3.1 and Section 3.2. The GSC maintains the DC-link voltage V D C using a PI-based control loop:
C d c d V d c d t = I d c P G S C V d c
where C d c is the DC-link capacitance, and P G S C is the GSC output power. The dynamic coupling between rotor and grid voltages introduces nonlinearity, which motivates the need for adaptive RL-based control rather than fixed-parameter PI controllers [60,61].

4.1.2. Distribution Network Model

The distribution network is modelled as a radial power flow system, where the complex power at each bus i is given by:
S i = P i + j Q i = V i k = 1 N V k G i k j B i k *
where G i k and B i k are the conductance and susceptance between buses i and k, respectively. The voltage magnitude at each bus is updated iteratively using the backwards–forward sweep (BFS) algorithm:
V i t + 1 = V i t + V i t = V i t + Z i P i j Q i V i * t
where Z i is the line impedance between buses. The BFS method provides a computationally efficient solution for radial feeders and is widely adopted in RL-based power system simulations [62,63].

4.1.3. Reactive Power Control Devices

The environmental model incorporates two reactive power compensation devices:
a.
On-Load Tap Changer (OLTC):
Distribution Network Operators (DNOs) must maintain supply voltages within statutory limits. The OLTC transformer, equipped with an Automatic Voltage Control (AVC) relay, remains the primary device for voltage regulation. However, increasing renewable energy integration causes bidirectional power flow, which can lead to potential overvoltage at connection points when local demand exceeds generation. The intermittent and uncertain nature of renewable energy sources further complicates voltage management. As a result, traditional OLTC-based control strategies are inadequate for modern networks. Advanced control methods are needed to ensure voltage stability and address operational constraints on renewable energy output. The OLTC regulates secondary voltage through discrete tap positions n, defined as follows [64]:
V o u t = V i n 1 + n T 100
where T is the tap step percentage (typically ±1.25%), and n   [ 16,16 ] . OLTC operation maintains the feeder head voltage but is limited by mechanical wear and switching delay [64].
b.
Shunt Capacitor Banks:
Utilities in distribution systems maintain the overall power factor near unity, usually within 5%, to ensure efficient operation and stable voltage levels. Reactive power compensation is provided through local VAR sources, with capacitor banks being the most common devices for voltage and VAR control. These banks improve voltage profiles, reduce losses, and enhance power factors. Fixed capacitor banks support constant inductive loads such as motors, while switched banks operate dynamically based on voltage or power factor thresholds. Automated controllers send switching commands as conditions change, although operators often lack real-time visibility due to limited monitoring capabilities. The reactive power support from capacitor banks is modelled as follows:
Q c a p = V 2 ω C
where C is the capacitance (F) and ω is the system angular frequency (rad/s). Capacitors are switched discretely in the RL environment to emulate real hardware operation [65].

4.1.4. ZIP Load Modelling for the IEEE 33-Bus System

The ZIP load model provides a comprehensive framework for characterising how electrical loads respond to variations in bus voltage, making it particularly valuable for voltage stability and control analysis in renewable-integrated distribution networks. As shown in Figure 5, the active and reactive power demands exhibit distinct voltage-dependent behaviours, depending on the composition of constant impedance (Z), constant current (I), and constant power (P) components.
In the proposed IEEE 33-bus test system, load modelling plays a crucial role in accurately representing the voltage-dependent demand characteristics of residential, industrial, and commercial consumers. To capture these nonlinear behaviours, the study modelled loads using the (Impedance–Current–Power) formulation, which combines three fundamental load types:
a.
Constant Impedance (Z):
With the constant impedance, the power varies with the square of the voltage: P , Q V 2 . This load type means that a 10% drop in voltage results in approximately a 19% decrease in power consumption. Typical examples include heaters, incandescent lamps, and resistive loads. Such loads naturally mitigate voltage instability, as lower voltage automatically reduces power demand, thereby helping to maintain grid equilibrium. Consequently, impedance-dominant systems are generally self-stabilising and beneficial for maintaining steady operation.
b.
Constant Current (I):
With the constant current, the power varies linearly with voltage: P , Q V . These loads include discharge lighting and some types of electronic devices. They provide moderate voltage dependency, not as stabilising as impedance loads, but more adaptive than constant power loads. Under low voltage, current-type loads slightly reduce consumption, offering limited damping against voltage fluctuations.
c.
Constant Power (P):
With constant power, the power remains independent of voltage: P = Q = c o n s t a n t . This behaviour is typical in motor drives, power electronic converters, and data centre equipment. To maintain constant power under low voltage, these loads draw higher current, which can exacerbate voltage instability and increase system losses. Such negative damping characteristics can trigger voltage collapse under severe disturbances or rapid fluctuations in renewable energy sources.
d.
Composite ZIP load:
In practical systems, loads are composed of a mixture of Z, I, and P components, resulting in a combined nonlinear response. The ZIP coefficients ( a p , b P , c P for active power and a q , b q , c q for reactive power) determine the relative weight of each component. The active and reactive power demands at each load bus are expressed as nonlinear functions of the voltage magnitude:
P L = P O a p V V O 2 + b P V V O + c P
Q L = Q O a q V V O 2 + b q V V O + c q
where
  • P L and Q L denote the instantaneous active and reactive load powers,
  • P O and Q O represent the nominal active and reactive powers at the reference voltage V O ,
  • V is the actual voltage magnitude at the load bus,
  • a p , b P , c P and a q , b q , c q are the (Impedance–Current–Power) coefficients that satisfy:
a p + b P + c P = 1 ,   a q + b q + c q = 1
For instance:
  • Residential networks: 20% Z, 30% I, 50% P (sensitive to voltage changes),
  • Industrial systems: 40% Z, 20% I, 40% P (balanced),
  • Commercial complexes: 10% Z, 20% I, 70% P (more constant power behaviour).
Figure 6 illustrates that as the constant-power component coefficient ( C p ) increases within the ZIP load composition, both the Voltage Deviation Index (VDI) and total system losses exhibit a significant rise. This behaviour stems from the fact that constant-power loads maintain their power consumption despite voltage dips, thereby increasing current draw and accentuating voltage drops along the feeder. Consequently, feeders experience greater reactive stress and higher I2R losses, particularly under scenarios involving heavy wind fluctuations. This sensitivity highlights the inherent vulnerability of distribution systems with a high share of constant-power characteristics (e.g., motor loads, electronic converters), where voltage damping capability is reduced and overall voltage regulation becomes more challenging [62].
Figure 7 illustrates the ternary plot, representing different ZIP load mixtures (ap + bp + cp = 1), which vividly illustrates the influence of load composition on voltage stability. Regions near the cp = 1 vertex (pure constant-power dominance) correspond to poor voltage stability and maximum power losses, signifying system operating points that are prone to instability and increased feeder heating. Conversely, regions near the ap = 1 vertex (impedance-dominant loads) exhibit the most stable voltage behaviour, as the voltage-dependent current draw naturally mitigates voltage deviations. Constant-current regions (bp-dominant) show intermediate behaviour, more stable than constant-power but less damping than impedance-based loads [66,67,68].
e.
Control Strategy Implication:
Controllers with adaptive voltage, VAR coordination, such as the proposed DQN framework, effectively mitigate the adverse effects of high constant-power fractions by proactively supplying reactive power and optimally adjusting OLTC tap positions. This results in operation trajectories that shift toward the low-VDI, low-loss regions of the ternary space. Compared to static or rule-based control methods, the DQN agent dynamically learns to counteract voltage sag tendencies induced by constant-power loads, maintaining overall system voltage within ±5% statutory limits even under significant wind speed variability. This strategy demonstrates the controller’s ability to generalise nonlinear load–voltage interactions and stabilise the feeder under diverse operating conditions. In reinforcement learning-based voltage regulation, the ZIP model offers a dynamic and nonlinear testing environment that mirrors real-world variability. When wind generation fluctuates, bus voltages vary, which alters power demand according to the ZIP characteristics. The DQN agent must therefore learn adaptive control actions (e.g., OLTC tap changes, capacitor switching) to maintain voltage stability across these nonlinear load responses. This approach makes the ZIP model essential for training and validating intelligent control policies that can generalise to unpredictable operating conditions.

4.2. Reinforcement Learning Problem Formulation

The proposed Intelligent Reinforcement Learning Controller (IRLC) formulates the voltage regulation problem in the IEEE 33-bus distribution network as a Markov Decision Process (MDP) [69,70]. The MDP formulation enables the controller to autonomously learn optimal control policies through sequential interactions with the power system environment, without requiring explicit mathematical models of the grid dynamics. The tuple defines the MDP [69]:
M = S , A , P , R , γ
where S denotes the state space, A the action space, P the state-transition probability, R the reward function, and γ 0,1 , while the discount factor controls the trade-off between immediate and future rewards. The following subsections provide detailed explanations of state space, action space, and state transition probability.

4.2.1. State Space

At each discrete time step t, the agent observes the system state s t S , comprising the measurable electrical quantities that represent the grid’s operating condition [71]:
s t = V 1 t , V 1 t ,   , V N t , Q 1 t , Q 2 t , ,   n O L T C t ,   C b a n k t , P w i n d t
where V i and Q i represent the per-unit voltage magnitude and reactive power injection at bus i, n O L T C is the tap position of the on-load tap changer, C b a n k the switching state of the capacitor bank (0 = OFF, 1 = ON), and P w i n d , the total real power output from the DFIG-based wind turbines.
These state variables capture the nonlinear coupling between renewable injections, voltage magnitudes, and reactive power flow, enabling the learning agent to infer the instantaneous voltage stability status of the feeder [71,72].

4.2.2. Action Space

The DQN agent determines control actions that modify system inputs to maintain voltages within statutory limits (±5% of the nominal value). The discrete action vector a t A is defined as:
a t = n O L T C t ,   C b a n k t
where n O L T C 1 ,   0 , + 1 represents a tap-down, no-change, or tap-up operation of the OLTC, and C b a n k 1 ,   0 , + 1 indicates the disconnection, maintenance, or connection of a capacitor step.
Each action triggers a corresponding adjustment in the distribution network model within MATLAB/Simulink, after which the environment returns the new voltage and power flow states s t + 1 .

4.2.3. Reward Function

The reward function quantitatively evaluates the desirability of each action, driving the agent to minimise voltage deviation, total system losses, and switching wear. The scalar reward at time t is formulated as:
R t = 1 J V t + 2 J P t + 3 J S t
where
  • J V t = i = 1 N V 1 t V r e f 2 is the voltage deviation index,
  • J P t = P l o s s t is the total active power loss,
  • J S t is the switching penalty representing the mechanical wear of OLTC and capacitors, and
  • 1 , 2 , 3 are positive weighting coefficients satisfying 1 + 2 + 3 = 1
The reward function, hence, promotes voltage regulation while discouraging unnecessary control operations:
R t = 1 i = 1 N V 1 1 2 + 2 P l o s s + 3 n O L T C + 3 V b a n k
Maximising the cumulative discounted reward drives the learning process toward globally optimal voltage regulation and minimal system loss [73,74].
J = E t = 0 T γ t R t

4.2.4. Transition Dynamics

The environment evolves according to the nonlinear distribution network, as shown in Equations (37)–(42). The transition probability P S | s , a is implicitly captured by the MATLAB/Simulink co-simulation engine, which updates bus voltages, reactive power flows, and wind outputs upon each action. Because these dynamics are highly nonlinear and stochastic, the model-free RL paradigm is particularly suitable [75].

4.2.5. Policy and Value Function

The objective of the agent is to learn an optimal policy π * a | s , that maximises the expected cumulative reward. The state–action value function (Q-function) is defined as:
Q π s t , a t = E k = 0 γ k R t + k + 1 | s t , a t , π
and the optimal Q-function satisfies the Bellman optimality equation:
Q * s t , a t = R t + γ m a x a t + 1 Q * s t + 1 , a t + 1
The DQN approximates Q * s , a ; θ using a deep neural network parameterised by weights θ. The network is trained by minimising the mean-squared temporal-difference loss:
L θ = E s , a , r , s ~ D r + γ m a x a Q s , a ; θ   Q s , a ; θ 2
where θ denotes the target-network parameters, and D is the experience replay buffer. Gradient descent is applied to update θ iteratively until convergence [76,77].

4.2.6. Block Diagram of the Reinforcement Learning Environment

Figure 8 illustrates the conceptual block diagram of the RL-based voltage control framework. The reinforcement learning environment for adaptive voltage control comprises several interconnected modules that collaborate to facilitate intelligent decision-making in a renewable-integrated distribution network. The MATLAB/Simulink environment model represents the IEEE 33-bus distribution system, including DFIG-based wind subsystems, ZIP loads, on-load tap changers (OLTCs), and shunt capacitor banks.
These components simulate real-world electrical dynamics and stochastic variations in renewable generation and load demand. The agent, implemented as a DQN controller, receives the system state vector, which includes measurements such as bus voltages, reactive power injections, OLTC tap positions, and capacitor switching states. Based on this data, the agent generates control actions that adjust the OLTC tap steps and capacitor bank operations to maintain voltage levels within statutory limits.
The reward computation module evaluates the effectiveness of each control action by calculating a scalar reward, R t , derived from voltage deviation, total active power loss, and switching penalties as defined in Equation (47). This feedback signal guides the agent toward minimising energy losses and maintaining voltage regulation. The experience replay mechanism stores past experiences, s t , a t , r t , s t + 1 , in a buffer, allowing the DQN to sample from prior interactions to break correlation between consecutive experiences and enhance learning efficiency. Simultaneously, a target network periodically updates Q-value estimates to improve training stability and prevent oscillations. Once training converges, the policy execution phase commences, wherein the trained DQN model generates real-time control commands with inference latency on the order of milliseconds. During continuous operation, the agent interacts dynamically with the simulated environment, iteratively applying actions, receiving rewards, and refining its internal policy parameters. This closed-loop interaction enables the controller to achieve robust voltage regulation and minimal energy loss under highly stochastic wind and load conditions, demonstrating adaptive and model-free learning behaviour suitable for practical deployment in smart grids.

5. Simulation Design and Implementation

The study utilises the standard IEEE 33-bus radial distribution feeder, modelled in MATLAB/Simulink (R2022b), with a backwards–forward sweep load flow. The RL agent is implemented in Python (version 3.8). The study uses IEEE 33-bus base quantities of 100 MVA and 11 kV (L-L). This work utilises a lightweight bridge to exchange signals at each control step, employing the MATLAB Engine for Python (synchronous call). At every decision instant, the Simulink model exposes the observed state vector, s t (bus voltages, reactive powers, OLTC tap, capacitor status, wind output). The agent returns the action, a t = n O L T C , V b a n k , and the environment advances one step by solving the nonlinear distribution equations and ZIP loads (Equations (35)–(40)) before computing the scalar reward R t (voltage deviation/loss/switching penalty). These components and data flows are consistent with the framework described in Section 4.1.1, Section 4.1.2, Section 4.1.3 and Section 4.1.4 (DFIG, ZIP, OLTC, capacitor) and the DQN controller with experience replay/target network introduced earlier.
A grid-connected DFIG-based wind farm sits at Bus 18, the distribution-level point of common coupling, and its aggregate machine rating allows it to supply about 20% of the peak system demand under nominal wind conditions; the instantaneous active power follows a Weibull wind profile with speeds ranging from 4 to 15 m/s, yielding an output of 0.2–1.2 MW over the day. The RSC regulates stator active/reactive exchange; the GSC maintains DC-link voltage as described in Section 3 (RSC/GSC vector control).
Loads are grouped by end-use and assigned per-bus (Z, I, P) fractions that sum to 1, as per Section 4.1.4. Active and reactive powers track voltage via Equations (39) and (40). One substation OLTC (Bus 1) regulates feeder-head voltage with a tap step of ±1.25% and a range of n ∈ [−16, 16]; switching wear and relay delay are accounted for in the reward/constraints. Three shunt capacitor banks are installed at buses 12, 25, and 30, each with a capacity of ±300 kVAr, switched discretely to emulate field hardware. The study enforces minimum dwell times of (OLTC ≥ 60 s, capacitor step ≥ 30 s) to avoid chattering. Simulations use a fixed integration step of 1 ms for electrical dynamics and a control decision interval of Δt = 1 s. Wind and load traces are time-synchronised. Test scenarios cover both nominal and stressful conditions (with varying wind and loads) to demonstrate robustness.

5.1. System Specifications and Test Scenarios

Table 3 represents the simulation system specifications and test scenarios used in the study.

5.2. Hyperparameter, Training, and Convergence

The study used a DQN controller and tuned the default hyperparameters to achieve stable convergence. The work set the learning rate to α = 1 × 10−3 with the Adam optimiser, and the discount factor was γ = 0.95. The exploration rate ϵ followed a linear decay from 0.90 to 0.05 throughout training, resetting at the start of each episode. The replay buffer of N r e p l a y = 1.0 × 10 5 were used with a minibatch size of 64. The algorithm updated the target network every 1000 steps using a hard update, or through soft updates with τ = 5.0 × 10 3 . The Q-network architecture consisted of a multilayer perception with two hidden layers (256 and 128 neurons, ReLU activation) and output neurons equal to the number of available actions, with optional layer normalisation for stability. Regularisation included gradient clipping g 10 and L2 weight decay of 1.0 × 10 5 . The model used a 1-second control step, and the electrical solver operated at 1 ms to ensure tight coupling between the control and physical dynamics. Training comprised 1000 episodes, each simulating 24 h with variable wind and load profiles. Convergence was declared when the moving average of episodic rewards plateaued (<1% change over 50 episodes) and performance metrics, voltage deviation index ≤ 0.007 p.u., total loss reduction ≥ 10%, and stable switching counts were satisfied. The trained model’s inference latency (a few milliseconds) and performance were then validated on unseen stochastic scenarios, confirming robustness and reproducibility within the DQN–Simulink framework.

Performance Observation

a.
Voltage Profile Performance
The results illustrated in Figure 9 compare the voltage magnitude profiles along the IEEE 33-bus distribution network under different controller configurations. The voltage profile indicates how effectively each control strategy maintains bus voltages near the nominal value (1.0 p.u.) under typical operating conditions.
The Proposed DQN controller demonstrates the flattest and most stable voltage profile, maintaining voltages within the range of 0.985–1.0 p.u. across all buses. This result highlights the DQN agent’s superior ability to regulate voltage levels dynamically by learning optimal control actions for OLTC tap positions and capacitor switching through continuous interaction with the nonlinear network environment. The DQN controller minimises both the voltage deviation index (VDI) and the voltage drop toward the feeder’s remote end, ensuring uniform voltage quality across the network.
The RL–PSO hybrid controller exhibits slightly lower voltage levels, maintaining voltages around 0.97–0.985 p.u., which reflects improved adaptability but with mild oscillations due to stochastic policy tuning. The PSO and MPC (Nonlinear) controllers demonstrate moderate regulation performance, each capable of partial compensation but unable to sustain consistent voltage levels along the feeder under varying load and wind generation conditions.
The PI controller exhibits the poorest performance, with bus voltages dropping below 0.93 p.u. at the most distant nodes, indicating insufficient reactive support and a slow corrective response to dynamic disturbances.
Overall, the DQN-based control scheme achieves the best voltage regulation performance, maintaining bus voltages well within statutory limits (±5%) while minimising both voltage sag and overcompensation. This superior profile confirms the effectiveness of deep reinforcement learning in managing distributed voltage control in renewable-integrated distribution networks, offering enhanced adaptability, reduced energy loss, and improved power quality compared to conventional and heuristic-based methods.
b.
Total Active Power Loss Performance
The results in Figure 10 compare the total active power losses (kW) under different operating scenarios, Nominal, High Wind, Heavy Load, and Variable Conditions, for all tested controllers. This analysis quantifies the effectiveness of each control strategy in minimising resistive losses in the distribution feeder while maintaining voltage stability under diverse system states.
Across all scenarios, the Proposed DQN controller achieves the lowest active power losses, demonstrating superior efficiency in real-time power flow management. Under nominal conditions, the DQN reduces total losses to approximately 140 kW, a reduction of about 25–30% compared to the PI baseline. During high-wind conditions, when power injections from DFIG units fluctuate significantly, the DQN maintains stable losses around 150 kW, confirming its adaptability to renewable intermittency. Under the heavy-load scenario, the DQN’s performance remains consistent, limiting losses to around 170 kW, whereas conventional PI and MPC controllers record significantly higher losses exceeding 210 kW. Even in variable conditions, where both wind and load fluctuate simultaneously, the DQN controller sustains efficient operation with minimal deviation from optimal performance.
The RL–PSO controller ranks second in performance, offering a substantial improvement over PSO and MPC due to its adaptive exploration. In contrast, PSO and MPC provide moderate reductions through periodic optimisation but lack dynamic responsiveness. The PI controller consistently exhibits the highest power losses, attributable to its inability to coordinate reactive compensation and OLTC operations efficiently.
Overall, these results confirm that the DQN-based controller effectively minimises network energy losses, optimising both reactive support and voltage profiles under diverse grid conditions. Its learning-based control policy ensures a robust, real-time response to nonlinear and stochastic behaviours, making it a superior solution for reducing losses and improving energy efficiency in active distribution networks.
c.
Reactive Power Profile Along the Feeder
The results depicted in Figure 11 illustrate the reactive power distribution along the IEEE 33-bus feeder under different control strategies. The profile demonstrates each controller’s ability to manage reactive power compensation and maintain voltage stability across the distribution network. A flatter and less negative reactive power curve indicates improved reactive support and reduced line losses.
The Proposed DQN controller achieves the best performance, maintaining the highest reactive power levels (around −120 kVAr near the feeder end), which indicates adequate compensation and minimal reactive power draw from upstream buses. This study highlights the DQN agent’s ability to optimally coordinate OLTC and capacitor operations while learning dynamic relationships between load variations and voltage-reactive interactions. Its adaptive learning mechanism allows real-time adjustments to balance reactive flow, ensuring uniform voltage regulation across all buses.
The RL–PSO controller performs relatively well, improving reactive balance compared to conventional methods, but still exhibits small fluctuations due to the variability in heuristic-based decisions. The PSO and MPC (Nonlinear) controllers demonstrate moderate compensation capabilities, as they rely on periodic optimisation or static objective functions that lack adaptability under fast-changing load or wind conditions. Conversely, the PI controller exhibits the steepest and most negative reactive power profile (below −170 kVAr), reflecting poor compensation and a higher reactive burden on the upstream network.
Overall, the DQN-based approach yields the most efficient reactive power management, achieving smoother distribution, reduced voltage drops, and lower network losses. This study confirms that reinforcement learning-based controllers can effectively enhance system voltage-reactive power coordination and power quality in active distribution systems with renewable integration.
d.
Reward Convergence Comparison Across Controllers
The simulation results presented in Figure 12 depict the reward convergence trends for different controllers over 1000 training episodes. The cumulative reward represents the overall control performance, integrating factors such as voltage deviation minimisation, loss reduction, and switching efficiency. Higher rewards correspond to better voltage regulation and control stability.
The Proposed DQN controller exhibits a smooth and stable learning curve, steadily increasing its cumulative reward until convergence around episode 600. This study reflects efficient policy learning and consistent improvement in maintaining system voltage within nominal limits while minimising control effort. The gradual convergence trend of the DQN indicates an effective balance between exploration and exploitation, aided by its experience replay and target network stabilisation mechanisms.
In contrast, the RL–PSO controller achieves the highest initial rewards but exhibits mild instability and saturation near convergence, indicating that while it quickly identifies near-optimal regions through heuristic exploration, it lacks long-term policy refinement and adaptability to dynamic disturbances. The PSO and MPC (Nonlinear) controllers exhibit intermediate reward levels. PSO converges rapidly but phases early due to its limited adaptability. In contrast, MPC achieves moderate convergence, constrained by its computational horizon and dependency on accurate modelling. The PI controller yields the lowest and flattest reward curve, indicating poor adaptability and limited capability in handling nonlinear stochastic system dynamics.
Overall, the DQN model demonstrates robust and stable learning behaviour, achieving high cumulative rewards through intelligent state-action mapping. This study confirms its superiority in adaptive voltage regulation under uncertain wind and load conditions, combining both optimality and generalisation across varying operating scenarios. The convergence behaviour validates that reinforcement learning provides a sustainable control policy capable of outperforming classical and heuristic methods in complex distribution networks.
e.
Sensitivity of Voltage Regulation to Wind Speed Variability
The results shown in Figure 13 illustrate the sensitivity of various controllers to wind power fluctuations, expressed in terms of the Mean Voltage Deviation Index (VDI) as a function of wind power variation levels ranging from 0% to 40%. This analysis quantifies the robustness of each control strategy under stochastic renewable generation conditions typical of DFIG-based wind energy systems.
As expected, all controllers exhibit an increase in VDI with higher wind variability, signifying greater voltage instability induced by fluctuating reactive power injections. However, the proposed DQN controller consistently maintains the lowest VDI across all variation levels, starting at approximately 0.006 p.u. for 0% fluctuation and rising only marginally to 0.018 p.u. at 40%. This study confirms the DQN agent’s ability to adaptively learn and compensate for dynamic wind disturbances through real-time policy updates and predictive action selection.
The RL–PSO controller performs second-best, leveraging heuristic search to partially adapt to uncertainty, but still exhibits higher deviation due to its limited temporal learning capabilities. The PSO and MPC (Nonlinear) controllers show intermediate performance, with VDIs increasing linearly with fluctuation intensity, reflecting their reliance on predefined models or fixed optimisation horizons that cannot fully adapt to real-time stochastic variations. The PI controller exhibits the poorest robustness, with VDI values nearly 2.5 times higher than those of the DQN at high wind variability, underscoring its inability to effectively handle nonlinear and time-varying system dynamics.
Overall, the results demonstrate that the DQN-based controller provides superior resilience and voltage stability, maintaining consistent regulation performance even under volatile wind conditions. This robustness stems from its reinforcement learning architecture, which enables continuous self-tuning and optimal control in the presence of uncertain renewable energy dynamics, making it the most reliable approach for adaptive voltage regulation in modern active distribution networks.
f.
Tap and Capacitor Switching Frequency Comparison
The results presented in Figure 14 provide a comparative evaluation of the OLTC tap and capacitor bank switching frequencies across various control strategies. These metrics reflect the controllers’ operational efficiency and their impact on equipment lifespan, as excessive switching leads to mechanical wear and increased maintenance costs.
The Proposed DQN controller demonstrates the lowest switching frequency, with approximately 18 tap operations and 10 capacitor switching events per day. This study demonstrates that the DQN-based approach learns an optimal control policy that minimises unnecessary actuation while maintaining voltage stability. Its reinforcement learning structure enables predictive decision-making, ensuring control actions are triggered only when voltage deviations exceed meaningful thresholds, thereby reducing control chatter.
In contrast, the RL–PSO hybrid method shows slightly higher switching activity (25 taps and 16 capacitor operations), attributed to its partial reliance on heuristic exploration, which introduces mild oscillations in control decisions. The PSO and MPC (Nonlinear) controllers display further increases in switching counts due to their periodic re-optimisation or sensitivity to local fluctuations in network conditions. The conventional PI controller exhibits the highest switching frequency (48 taps and 32 capacitor operations), as it responds directly to instantaneous voltage deviations without predictive smoothing or coordination between OLTC and capacitor actions.
Overall, the results confirm that the DQN controller achieves the best trade-off between voltage regulation accuracy and actuator endurance, effectively minimising mechanical stress on switching devices while maintaining compliance with network voltage limits. This behaviour underscores the superiority of learning-based adaptive control over traditional and optimisation-based approaches for real-time voltage regulation in active distribution systems.
g.
Temporal voltage evolution at Bus 18
The simulation result depicted in Figure 15 illustrates the temporal voltage evolution at Bus 18 under varying controller configurations over a 24 h operating period. The voltage trajectories demonstrate each controller’s ability to maintain the bus voltage within the acceptable range (0.95–1.05 p.u.) in response to stochastic wind and load variations modelled through the DFIG–ZIP dynamic environment.
The Proposed DQN controller exhibits the most stable and tightly regulated voltage profile, maintaining an average voltage near 1.0 p.u. with minimal fluctuation amplitude (±0.002 p.u.). This study demonstrates superior adaptability and learning efficiency in mitigating voltage deviations resulting from renewable intermittency and load variability. The DQN agent effectively leverages real-time state feedback and adaptive reward optimisation to reduce voltage overshoot and steady-state error, outperforming all benchmark methods.
In comparison, the RL–PSO hybrid controller achieves moderate voltage stability, benefiting from heuristic exploration but exhibiting slower adaptation to rapidly changing conditions. The PSO controller demonstrates acceptable regulation but with noticeable voltage oscillations, highlighting its static nature and limited responsiveness to dynamic disturbances. The MPC (Nonlinear) controller maintains a smoother trajectory than conventional PI control but suffers from model dependency and computational latency, which restricts its real-time effectiveness. The PI controller, although simple, exhibits the poorest performance, with significant under-voltage periods, especially during high-load or low-wind intervals.
Overall, the DQN-based voltage regulator achieves the lowest voltage deviation index (VDI ≈ 0.0045 p.u.), provides a fast response, and consistently complies with operational voltage standards, validating its superiority for adaptive voltage control in active distribution networks.
h.
Computational Efficiency Comparison
The Computational Efficiency analysis in Figure 16 evaluates both inference latency and training time to convergence for different control algorithms, providing insight into their practicality for real-time voltage regulation in active distribution systems.
The results show that the proposed DQN controller achieves a low inference latency (~1.2 ms), demonstrating its suitability for real-time operation. However, it requires the longest training time (~10 h) due to the iterative nature of reinforcement learning and the need for extensive interaction with the environment to learn optimal control policies. Once trained, the DQN executes decisions almost instantaneously, making it ideal for deployment in fast-reacting control loops.
In contrast, the PI controller has negligible training time and the lowest latency (<0.3 ms), but it lacks adaptability to nonlinear and stochastic operating conditions. The MPC (Nonlinear) controller shows the highest inference latency (~8 ms) due to real-time optimisation solving at every step. However, its training effort is minimal since it does not rely on iterative learning. PSO and RL–PSO present moderate computational loads. PSO requires more computation during optimisation but shorter training time than RL-based methods, while RL–PSO offers a balanced trade-off between adaptation and computational demand.
Overall, the DQN controller demonstrates excellent online efficiency and decision-making speed, despite incurring higher offline training costs. Its inference speed and adaptability make it the most computationally efficient solution for continuous real-time control, significantly outperforming conventional optimisation and rule-based schemes once deployed.

6. Discussion and Comparison of the Simulation Studies

The comprehensive simulation analysis conducted on the IEEE 33-bus wind-integrated distribution system provides an in-depth evaluation of six different voltage and reactive power control strategies. The authors assessed each control framework under identical operating conditions, including fluctuating wind generation, dynamic load variations, and multiple reactive compensation devices. The results, validated through multiple performance indices and dynamic response studies, collectively highlight the superior adaptability and robustness of the DQN-based intelligent control framework.

6.1. Voltage Profile Performance

Across the 33-bus feeder, the proposed DQN controller maintains voltages closest to the nominal value (≈0.985–1.00 p.u.) and with the flattest spatial profile. In contrast, the PI controller exhibits the steepest drop (<0.93 p.u. at remote buses), and MPC/PSO/RL-PSO sit in between. This study demonstrates the DQN’s superior coordination of OLTC and shunt capacitors in counteracting feeder R–X drops under varying operating conditions.

6.2. Active Power Loss Performance

For the Nominal, High-Wind, Heavy-Load, and Variable cases, the DQN consistently yields the lowest losses (≈approximately 140–170 kW) compared to PI (≈approximately 195–230+ kW) and MPC/PSO (intermediate). The loss reductions reflect both improved voltage regulation and enhanced reactive support, thereby reducing I2R losses on downstream lines.

6.3. Reactive Power Profile Performance

The DQN maintains the least negative Q profile (higher, flatter kVAr along buses), indicating effective VAR support, lower reactive draw from the substation, and better utilisation of local compensation.

6.4. Reward Convergence and Learning Behaviour

The DQN’s cumulative reward rises smoothly and stabilises by ~600 episodes, evidencing effective exploration–exploitation, replay, and target-network stabilisation. RL-PSO starts high but plateaus/oscillates; PSO/MPC converge faster but to lower asymptotes; PI remains lowest, consistent with their limited adaptivity.

6.5. Sensitivity of Voltage Regulation and Robustness

As wind variability increases (0–40%), all methods show higher mean VDI, but the slope for DQN is the smallest. The DQN thus delivers the best robustness to renewable intermittency, while PI’s VDI grows ~2–3 times faster, confirming the limited adaptability of fixed-gain control to stochastic injections.

6.6. Tap & Capacitor Switching Frequency

The DQN triggers the fewest operations (~18 OLTC taps and ~10 capacitor steps/day) vs. PI (~48/32). Fewer actions imply reduced wear, lower maintenance, and a lower risk of nuisance operations, all achieved without sacrificing voltage quality.

6.7. Temporal Voltage at BUS 18

Time-series traces show that DQN tracks near 1.0 p.u. with the smallest ripple, correcting disturbances faster than the benchmarks, which is essential for power-quality compliance under real wind/load dynamics.

6.8. Computational Efficiency

Once trained, the DQN’s inference latency is low (~1.2 ms), well within real-time control budgets, while MPC incurs the largest runtime burden (~8 ms) due to online optimisation. The DQN’s main cost is offline training time (~10 h in our setup), which is a one-time or infrequent expense; after deployment, it is computationally light.
Table 4 presents a comparative analysis, summarising the performance of all controllers across the major technical indicators evaluated in the simulation study.
Reinforcement learning-based control supports sustainable grid operation by improving energy efficiency, reducing technical losses, and enabling higher renewable penetration. By learning optimal voltage and reactive power actions in real time, the DQN framework minimises power dissipation and unnecessary switching, allowing greater utilisation of wind energy while lowering reliance on carbon-intensive generation. These efficiency gains collectively reduce the grid’s carbon footprint and strengthen the long-term sustainability of renewable-integrated distribution systems.

7. Conclusions and Future Work

7.1. Conclusions

This paper presented a DQN-based adaptive voltage regulation framework for active distribution networks with DFIG-integrated wind generation, ZIP loads, and discrete control devices (OLTC and capacitor banks). The proposed DQN-based voltage regulation framework not only enhances technical performance but also contributes substantively to sustainable energy development. By reducing active power losses by up to 35–40% and improving voltage stability under high renewable penetration, the controller increases grid energy efficiency and supports greater integration of clean wind energy. These improvements translate into lower carbon emissions, reduced operational costs, and enhanced grid resilience, key pillars of sustainability. The DQN agent’s capacity to autonomously learn optimal control strategies directly from the nonlinear environment, without relying on explicit system models, supports long-term adaptability as renewable penetration increases, offering a scalable foundation for future sustainable smart grids. Moreover, the DQN demonstrated strong resilience to stochastic wind variations, maintaining voltage deviations within ±5% across all scenarios, and achieving smooth reward convergence with consistent improvements in learning metrics, including voltage deviation, system losses, and device actuation frequency. In addition, the DQN’s inference latency (~1.2 ms) makes it feasible for real-time deployment, while its training overhead remains a one-time offline process.
Overall, the study demonstrates that intelligent, data-driven voltage control frameworks are essential tools for achieving energy sustainability targets, ensuring reliable operation of renewable-integrated distribution networks, and supporting global climate and decarbonisation commitments.

7.2. Future Work

Future research will further enhance the practicality and generalisability of the proposed DQN-based voltage control framework by exploring several key directions. First, the approach can be extended to a multi-agent reinforcement learning (MARL) paradigm, enabling coordinated control across multiple feeders, voltage regulators, and distributed energy resources (DERs) for system-wide voltage optimisation. Second, integrating safe and constrained reinforcement learning (RL) will allow operational limits, such as voltage bounds, equipment ratings, and switching constraints, to be embedded directly within the learning process, ensuring safe behaviour during both training and deployment. Third, transfer learning and meta-reinforcement learning (meta-RL) techniques will be applied to accelerate adaptation to evolving grid conditions, including topology changes, component ageing, and seasonal demand variations, thereby reducing the need for extensive retraining. Additional work will include hardware-in-the-loop (HIL) validation to examine real-time performance, latency, and interoperability with SCADA systems under realistic disturbances. The study will also investigate hybrid DQN–MPC frameworks that combine RL’s adaptive policies with MPC’s constraint handling and short-term forecasting to improve reliability under uncertainty. Finally, enhancing cyber–physical and communication resilience will remain a priority to ensure robust operation under latency, data loss, and potential cyber threats in future smart distribution networks.
Future work will further include evaluation of the proposed framework against policy-driven sustainability targets and renewable integration mandates. Moreover, examining the socio-economic benefits of enhanced renewable hosting capacity, such as lower energy costs and improved access to clean electricity, will help demonstrate its broader contribution to sustainable development.

Author Contributions

The authors planned for the study and contributed to the idea and field of information. Introduction, R.K.B. and A.K.S.; methodology, R.K.B. and A.K.S.; investigation, R.K.B. and A.K.S.; resources, R.K.B. and A.K.S.; data curation, R.K.B. and A.K.S.; writing—original draft preparation, R.K.B. and A.K.S.; writing, review and editing, R.K.B. and A.K.S.; visualisation, R.K.B. and A.K.S.; supervision, A.K.S.; project administration, A.K.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

I thank you so much for introducing me to the topic of Wind Renewable Energy, for your outstanding supervision, and for helping me pay close attention to detail. My sincere gratitude goes to my supervisor, Akshay Kumar Saha. Additionally, I would like to express my gratitude to my family for their encouragement and support. The sources listed in this paper provided the data for this analysis. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AVCAutomatic Voltage Control
DFIGDoubly Fed Induction Generator
DNQsDistribution Network Operators
DQNDeep Q-Network
GAGenetic Algorithm
GCCGrid code compliance
GSCGrid-Side Converter
IEAInternational Energy Agency
IGBTsInsulated Gate Bipolar Transistors
IRLCIntelligent Reinforcement Learning Controller
IRPIntegrated Resource Plan
MDPMarkov Decision Process
MMFsMagnetomotive Forces
MPCModel Predictive Control
OLTCOn Load Tap Changer
OPFOptimal Power Flow
PCCpoint of common coupling
PFPower Factor
PSOParticle Swarm Optimisation
PWMPulse-Width Modulation
RBCRule-Based Control
RLReinforcement Learning
RPPsRenewable Power Plants
RSCRotor-Side Converter
SASimulated Annealing
VDIVoltage Deviation Index
VOCVector-Oriented Control
XAIExplainable AI

References

  1. Petrusev, A.; Putratama, M.A.; Rigo-Mariani, R.; Debusschere, V.; Reignier, P.; Hadjsaid, N. Reinforcement learning for robust voltage control in distribution grids under uncertainties. Sustain. Energy Grids Netw. 2023, 33, 100959. [Google Scholar] [CrossRef]
  2. Nematshahi, S.; Shi, D.; Wang, F.; Yan, B.; Nair, A. Deep reinforcement learning based voltage control revisited. IET Gener. Transm. Distrib. 2023, 17, 4826–4835. [Google Scholar] [CrossRef]
  3. Jacob, R.A.; Paul, S.; Chowdhury, S.; Gel, Y.R.; Zhang, J. Real-time outage management in active distribution networks using reinforcement learning over graphs. Nat. Commun. 2024, 15, 4766. [Google Scholar] [CrossRef]
  4. Nwagu, C.N.; Ujah, C.O.; Kallon, D.V.V.; Aigbodion, V.S. Integrating solar and wind energy into the electricity grid for improved power accessibility. Unconv. Resour. 2025, 5, 100129. [Google Scholar] [CrossRef]
  5. Duranay, Z.B.; Güldemir, H.; Coşkun, B. The Role of Wind Turbine Siting in Achieving Sustainable Energy Goals. Processes 2024, 12, 2900. [Google Scholar] [CrossRef]
  6. Ullah, F.; Zhang, X.; Khan, M.; Mastoi, M.S.; Munir, H.M.; Flah, A.; Said, Y. A comprehensive review of wind power integration and energy storage technologies for modern grid frequency regulation. Heliyon 2024, 10, e30466. [Google Scholar] [CrossRef]
  7. Li, S. Reactive power limit of wind farm with doubly-fed induction generators and its asymmetric P-Q dependence. Int. J. Electr. Power Energy Syst. 2025, 169, 110819. [Google Scholar] [CrossRef]
  8. Kebede, M.G.; Tuka, M.B. Power Control of Wind Energy Conversion System with Doubly Fed Induction Generator. J. Energy 2022, 2022, 8679053. [Google Scholar] [CrossRef]
  9. Alhato, M.M.; Ibrahim, M.N.; Rezk, H.; Bouallègue, S. An Enhanced DC-Link Voltage Response for Wind-Driven Doubly Fed Induction Generator Using Adaptive Fuzzy Extended State Observer and Sliding Mode Control. Mathematics 2021, 9, 963. [Google Scholar] [CrossRef]
  10. Izanlo, A.; Abdollahi, S.E.; Gholamian, S.A. A New Method for Design and Optimization of DFIG for Wind Power Applications. Electr. Power Compon. Syst. 2020, 48, 1523–1536. [Google Scholar] [CrossRef]
  11. Mustafa, F.E.; Ahmed, I.; Basit, A.; Alqahtani, M.; Khalid, M. An adaptive metaheuristic optimization approach for Tennessee Eastman process for an industrial fault tolerant control system. PLoS ONE 2024, 19, e0296471. [Google Scholar] [CrossRef]
  12. Xiang, X.; Diao, R.; Bernadin, S.; Foo, S.Y.; Sun, F.; Ogundana, A.S. An Intelligent Parameter Identification Method of DFIG Systems Using Hybrid Particle Swarm Optimization and Reinforcement Learning. IEEE Access 2024, 12, 44080–44090. [Google Scholar] [CrossRef]
  13. Manjunath, T.G.; Kusagur, A. Analysis of Different Meta Heuristics Method in Intelligent Fault Detection of Multilevel Inverter with Photovoltaic Power Generation Source. Int. J. Power Electron. Drive Syst. 2018, 9, 1214. [Google Scholar] [CrossRef]
  14. Huang, W.; Hu, B.; Shao, C.; Li, W.; Wang, X.; Xie, K.; Chung, C.Y. Power System Reliability Evaluation Based on Sequential Monte Carlo Simulation Considering Multiple Failure Modes of Components. J. Mod. Power Syst. Clean. Energy 2024, 13, 202–214. [Google Scholar] [CrossRef]
  15. Yang, K.; Cho, K. Simulated Annealing Algorithm for Wind Farm Layout Optimization: A Benchmark Study. Energies 2019, 12, 4403. [Google Scholar] [CrossRef]
  16. Guediri, A.; Touil, S. Optimization Using a Genetic Algorithm Based on DFIG Power Supply for the Electrical Grid. Int. J. Eng. 2022, 35, 121–129. [Google Scholar] [CrossRef]
  17. Barrios Aguilar, M.E.; Coury, D.V.; Reginatto, R.; Monaro, R.M. Multi-objective PSO applied to PI control of DFIG wind turbine under electrical fault conditions. Electr. Power Syst. Res. 2020, 180, 106081. [Google Scholar] [CrossRef]
  18. Puchalapalli, S.; Singh, B. A Novel Control Scheme for Wind Turbine Driven DFIG Interfaced to Utility Grid. IEEE Trans. Ind. Appl. 2020, 56, 2925–2937. [Google Scholar] [CrossRef]
  19. Gholami, A.; Sahab, A.; Tavakoli, A.; Alizadeh, B. DFIG-Based Wind Energy System Robust Optimal Control by Using of Novel LMI-Based Adaptive MPC. IETE J. Res. 2023, 69, 3467–3476. [Google Scholar] [CrossRef]
  20. Karagiannopoulos, S.; Aristidou, P.; Hug, G.; Botterud, A. Decentralized control in active distribution grids via supervised and reinforcement learning. Energy AI 2024, 16, 100342. [Google Scholar] [CrossRef]
  21. Huang, J.; Zhang, H.; Tian, D.; Zhang, Z.; Yu, C.; Hancke, G.P. Multi-agent deep reinforcement learning with enhanced collaboration for distribution network voltage control. Eng. Appl. Artif. Intell. 2024, 134, 108677. [Google Scholar] [CrossRef]
  22. Aoxiang, M.A.; Cao, J.; Cortes, P.R. Graph Neural Network Based Deep Reinforcement Learning for Volt-Var Control in Distribution Grids. In Proceedings of the 2024 IEEE 15th International Symposium on Power Electronics for Distributed Generation Systems (PEDG), Luxembourg, 23–26 June 2024; IEEE: New York, NY, USA, 2024; pp. 1–5. [Google Scholar] [CrossRef]
  23. Battu, N.R.; Senroy, N.; Abhyankar, A.R. Effect of Capacitors on Frequency of OLTC Tap Changes in the Presence of Wind Generation. Int. J. Emerg. Electr. Power Syst. 2015, 16, 385–395. [Google Scholar] [CrossRef]
  24. Otchere, I.K.; Ampofo, D.; Dantuo, J.; Frimpong, E.A. The Influence of Voltage Regulation Devices Based Distributed Generation in Distribution Networks. Electr. Electron. Eng. 2023, 2023, 1–5. [Google Scholar]
  25. Petinrin, J.O.; Petinrin, O.O.; Petinrin, M.O.; Ogedengbe, I.I. Coordinated Framework for Voltage Control in Distribution System with Wind Energy. IAENG Int. J. Comput. Sci. 2025, 52, 3159–3166. [Google Scholar]
  26. Chen, Y.; Liu, Y.; Zhao, J.; Qiu, G.; Yin, H.; Li, Z. Physical-assisted multi-agent graph reinforcement learning enabled fast voltage regulation for PV-rich active distribution network. Appl. Energy 2023, 351, 121743. [Google Scholar] [CrossRef]
  27. Yu, P.; Zhang, H.; Song, Y.; Wang, Z.; Dong, H.; Ji, L. Safe reinforcement learning for power system control: A review. Renew. Sustain. Energy Rev. 2025, 223, 116022. [Google Scholar] [CrossRef]
  28. Hossain, R.; Gautam, M.; Olowolaju, J.; Livani, H.; Benidris, M. Multi-agent voltage control in distribution systems using GAN-DRL-based approach. Electr. Power Syst. Res. 2024, 234, 110528. [Google Scholar] [CrossRef]
  29. Kilembe, A.B.; Hamilton, R.I.; Papadopoulos, P.N. Explainable Machine Learning: A SHAP value-based approach to locational frequency stability. Int. J. Electr. Power Energy Syst. 2025, 170, 110885. [Google Scholar] [CrossRef]
  30. Hassouna, M.; Holzhüter, C.; Lytaev, P.; Thomas, J.; Sick, B.; Scholz, C. Graph Reinforcement Learning for Power Grids: A Comprehensive Survey. arXiv 2024, arXiv:2407.04522. [Google Scholar]
  31. Zhang, T.; Yu, L.; Yue, D.; Dou, C.; Xie, X.; Shi, T. Explainable deep reinforcement learning approach for smart voltage regulation of high renewable-penetrated distribution networks considering hydrogen-storage system. Electr. Power Syst. Res. 2025, 246, 111654. [Google Scholar] [CrossRef]
  32. Cai, X.; Yang, Z.; Liu, P.; Lian, X.; Li, Z.; Zhu, G.; Geng, H. Voltage Control Strategy for Large-Scale Wind Farm with Rapid Wind Speed Fluctuation. Energies 2024, 17, 2220. [Google Scholar] [CrossRef]
  33. Shadi, M.R.; Mirshekali, H.; Shaker, H.R. Explainable artificial intelligence for energy systems maintenance: A review on concepts, current techniques, challenges, and prospects. Renew. Sustain. Energy Rev. 2025, 216, 115668. [Google Scholar] [CrossRef]
  34. Jerström, N.; Selin, T. Decentralized Voltage Regulation in Low-Voltage Grids Using Projection-Constrained Multi-Agent Deep Reinforcement Learning: A Case Study in E.ON’s Swedish Distribution Grid. Master’s Thesis, Uppsala University, Uppsala, Sweden, 2025. Available online: https://uu.diva-portal.org/smash/get/diva2%3A1989752/FULLTEXT02.pdf (accessed on 15 February 2025).
  35. Yeh, C.; Yu, J.; Shi, Y.; Wierman, A. Online Learning for Robust Voltage Control Under Uncertain Grid Topology. IEEE Trans. Smart Grid 2024, 15, 4754–4764. [Google Scholar] [CrossRef]
  36. Su, T.; Wu, T.; Zhao, J.; Scaglione, A.; Xie, L. A Review of Safe Reinforcement Learning Methods for Modern Power Systems. Proc. IEEE 2025, 113, 213–255. [Google Scholar] [CrossRef]
  37. Jung, Y.; Han, C.; Lee, D.; Song, S.; Jang, G. Adaptive Volt–Var Control in Smart PV Inverter for Mitigating Voltage Unbalance at PCC Using Multiagent Deep Reinforcement Learning. Appl. Sci. 2021, 11, 8979. [Google Scholar] [CrossRef]
  38. Tamaarat, A. Active and Reactive Power Control for DFIG Using PI, Fuzzy Logic and Self-Tuning PI Fuzzy Controllers. Adv. Model. Anal. C 2019, 74, 95–102. [Google Scholar] [CrossRef]
  39. Osama abed el-Raouf, M.; A. Mageed, S.A.; Salama, M.M.; Mosaad, M.I.; AbdelHadi, H.A. Performance Enhancement of Grid-Connected Renewable Energy Systems Using UPFC. Energies 2023, 16, 4362. [Google Scholar] [CrossRef]
  40. Aljohani, M. Enhancing the Performance of Grid-Tied Renewable Power Systems Using an Optimized PI Controller for STATCOM. Int. J. Robot. Control Syst. 2025, 5, 748–762. [Google Scholar] [CrossRef]
  41. Amar, A.; Yusupov, Z. Real-Time Capable MPC-Based Energy Management of Hybrid Microgrid. Processes 2025, 13, 2883. [Google Scholar] [CrossRef]
  42. Behara, R.K.; Saha, A.K. Neural Network Predictive Control for Improved Reliability of Grid-Tied DFIG-Based Wind Energy System under the Three-Phase Fault Condition. Energies 2023, 16, 4881. [Google Scholar] [CrossRef]
  43. Murray, W.; Adonis, M.; Raji, A. Voltage control in future electrical distribution networks. Renew. Sustain. Energy Rev. 2021, 146, 111100. [Google Scholar] [CrossRef]
  44. Lee, D.; Han, C.; Jang, G. Stochastic Analysis-Based Volt–Var Curve of Smart Inverters for Combined Voltage Regulation in Distribution Networks. Energies 2021, 14, 2785. [Google Scholar] [CrossRef]
  45. Akinwola, A.B.; Alkuhayli, A. Hybrid PSO–Reinforcement Learning-Based Adaptive Virtual Inertia Control for Frequency Stability in Multi-Microgrid PV Systems. Electronics 2025, 14, 3349. [Google Scholar] [CrossRef]
  46. Cestero, J.; Delle Femine, C.; Muro, K.S.; Quartulli, M.; Restelli, M. Optimizing energy management of smart grid using reinforcement learning aided by surrogate models built using physics-informed neural networks. Appl. Energy 2025, 401, 126750. [Google Scholar] [CrossRef]
  47. Gao, X.; Zhang, J.; Sun, H.; Liang, Y.; Wei, L.; Yan, C.; Xie, Y. A Review of Voltage Control Studies on Low Voltage Distribution Networks Containing High Penetration Distributed Photovoltaics. Energies 2024, 17, 3058. [Google Scholar] [CrossRef]
  48. Yu, W.; Huawei, G.; Xiaohai, Z.; Quansheng, C.; Peng, Z. DQN-Based Voltage Regulation for Active Distribution Network with Distributed Energy Storage System. In Proceedings of the 2020 International Top-Level Forum on Engineering Science and Technology Development Strategy and The 5th PURPLE MOUNTAIN FORUM (PMF2020), Nanjing, China, 15–16 August 2020; Springer: Singapore, 2021; pp. 749–762. [Google Scholar] [CrossRef]
  49. Kumar Behara, R.; Kumar Saha, A. Deep Q-Network Reinforcement Learning-Based Rotor Side Control System of a Grid Integrated DFIG Wind Energy System Under Variable Wind Speed Conditions. IEEE Access 2024, 12, 184179–184205. [Google Scholar] [CrossRef]
  50. Fournier, J. Modeling, Control and Experimental Validation of a DFIG-Based Wind Turbine Test Bench. Master’s Thesis, Universitat Politecnica de Catalunya, Barcelona, Catalonia Institute for Energy Research (IREC), Barcelona, Spain, 2013. Available online: http://upcommons.upc.edu//handle/2099.1/18983 (accessed on 15 February 2025).
  51. Behara, R.K.; Saha, A.K. Comparative Performance Analysis of Deep Learning-Based Diagnostic and Predictive Models in Grid-Integrated Doubly Fed Induction Generator Wind Turbines. Energies 2025, 18, 4725. [Google Scholar] [CrossRef]
  52. Hamon, C.; Elkington, K.; Ghandhari, M. Doubly-fed induction generator modeling and control in DigSilent PowerFactory. In Proceedings of the 2010 International Conference on Power System Technology, Zhejiang, China, 24–28 October 2010. [Google Scholar] [CrossRef]
  53. Badreldien, M.; Usama, R.; El-wakeel, A.; Abdelaziz, A.Y. Modeling, Analysis and Control of Doubly Fed Induction Generators for Wind Turbines. In Proceedings of the ICEENG Conference ICEENG-9, Cairo, Egypt, 27–29 May 2014; pp. 1–18. [Google Scholar] [CrossRef]
  54. Mazari, S. Control Design and Analysis of Doubly-Fed Induction Generator in Wind Power Application. Master’s Thesis, The University of Alabama, Tuscaloosa, AL, USA, 2009. [Google Scholar]
  55. Junyent-ferré, A.; Gomis-bellmunt, O.; Sumper, A.; Sala, M. Simulation Modelling Practice and Theory Modeling and control of the doubly fed induction generator wind turbine. Simul. Model. Pract. Theory 2010, 18, 1365–1381. [Google Scholar] [CrossRef]
  56. Arnaltes, S.; Rodriguez-Amenedo, J.L.; Montilla-DJesus, R.-A. Control of Variable Speed Wind Turbines with Doubly Fed Asynchronous Generators for Stand-Alone Applications. Energies 2018, 11, 26. [Google Scholar] [CrossRef]
  57. Hamane, B.; Doumbia, M.L.; Bouhamida, A.M.; Benghanem, M. Direct active and reactive power control of DFIG based WECS using PI and sliding mode controllers. In Proceedings of the IECON 2014—40th Annual Conference of the IEEE Industrial Electronics Society, Dallas, TX, USA, 29 October–1 November 2014; pp. 2050–2055. [Google Scholar] [CrossRef]
  58. Alhato, M.M.; Bouallègue, S. Direct Power Control Optimization for Doubly Fed Induction Generator Based Wind Turbine Systems. Math. Comput. Appl. 2019, 24, 77. [Google Scholar] [CrossRef]
  59. Ntuli, K.; Kabeya, M.; Sharma, G. A Comparative Study of Wind Energy Conversion System incorporating the Doubly Fed Induction Generator and the Permanent Magnet Synchronous Generator. Int. Conf. Intell. Innov. Comput. Appl. 2022, 2022, 238–244. [Google Scholar] [CrossRef]
  60. Jiao, T.; Li, Z.; Zhu, Z. Advanced Control Solutions for DFIG Wind Turbines: From Vector to Predictive Control. J. Electr. Syst. 2024, 20, 2289–2322. [Google Scholar] [CrossRef]
  61. Elenga Baningobera, B.; Oleinikova, I.; Uhlen, K.; Pokhrel, B.R. Challenges and solutions in low-inertia power systems with high wind penetration. IET Gener. Transm. Distrib. 2024, 18, 4221–4244. [Google Scholar] [CrossRef]
  62. Ghatak, U.; Mukherjee, V.; Abdelaziz, A.Y.; Abdel Aleem, S.H.E.; Abdel Mageed, H.M. Time-Efficient Load Flow Technique for Radial Distribution Systems with Voltage-Dependent Loads. Int. J. Energy Convers. 2018, 6, 196. [Google Scholar] [CrossRef]
  63. Zhang, Z.; da Silva, F.F.; Guo, Y.; Bak, C.L.; Chen, Z. Double-layer stochastic model predictive voltage control in active distribution networks with high penetration of renewables. Appl. Energy 2021, 302, 117530. [Google Scholar] [CrossRef]
  64. Gao, C. Voltage Control in Distribution Networks using On-Load. Tap. Changer Transformers. Ph.D. Thesis, University of Bath, Bath, UK, 2013; p. 226. [Google Scholar]
  65. Shahsavari, A.; Farajollahi, M.; Stewart, E.; Von Meier, A.; Alvarez, L.; Cortez, E.; Mohsenian-Rad, H. A data-driven analysis of capacitor bank operation at a distribution feeder using micro-PMU data. In Proceedings of the 2017 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Washington, DC, USA, 23–26 April 2017. [Google Scholar] [CrossRef]
  66. Kosuru, R.; Liu, S.; Shi, W. Deep Reinforcement Learning for Stability Enhancement of a Variable Wind Speed DFIG System. Actuators 2022, 11, 203. [Google Scholar] [CrossRef]
  67. Sarkar, M.; Sørensen, M.; Sørensen, P.E.; Hansen, A.D. Impact of different load types on voltage stability of power system considering wind power support. In Proceedings of the 18 Wind Integration Workshop 2019, Dublin, Ireland, 16–18 October 2019; p. 5. Available online: https://www.iea-isgan.org/wp-content/uploads/2020/02/WIW19_258_posterpaper_Sakar_Moumita.pdf (accessed on 15 September 2025).
  68. Li, S.; Zhang, F.; Wu, W.; Hu, W. Data-driven Stochastic Model Predictive Control Method for Voltage and Power Regulation in Active Distribution Networks. In Proceedings of the 2024 IEEE Power & Energy Society General Meeting (PESGM), Seattle, WA, USA, 21–25 July 2024; IEEE: New York, NY, USA, 2024; pp. 1–5. [Google Scholar] [CrossRef]
  69. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with Deep Reinforcement Learning. arXiv 2013, arXiv:1312.5602. [Google Scholar] [CrossRef]
  70. Morales, E.F.; Zaragoza, J.H. An introduction to reinforcement learning. In Decision Theory Models for Applications in Artificial Intelligence: Concepts and Solutions; IGI Global: Hershey, PA, USA, 2011; pp. 63–80. [Google Scholar] [CrossRef]
  71. Zhang, Y.; Wang, X.; Wang, J.; Zhang, Y. Deep Reinforcement Learning Based Volt-VAR Optimization in Smart Distribution Systems. IEEE Trans. Smart Grid 2021, 12, 361–371. [Google Scholar] [CrossRef]
  72. Toubeau, J.-F.; Bakhshideh Zad, B.; Hupez, M.; De Grève, Z.; Vallée, F. Deep Reinforcement Learning-Based Voltage Control to Deal with Model Uncertainties in Distribution Networks. Energies 2020, 13, 3928. [Google Scholar] [CrossRef]
  73. Shi, D.; Zhang, Q.; Hong, M.; Wang, F.; Maslennikov, S.; Luo, X.; Chen, Y. Implementing Deep Reinforcement Learning-Based Grid Voltage Control in Real-World Power Systems: Challenges and Insights. In Proceedings of the 2024 IEEE PES Innovative Smart Grid Technologies Europe (ISGT EUROPE), Dubrovnik, Croatia, 14–17 October 2024. [Google Scholar] [CrossRef]
  74. Vallam Kondu, S.C.; Ravikumar, G. Adaptive Deep Reinforcement Learning for VVC in High DER-Integrated Distribution Grids. In Proceedings of the 2025 IEEE Texas Power and Energy Conference (TPEC), College Station, TX, USA, 10–11 February 2025; IEEE: New York, NY, USA, 2025; pp. 1–6. [Google Scholar] [CrossRef]
  75. Ginzburg-Ganz, E.; Segev, I.; Balabanov, A.; Segev, E.; Kaully Naveh, S.; Machlev, R.; Belikov, J.; Katzir, L.; Keren, S.; Levron, Y. Reinforcement Learning Model-Based and Model-Free Paradigms for Optimal Control Problems in Power Systems: Comprehensive Review and Future Directions. Energies 2024, 17, 5307. [Google Scholar] [CrossRef]
  76. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
  77. Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
Figure 1. Induction machine winding diagram [50].
Figure 1. Induction machine winding diagram [50].
Sustainability 17 11164 g001
Figure 2. Circuit signifying the DFIG’s d-q frame of reference [52].
Figure 2. Circuit signifying the DFIG’s d-q frame of reference [52].
Sustainability 17 11164 g002
Figure 3. GSC Modelling.
Figure 3. GSC Modelling.
Sustainability 17 11164 g003
Figure 4. Methodology Overview.
Figure 4. Methodology Overview.
Sustainability 17 11164 g004
Figure 5. ZIP load model behaviour.
Figure 5. ZIP load model behaviour.
Sustainability 17 11164 g005
Figure 6. Sensitivity of voltage stability to ZIP constant power fraction.
Figure 6. Sensitivity of voltage stability to ZIP constant power fraction.
Sustainability 17 11164 g006
Figure 7. Sensitivity of total losses to ZIP constant power fraction.
Figure 7. Sensitivity of total losses to ZIP constant power fraction.
Sustainability 17 11164 g007
Figure 8. RL-based voltage control framework.
Figure 8. RL-based voltage control framework.
Sustainability 17 11164 g008
Figure 9. The voltage profile under wind variability.
Figure 9. The voltage profile under wind variability.
Sustainability 17 11164 g009
Figure 10. The total active power loss comparison across controllers.
Figure 10. The total active power loss comparison across controllers.
Sustainability 17 11164 g010
Figure 11. The reactive power profile along the feeder.
Figure 11. The reactive power profile along the feeder.
Sustainability 17 11164 g011
Figure 12. The convergence behaviour of different controllers.
Figure 12. The convergence behaviour of different controllers.
Sustainability 17 11164 g012
Figure 13. Sensitivity of voltage regulation to wind speed variability.
Figure 13. Sensitivity of voltage regulation to wind speed variability.
Sustainability 17 11164 g013
Figure 14. Tap and capacitor switching frequency comparison.
Figure 14. Tap and capacitor switching frequency comparison.
Sustainability 17 11164 g014
Figure 15. The temporal voltage evolution at Bus 18.
Figure 15. The temporal voltage evolution at Bus 18.
Sustainability 17 11164 g015
Figure 16. Voltage stability index comparison across control methods.
Figure 16. Voltage stability index comparison across control methods.
Sustainability 17 11164 g016
Table 1. Summary of research gaps in existing voltage regulation studies.
Table 1. Summary of research gaps in existing voltage regulation studies.
Research GapDescriptionLimitations of Existing ApproachesRecent References (2023–2025)
Static and Model-Dependent ControlTraditional OPF, Volt/VAR control, and metaheuristics rely on fixed models and steady-state analysis.Lacks adaptability to stochastic wind and load variations; requires accurate system models and frequent re-optimisation.[6,16,17,18,19,20]
Forecasting-Oriented ML/DL MethodsMost ML/DL research focuses on renewable generation forecasting rather than real-time control.Supervised models cannot autonomously adjust to unseen conditions, as they are limited in their interaction with voltage/reactive power dynamics.[7,8,9,10,11]
Limited RL Applications in Distribution SystemsResearchers primarily apply the RL model to economic dispatch and microgrid energy management.Few studies explore wind-specific voltage control; reward design often ignores power losses and switching costs.[12,21,22,23,24]
Lack of explainability and SafetyRL models often operate as black boxes, lacking interpretability and safety guarantees.Absence of Explainable AI (XAI) and constraint handling reduces operator trust and hinders deployment in live grids.[25,26,27]
Poor Robustness and Scalability ValidationFew works evaluate model robustness under uncertainty or changes in network configuration.Performance may degrade under varying topologies, measurement noise, or unseen disturbances.[27,28]
Table 2. Summary of the key comparative insights across all six control strategies.
Table 2. Summary of the key comparative insights across all six control strategies.
Control MethodKey CharacteristicsAdvantagesLimitationsRepresentative Studies
PI Control (Linear)Classical proportional–integral feedback loop controlling OLTC voltage and reactive power compensation.Simple structure, fast steady-state response, low computational demand.Requires precise tuning; poor adaptability under nonlinear or stochastic conditions; prone to overshoot and oscillation during wind fluctuations.[37,38]
MPC (Nonlinear)Predictive controller minimising voltage deviation and switching penalties using model-based optimisation.Anticipates future states; handles constraints effectively; suitable for short-term forecasting.Relies on accurate linearised models; high computational complexity; reduced performance under rapid stochastic variation.[39,40,41,42]
Rule-Based Control (RBC)Fixed threshold-based heuristic logic for OLTC and capacitor bank operation.Robust, easy to implement, and no model.Lacks adaptivity; causes delayed or oscillatory actions under fast load or wind changes; inefficient switching.[43,44]
PSO-Based ControlMetaheuristic optimisation minimising voltage deviation and power losses via swarm intelligence.Finds near-global optima; good static optimisation performance.Requires re-execution at each operating condition; non-adaptive; unsuitable for real-time dynamic control.[13,15,20]
Hybrid RL–PSO ControlCombines RL’s adaptability with PSO’s optimisation efficiency for improved convergence.Adaptive learning and improved accuracy, outperforming PSO alone in non-stationary environments.Increased computational cost, limited scalability in real-time applications, and slower inference time.[45,46]
Proposed DQN ControlDeep Q-Network with experience replay and target networks for model-free learning and adaptive control.Fully adaptive; minimises voltage deviation and losses; fast convergence; real-time capability; scalable.Requires extensive training; performance depends on the design of the reward function.[47,48,49]
Table 3. System specifications and test scenarios.
Table 3. System specifications and test scenarios.
CategoryParameterValue/SettingSource
Bases S b a s e , V b a s e 100 MVA, 12.66 kVStandard 33-bus
Network equationsPower balance, BFSEquations (37) and (38)
Load modelZIP (per bus)Residential (0.20, 0.30, 0.50);
Industrial (0.40, 0.20, 0.40);
Commercial (0.10, 0.20, 0.70)
Wind farmLocation, ratingBus 18; 0.2–1.2 MW over 4–15 m/s wind
DFIG controlRSC/GSC rolesRSC: (P/Q) control; GSC: DC-link
OLTCStep, range, location(+\−1.25%) per step; (n ∈ [−16,16]); Bus 1
CapacitorsLocations, ratingsBuses 12/25/30; (+\−300 kVAr) each
Solver/controlElectrical step; decision step1 ms; 1 sThis work
Test A (Nominal)Wind & loadMean wind, base loads
Test B (High-wind)WindUpper-quartile wind; check overvoltage
Test C (Heavy load)Load+20% peak; undervoltage stress
Test D (Variability)Wind & loadFast (±10%) fluctuations; noise injected
Table 4. Comparative analysis table summarising all controllers’ performance.
Table 4. Comparative analysis table summarising all controllers’ performance.
Performance MetricLinear Control (PI)Nonlinear Control (MPC)PSO ControlHybrid RL–PSO ControlProposed DQN ControlRemarks
Mean Voltage Deviation Index (VDI)0.045 p.u.0.032 p.u.0.028 p.u.0.021 p.u.0.015 p.u.DQN maintains the tightest voltage regulation across all buses.
Minimum Bus Voltage (p.u.)0.9170.9410.9510.9630.981DQN achieves voltages closest to the nominal value of 1.0 p.u.
Total Active Power Loss (kW)230215185165145DQN yields a ~35–40% reduction in losses compared to PI.
Reactive Power Range (kVAr)−175 to −140−165 to −145−155 to −135−145 to −125−135 to −115DQN delivers superior VAR support along the feeder.
Tap Operation Frequency (counts/day)4838302518DQN minimises OLTC wear by optimal tap scheduling.
Capacitor Operation Frequency (counts/day)3226201610Fewer capacitor switchings with DQN, enhancing device longevity.
Voltage Sensitivity to Wind Variation (∆VDI per 10%)0.0100.0080.0060.0050.004DQN is most robust to wind fluctuations.
Reward Convergence StabilityPoor/FlatModerateGoodVery GoodExcellentDQN achieves smooth, monotonic convergence by ~600 episodes.
Learning Convergence Speed (episodes)N/AN/A400300600 (stable)DQN takes longer but achieves a stable high reward.
Inference Latency (ms)0.28.03.04.01.2DQN operates near real-time; the optimisation solver limits MPC.
Training Time (hours)2.06.010.0One-time offline cost for DQN training.
Overall Performance Rank6th4th3rd2nd1stDQN consistently outperforms others across all metrics.
“N/A” in the Table 4 indicates metrics that are not applicable to the corresponding controllers. Specifically, PI and MPC methods do not involve iterative learning processes or episodic training; therefore, metrics such as learning convergence speed and training time are not defined for these model-based or rule-based controllers.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Behara, R.K.; Saha, A.K. A DQN-Based Intelligent Voltage Control Framework for Enhancing Renewable Integration and Energy Sustainability in Wind-Penetrated Distribution Networks. Sustainability 2025, 17, 11164. https://doi.org/10.3390/su172411164

AMA Style

Behara RK, Saha AK. A DQN-Based Intelligent Voltage Control Framework for Enhancing Renewable Integration and Energy Sustainability in Wind-Penetrated Distribution Networks. Sustainability. 2025; 17(24):11164. https://doi.org/10.3390/su172411164

Chicago/Turabian Style

Behara, Ramesh Kumar, and Akshay Kumar Saha. 2025. "A DQN-Based Intelligent Voltage Control Framework for Enhancing Renewable Integration and Energy Sustainability in Wind-Penetrated Distribution Networks" Sustainability 17, no. 24: 11164. https://doi.org/10.3390/su172411164

APA Style

Behara, R. K., & Saha, A. K. (2025). A DQN-Based Intelligent Voltage Control Framework for Enhancing Renewable Integration and Energy Sustainability in Wind-Penetrated Distribution Networks. Sustainability, 17(24), 11164. https://doi.org/10.3390/su172411164

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop