1. Introduction
In recent years, Renewable Energy Sources (RESs) have gained worldwide attention due to environmental concerns, leading to significant progress in the development of Distributed Energy Resources (DERs) in power grids. Among these, photovoltaic (PV) power generation systems, a representative of RESs, are experiencing a period of rapid growth [
1]. PV systems are installed in not only a relatively small-scale source, such as roof-top PV connected to distribution grids, but also large-scale plants in sub-transmission girds to reduce costs and improve generation efficiency [
2,
3]. Massive PV interconnection leads to voltage rises due to reverse power flow and fast voltage fluctuations caused by changing weather conditions, complicating the maintenance of proper voltage ranges [
4]. Furthermore, increased reverse power flow can cause not only voltage rises, but also voltage drops in the grid [
5]. Consequently, the voltage distribution becomes significantly more complex with the introduction of PV systems.
Conventional voltage control in sub-transmission grids typically relies on devices such as Load Ratio Control Transformers (LRTs) and Static Capacitors (SCs). However, the mechanical operation of these devices slows down the control cycle, and their frequent usage further reduces their operational lifespan. To address these limitations, the reactive power capabilities of Power Conditioning Systems (PCSs) for PV systems have emerged as a promising solution to mitigate voltage fluctuations [
6,
7]. As a type of power electronics device, PCSs offer the advantage of a very fast response time. Consequently, there has been significant research on reactive power control methods for PCSs. In [
8], constant power factor control of the PCS maintained the grid voltages within a proper range in an example grid based on the actual distribution grid in Japan. However, the constant power factor control method cannot handle the voltage fluctuations caused by weather conditions. In [
9], each PCS mitigated voltage fluctuations through reactive power compensation according to the volt-var curve. By updating the control parameters of the volt-var curve given to the PCS in real time, the PCSs are able to adapt to disturbances such as weather conditions and voltage changes at the substation. In [
10], there is also a global standard for the volt-var curve which is a major control method. However, these methods are local control, not optimal control for the whole grid, and require coordination with LRTs and other control devices with different control cycles. Coordinated control of inverter-based devices, such as PCSs, and mechanical devices, such as LRTs, involves a multi-timescale optimization challenge due to their varying control cycles and characteristics [
11,
12]. Methods based on Optimal Power Flow (OPF) can maintain proper voltage levels across the grid by improving coordination between multi-timescale devices, including long-term On-Load Tap Changers (OLTCs) or Capacitor Banks (CBs) and short-term inverter-based devices. However, the effectiveness of these methods depends heavily on the accuracy of the grid model and precise forecasts of RES output and load demand, both of which are difficult to achieve [
13]. Moreover, recalculating and determining an optimal solution in real time to address rapid PV output fluctuations is a significant challenge, as PV output varies much faster than the time required for optimization computations [
13].
Recently, the application of Deep Reinforcement Learning (DRL) to voltage control has gained significant attention as a potential solution to this problem [
14]. DRL-based voltage control methods develop optimal control policies by interacting with a grid model during offline training. Once deployed online, the trained DRL agents act as controllers for voltage control devices, making real-time decisions based on observed conditions. Unlike model-based methods, such as those relying on OPF, DRL-based control is a model-free approach that can operate effectively without requiring an accurate grid model during operation [
15]. In [
16], an Automatic Voltage Control (AVC) method for controlling generator terminal voltage using DRL demonstrated better performance with the Deep Deterministic Policy Gradient (DDPG) algorithm compared to the Deep Q-Network (DQN) algorithm. This highlights that actor–critic DRL methods, such as DDPG, are well suited for continuous-valued control outputs like AVC, as they can directly handle continuous-valued actions. In [
17], a safe off-policy DRL algorithm was proposed to minimize the switching costs of slow-timescale discrete devices while maintaining voltage constraints on an hourly basis. Additionally, in [
18], a voltage control method using Multi-Agent DDPG (MADDPG) with nine PV inverters was introduced, incorporating cooperation between agents through an attention mechanism.
These voltage control frameworks can be broadly categorized into centralized [
16,
17] and decentralized control [
18]. Decentralized control includes local and distributed control and assumes that communication is possible within a specific area [
19]. Because communication costs are lower in decentralized control compared to centralized control, DRL applications have increasingly been explored for multi-timescale voltage control. For example, in [
20], a bi-level DRL-based algorithm for multi-timescale voltage control was proposed. The multi-discrete Soft Actor Critic (SAC) algorithm was used to control long-term discrete devices, while the SAC algorithm was used to control short-term continuous devices. However, this approach relied on centralized control for both long-term and short-term devices. In [
21], a different multi-timescale voltage control method from [
20] was introduced. The proposed method utilized a centralized SAC algorithm for long-term agents with centralized control and a Multi-Agent SAC (MASAC) algorithm for short-term agents with decentralized control. In [
22], the concept of multi-timescale voltage control was further advanced, and a framework was proposed to implement the MASAC algorithm for both long-term and short-term control. As outlined above, multi-timescale voltage control has evolved from centralized to decentralized control to minimize communication costs. However, because the OLTC of the substation has the capability to regulate the voltage of entire lower grids connected to it, it relies heavily on grid-wide voltage information [
22]. In addition, most decentralized control methods rely on voltage information from the surrounding area.
Real-time voltage information is critical for optimal voltage control, as power flow states change dynamically due to load fluctuations, PV output variability, and PCS control. However, acquiring this information in real time incurs high communication costs, even when limited to the surrounding area. To address this challenge, it is essential to develop a framework that further reduces communication costs while enhancing the sophistication of voltage control. To this end, the authors propose a real-time voltage estimation method using Machine Learning (ML) to estimate voltages at all grid nodes based on limited measurable information. Furthermore, the authors also introduce a centralized multi-timescale voltage control method with DRL that uses these real-time estimations for improved performance. This method reduces communication costs while enhancing the sophistication of voltage control. Consequently, developing real-time State Estimation (SE) of the voltage at all nodes using only limited measurable information is essential.
SE in power grids is a critical factor for operational enhancement, and significant research is ongoing in this area. According to the review by [
23], most studies focus on modified SE methods aimed at improving data efficiency. However, as noted in [
24], it is challenging to use smart meter data as real-time observations for SE because these data update slowly (approximately every 15 min) and often involve delays of up to one day for collection. As a result, real-time measurements in power grids are limited, making real-time SE impossible without the use of pseudo-observations. To address this limitation, various methods have been proposed for generating pseudo-observations for load, including statistical approaches and ML techniques. In [
25], a neural network (NN) was employed to enhance the accuracy of load power consumption estimation, which serves as a pseudo-observation value. This approach used three types of information as input: actual load information, weather data, and time information, to predict load power consumptions for the next hour. Research on SE has leveraged ML to accurately estimate pseudo-observations. Building on this, this study aims to estimate voltage as a pseudo-observation value directly and in real-time using ML for real-time voltage control. However, there are few studies on real-time voltage estimation in power grids using only limited real-time information. To address this gap, the authors developed a voltage estimation method using regression trees, a type of ML algorithm, for real-time voltage estimation [
26]. This approach uses regression trees to learn the relationship between node voltages (output data) and measurable input data, such as substation secondary-side bus voltage as well as the active and reactive power supplied to each line, which can be obtained in real time at the substation. In [
27,
28], the authors demonstrated that comprehensive training on assumed PV output and load power data enables accurate voltage estimation under real-time grid conditions. However, previous studies did not consider scenarios where PCS reactive power control for PV systems is implemented. The PCS reactive power output adds complexity to voltage estimation using regression trees. Furthermore, the growing number of PV interconnection points necessitates more accurate voltage estimation while minimizing the size of the training dataset. Recently, Deep Neural Networks (DNNs) have been utilized for estimating pseudo-observations in SE [
25] and for power flow calculations [
29]. DNNs have gained significant attention due to their effectiveness in addressing highly nonlinear problems. Building on this, this study developed a method for estimating voltages at all grid nodes using DNNs, leveraging real-time information measurable at substations to minimize communication costs. This study further proposed a multi-timescale voltage control method that uses DRL and limited measurable information. In this approach, the estimated voltages serve as input data for the DRL agents. By employing ML-based voltage estimation, the proposed method enables centralized control despite the constraints of limited measurable information. This innovation reduces communication costs while enhancing the sophistication of voltage control. Simulations were then conducted to evaluate the accuracy of the voltage estimation method and to assess the effectiveness of the multi-timescale voltage control with DRL based on the estimated voltages.
DRL-based control faces challenges related to the explainability and transparency of its decision-making processes. The “black box” nature of DRL models can be a significant obstacle to their adoption. To address this issue, Explainable AI (XAI) techniques have been developed in recent years to enhance the interpretability of ML models and make their outputs more understandable. The primary goal of XAI is to enable users to better comprehend the behavior of ML models while maintaining their high performance. The application of XAI in the energy field is quite new, beginning around 2020 [
30]. According to [
30], the most common XAI techniques in the energy field are Local Interpretable Model-agnostic Explanations (LIMEs) and Shapley Additive Explanations (SHAPs), both of which are compatible with any ML model. In [
31], XAI techniques, including Explain Like I’m 5 (ELI5), LIME, and SHAP, were applied to solar power forecasting. Among these, SHAP stands out for its ability to provide both global and local interpretability and is the only method offering a complete explanation of model behavior. In [
32], SHAP was also applied to an emergency control scheme for power grids. Few studies in the energy field have explored the interpretability issues of DRLs. However, one notable work [
32] implemented SHAP in a DQN model for load shedding. Using SHAP, the average influence of all features on all actions was calculated as a global explanation, while the impact of individual features on specific data points was visualized and evaluated as a local explanation. However, the outputs of the DQN are Q-values, and actions are selected based on the relative evaluation of these Q-values. Despite this, a detailed analysis considering this fact is lacking. Furthermore, there is limited research on applications of the actor–critic method, which is a major DRL algorithm. The interpretability of actor–critic DRL methods should be explored, as the network structure of actor–critic differs from that of Q-learning DRL methods such as DQN. Additionally, there is sufficient scope to discuss the comparison of action decision criteria with other control methods using SHAP. There are also few examples of SHAP being applied in the field of voltage control, making it crucial to clarify the criteria for action decisions in this field. Therefore, the authors applied the SHAP method to multi-timescale voltage control using DRL and conducted a detailed analysis to elucidate the criteria for action decisions in voltage control. Moreover, the SHAP method reveals how the estimated voltage affects the criteria for action decisions in voltage control.
This study proposes an XAI-based multi-timescale voltage control framework that uses only limited measurable information. Specifically, the framework estimates the voltages of all nodes in the grid in real time using data measured at the substation, incorporates the estimated voltages into the states of each agent, and provides control commands based on the estimated voltages of all nodes. The main contributions are as follows:
Voltage estimation for real-time voltage control: A DNN model was developed using real-time measurable data from substations as input variables, with the voltage of all nodes in the grid as the output. Compared to the conventional voltage estimation method using regression trees, proposed by [
26], the number of training datasets was significantly reduced, and the estimation accuracy improved by optimizing the DNN structure. Unlike [
25], which relies on more extensive data, this method uses only limited real-time measurable information to estimate the voltage values required for voltage control. Simulation studies demonstrate the accuracy of the voltage estimation.
Multi-timescale voltage control using DRLs based on real-time measurable data combined with voltage estimation: This framework leveraged real-time voltage estimation to enable each agent to make coordinated decisions that consider the voltage conditions across the grid. Unlike conventional methods that rely on partial observation, this approach optimizes voltage control for the entire grid with lower communication costs. The multi-timescale control strategy, trained using DRL algorithms, ensures optimal coordination between control actions at different timescales. Specifically, the reward functions are designed to ensure effective coordination between long-term LRT adjustments and short-term PCS operations. By combining ML techniques for both voltage estimation and control, this model-free approach enables real-time implementation. Simulation studies demonstrate that the proposed framework achieves comprehensive voltage control across the grid, effectively addressing challenges that conventional control methods, limited by partial observation, cannot address.
Application of XAI to multi-timescale voltage control with DRL: XAI methodologies were applied to understand the factors influencing voltage control with DRL. Specifically, the importance of both the LRT and PCS agents in the proposed voltage control method was visualized, and both global and local explanations were provided. For the local explanation, the authors used a sample of voltage deviations from a benchmark voltage control method with DRLs for analysis, highlighting the effectiveness of the proposed method.
4. Conclusions
In this study, the authors proposed an XAI-based multi-timescale voltage control framework that utilizes only limited measurable information to achieve real-time grid-wide voltage control. Specifically, a method to estimate the voltage at all nodes in the grid using real-time measurements available at the substation was developed. This estimated voltage was then incorporated into each agent’s observed state, allowing for a comprehensive voltage control strategy based on the estimated voltage of the entire grid.
The proposed “ALL Model”, which consolidates all nodes into a single DNN model, achieved remarkable accuracy in voltage estimation. It showed notable improvements in reducing RMSE and maximum estimation errors across the test dataset. In particular, the maximum estimation error was 0.0024 p.u., indicating that this estimation accuracy does not affect the voltage control. The DNN ALL Model effectively captured nonlinear relationships, substantially outperforming the regression tree approach. The voltage estimation method demonstrated high accuracy, particularly in challenging places such as the end nodes of line A, where the estimation errors were notably reduced. In terms of voltage control, the proposed method successfully maintained proper voltage across all nodes while reducing the frequency of tap changes in response to fluctuating PV output. The integration of voltage estimation allowed for comprehensive grid monitoring and control, even in nodes traditionally difficult to regulate using benchmark methods. Notably, the voltage deviation observed in the benchmark model at node A8 was mitigated, highlighting the effectiveness of the proposed model’s policy. The proposed voltage control method had low communication costs because it uses only limited measurable information, and this method coordinates voltage regulators that are on different timescales in the grid to maintain proper voltage by implementing control based on estimated voltage.
Furthermore, the use of SHAP values offered valuable insights into the decision-making process of the DRL agent. By evaluating feature importance at both global and local levels, the analysis revealed key factors influencing control decisions, including tap position, PV active power output, and PCS reactive power output. The results demonstrated that the DNN-based model effectively utilized voltage information from other nodes, facilitating coordinated control across different lines. Additionally, the SHAP analysis provided an explanation for the effectiveness of the proposed method in maintaining proper voltage, in contrast to the benchmark model. The SHAP values offered a transparent mechanism for understanding the model’s decision-making process, providing important guidance for future improvements in the model.
In conclusion, the proposed DRL-based voltage control framework with DNN-based voltage estimation offered a significant advancement in grid voltage control by reducing deviations and enhancing accuracy. And, combined with SHAP-based interpretability, it provided a transparent mechanism for decision-making insights.
In the future, the proposed framework should be validated on larger and more complex grid models that more closely resemble real-world systems. This would allow for an assessment of the method’s scalability and robustness under more realistic conditions. Additionally, while this study utilized SHAP analysis to evaluate and explain the model’s performance, future research should focus on leveraging insights gained from SHAP values to further refine and improve the model. By incorporating these insights, it may be possible to enhance the model’s decision-making capabilities and achieve even better voltage control performance. These advancements will contribute to the continuous development of intelligent voltage control systems that are both practical and reliable for modern power grids.