A Model-Free Deep Reinforcement Learning-Based Approach for Assessment of Real-Time PV Hosting Capacity

: Assessments of the hosting capacity of electricity distribution networks are of paramount importance, as they facilitate the seamless integration of rooftop photovoltaic systems into the grid, accelerating the transition towards a more carbon neutral and sustainable system. This paper employs a deep reinforcement learning-based approach to evaluate the real-time hosting capacity of low voltage distribution networks in a model-free manner. The proposed approach only requires real-time customer voltage data and solar irradiation data to provide a fast and accurate estimate of real-time hosting capacity at each customer connection point. This study addresses the imperative for accurate electrical models, which are frequently unavailable, in evaluating the hosting capacity of electricity distribution networks. To meet this challenge, the proposed approach utilizes a deep neural network-based, data-driven model of a low-voltage electricity distribution network. This proposed methodology incorporates model-free elements, enhancing its adaptability and robustness. In addition, a comparative analysis between model-based and model-free hosting capacity assessment methods is presented, highlighting their respective strengths and weaknesses. The utilization of the proposed hosting capacity estimation model enables distribution network service providers to make well-informed decisions regarding grid planning, leading to cost minimization.


Introduction
Rapid integration of rooftop photovoltaic (PV) systems into low-voltage (LV) electricity distribution networks has raised concerns among distribution network service providers (DNSPs) due to its impact on power quality and reliability.High levels of PV penetration without proper consideration of integration planning may lead to adversities such as over-voltage, unnecessary curtailments, high imbalance, and thermal overloading of network elements.
The hosting capacity (HC) of an electricity distribution network may be defined as the maximum distributed generation that can be safely and reliably integrated without causing any adverse impacts to the grid.Undertaking a HC assessment allows DNSPs to plan investments more efficiently and integrate more distributed energy resources (DERs) into the grid, ensuring a cost-effective grid expansion.Traditional HC assessment strategies are commonly classified into three groups: deterministic methods, stochastic methods, and quasi-static time-series (QSTS) methods [1,2].Deterministic methods require lower computational complexity and provide a fast estimate of HC with less accuracy [3].Stochastic methods utilize probabilistic power flow to model uncertainties in the LV distribution network and quantify the HC [4].QSTS methods utilize a series of steady-state power flows to accurately analyze the HC of distribution networks [5].In contrast to other methods, QSTS simulations offer a partial representation of the system dynamics associated

•
Development of a DNN-based surrogate model to perform voltage calculations using smart meter data, integrating the model-free aspects in the proposed methodology.

•
Evaluation of the real-time HC using the SAC algorithm.The proposed approach only requires real-time customer voltages and solar irradiation data to provide a fast and accurate estimate of real-time HC at each customer connection point.• A comparative analysis is presented between the model-based and model-free HC assessments, highlighting advantages and disadvantages of both approaches.

Problem Formulation 2.1. System Model and Constraints
Voltage calculations at customer connection points (CCPs) using power flow or equivalent methods are at the heart of most well-recognized HC assessment strategies.Determining the likely voltages to occur at the CCPs can assist in electricity distribution network planning such as identifying the feasibility of new connection requests, management of DERs and evaluating the impact of grid augmentations.The conventional method for voltage calculations in electricity distribution networks is via power flow analysis, which relies on an accurate model of the network that accounts for the complex network topology.
Provided that LV distribution network data are readily available, power flow calculations can be easily performed using appropriate software.The HC assessment presented in this study utilizes a QSTS simulation that accounts for the temporal variability of the electricity distribution network by considering changing load patterns and uncertain weather conditions (solar irradiation).Historical time-series data gathered from smart meters for customer load active power (P load ), customer load reactive power (Q load ), and voltage at CCP (V CCP ) are utilized in the proposed HC assessment.The voltage constraints consid- ered are based on the AS/NZS 4777.2:2020 standard [15] which states that the active power export must cease if the local voltage exceeds 258 V.

Surrogate Model of the Network
Due to the presence of a vast number of LV networks, the topological and electrical component data of every LV distribution network is not always readily available to DNSPs.The absence of an accurate LV network model presents a significant challenge when undertaking HC assessment.However, the widespread deployment of smart meters presents an interesting opportunity to utilize P load , Q load , and V CCP measurements at the customer level to develop equivalent LV feeder models.The study presented in this paper utilizes a DNN-based surrogate model to approximate voltages at the CCPs.The proposed method for voltage estimation is a model-free approach that only relies on historical smart meter data and does not require an electrical model that represents the intricate details of the distribution network.A graphical illustration of the proposed DNN-based surrogate model of the LV distribution network is given in Figure 1.
Energies 2024, 17, x FOR PEER REVIEW 3 of 12 which relies on an accurate model of the network that accounts for the complex network topology.Provided that LV distribution network data are readily available, power flow calculations can be easily performed using appropriate software.The HC assessment presented in this study utilizes a QSTS simulation that accounts for the temporal variability of the electricity distribution network by considering changing load patterns and uncertain weather conditions (solar irradiation).Historical time-series data gathered from smart meters for customer load active power (  ), customer load reactive power (  ), and voltage at CCP (  ) are utilized in the proposed HC assessment.The voltage constraints considered are based on the AS/NZS 4777.2:2020 standard [15] which states that the active power export must cease if the local voltage exceeds 258 V.

Surrogate Model of the Network
Due to the presence of a vast number of LV networks, the topological and electrical component data of every LV distribution network is not always readily available to DNSPs.The absence of an accurate LV network model presents a significant challenge when undertaking HC assessment.However, the widespread deployment of smart meters presents an interesting opportunity to utilize   ,   , and   measurements at the customer level to develop equivalent LV feeder models.The study presented in this paper utilizes a DNN-based surrogate model to approximate voltages at the CCPs.The proposed method for voltage estimation is a model-free approach that only relies on historical smart meter data and does not require an electrical model that represents the intricate details of the distribution network.A graphical illustration of the proposed DNN-based surrogate model of the LV distribution network is given in Figure 1.The surrogate model is a nonlinear regression method that maps the active power exports ( , ), active power load ( , ), and reactive power load ( , ) of  number of customers to their respective voltage at the CCP ( , ) where,  = 1,2,3, … , .The input layer of the DNN requires three inputs {  ,   ,   } for each  number of customers, resulting in 3 total number of inputs.The output layer of the DNN provides  number of outputs for the voltage   of each customer.The relationship between the inputs  , and the output   of a single neurone  is given in (1).The surrogate model is a nonlinear regression method that maps the active power exports P gen,i , active power load (P load,i ), and reactive power load (Q load,i ) of N number of customers to their respective voltage at the CCP (V CCP,i ) where, i = 1, 2, 3, . . ., N. The input layer of the DNN requires three inputs P gen , P load , Q load for each N number of customers, resulting in 3N total number of inputs.The output layer of the DNN provides N number of outputs for the voltage V CCP of each customer.The relationship between the inputs x l,k and the output O l of a single neurone l is given in (1).
Energies 2024, 17, 2075 where k represents the inputs from other neurones, w l,k are the corresponding weights for each input, F l is the activation function, and b is the bias term.The activation function introduces nonlinearities to the DNN and ensures that the neurone response is bounded, determining the threshold at which a neurone activates.For this study, a rectified linear unit (ReLU) activation function with a range of [0, 1] is used for all the hidden lay- ers.The bias term is a learnable parameter that allows neurones to have individual response characteristics offsetting the neurone output.The relationship between DNN input {P gen , P load , Q load } and each output V CCP,i is described in (2).
where Z i,k P gen , P load , Q load is the hierarchical transformation of the input data through multiple hidden layer nonlinear mappings.The surrogate model is trained in a supervised manner and the network parameters θ (i.e., weights and biases) are updated using stochastic gradient descent by minimizing the loss function L(θ).During each epoch, the loss is calculated according to mean square error as given in (3) using a sampled batch B from the training data.
where V n is an instance of CCP voltages (V CCP,i ) and Vn P gen,n , P load,n , Q load,n is the voltage estimated by the surrogate model.The old network parameters θ old are updated to θ new by applying stochastic gradient descent as given in (4).
where α is the hyperparameter for the learning rate that regulates the rate at which the network parameters are updated.For this study, the learning rate of 0.001 is used.

Hosting Capacity Assessment Framework
The real-time HC of electricity distribution networks is directly impacted by the instantaneous voltage observed at CCPs and the solar irradiation levels that govern the current active power exports of DERs.The proposed HC assessment framework utilizes the SAC deep reinforcement learning algorithm to approximate the real-time HC based on the real-time observations of voltage levels at CCPs and solar irradiation levels.The exceptional performance of the SAC algorithm has garnered significant attention among researchers and emerged as a favored option for tackling complex tasks that require continuous actions in deep reinforcement learning.SAC leverages the Markov decision process (MDP) to formalize sequential decision making in reinforcement learning tasks.The following sections describe the formulation of the MDP and the application of the SAC algorithm to estimate the real-time HC.

Formulation of Markov Decision Process
In this study, the assessment of HC is formulated as a MDP with infinite time steps.MDP is a powerful mathematical framework designed to formalize the sequential decisionmaking process of a decision maker, otherwise known as an agent, in an uncertain environment while adhering to the Markov property.The key components of a MDP can be described as a 5-tuple {S, A, P, R, γ}, where: S is the state space (s ∈ S), which is a comprehensive set encompassing all feasible conditions (states) of an environment accessible to the decision maker; A is the action space (a ∈ A) that defines the entirety of permissible decisions (actions) that an agent can take in the environment; P : S × A × S → R + represents the transition probability function that determines the conditional probability of transitioning to a new state considering the current state and action; R : S × A × S → R represents the reward function that evaluates the agent's performance with a numerical Energies 2024, 17, 2075 5 of 12 reward based on the action executed in a particular state; and the variable γ represents the discount factor γ ∈ (0, 1).
At each time step t = {0, 1, . . . ,T}, given the state of the environment s t , the agent takes an action a t by interacting with the environment and receiving an immediate reward r t .Consequently, the environment is then transitioned into its next state s t+1 .The policy function π governs the sequential decision-making process of an agent, dictating the agent behaviour in the environment.Given a particular state (s ∈ S), a stochastic policy presents a probability distribution π(a|s), including all feasible actions (a ∈ A) the agent can undertake.The goal of an agent in reinforcement learning is to maximize its discounted cumulative reward R(s t , a t ) = r t + γr t+1 + • • • + γ T−t r T by optimizing the policy π.In reinforcement learning, the action value function Q π (s, a) is utilized to evaluate the policy π amidst uncertain environment transition dynamics and undertake improvements to achieve an optimal policy.According to the Markov property and leveraging the Bellman theorem, the action value function is derived in (5).
where s ′ is the next state and a ′ is the next action.
The formulation of the HC quantification problem as a MDP is detailed as follows.
• Environment: the environment that the agent interacts with is the actual LV distribution network.

•
Agent: the agent is the controller that estimates the rated capacity (S PV,i ) of the customer PV inverters.

•
State: the state of the environment at time t consists of two observations (V CCP,i , GHI i ), where GH I i is the global horizontal irradiation at customer i.

•
Action: the action that an agent takes is the estimated real-time HC of each customer i, denoted by the rated capacity (S PV,i ).To reduce the search space and pre- vent the (S PV,i ) estimates of the SAC algorithm reaching unrealistically high values during periods of low solar irradiation, action is clipped between 0 and MaxHC, a ← clip((S PV,i ), 0, MaxHC) , where MaxHC is the upper limit for PV capacity that is unlikely to be achieved during periods of high solar irradiation.

•
Reward Function: the immediate reward r t that an agent receives for taking an action a t at state s t while satisfying voltage constraints is given in (6).
If voltage constraints are violated at any CCP, the reward (r t ) is assigned the penalty value, which is a significantly high negative integer.The Markov property stipulates that the future states and actions are determined exclusively by the current state and action, rendering historical states and actions of an agent irrelevant for predicting future outcomes.As a result, the modeling process becomes more streamlined and enables the use of the SAC algorithm for determining optimal policies that maximize expected rewards.

Soft Actor-Critic Algorithm
Soft actor-critic is an off-policy algorithm that optimizes a stochastic policy utilizing an actor-critic framework.The actor represents the stochastic policy π(•|s ), which is a probability distribution over actions for a given state.The critic represents the action value function Q π (s, a) that provides an estimate of the expected cumulative reward.An off-policy algorithm updates its current policy from experience samples generated by a different policy, which leads to fewer interactions with the environment and an improved sample efficiency.Through the years, SAC has undergone several iterations and enhancements [10,11].However, for this study the SAC algorithm follows the architecture presented in [10], which utilizes a total of five feed-forward neural networks that include one actor network (π ϕ ), two critic networks Q θ 1 and Q θ 2 , and two target critic networks Q θ ′ The proposed framework for HC assessment using a SAC agent is illustrated in Figure 2. (  1 ′ and   2 ′ ).The proposed framework for HC assessment using a SAC agent is illustrated in Figure 2. The key feature of the SAC algorithm is entropy regularization, which is designed to encourage exploration and regulate the exploitation-exploration trade-off during the learning process.The entropy ((• |)) of an agent's policy represents the randomness of the agent's actions as given in (7).A high entropy implies a more exploratory policy with less exploitation, while a low entropy implies a more deterministic policy with less exploration.The Bellman equation for entropy regularized action value function   (, ) for the SAC algorithm is given in (8).
SAC employs two critic functions (  1 ,   2 ) and uses the minimum of the two critics for the policy updates; this reduces the overestimation bias and improves the learning stability.The target networks (  1 ′ ,   2 ′ ) of SAC facilitate the generation of more stable and reliable value estimates during the learning process.The training process of the SAC algorithm is summarized in Algorithm 1, which further elaborates the process of network parameter updates.It should be noted that the reparameterization trick is not used in the proposed SAC algorithm to reduce further additions of complexity to the HC quantification problem since the current algorithm already displays excellent performance.For the HC assessment, the learning rates used for the actor and the critic networks were 0.001 and 0.002 respectively.Five hidden layers were used for all actor and critic networks consisting of [256, 512, 1024, 512, 256] nodes.A fixed entropy coefficient of  = 0.2 and a batch size of 750 samples were used in the final SAC design, which were optimized by undertaking a sensitivity analysis.The key feature of the SAC algorithm is entropy regularization, which is designed to encourage exploration and regulate the exploitation-exploration trade-off during the learning process.The entropy H(π(•|s )) of an agent's policy represents the randomness of the agent's actions as given in (7).A high entropy implies a more exploratory policy with less exploitation, while a low entropy implies a more deterministic policy with less exploration.The Bellman equation for entropy regularized action value function Q π (s, a) for the SAC algorithm is given in (8).
SAC employs two critic functions Q θ 1 , Q θ 2 and uses the minimum of the two critics for the policy updates; this reduces the overestimation bias and improves the learning stability.The target networks of SAC facilitate the generation of more stable and reliable value estimates during the learning process.The training process of the SAC algorithm is summarized in Algorithm 1, which further elaborates the process of network parameter updates.It should be noted that the reparameterization trick is not used in the proposed SAC algorithm to reduce further additions of complexity to the HC quantification problem since the current algorithm already displays excellent performance.For the HC assessment, the learning rates used for the actor and the critic networks were 0.001 and 0.002 respectively.Five hidden layers were used for all actor and critic networks consisting of [256, 512, 1024, 512, 256] nodes.A fixed entropy coefficient of α = 0.2 and a batch size of 750 samples were used in the final SAC design, which were optimized by undertaking a sensitivity analysis.Compute the target y(r, s ′ ) for the critic network updates Update critics Q θ 1 and Q θ 2 by gradient decent using: Update the policy ϕ by gradient accent using: Update target networks with ρ ≪ 1: end if 16: end for

Experimental Setup
A single-line diagram of the developed DIgSILENT PowerFactory LV feeder model, which consists of 28 customer connections, is given in Figure 3.The MV segment of the distribution network is represented as a Thevenin-equivalent model with a voltage source and a series impedance.The main feeders are 3-phase with a neutral conductor and the service feeders that ties the main 3-phase busbars and CCPs are single-phase with a neutral.The selected real-world LV network for the numerical study displays a significant level of phase unbalance, which is a typical feature of most LV networks.Operational constraints and the electrical characteristics of the modeled network are detailed in Table 1.

Surrogate Model Performance Evaluation
All the proposed DNN-based models in this paper are implemented using Tensor-Flow 2, which provides a high-level API and simplifies the process of deploying deep learning models.The DNN-based surrogate model is trained for 3000 epochs with a batch size of 48 to capture the complex mapping relationship between the inputs (  ,   ,   ) and output (  ) .As illustrated in Figure 1 and considering that there are 28 customer connections in the LV network, the input layer of the surrogate model consists of 28 × 3 = 84 nodes.The hidden layers consist of five fully connected dense layers of size [256, 512, 1024, 512, 256] and the output layer consists of 28 nodes providing   of each customer.Hyperparameters of the surrogate model, i.e., hidden layer size, batch size, and learning rate, were optimized by conducting sensitivity analysis.
The training process of the surrogate model follows the methodology described in Section 2.2 and, upon completion of the training iterations, the performance of the trained model must be evaluated.Data Set 2 is used for the performance evaluation, which con-  For the HC assessment, a 100% PV penetration scenario is considered, and each customer is given the opportunity to make active power exports to the network.Different data sets were used in the numerical study consisting of historical smart meter data, which are detailed in Table 2. Data Set 1 and Data Set 2 represent yearly time-series data that capture all diverse seasonal variations and are excellent for training and evaluation of the proposed DNN-based models.Data Set 3 consists of high-resolution time-series data for a single day and is ideal for the HC assessment and guarantees more accurate results.

Surrogate Model Performance Evaluation
All the proposed DNN-based models in this paper are implemented using TensorFlow 2, which provides a high-level API and simplifies the process of deploying deep learning models.The DNN-based surrogate model is trained for 3000 epochs with a batch size of 48 to capture the complex mapping relationship between the inputs (P gen , P load , Q load ) and output (V CCP ).As illustrated in Figure 1 and considering that there are 28 customer connec- tions in the LV network, the input layer of the surrogate model consists of 28 × 3 = 84 nodes.The hidden layers consist of five fully connected dense layers of size [256,512,1024,512,256] and the output layer consists of 28 nodes providing V CCP of each customer.Hyperparameters of the surrogate model, i.e., hidden layer size, batch size, and learning rate, were optimized by conducting sensitivity analysis.
The training process of the surrogate model follows the methodology described in Section 2.2 and, upon completion of the training iterations, the performance of the trained model must be evaluated.Data Set 2 is used for the performance evaluation, which consists of completely different samples of data from that of Data Set 1, which was used for training.Voltage deviation is used as the metric to evaluate the surrogate model, which is defined as V target (i, t) − V CCP (i, t) for each customer i at time step t, where V target is the actual smart meter voltage at CCP according to Data Set 2. The voltage deviations of the surrogate model calculated for all customers of the LV network are illustrated as violin plots in Figure 4. Based on the calculated voltage deviation results, the maximum voltage deviation is identified to be ±3 V across all the customers.Therefore, it can be concluded that the trained surrogate model is capable of delivering accurate estimates of the voltages at the CCPs.

Hosting Capacity Assessment Results
The LV network PowerFactory model and the surrogate model were used to train two distinct SAC agents (model-based SAC and model-free SAC, respectively) for the HC assessments.The design and the hyperparameters of the two agents are identical except for the environment that they interact with.Data Set 1 is used for the training of both SAC agents and for each time step, 12 episodes were considered, resulting in a total of 5760 × 12 = 69,120 training episodes.The learning curves of the model-free and model-based SAC agents are illustrated in Figure 5.Both agents converge to a similar reward and display similar learning efficiency, which is consistent with the fact that they are of similar design.Minor deviations in the two learning curves can be explained by the disparity between the PowerFactory model and the surrogate model.Upon the completion of training, the network parameters of both SAC agents were saved and utilized for the HC assessment.Data Set 3 was utilized to undertake two high-resolution QSTS simulations and analyze the real-time HC of the LV distribution network by each SAC agent.The trained SAC agents estimate the real-time HC for each customer within milliseconds using just customer voltage  , from smart meter measurements and live solar irradiation   data

Hosting Capacity Assessment Results
The LV network PowerFactory model and the surrogate model were used to train two distinct SAC agents (model-based SAC and model-free SAC, respectively) for the HC assessments.The design and the hyperparameters of the two agents are identical except for the environment that they interact with.Data Set 1 is used for the training of both SAC agents and for each time step, 12 episodes were considered, resulting in a total of 5760 × 12 = 69,120 training episodes.The learning curves of the model-free and model-based SAC agents are illustrated in Figure 5.Both agents converge to a similar reward and display similar learning efficiency, which is consistent with the fact that they are of similar design.Minor deviations in the two learning curves can be explained by the disparity between the PowerFactory model and the surrogate model.Upon the completion of training, the network parameters of both SAC agents were saved and utilized for the HC assessment.

Hosting Capacity Assessment Results
The LV network PowerFactory model and the surrogate model were used to train two distinct SAC agents (model-based SAC and model-free SAC, respectively) for the HC assessments.The design and the hyperparameters of the two agents are identical except for the environment that they interact with.Data Set 1 is used for the training of both SAC agents and for each time step, 12 episodes were considered, resulting in a total of 5760 × 12 = 69,120 training episodes.The learning curves of the model-free and model-based SAC agents are illustrated in Figure 5.Both agents converge to a similar reward and display similar learning efficiency, which is consistent with the fact that they are of similar design.Minor deviations in the two learning curves can be explained by the disparity between the PowerFactory model and the surrogate model.Upon the completion of training, the network parameters of both SAC agents were saved and utilized for the HC assessment. of HC between zero and  (where  = 50  for this numerical study and is an arbitrary value, as detailed in Section 3.1) during the periods around sunrise and sunset when  values are close to or at zero.To ensure fairness between customers for active power exports, the HC estimates of customers by the SAC agent are clipped between ±10% of the mean hosting capacity among customers at each instance of time.This characteristic is evident in Figure 6, where the HC range across all customers does not vary by more than ±10% at any given point in time.Considering the model-based HC values as the benchmark, it is evident from the results that the model-free SAC agent slightly overestimates the real-time HC during periods of high .However, overall results indicate that the quantified model-free HC values and the model-based HC values are more or less similar to each other.

Discussion
The proposed method for real-time HC evaluation is superior to traditional HC evaluation methods in different aspects.The dynamic and adaptive nature of the proposed real-time HC evaluation strategy enables DNSPs to make informed decisions related to grid planning and expansions while responding to grid constraints.To evaluate the realtime HC more accurately using the proposed model-free SAC algorithm with a surrogate model, the actual LV distribution network needs to exhibit some level of PV penetration at the current stage.Since the proposed DNN-based surrogate model features as a regression model, minimal PV penetration levels result in sparse training data representing active power exports and ultimately lead to suboptimal mapping of active and reactive powers to voltages.Based on the sensitivity analysis conducted using a surrogate model for HC evaluation, a minimum of roughly 30% PV penetration should exist in the current LV distribution network to yield accurate results.
Model-based high-resolution QSTS simulations generally take a significant amount of time for the simulations to complete.This is mainly due to the time taken for the power flow calculation itself and the time delay caused by the data transfer between the power It should be noted that HC is evaluated as the maximum allowed PV rating for each customer installation.In Figure 6, HC is defined only for durations when GH I is present, with HC assigned a value of zero during nighttime periods.This explains the fluctuation of HC between zero and MaxHC (where MaxHC = 50 kVA for this numerical study and is an arbitrary value, as detailed in Section 3.1) during the periods around sunrise and sunset when GH I values are close to or at zero.To ensure fairness between customers for active power exports, the HC estimates of customers by the SAC agent are clipped between ±10% of the mean hosting capacity among customers at each instance of time.This characteristic is evident in Figure 6, where the HC range across all customers does not vary by more than ±10% at any given point in time.Considering the model-based HC values as the benchmark, it is evident from the results that the model-free SAC agent slightly overestimates the real-time HC during periods of high GH I.However, overall results indicate that the quantified model-free HC values and the model-based HC values are more or less similar to each other.

Discussion
The proposed method for real-time HC evaluation is superior to traditional HC evaluation methods in different aspects.The dynamic and adaptive nature of the proposed real-time HC evaluation strategy enables DNSPs to make informed decisions related to grid planning and expansions while responding to grid constraints.To evaluate the real-time HC more accurately using the proposed model-free SAC algorithm with a surrogate model, the actual LV distribution network needs to exhibit some level of PV penetration at the current stage.Since the proposed DNN-based surrogate model features as a regression model, minimal PV penetration levels result in sparse training data representing active power exports and ultimately lead to suboptimal mapping of active and reactive powers to voltages.Based on the sensitivity analysis conducted using a surrogate model for HC evaluation, a minimum of roughly 30% PV penetration should exist in the current LV distribution network to yield accurate results.
Model-based high-resolution QSTS simulations generally take a significant amount of time for the simulations to complete.This is mainly due to the time taken for the power flow calculation itself and the time delay caused by the data transfer between the power flow software and the scripting software.The use of a DNN-based surrogate model bypasses this time delay and significantly reduces the simulation time of the QSTS simulations.After the training of SAC agents, any persisting exploratory actions of the SAC algorithm due to entropy regulation may result in slight errors in the quantified HC.However, this error can be negated by using SAC as a deterministic agent for the HC assessment after training by assigning the entropy coefficient to α = 0. Overall, SAC is a powerful algorithm that is less sensitive to hyperparameters and delivers exceptional performance in high-dimensional and continuous action spaces.

Conclusions
An electrical model of a real-world 3-phase LV distribution network was developed and a DNN-based surrogate model of the same LV network was designed and its performance was evaluated.A model-based SAC agent and a model-free SAC agent were trained using the electrical model and the DNN-based surrogate model, respectively.In this paper, the real-time HC of the LV distribution network is evaluated using both the trained modelbased SAC agent and the model-free SAC agent.Furthermore, a comparative analysis is presented between the proposed model-based and model-free HC assessments.The experimental results demonstrate the excellent performance of the proposed real-time HC quantification strategy.
The proposed methodology represents a notable advancement over traditional HC quantification methods, which typically yield static estimates.By contrast, the proposed approach utilizes trained neural networks to provide HC estimates within milliseconds, eliminating the need for lengthy calculations inherent in traditional methods.This methodology leverages artificial intelligence and machine learning to enable the application of advanced algorithms capable of more effectively addressing complex, nonlinear, and nonconvex optimization challenges compared to conventional techniques.
Future work entails the extension of the presented HC quantification methodology as an advanced coordinated control strategy to regulate the dispatched active and reactive power of customer PV systems and enhance the overall HC of the grid.Further investigation will be conducted to develop a more precise surrogate model of the LV distribution network capable of adjusting to network variations without necessitating significant changes to the neural network architecture or requiring extensive retraining.

Figure 1 .
Figure 1.DNN-based surrogate model of the LV distribution network.

Figure 1 .
Figure 1.DNN-based surrogate model of the LV distribution network.

Figure 2 .
Figure 2. Diagrammatic representation of the proposed real-time HC assessment framework.

Figure 2 .
Figure 2. Diagrammatic representation of the proposed real-time HC assessment framework.

Figure 3 .
Figure 3. Single-line diagram of the modeled LV distribution network.

Figure 3 .
Figure 3. Single-line diagram of the modeled LV distribution network.

Figure 4 .
Figure 4. Voltage deviation of the LV network customer voltages evaluated by the surrogate model.

Figure 5 .
Figure 5. Learning curve of the model-free and model-based SAC agents.

Figure 4 .
Figure 4. Voltage deviation of the LV network customer voltages evaluated by the surrogate model.

Figure 4 .
Figure 4. Voltage deviation of the LV network customer voltages evaluated by the surrogate model.

Figure 5 .Figure 5 .
Figure 5. Learning curve of the model-free and model-based SAC agents.Data Set 3 was utilized to undertake two high-resolution QSTS simulations and analyze the real-time HC of the LV distribution network by each SAC agent.The trained SAC agents estimate the real-time HC for each customer within milliseconds using just customer voltage  , from smart meter measurements and live solar irradiation   data

Figure 6 .
Figure 6.Average real-time HC across customers in the LV distribution network.

Figure 6 .
Figure 6.Average real-time HC across customers in the LV distribution network.

Table 1 .
LV network electrical characteristics and constraints.

Table 2 .
Data sets used in the numerical study.

Table 1 .
LV network electrical characteristics and constraints.

Table 2 .
Data sets used in the numerical study.