1. Introduction
Over the last decade, deployment of behind-the-metre (BTM) rooftop solar photovoltaics (PVs) in electricity distribution networks (DNs) has shown a significant growth across the world [
1]. Australia in particular has exhibited a remarkable advancement with a total installed rooftop solar PV capacity of 24.4 GW by the midpoint of 2024, making rooftop solar the second-largest source of renewable energy generation, just behind wind energy, and contributing 11.3% of total Australian electricity energy generation in the first half of the same year [
2]. New South Wales has maintained the record for highest annual rooftop solar PV installed capacity of any Australian State with 454 MW of new installations in the first half of 2024. Average system size has also grown notably to 9.9 kW in June 2024, from an average of 7.4 kW five years ago, and 4.3 kW a decade ago [
2].
This increased integration of rooftop solar PVs to DNs results in a range of technical complications including violation of voltage limits, curtailment of existing solar PVs, maloperation of protection systems, and issues with power quality, among which voltage limit violations are considered as the most prominent issue [
3]. Thus, voltage regulation is of paramount importance for addressing voltage-related complications and ensuring uninterrupted operation of modern power systems [
4,
5]. Consequently, compliance with established voltage regulation standards is essential to maintain secure and reliable operation of electricity systems as rooftop solar PV penetration continues to increase in DNs. Example low voltage (LV) regulation standards currently in place internationally are provided in
Table 1 [
6].
While this study is applicable to any LV standard maintained in the world, the context of Australia is considered as the example case. Nominal system voltage, the appropriate estimated voltage value used to designate or identify an Australian LV DN is 230 V. Australian Standard
AS 61000.3.100 [
7] presents statistical limits for nominal system voltage with 1st percentile (V
1%) of −6% and 99th percentile (V
99%) of +10%, where voltage percentile refers to the voltage value below which
x% of measurements fall over the period of a survey. Accordingly, the steady state voltage limits to be maintained at the connection point of a customer are 216 V (230 V − 6%) minimum and 253 V (230 V + 10%) maximum. As per Australian Standard
AS IEC 60038 [
8], it should be maintained within ±10% of nominal system voltage corresponding to a minimum of 207 V (230 V − 10%) and a maximum of 253 V (230 V + 10%) under normal operating conditions.
AS IEC 60038 [
8] also defines Australian utilisation voltage range, the voltage range to be maintained at outlets or equipment terminals. Utilisation voltage may rise due to increased supply voltage or internal voltage rises caused by distributed generation such as rooftop solar PVs, whereas it may fall due to decreased supply voltage or excess demand from customers. It is mandatory for the utilisation voltage to be maintained within stipulated range at all times to ensure safe operation of all equipment. Australian Standard
AS 4777.2 [
9] states that all types of inverters, including rooftop solar PV inverters, installed in the DNs need to assist in maintaining voltage limits compatible with
AS 61000.3.100 [
7] through curtailment of generation via Volt-Watt and Volt-VAr modes. Moreover, to ensure uninterrupted inverter services, Australian Standard
AS 4777.1 [
10] requires suitably designed wiring of solar PV installations in order to maintain the overall voltage rise from the network point of supply to the inverter terminals to below 2% of nominal voltage.
To ensure that these standards are effectively followed, it is important for the distribution network service providers (DNSPs) to have efficient, reliable, and accurate ways of estimating the voltage variations in DNs that will happen due to increased integration of rooftop solar PV systems. Having sound knowledge about these voltage variations will help the DNSPs to understand rooftop solar photovoltaic hosting capacity (PVHC) of their DNs. PVHC is defined as the maximum quantity of rooftop solar PVs that can be installed in a given electricity network without imposing any changes to existing infrastructure and without violating any network performance limits [
11]. Understanding PVHC will enable the DNSPs to make informed decisions regarding solar PV installation requests and provide them with the confidence to allow safe integration of more rooftop solar PVs to their DNs. In the future, this will also facilitate dynamic control of installed PV systems by the DNSPs, ensuring optimal utilisation of available PVHC within the DNs.
The most common voltage estimation approach adopted by many DNSPs is simulation of network models using power system analysis software [
12]. This requires accurate and up to date network models, which are often not readily available to DNSPs. Even if they are available, the model-based voltage estimation methods involve complex computations leading to time-intensive simulations [
13]. Therefore, having a more efficient way of identifying these system operational impacts caused by the changes in grid connected BTM rooftop solar PV installations has become a serious challenge for many DNSPs. A noteworthy interest towards developing smart metre data-driven model-free methods of voltage estimation as a promising alternative to conventional model-based approaches, is increasingly visible within the present research community. For this, it is necessary to determine suitable ways of extracting voltage sensitivities to power injections from available smart metre data [
14], which can either be undertaken by incorporating linearised power flow approximations [
11,
14], or non-linear power flow dynamics [
12,
13].
The objective of this paper is to compare the independent application of linearised power flow approximations and non-linear power flow relationships in the development of model-free methods to be effectively employed in place of model-based methods for estimating phase voltages at user connection points in LV DNs. To achieve this objective, this study proposes two distinct model-free voltage estimation approaches. One approach is based on linearised power flow approximations and utilises linear regression (LR), while the other approach is focused on capturing non-linear power flow dynamics and employs neural networks (NNs).
LR is a supervised machine learning (ML) technique that makes predictions using a linear relationship between dependent and independent variables. LR provides simple and easy mathematical interpretations of relatively complex relationships leading to almost perfect predictions [
15]. Linear approximations of power flow equations are commonly used to simplify the complexity and computational demands of non-linear power flow analysis [
16]. Similar linearisation approaches are also applied in other complex energy system problems, such as electricity–water nexus dispatch, to improve computational tractability [
17]. However, the accuracy of this linearisation is crucial to ensure the quality of proposed solutions. Some recent literature employing LR in power engineering applications are presented in
Table 2.
NNs are a foundational component of modern artificial intelligence, designed to simulate the way the human brain processes information. They consist of layers of interconnected nodes (or neurons) that transform input data through weighted connections and activation functions. NNs are particularly effective in capturing complex, non-linear relationships, making them suitable for a wide range of applications including image classification, speech recognition, natural language processing, and time series forecasting [
28]. Some recent literature employing NNs in power engineering applications are provided in
Table 3.
A comparative analysis of model-free voltage estimation techniques independently employing LR and NNs, against model-based simulations conducted using OpenDSS is presented in [
38]. The study focused on estimating the voltage at the customer located furthest from the transformer. These model-free approaches were developed using only historical active power (
P) and voltage (
V) measurements obtained from smart metres, whereas the model-based method also incorporated reactive power (
Q) data. The LR and NN models were trained with 10% PV and EV penetration and evaluated with the same penetration level, as well as with increased PV and EV penetration. The evaluation was conducted using data from 127 real LV feeders, with smart metre datasets generated based on statistical data from the United Kingdom. The results indicated that LR demonstrated greater generalisability than NNs under high PV and EV penetration scenarios. However, the maximum mean absolute errors across all feeders for the LR and NN methods were 6.23 V and 77.37 V respectively, substantially higher than those achieved by the methods proposed in this paper, as detailed in
Section 7.
The study in [
39] compares model-free voltage estimation methods independently built using LR and NNs against model-based simulations conducted in OpenDSS. Both scenarios of within (in-domain) and beyond (out-of-domain) historical data ranges were evaluated using synthetic data from an Australian LV DN with 31 single-phase customers and 25% PV penetration. LR and NN models were built with various input configurations, including customer
P,
Q, and
V, aggregated power, and transformer secondary side voltage or proxies and the voltage analysis was performed at a single customer. Results showed that LR outperformed NNs, particularly in out-of-domain scenarios that had more exports or imports. For instance, under high exports, extending the active power range from [−4.5, 5.3] kW to [−8, 8] kW at a customer, the best-performing LR model resulted in a mismatch of 0.82 V, while the best-performing NN model yielded a mismatch of 3.69 V. However, the proposed method in this paper demonstrates superior accuracy, as detailed in
Section 7.
This paper presents a detailed comparative analysis of two model-free and one model-based method of voltage estimation through an extensive evaluation conducted on a real Australian LV DN. The LR and NN models developed using
P and
Q data from smart metres at real locations of the case study DN estimated the voltages at all users with smart metres incorporated in the LR and NN models, distinguishing this study from the work in [
38,
39]. The comparative model-based simulations in this study are undertaken in Pandapower V3.3.0 (PP), an open-source Python tool for electrical power system modelling, analysis, and optimisation [
40]. It is examined how well LR and NN models can perform relative to PP power flow simulations by analysing the performance of 16 distinct LR models and 648 distinct NN models in estimating phase voltages at users of the case study DN. By training the LR and NN models under existing DN conditions with limited rooftop solar PV penetration and testing them under modified DN conditions with full PV penetration, it is thoroughly explored whether the proposed model-free approaches can accurately capture the voltage variations caused by new PV installations in the case study DN. The uniqueness of this study lies in its comprehensive evaluation framework, which assesses each method based on performance accuracy, computational efficiency (in terms of processing times), data requirements for model development (as applicable to the developed models), and practicality and interpretability. The strong performance demonstrated by the proposed LR- and NN-based voltage estimation methods combined with high efficiency compared to the PP simulations emphasise the practical relevance and contribution of this research.
This paper is structured as follows. The DN utilised as a case study for the work undertaken in this paper is introduced in
Section 2. Pre-processing of data, selection of swing bus voltage and assessment of PP model are explained in
Section 3. Formation of load profiles for LR and NN models is discussed in
Section 4. The proposed methodologies for LR model development and NN model development are presented in
Section 5 and
Section 6 respectively. The results of the work performed are discussed in
Section 7, and the conclusion is given in
Section 8.
2. Case Study Distribution Network
The case study analysis in this paper was performed on a model of a real underground LV DN located in an urban area of New South Wales, Australia, shown in
Figure 1. The DN consisted of one distribution transformer and two radial LV feeders. The DN had a total of 28 users (customers) distributed along two LV feeders, with
user01 through
user14 on
Feeder1, and
user15 through
user28 on
Feeder2. Here, 21 users possessed smart power quality enabled electricity metres while seven users possessed electromechanical electricity metres, accounting for a 75% smart metre penetration. Throughout this paper, the smart power quality enabled electricity metres are referred to as
smart metres.
Traditional electromechanical electricity metres only record electricity consumption in kWh units, which are manually read through visits taken by metre readers. Smart metres record fine-grained measurements on electricity consumption including but not limited to active power, reactive power, current and voltage magnitude in near real-time [
41]. This smart metre data is directly transmitted to the metering service providers via wireless communication networks so that they can be remotely read rather than performing manual visits. However, this high-dimensional and massive smart metre dataset comes with its own challenges such as: bad or null measurements, costly data communication and storage, and data privacy and security issues [
42]. Nevertheless, this study, along with many other existing studies, demonstrates that smart metre data can be effectively leveraged to develop reliable solutions for numerous power engineering problems, thereby supporting the global trend of increased smart metre deployment.
The global smart metre penetration reached 43% at the end of 2023 with 77% penetration in North America, 49% penetration in Asia-Pacific region, and 47% penetration in Europe, and is forecasted to reach 54% by 2030 [
43]. When considering the Australian context, the Australian Energy Market Commission has recommended a target of 100% penetration of smart metres by 2030 in National Electricity Market jurisdictions [
44]. By 2023, the states and territories of Queensland, New South Wales, Australian Capital Territory and South Australia had an average of 30% smart metre penetration, while Victoria has already achieved near 100% smart metre penetration. Tasmania on the other hand has placed an acceleration programme with a target of 100% smart metre deployment by 2026. Thus, improved inclusion of smart metre data in Australian power engineering research and applications can be expected in the near future.
In this case study DN, the 21 users with smart metres included 14 single-phase users and seven three-phase users, corresponding to a total of 35 smart metre channels referred to as
user phases in this paper. This study was performed on these 35 user phases. Further, for this study, smart metre data was available across 11 months spanning from November 2022 to October 2023 at intervals of 5 min. Data at a few instances within this timeframe was missing and was handled as explained in
Section 3.
4. Load Profiles for Linear Regression and Neural Network Models
The case study DN originally had a residential rooftop solar PV penetration of 43%. The LR and NN model training were conducted considering the original DN conditions. To test the proposed model-free voltage estimation approaches on a DN with full PV penetration of 100%, the LR and NN model testing were undertaken on the case study DN after modifying it to have 100% of residential rooftop solar PV penetration. This modification was accomplished by assigning PV installations to those user phases which did not originally have PV installations, and the process of this modification is discussed in detail in the latter part of this section. Accordingly, in this study, two datasets named ‘
Partial PV penetration’ and ‘
Full PV penetration’ were created. The ‘
Partial PV penetration’ dataset was formed considering the original case study DN with 43% of PV penetration using
P,
Q load profiles at the user phases from smart metre measurements and
V profiles at the user phases from PP simulations. The ‘
Full PV penetration’ dataset was formed considering the modified version of the case study DN with 100% of PV penetration using
P,
Q load profiles at the user phases from smart metre measurements together with solar profiles obtained from
pvlib python [
46] as needed to meet 100% PV penetration of the DN and
V profiles at the user phases from PP simulations.
The proposed model-free approaches were trained on data corresponding to limited PV penetration (43%) and subsequently tested under modified DN conditions representing increased PV penetration (100%). This enabled the study to evaluate whether the developed LR and NN models can adapt to more demanding DN conditions and capture voltage variations resulting from increased PV integration, which is particularly relevant for DNSPs when assessing future network planning. In practice, a real DN operates at one PV penetration level during a given time period, and large changes in penetration (e.g., from 43% to 100%) cannot occur within a short time window such as the one-month period considered in this study. Therefore, when analysing the impact of future PV integration scenarios, it is necessary to synthesise modified DN conditions. Hence, in this study, the PP model was used as a consistent benchmark against which the proposed model-free methods are trained as well as tested. Moreover, when developing the modified DN conditions with full PV penetration, pvlib python was utilised to create the synthesised PV generation profiles of the newly added PV systems.
As stated in
Section 2, for this study, 11 months of smart metre measurements was available. For the development of LR and NN models, one month of data was considered. Data from the ‘
Partial PV penetration’ dataset corresponding to the first 3 weeks of the one-month time period was taken for LR and NN model training, while data from the ‘
Full PV penetration’ dataset corresponding to the rest of the one-month time period (referred to as
last week) was taken for LR and NN model testing. This train-test split allowed the LR and NN models to learn from continuous variations in user load and solar generation. Here, one month of data was considered to analyse how accurate the developed LR and NN models can perform by being trained over a considerably short period of time, e.g., 3 weeks.
The dependent and independent variables utilised for LR and NN model development are briefly introduced in
Table 6, where the derivation and application of these variables are discussed in detail below.
The actual
P,
Q load profiles of all user phases for the selected month were obtained after pre-processing the smart metre measurements as outlined in
Section 3 and comprised within the ‘
Partial PV penetration’ dataset representing the original load scenario of the DN with 43% of residential solar penetration. The new
P,
Q load profiles were formed by assigning solar PV installations to the user phases with no original solar PV installations and comprised within the ‘
Full PV penetration’ dataset representing the modified load scenario of the DN with 100% of residential solar penetration.
As mentioned earlier, this study explored how LR and NN models can accurately capture the voltage variations caused by new PV installations. The most extreme configuration of a DN would be when all users have solar PV installed across all of their phases (i.e., 100% solar PV penetration). As such, the new P, Q load profiles were created to mimic a 100% solar PV penetration scenario for the case study DN. For this, it was required to assign new solar PV systems to the user phases originally with no solar PV installations. Solar PV penetration of 100% may not be compatible with the actual PV hosting capacity of the DN (depending on the ratings of installed solar panels); however, that practicality was not relevant in characterising the proposed LR and NN models. The objective of this paper was neither to estimate the PVHC of the DN, nor to establish the potential curtailment of local solar PV resources, but these will be undertaken in future studies.
To form the new
P,
Q load profiles, firstly, it was required to identify the user phases originally having installed solar PV systems. This was undertaken by following the solar PV identification approach presented in authors’ previous work [
47], based on the facts that a BTM generation usually results in net negative
P measurements (assuming that the full generation is not locally consumed all the time) and rooftop solar PV systems exhibit unique pattern in their generation profiles distinguishing them from other BTM generators such as small wind and BESSs. The utilised algorithm correctly identified all the user phases with originally installed solar PV systems when compared to the information available about the actual DN configuration.
Next, the sizes of originally installed solar PV systems were estimated by applying the PV size estimation approach developed in authors’ previous work [
47]. The solar PV system sizes (PV panel ratings) estimated by the utilised approach are compared against the PV panel ratings as detected by network explorer software utilised by the DNSP in
Table 7.
As observable, the solar PV size estimations were almost the same as those detected by the network explorer software utilised by the DNSP, except for user14. The maximum negative P measured by the smart metre at user14 is 8755 W, leading to a size estimation of 8.8 kW, which is significantly different from 7 kW, the PV panel rating as detected by the network explorer software utilised by the DNSP. Here, it is important to note that the utilised algorithm was entirely dependent upon the maximum negative P measured by the smart metres and any fault in the smart metres could directly affect the accuracy of solar size estimation and that these estimations were compared against those detected by the network explorer software utilised by the DNSP, which could have its own errors. However, the estimated solar PV sizes were utilised for the rest of this study.
Each user listed in
Table 7 originally had solar PV installed on one phase, except for
user14 who had solar PV originally installed on all three phases. Therefore, 15 out of 35 user phases, corresponding to 43% of total user phases, in the case study DN had original solar PV installations.
Then, the sizes of newly installed solar PV systems were to be identified. These were determined in a way such that the existing solar size distribution across user phases of the original DN was maintained in the modified DN as well. Accordingly, the exact per-phase PV system sizes existing in the original DN (2.6 kW, 3.2 kW, 4.4 kW, 5 kW, 6 kW and 8.2 kW) were considered when assigning the new solar PV systems.
When phase-wise solar PV sizes for
user14 were found, the total PV size became 9 kW (as opposed to 8.8 kW in
Table 7). This was because the maximum negative
p values on each phase of
user14 were 2591 W, 3163 W and 3001 W corresponding to solar PV sizes of 2.6 kW, 3.2 kW and 3.2 kW respectively. In a typical three-phase solar PV setup using a three-phase inverter, generation is evenly distributed across all three phases. However, the actual amount of power injected into the grid from each phase depends on how local consumption is balanced across them. Since this study analyses each user phase independently, solar PV installations were assigned separately to each identified phase. Assigning the same PV size to all three phases could result in unrealistically large systems for users who already have existing installations on one phase. For example,
user22 originally had a 6 kW system on phase
b. Assigning 6 kW to each phase would lead to a total of 18 kW, which is uncommon in practice. Therefore, when assigning new PV systems to three-phase users, the combined capacity across all phases was ensured to be realistic, even though it led to different PV sizes across the phases of a three-phase user. As mentioned earlier, this was not an issue for this study, since this study treated each user phase individually.
Distribution of solar PV system sizes utilised in the creation of ‘
Partial PV penetration’ and ‘
Full PV penetration’ datasets are given in
Table 8, and the newly added solar PV systems are coloured in respective phase colours (i.e., red, green, blue).
Then, to obtain the solar irradiance information over the real location area of the case study DN and to simulate the performance of solar PV systems, pvlib python was utilised. Here, the generation profile of a solar module at the real location of each user was obtained using pvlib python over the last week of selected month by taking ‘Australia/NSW’ as the time zone and 245 m as the altitude considering a generalised altitude of DN location (242 m) and average height of a single storied house (3 m).
The parameters and other criteria configured in
pvlib python for solar PV system characterisation are given in
Table 9.
Here, solar modules of size 220 W with individual micro inverters were considered. The solar panels were placed on rooftop with a surface azimuth of 0° (North direction). The surface tilt of solar panels was kept at 22° with the horizontal. Rooftop solar PV system output depends on the amount of solar irradiance received by the surface of solar panels, which is influenced by many factors such as solar PV system location, solar panel surface azimuth and surface tilt. Different buildings belonging to the same DN will have rooftop solar panels installed facing different directions with different surface tilts depending on their location and roof design. To get exact solar panel configuration of each building in a DN, it is possible to observe every rooftop via a tool such as Google Maps. However, that was beyond the scope of this research. As this study was focused on investigating how well the proposed model-free voltage estimation methods can mimic the model-based method, the selection of solar panel parameters did not significantly affect the results since same solar panel parameters were used with all methods.
The ultimate application of this study is to estimate rooftop solar PVHC of a DN. For that, a generic constant value for surface azimuth and surface tilt of all solar panels is adequate. However, the consequent result will depend on the selection of the solar panel parameters. For the solar panel settings above, 16 February 2023 was found to be the day with highest instantaneous solar generation over the 11 months of timespan considered for the study. This is a summer day with sun located directly above the earth’s surface. Thus, a solar panel with a slightly lower surface tilt will get more solar insolation throughout the day and result in more solar generation, which will reduce the maximum amount of rooftop solar PV panels that can be installed in the DN without compromising any network performance limits or, in other words, will reduce the PVHC of the DN.
After obtaining the solar generation profiles of 220 W solar modules at real locations of users with the help of
pvlib python, they were multiplied by the respective number of solar modules to get the solar generation profiles corresponding to the sizes of newly added solar PV systems at each user phase, which are coloured in
Table 8. Finally, to create the new
P,
Q load profiles, these solar generation profiles were added on top of the respective actual
P,
Q load profiles over the last week of selected month of user phases with newly assigned solar PV systems. Here, the load profiles of user phases that had solar PV systems originally were not changed and thus, their new
P,
Q load profiles were same as their actual
P,
Q load profiles over the last week of the selected month.
The actual P and Q load profiles from smart metre measurements at all user phases (comprised within ‘Partial PV penetration’ dataset) over the first 3 weeks of the selected month formed the and data. These were fed to the PP model and the resultant user phase voltages from PP simulations were obtained to form the data. These , and data were utilised for LR and NN model training. It should be remembered that 15 out of 35 user phases, accounting for 43% of total user phases in the case study DN, originally had solar PV installed. It was important to ensure that the LR and NN models were capable of capturing behaviours in voltage variation induced by solar PV generation during the training phase, despite being tested under extrapolated conditions. Thus, the LR and NN models were supplied with sufficient and representative data to effectively learn the key underlying relationships within the DN. The new P and Q load profiles at all user phases (comprised within ‘Full PV penetration’ dataset) over the last week of the selected month obtained by increasing the solar PV installations across the case study DN from original 43% to 100% formed and data. The and datasets were formed in this manner to examine how accurate the proposed LR and NN models can predict on unseen data including new solar PV installations. These were fed to the PP model and the resultant user phase voltages from PP simulations were obtained to form the dataset. These , and datasets were utilised for LR and NN model testing.
6. Neural Network Models
An NN begins with an input layer, which receives features from the dataset, and ends with an output layer that produces predictions tailored to a specific task such as class labels in classification or continuous values in regression. Between these layers lie the hidden layers, each composed of a configurable number of artificial neurons that enable the NN to learn complex patterns [
50]. The output of an artificial neuron is determined by applying an activation function to the weighted sum of its inputs offset by a bias parameter, as illustrated in
Figure 4 and given in Equation (2).
number of inputs to the artificial neuron
th input of the artificial neuron
th weight of the artificial neuron
bias of the artificial neuron
activation function of the artificial neuron
output of the artificial neuron
The depth (number of hidden layers) and width (number of neurons per layer) of an NN significantly influence its capacity to capture relationships within the dataset, with deeper or wider architectures generally being more powerful yet computationally demanding [
50]. The performance and generalisation ability of NN models, however, depend not only on the architecture but also on a set of configurable parameters known as hyperparameters. These are not learnt from data during training but must be specified or tuned beforehand. Key hyperparameters include the number of hidden layers, the number of neurons per layer, activation functions, learning rate, optimiser choice, batch size, epochs, regularisation methods, and input data scaling techniques among others. Together, these hyperparameters form the backbone of NN design and training, and their careful configuration is critical for achieving high performance and robust generalisation [
50].
To introduce non-linearity and enhance learning capacity of an NN model, activation functions such as
ReLU,
Tanh,
Swish or
Sigmoid are applied in hidden layers, while the output layer typically uses a
Softmax function for classification or a
Linear function for regression. NN learning is guided by an error function (also known as a loss function), which quantifies the difference between predicted and actual outputs. Common options of error functions include categorical cross-entropy for classification and mean squared error (MSE) for regression. NN weight updates during training are performed by an optimiser, such as stochastic gradient descent or Adam, which relies on gradients derived from the error function. Another critical hyperparameter in NNs is the learning rate, which controls the magnitude of weight adjustments. If the learning rate is set to be too high, training may become unstable and if it is set to be too low, learning may become slow or stagnant [
50]. NN training is typically conducted over multiple epochs, with each epoch representing a full pass through the training dataset. Data is typically divided into smaller subsets called batches, defined by the batch size, which impacts computational efficiency as well as quality of gradient estimates. Data normalisation in NNs using methods such as
MinMaxScaler, ensures balanced feature influence, improves gradient flow, and leads to faster, more stable training with better convergence and accuracy. Moreover, regularisation including
L1 or
L2 penalties or dropout are employed to reduce overfitting by constraining NN model complexity [
50]. How these hyperparameters were selected and tuned for this study is discussed in the following section.
Hyperparameter Selection
NN parameters such as weights and biases are learnt and optimised from data during the NN training process. NN hyperparameters are configurable and not learnt from data. Even though NNs are powerful, they are prone to memorising training data rather than learning, if they are not carefully designed. Thus, it is necessary to carefully select and tune hyperparameters to achieve a desired level of NN performance. When selecting the hyperparameters for this study, the related literature were closely referred [
13]. In this study, several fixed and varying hyperparameters were incorporated in NN modelling and their impact on NN performance was investigated. The
P and
Q load profiles at user phases formed the NN inputs, while the
V profiles at user phases formed the NN outputs. Two conditions ‘
Including Vtx’ and ‘
Excluding Vtx’ respectively corresponding to the inclusion and exclusion of
Vtx in NN input space were also considered in this study. A summary of the fixed and varying hyperparameters investigated in this study is provided in
Table 11.
NN models in this study considered two sets of inputs corresponding to the two conditions ‘
Including Vtx’ and ‘
Excluding Vtx’. Under condition ‘
Including Vtx’, the NN input space consisted of 73 inputs including instantaneous
P,
Q data at the 35 user phases and instantaneous
Vtx on the three phases, while under condition ‘
Excluding Vtx’, the NN input space consisted of 70 inputs made of instantaneous
P,
Q data at the 35 user phases. The NN output space comprises 35 outputs specifying the instantaneous
V estimations at the 35 user phases. The NNs were designed to have only one hidden layer to decrease complexity and increase computational efficiency. They were not made unnecessarily deep with multiple hidden layers in this study, as with one hidden layer the desired accuracy could be achieved. Number of neurons in the hidden layer was varied among 210, 245 and 280 corresponding to six, seven and eight times the NN outputs. These numbers were chosen with due reference to [
13], while ensuring that the hidden layer of the NN models was made sufficiently wide to effectively capture the underlying relationships between inputs and outputs.
In line with [
13], the three activation functions ReLU, Tanh and Swish were explored in the hidden layer and the Linear activation function was applied in the output layer to accomplish the regression task of the NN model. ReLU (Rectified Linear Unit) is simple and commonly used in NNs. ReLU is highly efficient, as it outputs zero for negative inputs and the input itself for positive inputs. Tanh (Hyperbolic Tangent) outputs values between −1 and 1, which helps balance the data. However, Tanh can slow down learning when the inputs are large in magnitude. Swish is a more recent activation function, whose curve dips slightly below zero for negative inputs. Swish is smoother and often performs better in deep NNs but comes at a slightly higher computational cost. Linear activation function outputs the input directly without any transformation. It is often used in the output layer of regression models, where the goal is to predict continuous values. The NN models in this study employed the error function MSE, since it is best used in regression tasks, where penalisation of large errors is vital. MSE quantifies the difference between predicted and actual outputs by calculating the average of the squares of the differences between them. Adam (Adaptive Moment Estimation), a widely adopted optimiser in NNs known for its efficiency and robustness, was selected as the optimiser of the NN models in this study. Adam promotes faster convergence and improved performance on complex and noisy datasets by adaptively adjusting the learning rate for each parameter depending on the gradients derived from the error function. To scale these adjustments, Adam relies on a base learning rate, which was manually set to a value among 10
−3, 10
−4 and 10
−5 in this study. These values for learning rate were chosen in direct reference to [
13] and NNs with a learning rate of 10
−4 performed best under both conditions as observable in
Section 7.
In NN models, the number of epochs determines the duration of training, with too few potentially leading to underfitting and too many increasing the risk of overfitting. In this study, the number of epochs was capped at 1000, allowing for up to 1000 complete iterations over the entire training dataset. With early stopping enabled, training could terminate earlier if the NN performance ceased to improve for a predefined number of consecutive epochs, referred to as patience. Early stopping is a regularisation technique that helps to optimise NN generalisation and efficiency by preventing overtraining and reducing unnecessary computational effort. Early stopping monitors a metric such as loss or accuracy and stops training if the metric does not improve within the patience [
50]. In this study, early stopping terminated training if the loss did not deteriorate for 25 consecutive epochs (monitor = ‘loss’, mode = ‘min’, patience = 25 (min_delta = default 0)). In this study, batches were introduced to update NN weights more efficiently and effectively by dividing the data into manageable chunks, improving both training speed and learning stability. The batch sizes 24, 48 and 72 respectively corresponding to 2, 4 and 6 h of data at 5 min resolution were considered. As evident from
Section 7, under both conditions examined in this study, the NNs achieved their best performance with the smallest batch size considered, which was 24. However, the batch size was not lowered further, despite the potential for improved generalisation with smaller batch sizes, as doing so could decrease the training efficiency and the desired performance had already been attained.
To ensure numerical stability and improve learning efficiency in NN models, the MinMaxScaler, a normalisation technique that transforms data to a typical fixed range of [0, 1] or [−1, 1], was employed. In this study, the fixed range of [0, 1] was utilised and the scaling was performed on both inputs and outputs by fitting the scalers only on the training dataset to avoid data leakage. After prediction, the outputs were inverse transformed to the original scale to ensure that the predictions were interpretable in the real-world context. The influence of the L2 regularisation technique (also known as Ridge) was evaluated in this study by implementing the NNs both with and without its application. Regularisation in NNs refers to techniques used to reduce overfitting, improve generalisation, and ensure that the NN performs well on unseen data. L2 regularisation adds the sum of squared weights to the loss function and encourages the NN to keep weights small, which can reduce overfitting. L2 regularisation factor directly controls how strongly the NN penalises large weights and requires careful tuning, where it being too small could result in almost no regularisation and an overfitted NN, and too large could result in strong regularisation and an underfitted NN. In this study, three L2 regularisation factors 10
−5, 10
−6 and 10
−7 were considered for the NN hidden layer with due reference to [
13], and the NNs with L2 regularisation factor of 10
−7 performed best under both conditions investigated in this study, as observable in
Section 7. The L2 regularisation factor was not further lowered, because doing so could reduce regularisation and result in an overfitted NN.
Unlike the LR models, which were evaluated separately across four different months, the NN models in this study were assessed only for May 2023. This approach was justified by findings from the LR analysis, which showed that model-free voltage estimation was not significantly influenced by solar insolation variations. Additionally, unlike LR models, NN models were not challenged with linearised power flow approximations, making them sufficient to be tested on any generic scenario of power consumption. As mentioned earlier, this study examined two conditions (‘Including Vtx’ and ‘Excluding Vtx’) with NNs derived from two distinct input sets. Under each condition, 324 hyperparameter combinations were evaluated. These hyperparameter combinations comprise a fixed number of outputs, fixed number of hidden layers, fixed output activation function, fixed error function, fixed optimiser, fixed number of epochs, fixed normalisation, three counts of hidden layer neurons, three hidden layer activation functions, three learning rates, three batch sizes, activation or deactivation of L2 regularisation, and three L2 regularisation factors (where L2 regularisation was enabled). Thus, overall this study investigated 648 (= 324 × 2) distinct NN models.
To assess the performance of these NN models,
k-fold cross-validation was employed in this study.
K-fold cross-validation is a statistical technique commonly employed to evaluate the generalisation performance of NNs. Here, the original dataset is partitioned into
k equally sized subsets or folds. The NN model is then trained and validated
k times, each time using a different fold as the validation set while the remaining
k–1 folds are used for training. This process ensures that every data point is used for both training and validation, thereby reducing bias associated with a single train-test split. The results from each iteration are aggregated, typically by averaging, to produce a more reliable estimate of the NN performance. In this study, three-fold cross-validation was independently applied to the training dataset of each NN model, and the MSE was averaged across the three iterations for each NN model. Then, under each condition, the ten hyperparameter combinations corresponding to the ten NN models (five with L2 regularisation enabled and five with L2 regularisation disabled) with least averaged MSE from three-fold cross-validation were identified. After that, under each condition, using the ten identified hyperparameter combinations, ten new NN models were generated. Here, the original training dataset of each condition was randomly separated for training (80% of original training dataset) and validation (remaining 20% of original training dataset) and the entire testing dataset was used for testing. Then, under each condition, the NN model with least root mean squared error (RMSE) and least maximum absolute error (MaxAE) was identified as the best-performing NN model under the respective condition.
Figure 5 illustrates how the MSE loss improved during training and validation of these two NN models.
7. Results and Discussion
This section evaluates the performance accuracy and efficiency of the 16 LR models and the two identified NN models, which were trained with the ‘Partial PV penetration’ dataset and tested with the ‘Full PV penetration’ dataset as discussed in the previous sections. This section also provides an overall comparison of the proposed model-free voltage estimation methods using LR and NNs against the examined model-based voltage estimation method using PP in terms of data requirement, efficiency and accuracy. The distinctive advantages of LR over NNs in model-free voltage estimation, specifically with regard to the aspects of practicality, and interpretability are highlighted as well. Furthermore, how the proposed model-free voltage estimation approaches can contribute to sustainable benefits is also discussed in this section.
7.1. Accuracy of Proposed Model-Free Voltage Estimation Methods
The predicted voltages from the LR and NN models were compared against the PP simulated voltages. The performance accuracy of the LR and NN models was investigated considering the evaluation metrics; coefficient of determination (R2), root mean squared error (RMSE) and maximum absolute error (MaxAE). R2 determines the proportion of variance in the dependent variable (voltages at user connection points) that can be explained by a model. Therefore, R2 provides a measure of the goodness of fit of a model. R2 can take any value between zero and one, and the higher the R2, the better the fitness of a model to target values (PP simulated voltages). For example, a model with R2 of 0.90 will account for 90% of variance in the dependent variable. RMSE measures the average difference between model predictions and target values and provides an estimation of how accurate a model can predict the target values. The lower the RMSE, the better the model performance. Thus, a perfect model (a hypothetical scenario of model always exactly predicting the target values) would have an RMSE of zero. MaxAE represents the largest absolute error in model predictions when compared against the target values. MaxAE analyses the fitness of a model and indicates the worst case of model predictions. A large MaxAE suggests that the respective data point is an outlier, or the model is not capable of accurately capturing the underlying relationship between dependent and independent variables at the instance under consideration.
7.1.1. Linear Regression Models
The R
2, RMSE and MaxAE of all 16 LR models during testing are provided in
Table 12. R
2 close to one along with RMSE and MaxAE close to zero will reflect that the LR predictions are close to PP simulation results, and thus the respective LR model is effectively mimicking the PP simulations.
It is observable from
Table 12 that the best performance is shown by the ‘
Three-phase, original coefficients’ LR model with highest R
2, least RMSE and least MaxAE across all four months. It is also visible that the results from three-phase LR models are comparatively more acceptable than those from per-phase LR models. The user phase voltages estimated by the LR models when tested against those simulated by the PP model for the month with maximum solar generation variation (May 2023) are plotted in
Figure 6, where the PP simulated voltages increase along the x axes and the LR predicted voltages increase along the y axes. When the LR model predictions were same or closer to the PP simulations, the markers lied on or close to the diagonal plotted in grey colour. Accordingly, the ‘
Three-phase, original coefficients’ LR model with a linear plot in
Figure 6, produced the best voltage predictions. For LR models ‘
Per-phase, original coefficients’, ‘
Per-phase, forced-positive coefficients’ and ‘
Three-phase, forced-positive coefficients’, the markers got more dispersed from the diagonals as the voltages increased, indicating that their predictions deviated more from the PP simulation results as the voltages increased.
From
Table 12, it is evident that the performance of LR models is not significantly affected by the level of solar insolation, because all LR models developed independently on separate months produced approximately the same outcome. This is further illustrated in the boxplots of
Figure 7, which depicts the spread of errors between simulated PP voltages and predicted LR voltages (PP voltages–LR voltages) resulted at each user phase by ‘
Three-phase, original coefficients’ LR model developed for each month. In
Figure 7, the user phases on phase
a, phase
b and phase
c are coloured in red, green, and blue respectively, and the user phases on
Feeder1 are shown in respective dark colours, while the user phases on
Feeder2 are shown in respective light colours.
As mentioned earlier, the voltage variation at a user is proportional to the corresponding line impedance, which is determined by the distance from distribution transformer to the user. Thus, the voltage variations at users tend to increase along a feeder. This means users located close to the distribution transformer will have comparatively low voltage variation than those located further down the feeder. From
Figure 7, it is visible that on each phase the error spreads increase along the feeders, approximately following the geographical location of the users applied in labelling user phases in
Figure 1. This indicates that, as the voltage variation at a user increases or as the line impedance increases down the feeders, the voltage prediction accuracy of proposed LR models tends to reduce. It is also observable that the errors computed by subtracting LR voltages from PP voltages are predominantly negative. This indicates that the majority of the estimated voltages from LR exceed the corresponding PP voltages, given that both voltage values are strictly positive.
7.1.2. Neural Network Models
An overview of an NN model developed in this study is presented in
Figure 8. Each neuron in the hidden and output layers operate as depicted in
Figure 4. To maintain visual clarity, individual weights are not shown in
Figure 8. Here,
refers to the number of inputs to the NN model,
refers to the
th input of the NN model,
refers to the set of biases for the hidden layer,
refers to the set of biases for the output layer,
refers to the activation function of the hidden layer,
refers to the activation function of the output layer,
refers to the number of outputs/targets of the NN model,
refers to the
th output of the NN model and
refers to the
th target of the NN model.
Table 13 provides the varying hyperparameters and the evaluation metrics (RMSE, MaxAE and R
2) of the NN models that were identified to be performing best (with least RMSE and least MaxAE) during testing (at ‘
Full PV penetration’) under each condition (‘
Including Vtx’ and ‘
Excluding Vtx’) investigated in this study.
Figure 9 plots the voltage estimations from the two NN models developed under the two conditions against the PP simulated voltages. Here, the PP simulated voltages increase along the x axes, while the NN predicted voltages increase along the y axes. When the NN model predictions were same or closer to the PP simulations, the markers lied on or close to the diagonal plotted in grey colour.
As observable in
Table 13 and
Figure 9, with almost the same performance accuracy, NN models were equally effective under both conditions. When multiple iterations were undertaken to choose the best-performing NN model, the number of neurons in the hidden layer, the activation function of the hidden layer and the number of epochs varied. However, the best performance was always seen with NN models with L2 regularisation enabled, and the same regularisation factor for a given condition at all iterations. Further, the identified NN models always had a learning rate of 10
−4 and the smallest batch size of 24. It can be seen that under ‘
Including Vtx’ condition, early stopping was activated, and the number of epochs was limited, making the process more efficient.
It is evident from the condition ‘Excluding Vtx’ that the NN models could capture the underlying relationships between inputs and outputs even without the insights from transformer voltage. However, if the actual transformer voltage (with variance) was available, the NN models could have performed even better, as it would have provided more insights into the underlying relationships rather than the constant value assumed in this study. Since a constant voltage of 1.05 pu was assumed at the distribution transformer LV bus under the condition ‘Including Vtx’, applying MinMaxScaler normalisation resulted in this constant voltage being transformed to zero across all time instances. It is vital not to be misled by these results when deciding feature importance and be aware that this feature (Vtx) was constant by design and could have provided deeper insights if the actual variance was incorporated. Nevertheless, even in the absence of this feature, NN models could provide sufficient accuracy, whereas with the LR approach, this feature was essential either as an approximate constant value, as adopted in this study, or as actual measurements obtained from a transformer monitor when available.
The boxplots in
Figure 10 depict the spread of errors between simulated PP voltages and predicted NN voltages (PP voltages–NN voltages) encountered by each user phase under the conditions ‘
Including Vtx’ and ‘
Excluding Vtx’ during NN testing (at ‘
Full PV penetration’). The user phases on phase
a, phase
b and phase
c are coloured in red, green, and blue respectively. The user phases on
Feeder1 are shown in respective dark colours while the user phases on
Feeder2 are shown in respective light colours. No significant pattern in error distribution across the user phases or along the LV feeders can be identified from these plots for NN results, unlike with those for LR results. However, it can be observed that the errors from NN models for all user phases were small and acceptable under both conditions examined in this study.
7.2. Efficiency of Proposed Model-Free Voltage Estimation Methods
All model-based and model-free simulations were implemented using Python programming language in a Databricks workspace built on Amazon Web Services cloud infrastructure. The Databricks cluster comprises one driver node with 32 GB memory and four CPU cores and two worker nodes with 32 GB memory and four CPU cores each. NN models were constructed using
Keras [
51] interface within the
TensorFlow [
52] framework, with additional data pre-processing and model evaluation tasks conducted using
scikit-learn library.
To estimate voltages at all user phases in the DN at one timestamp, the model-based method using PP took 607 ms, while the proposed model-free method using LR took only 0.001 ms on average by a selected LR model. The approximate time for execution of one timestamp by the NN models under ‘
Including Vtx’ and ‘
Excluding Vtx’ conditions was 0.060 ms and 0.058 ms respectively. With optimised coding and more powerful hardware, the execution time of all methods could be further improved. Nevertheless, this aspect was not critical for this study, which focused to compare different approaches of voltage estimation. However, it is clear that the proposed model-free approaches of voltage estimation are significantly faster than the tested model-based method and have stronger potential for application in real-time DN estimations. For example, consider the allocation of dynamic operating envelopes (DOEs), which determines the upper and lower limits of power imports and/or exports of distributed energy resources (DERs) within a given time interval, where a time interval usually spans from 5 to 30 min [
53]. This involves identification of optimal inverter control settings by estimating voltages at DER connection points every 5 to 30 min, which will be possible with all approaches examined in this study, while it will be more practically feasible with the proposed model-free methods.
7.3. Overall Comparison of PP, LR and NN Approaches of Voltage Estimation
Table 14 provides an overall comparison across PP, LR and NN approaches in terms of requirements for model development and performance efficiency and accuracy. The LR model ‘
Three-phase, original coefficients’ developed using data of the month with maximum solar generation variation (May 2023) was taken for the comparison.
The development of PP model required up to date DN topology, geographic coordinates of DN components, as well as transformer and line parameters, which are typically unavailable at the DNSPs. Even if those data were available at the DNSPs allowing the creation of detailed network models in a chosen power system analysis software, the execution of those models will be extremely time-intensive compared to the proposed model-free methods. The development of LR and NN models (except for NN models under ‘
Excluding ’ condition) required insights of
Vtx, either in the form of an approximate constant value or actual historical measurements from a transformer monitor, and historical
P,
Q and
V measurements at users from smart metres. Since this study was a comparison of model-free methods against PP simulations,
V from PP simulations were employed in the LR and NN model development, so that any errors inherently present within PP simulations could be disregarded. However, when the proposed model-free methods are applied in the real world,
V measurements from smart metres will be required. It is observable from
Table 14 that the proposed model-free approaches could provide sufficiently accurate voltage estimations in a significantly faster manner when compared with the model-based approach using PP. Further, the strong performance of proposed model-free approaches in the extrapolated test environment (‘
Full PV penetration’) demonstrates that both LR and NN models could effectively capture the voltage variations associated with increased solar PV generation.
The accuracy of voltage estimation from the proposed model-free approaches is further illustrated by the voltage variations during the testing phase of LR and NN models (at ‘
Full PV penetration’) plotted in
Figure 11. These plots clearly show how closely the proposed model-free estimation methods align with the model-based simulation results.
It is evident from
Table 14 that NNs could provide similarly accurate results as LR, even without the input of
Vtx. However, NNs took a significantly longer time to train (approximately 7 min) than LR (approximately 1 s), making it comparatively less practical, where the DNs can be much larger taking even more time for NN training. Furthermore, the straightforward and transparent nature of LR, contrasting with the complex, black-box characteristics of NNs that require extensive hyperparameter tuning, makes LR inherently more interpretable than NNs. It can be concluded that both the proposed model-free approaches of LR and NNs can be effective and efficient alternatives to model-based voltage estimation approaches and LR owing to its simplicity and quick training capability is more practical and interpretable compared to NNs.
It is important to emphasise that the LR and NN models were trained and tested using voltages generated via PP simulations. Consequently, the reported errors primarily reflect the ability of the proposed model-free approaches to replicate the behaviour of the model-based power flow solution. This is consistent with the primary objective of the paper, which was to examine whether model-free methods can serve as practical alternatives to conventional power flow simulations for voltage estimation tasks. Although real measurements may include additional uncertainties such as measurement noise, the proposed model-free methods are expected to adequately capture the underlying non-linear relationships in the DN, similar to how they successfully reproduced the behaviour of the PP simulations.
7.4. Benefits of Proposed Model-Free Voltage Estimation Methods
Table 15 gives monthly solar generation of the case study DN obtained using
pvlib python for the cases of ‘
Partial PV penetration’ and ‘
Full PV penetration’ along with the additional monthly solar generation of the DN due to increase in rooftop solar PV penetration to 100%. It is observable that by increasing installation of rooftop solar PV systems in the DN from 43% to 100%, the total solar generation of DN is approximately doubled for each selected month. This increase in solar generation will cause more variations in system voltage than before, and it is paramount to have an accurate and efficient way of capturing those variations. From the accuracy and efficiency seen in the results of this study, it is evident that the proposed model-free voltage estimation methods are promising ways of estimating those voltage variations. Having such reliable tools will help the DNSPs in confidently embracing the ongoing renewable revolution without unnecessarily restricting installation of new rooftop solar PV systems or limiting enlargement of existing systems. This will ultimately help efficient utilisation of existing grid infrastructure, enhance financial gains of rooftop solar PV system owners, and reduce overall carbon footprint of the DN.
According to
Table 15, 100% installation of rooftop solar PV systems in the DN results in an average of 13.23 MWh of additional solar generation per month approximately accounting for 158.76 MWh of additional annual solar generation in the DN. This additional amount of electrical energy generated through solar power reduces 122 tons of annual CO
2 emissions equivalent to 54.6 t of annual coal combustion according to
Greenhouse Gas Equivalencies Calculator developed by Environmental Protection Agency, United States [
54]. This calculator uses United States national average emission factors for electricity generation, which may not be the same in the context of Australia. However, to obtain a reasonable understanding, results from this calculator are useful.
As mentioned earlier, model-based non-linear power flow simulations are highly complex and time-intensive, whose complexity exponentially increases with longer time horizons and larger DNs. Further, model-based non-linear power flow requires highly accurate DN data, which is typically unavailable with the DNSPs. On the other hand, model-free approaches proposed in this study can be trained offline using historical smart metre data. Although training may require some time for larger networks, it will still be beneficial as, after the LR or NN models are trained for once, they can be deployed anytime later as needed as long as the DN topology remains unchanged. In the event of any changes to the DN topology, retraining of the LR and NN models would be necessary. Similarly, any models developed using a power system analysis software would also require updating to reflect the new topology.
For model-based approaches, updating the network models continues to rely on access to accurate and detailed network information, as well as the technical expertise required to revise and validate the system representation. Acquiring the necessary field data to support such updates may be time-consuming, particularly when dependent on scheduled site visits, unless network modifications are recorded at the time they occur. In contrast, model-free methods depend only on the availability of updated smart metre data rather than precise network parameters or specialised modelling expertise. While time is needed to gather sufficient post-change smart metre data before retraining the LR and NN models, the absence of dependence on precise network modelling enhances the practical flexibility of the model-free approaches. Furthermore, although the LR and NN models in this paper were trained using one month of smart metre data, this duration is not strictly mandatory. A shorter data period may still provide reliable results, provided that the data is of sufficient quality and representativeness.
As evidently shown earlier, the proposed model-free voltage estimation methods are substantially faster than the model-based method, making it particularly suitable for real-time network estimations. Thus, the real benefits from proposed model-free voltage estimation methods can be harnessed when DNSPs are looking for real-time estimation of hosting capacity of DERs such as rooftop solar PVs and EVs.
8. Conclusions
This paper independently employed LR and NNs for model-free voltage estimation of LV DNs with high penetration of residential rooftop solar. Here, 16 distinct LR models and 648 NN models were trained utilising measurements from smart metres in a real Australian DN with 43% solar PV penetration and were tested under modified DN conditions with 100% solar PV penetration. The proposed model-free approaches of voltage estimation were compared against model-based non-linear power flow simulated in PP. The results demonstrated that both proposed model-free approaches are capable of estimating voltages at user connection points with comparable accuracy, while offering greater efficiency with faster performance than conventional model-based methods.
The LR model ‘Three-phase, original coefficients’, developed on three-phase basis without forcing coefficients to be positive, produced comparatively the best voltage predictions from the proposed LR approach. It accurately followed the PP simulations with almost negligible errors for all four selected months. For example, it had an R2 of 0.9998, RMSE of 0.01 V and MaxAE of 0.11 V for the month of May 2023. It was evident that, as the voltage variation at users increased or as the line impedance increased down the feeders, the voltage prediction accuracy of proposed LR models slightly reduced. Moreover, the two NN models identified as best-performing under the conditions ‘Including Vtx’ and ‘Excluding Vtx’ were able to effectively follow the PP simulations with respective MaxAEs of 0.15 V and 0.13 V and same RMSE of 0.02 V. Additionally, the simulation of LR and NN models demonstrated significant efficiency compared to PP model.
It is suggested that the proposed model-free methods can be utilised as potential alternatives to existing model-based methods for estimating phase voltages at users, allowing reliable decision making related to safe accommodation of solar PV installations in DNs. Thus, it is proposed that LR and NN models can be effectively deployed for efficient decision making involved with advanced power engineering applications such as model-free DER hosting capacity estimation and DOE allocation if instantaneous smart metre data is available.
It is also suggested that LR models can be more practical and interpretable compared to NN models when estimating voltages in LV DNs. LR models are straightforward to implement and do not require architecture design or hyperparameter tuning, whereas NN models typically require careful selection of architectures and fine-tuning of hyperparameters and often operate as black-box models. In addition, LR directly provides coefficients that represent voltage-to-power sensitivities, making the relationships captured by the model easier to analyse and interpret. Furthermore, it was evident that LR training is faster compared to NN training under the studied scenarios, which can be advantageous in applications where models need to be retrained frequently. However, this does not imply that LR will always outperform NN models in predictive accuracy. More complex or highly non-linear systems, larger datasets, or architectures that explicitly incorporate network structure may benefit from more advanced NN models.
In this study, the NN architecture was intentionally limited to a single hidden layer to reduce model complexity and improve computational efficiency, as this configuration was sufficient to achieve highly accurate results under considered conditions. The aim of having NNs was to investigate whether introducing non-linear modelling capability provides a meaningful improvement over LR for voltage estimation in LV DNs. Although the DN considered in this study is modest in size, it represents a real Australian LV DN and therefore provides a realistic test case representative of typical Australian LV distribution systems. As mentioned earlier, more advanced NN architectures may become beneficial under conditions where the system behaviour exhibits stronger non-linearities. In such cases, architectures that explicitly incorporate system structure such as graph neural networks that encode network topology or recurrent neural networks that capture temporal dependencies may provide additional modelling capability. This highlights a promising avenue for future research, including the investigation of more advanced NN architectures as well as larger and more nonlinear networks.
Among the model-free voltage estimation methods proposed in this paper, the LR-based approach explicitly requires the transformer voltage as an input variable, and therefore, its performance may be sensitive to inaccuracies in the assumed swing bus voltage. Additionally, in DNs with higher transformer impedance, significant upstream voltage variability, or active tap-changing transformers, deviations in transformer voltage may propagate along the feeders and affect voltage estimation accuracy. However, since the objective of this paper was a comparative evaluation of different model-free techniques (LR and NNs) that can serve as effective alternatives to model-based approaches (PP) of voltage estimation in LV DNs, applying the same swing bus voltage assumption across all methods ensured consistency and enabled a fair comparison of their relative performance. Further, with the high rollout of distribution transformer monitors anticipated globally in the electricity networks, the requirement of transformer voltage insights for the proposed model-free voltage estimation approach with LR will be sufficiently accomplished in the near future.
The primary objective of this paper was a comparative evaluation of model-free techniques as potential alternatives to model-based approaches for voltage estimation in PV-rich LV DNs. Accordingly, PP was used as the benchmark model-based power flow framework, and the LR and NN models were evaluated against PP-generated voltages to assess their ability to reproduce the behaviour of a model-based voltage estimation method. It is important to note that PP simulations do not represent perfect ground truth, as power flow models themselves may exhibit deviations from real measurements due to modelling assumptions and parameter uncertainties. However, the use of PP as a benchmark did not hinder the objective of evaluating whether the proposed model-free approaches can effectively replicate model-based voltage estimation results. Although real measurements may include additional uncertainties such as measurement noise, the proposed model-free methods are expected to adequately capture the underlying non-linear relationships in the DNs, similar to how they successfully reproduced the behaviour of the PP simulations in this study.
When considering higher PV penetration levels, pvlib python was utilised in this study to synthesise the modified DN conditions. The pvlib python framework incorporated geographical location and irradiance modelling, allowing the created PV profiles to reflect physically consistent solar generation patterns. Further, when assigning new PV systems, the existing PV size distribution in the DN was preserved, ensuring consistency with the original network structure. However, this study assumed a uniform panel configuration for all newly added PV systems. It is important to note that this assumption did not adversely affect the focus of this study to evaluate the capability of the proposed model-free voltage estimation approaches in reproducing the behaviour of the model-based method under increased PV penetration conditions. Authors acknowledge that real-world deployment conditions may involve additional variability associated with diverse orientations, roof tilts, partial shading conditions, inverter characteristics, and short-term weather variability such as cloud transients, which could influence model generalisation. Investigating these aspects, along with larger and more diverse networks, was out of the scope of this paper and represents a valuable direction for future work.
The proposed model-free voltage estimation methods are primarily intended for application at the level of individual LV DNs, rather than large-scale integrated systems. In practice, LV DNs are inherently limited in size by the capacity of their associated distribution transformers. Therefore, while the case study network is relatively small, it is representative of a typical Australian LV DN, and extreme scaling to very large numbers of user phases is generally not encountered within a single LV DN. For larger systems, such as those involving multiple LV DNs connected to a medium voltage (MV) substation, the proposed approaches can be applied in a decentralised manner, where each LV DN is modelled independently using its own LR or NN model. These individual models can be executed in parallel, enabling efficient computation and mitigating scalability concerns. The aggregated impact on the MV level can then be assessed by combining the outputs of these individual models, providing a practical and computationally tractable approach for larger systems. However, if the proposed model-free methods are to be extended to model an entire MV network as a single unified system, additional considerations would be required. In such cases, more advanced techniques such as dimensionality reduction, clustering, or hierarchical modelling may be necessary to ensure computational tractability and represents a promising future research direction. Nevertheless, the proposed fundamental concept of model-free voltage estimation using data-driven approaches will remain valid and fully applicable.