Comparison of Linear Regression and Neural Networks for Model-Free Voltage Estimation of Low Voltage Distribution Networks with High Penetration of Residential Rooftop Solar

Kalinga, Tharushi; Banfield, Brendan; Knott, Jonathan C.; Robinson, Duane A.

doi:10.3390/electronics15071467

Open AccessArticle

Comparison of Linear Regression and Neural Networks for Model-Free Voltage Estimation of Low Voltage Distribution Networks with High Penetration of Residential Rooftop Solar

¹

Australian Power Quality Research Centre, University of Wollongong, Wollongong, NSW 2522, Australia

²

Gridsight, Innovation Campus, North Wollongong, NSW 2500, Australia

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(7), 1467; https://doi.org/10.3390/electronics15071467

Submission received: 3 March 2026 / Revised: 30 March 2026 / Accepted: 30 March 2026 / Published: 1 April 2026

(This article belongs to the Special Issue Applications of Machine Learning and Artificial Intelligence in Modern Power and Energy Systems, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

The widespread integration of rooftop solar photovoltaic systems into electricity distribution networks often leads to poor voltage regulation at user connection points, potentially breaching system voltage standards. Therefore, it is important for distribution network service providers to thoroughly assess such real and potential impacts to ensure compliant and safe operation of their power systems. The conventional approach of non-linear power flow-based voltage estimation using model-based methods is complex and time-intensive. Consequently, there is an increasing research interest towards model-free voltage estimation methods as a reliable alternative. This paper proposes and compares two distinct model-free voltage estimation approaches that can be utilised for effective hosting capacity estimation of residential solar photovoltaic systems in low voltage distribution networks. One approach utilises linear regression based on linearised power flow equations, while the other employs neural networks to capture non-linear power flow dynamics. The study developed 16 linear regression models and 648 neural network models utilising historical data collected from residential smart electricity metres in a real low voltage distribution network and compared their efficacy against conventional model-based, non-linear power flow simulations. Results indicate that the proposed model-free voltage estimation approaches can estimate voltages at user connection points in a similarly accurate but faster manner compared to the model-based approach. Observations show that the proposed linear regression-based voltage estimation approach is superior to the proposed neural network-based voltage estimation approach in terms of interpretability and practicality.

Keywords:

model-free voltage estimation; linear regression; neural networks; solar photovoltaic; smart metre data; low voltage distribution networks

1. Introduction

Over the last decade, deployment of behind-the-metre (BTM) rooftop solar photovoltaics (PVs) in electricity distribution networks (DNs) has shown a significant growth across the world [1]. Australia in particular has exhibited a remarkable advancement with a total installed rooftop solar PV capacity of 24.4 GW by the midpoint of 2024, making rooftop solar the second-largest source of renewable energy generation, just behind wind energy, and contributing 11.3% of total Australian electricity energy generation in the first half of the same year [2]. New South Wales has maintained the record for highest annual rooftop solar PV installed capacity of any Australian State with 454 MW of new installations in the first half of 2024. Average system size has also grown notably to 9.9 kW in June 2024, from an average of 7.4 kW five years ago, and 4.3 kW a decade ago [2].

This increased integration of rooftop solar PVs to DNs results in a range of technical complications including violation of voltage limits, curtailment of existing solar PVs, maloperation of protection systems, and issues with power quality, among which voltage limit violations are considered as the most prominent issue [3]. Thus, voltage regulation is of paramount importance for addressing voltage-related complications and ensuring uninterrupted operation of modern power systems [4,5]. Consequently, compliance with established voltage regulation standards is essential to maintain secure and reliable operation of electricity systems as rooftop solar PV penetration continues to increase in DNs. Example low voltage (LV) regulation standards currently in place internationally are provided in Table 1 [6].

While this study is applicable to any LV standard maintained in the world, the context of Australia is considered as the example case. Nominal system voltage, the appropriate estimated voltage value used to designate or identify an Australian LV DN is 230 V. Australian Standard AS 61000.3.100 [7] presents statistical limits for nominal system voltage with 1st percentile (V_1%) of −6% and 99th percentile (V_99%) of +10%, where voltage percentile refers to the voltage value below which x% of measurements fall over the period of a survey. Accordingly, the steady state voltage limits to be maintained at the connection point of a customer are 216 V (230 V − 6%) minimum and 253 V (230 V + 10%) maximum. As per Australian Standard AS IEC 60038 [8], it should be maintained within ±10% of nominal system voltage corresponding to a minimum of 207 V (230 V − 10%) and a maximum of 253 V (230 V + 10%) under normal operating conditions.

AS IEC 60038 [8] also defines Australian utilisation voltage range, the voltage range to be maintained at outlets or equipment terminals. Utilisation voltage may rise due to increased supply voltage or internal voltage rises caused by distributed generation such as rooftop solar PVs, whereas it may fall due to decreased supply voltage or excess demand from customers. It is mandatory for the utilisation voltage to be maintained within stipulated range at all times to ensure safe operation of all equipment. Australian Standard AS 4777.2 [9] states that all types of inverters, including rooftop solar PV inverters, installed in the DNs need to assist in maintaining voltage limits compatible with AS 61000.3.100 [7] through curtailment of generation via Volt-Watt and Volt-VAr modes. Moreover, to ensure uninterrupted inverter services, Australian Standard AS 4777.1 [10] requires suitably designed wiring of solar PV installations in order to maintain the overall voltage rise from the network point of supply to the inverter terminals to below 2% of nominal voltage.

To ensure that these standards are effectively followed, it is important for the distribution network service providers (DNSPs) to have efficient, reliable, and accurate ways of estimating the voltage variations in DNs that will happen due to increased integration of rooftop solar PV systems. Having sound knowledge about these voltage variations will help the DNSPs to understand rooftop solar photovoltaic hosting capacity (PVHC) of their DNs. PVHC is defined as the maximum quantity of rooftop solar PVs that can be installed in a given electricity network without imposing any changes to existing infrastructure and without violating any network performance limits [11]. Understanding PVHC will enable the DNSPs to make informed decisions regarding solar PV installation requests and provide them with the confidence to allow safe integration of more rooftop solar PVs to their DNs. In the future, this will also facilitate dynamic control of installed PV systems by the DNSPs, ensuring optimal utilisation of available PVHC within the DNs.

The most common voltage estimation approach adopted by many DNSPs is simulation of network models using power system analysis software [12]. This requires accurate and up to date network models, which are often not readily available to DNSPs. Even if they are available, the model-based voltage estimation methods involve complex computations leading to time-intensive simulations [13]. Therefore, having a more efficient way of identifying these system operational impacts caused by the changes in grid connected BTM rooftop solar PV installations has become a serious challenge for many DNSPs. A noteworthy interest towards developing smart metre data-driven model-free methods of voltage estimation as a promising alternative to conventional model-based approaches, is increasingly visible within the present research community. For this, it is necessary to determine suitable ways of extracting voltage sensitivities to power injections from available smart metre data [14], which can either be undertaken by incorporating linearised power flow approximations [11,14], or non-linear power flow dynamics [12,13].

The objective of this paper is to compare the independent application of linearised power flow approximations and non-linear power flow relationships in the development of model-free methods to be effectively employed in place of model-based methods for estimating phase voltages at user connection points in LV DNs. To achieve this objective, this study proposes two distinct model-free voltage estimation approaches. One approach is based on linearised power flow approximations and utilises linear regression (LR), while the other approach is focused on capturing non-linear power flow dynamics and employs neural networks (NNs).

LR is a supervised machine learning (ML) technique that makes predictions using a linear relationship between dependent and independent variables. LR provides simple and easy mathematical interpretations of relatively complex relationships leading to almost perfect predictions [15]. Linear approximations of power flow equations are commonly used to simplify the complexity and computational demands of non-linear power flow analysis [16]. Similar linearisation approaches are also applied in other complex energy system problems, such as electricity–water nexus dispatch, to improve computational tractability [17]. However, the accuracy of this linearisation is crucial to ensure the quality of proposed solutions. Some recent literature employing LR in power engineering applications are presented in Table 2.

NNs are a foundational component of modern artificial intelligence, designed to simulate the way the human brain processes information. They consist of layers of interconnected nodes (or neurons) that transform input data through weighted connections and activation functions. NNs are particularly effective in capturing complex, non-linear relationships, making them suitable for a wide range of applications including image classification, speech recognition, natural language processing, and time series forecasting [28]. Some recent literature employing NNs in power engineering applications are provided in Table 3.

A comparative analysis of model-free voltage estimation techniques independently employing LR and NNs, against model-based simulations conducted using OpenDSS is presented in [38]. The study focused on estimating the voltage at the customer located furthest from the transformer. These model-free approaches were developed using only historical active power (P) and voltage (V) measurements obtained from smart metres, whereas the model-based method also incorporated reactive power (Q) data. The LR and NN models were trained with 10% PV and EV penetration and evaluated with the same penetration level, as well as with increased PV and EV penetration. The evaluation was conducted using data from 127 real LV feeders, with smart metre datasets generated based on statistical data from the United Kingdom. The results indicated that LR demonstrated greater generalisability than NNs under high PV and EV penetration scenarios. However, the maximum mean absolute errors across all feeders for the LR and NN methods were 6.23 V and 77.37 V respectively, substantially higher than those achieved by the methods proposed in this paper, as detailed in Section 7.

The study in [39] compares model-free voltage estimation methods independently built using LR and NNs against model-based simulations conducted in OpenDSS. Both scenarios of within (in-domain) and beyond (out-of-domain) historical data ranges were evaluated using synthetic data from an Australian LV DN with 31 single-phase customers and 25% PV penetration. LR and NN models were built with various input configurations, including customer P, Q, and V, aggregated power, and transformer secondary side voltage or proxies and the voltage analysis was performed at a single customer. Results showed that LR outperformed NNs, particularly in out-of-domain scenarios that had more exports or imports. For instance, under high exports, extending the active power range from [−4.5, 5.3] kW to [−8, 8] kW at a customer, the best-performing LR model resulted in a mismatch of 0.82 V, while the best-performing NN model yielded a mismatch of 3.69 V. However, the proposed method in this paper demonstrates superior accuracy, as detailed in Section 7.

This paper presents a detailed comparative analysis of two model-free and one model-based method of voltage estimation through an extensive evaluation conducted on a real Australian LV DN. The LR and NN models developed using P and Q data from smart metres at real locations of the case study DN estimated the voltages at all users with smart metres incorporated in the LR and NN models, distinguishing this study from the work in [38,39]. The comparative model-based simulations in this study are undertaken in Pandapower V3.3.0 (PP), an open-source Python tool for electrical power system modelling, analysis, and optimisation [40]. It is examined how well LR and NN models can perform relative to PP power flow simulations by analysing the performance of 16 distinct LR models and 648 distinct NN models in estimating phase voltages at users of the case study DN. By training the LR and NN models under existing DN conditions with limited rooftop solar PV penetration and testing them under modified DN conditions with full PV penetration, it is thoroughly explored whether the proposed model-free approaches can accurately capture the voltage variations caused by new PV installations in the case study DN. The uniqueness of this study lies in its comprehensive evaluation framework, which assesses each method based on performance accuracy, computational efficiency (in terms of processing times), data requirements for model development (as applicable to the developed models), and practicality and interpretability. The strong performance demonstrated by the proposed LR- and NN-based voltage estimation methods combined with high efficiency compared to the PP simulations emphasise the practical relevance and contribution of this research.

This paper is structured as follows. The DN utilised as a case study for the work undertaken in this paper is introduced in Section 2. Pre-processing of data, selection of swing bus voltage and assessment of PP model are explained in Section 3. Formation of load profiles for LR and NN models is discussed in Section 4. The proposed methodologies for LR model development and NN model development are presented in Section 5 and Section 6 respectively. The results of the work performed are discussed in Section 7, and the conclusion is given in Section 8.

2. Case Study Distribution Network

The case study analysis in this paper was performed on a model of a real underground LV DN located in an urban area of New South Wales, Australia, shown in Figure 1. The DN consisted of one distribution transformer and two radial LV feeders. The DN had a total of 28 users (customers) distributed along two LV feeders, with user01 through user14 on Feeder1, and user15 through user28 on Feeder2. Here, 21 users possessed smart power quality enabled electricity metres while seven users possessed electromechanical electricity metres, accounting for a 75% smart metre penetration. Throughout this paper, the smart power quality enabled electricity metres are referred to as smart metres.

Traditional electromechanical electricity metres only record electricity consumption in kWh units, which are manually read through visits taken by metre readers. Smart metres record fine-grained measurements on electricity consumption including but not limited to active power, reactive power, current and voltage magnitude in near real-time [41]. This smart metre data is directly transmitted to the metering service providers via wireless communication networks so that they can be remotely read rather than performing manual visits. However, this high-dimensional and massive smart metre dataset comes with its own challenges such as: bad or null measurements, costly data communication and storage, and data privacy and security issues [42]. Nevertheless, this study, along with many other existing studies, demonstrates that smart metre data can be effectively leveraged to develop reliable solutions for numerous power engineering problems, thereby supporting the global trend of increased smart metre deployment.

The global smart metre penetration reached 43% at the end of 2023 with 77% penetration in North America, 49% penetration in Asia-Pacific region, and 47% penetration in Europe, and is forecasted to reach 54% by 2030 [43]. When considering the Australian context, the Australian Energy Market Commission has recommended a target of 100% penetration of smart metres by 2030 in National Electricity Market jurisdictions [44]. By 2023, the states and territories of Queensland, New South Wales, Australian Capital Territory and South Australia had an average of 30% smart metre penetration, while Victoria has already achieved near 100% smart metre penetration. Tasmania on the other hand has placed an acceleration programme with a target of 100% smart metre deployment by 2026. Thus, improved inclusion of smart metre data in Australian power engineering research and applications can be expected in the near future.

In this case study DN, the 21 users with smart metres included 14 single-phase users and seven three-phase users, corresponding to a total of 35 smart metre channels referred to as user phases in this paper. This study was performed on these 35 user phases. Further, for this study, smart metre data was available across 11 months spanning from November 2022 to October 2023 at intervals of 5 min. Data at a few instances within this timeframe was missing and was handled as explained in Section 3.

3. Pre-Processing

3.1. P, Q Load Measurements

The study in this paper required instantaneous P and Q measurements at all smart metres. Therefore, only the instances having P and Q measurements at all smart metres were considered for the study. In other words, if any of the two measurements at any of the 21 smart metres was absent at a particular instance in time, then that instance was excluded from the study. Instead of excluding these instances, it is possible to perform data interpolation to obtain approximations for missing measurements, however that process was outside the scope of this study.

The models under investigation in this study required accurate identification of the grid phase to which each smart metre channel was connected, and as such it was necessary to apply a suitable phase identification algorithm on the available data. For this, the k-means clustering-based user phase identification approach developed in authors’ previous work [45] was utilised. The identified phases are illustrated in Figure 1 via the application of phase colours red, green, and blue, or a combination thereof.

As the model-free voltage estimation approaches developed in this paper were based on P and Q smart metre measurements, any outliers in them needed to be handled. As significant variations in P and Q over time were required for the effective training of the proposed model-free voltage estimation approaches, it was important to ensure that sufficient standard deviation was demonstrated by both P and Q of all phases of all users. Therefore, any user with standard deviation in P less than 10 W or Q less than 5 VAr in any of its phases were excluded from the study. These thresholds were selected after observing typical residential P and Q profiles. Here, there was no need of excluding any user from the study because all user phases showed standard deviations in P and Q above considered thresholds.

3.2. Swing Bus Voltage Selection

As the distribution transformer of the case study DN did not include direct monitoring hardware, the smart metre voltage measurements at the user phases located closest to the distribution transformer were used as an approximate representation for the voltage at the transformer secondary terminals. The users located closest to the distribution transformer on each phase were selected by finding the users showing minimum standard deviation in night-time voltages. Voltage variation is dependent upon line impedance, which is determined by the distance from the distribution transformer to a given user. Thus, users located closest to the distribution transformer usually show significantly low standard deviations in voltage compared to those located further away. Here, night-time voltages were considered to avoid interference from solar PV generation occurring during daytime. The user selected as located closest to the distribution transformer in this manner were correct when compared with the actual DN configuration.

To closely examine voltage variations across the three phases at the distribution transformer, a day characterised by high instantaneous solar generation and ample smart metre data at the user located closest to the transformer was selected. This day was chosen because high instantaneous solar generation typically induces significant voltage fluctuations, and the availability of detailed smart metre recordings enhances visibility, enabling a more effective analysis of transformer voltage behaviour throughout the day. Approximate instantaneous solar generation at the transformer was estimated for the days within the period covered by the available smart metre data using the open-source python toolbox pvlib python V0.12.0 [46]. Accordingly, 16 February 2023 was identified as the day with highest instantaneous solar generation at the transformer. However, this day had smart metre data only till 2.00 pm at the user located closest to the transformer. Therefore, the day of 17 February 2023, which had the second-highest instantaneous solar generation at the transformer and smart metre data throughout the day at the user located closest to the transformer, was considered to examine the voltage variation at the distribution transformer. The voltage variation at the distribution transformer over 17 February 2023 obtained by considering user phases located closest to the distribution transformer is shown in Figure 2.

The standard deviation in voltages and mean voltages at each phase of distribution transformer over the 11 months period and on 17 February 2023, estimated considering user phases located closest to the distribution transformer, are provided in Table 4 and Table 5 respectively.

Following analysis of results in Table 4 and Table 5, it was decided to use a constant voltage of 1.05 pu as the voltage at the swing bus of the case study DN. The maximum standard deviation of 2.88 V found over Table 4 and Table 5 corresponds to 1.25% (0.0125 pu) of voltage deviation, which may be considered negligible. The mean voltage across all phases at the distribution transformer over the total timespan of 11 months and over 17 February 2023 are 241.44 V and 240.91 V equating to 1.05 pu and 1.04 pu respectively. Thus, the assumption of a constant voltage of 1.05 pu as the distribution transformer LV bus phase voltage (

V_{t x})

of the case study DN is sufficiently accurate, but may require further investigation in future work, especially where high variations in voltage regulation exist due to large loads or high network impedance, or be replaced with the representative distribution transformer voltage data when suitable case studies with transformer monitors become available. It is important that this constant voltage of 1.05 pu is not interpreted as a universal value, and if

V_{t x}

is considered a constant, should ideally be tuned or estimated based on the characteristics of the specific LV DN.

3.3. Pandapower Model Assessment

The model-free voltage estimation approaches proposed in this paper were compared against a model-based benchmark using power flow simulations conducted on a network modelled in Pandapower (PP). To assess PP simulation results, the same day (17 February 2023) characterised by high instantaneous solar generation and ample smart metre data taken for the swing bus voltage analysis in Section 3.2 was considered. This day was selected because high instantaneous solar generation often induces significant voltage fluctuations, potentially leading to notable discrepancies between PP simulation results and smart metre measurements, and the availability of substantial smart metre recordings enhances visibility into the DN.

This PP assessment was undertaken via comparison against smart metre measurements by employing the evaluation metrics; maximum absolute error (MaxAE) and root mean squared error (RMSE), whose definitions are provided in Section 7.1. Accordingly, the highest MaxAE and highest RMSE shown at a user by the PP simulations when compared against the smart metre measurements were 3.15 V (~1.4% of nominal Australian LV) at user24 and 1.14 V (~0.5% of nominal Australian LV) at user08 respectively, which can be considered negligible. Comparison of PP simulated voltages and smart metre measured voltages on each phase of user24 over 17 February 2023 is shown in Figure 3.

The PP simulations demonstrated strong alignment with the real-world data making it an acceptable benchmark for evaluation of the model-free voltage estimation approaches proposed in this paper. However, the implementation of PP simulations required an accurate network model constructed from detailed information. For the DN studied in this paper, these details were obtained from the corresponding DNSP. However, this level of detail is not commonly accessible to DNSPs, presenting a significant barrier to the practical deployment of model-based approaches. This limitation underscores the relevance and necessity of model-free methods, such as those proposed in this paper.

4. Load Profiles for Linear Regression and Neural Network Models

The case study DN originally had a residential rooftop solar PV penetration of 43%. The LR and NN model training were conducted considering the original DN conditions. To test the proposed model-free voltage estimation approaches on a DN with full PV penetration of 100%, the LR and NN model testing were undertaken on the case study DN after modifying it to have 100% of residential rooftop solar PV penetration. This modification was accomplished by assigning PV installations to those user phases which did not originally have PV installations, and the process of this modification is discussed in detail in the latter part of this section. Accordingly, in this study, two datasets named ‘Partial PV penetration’ and ‘Full PV penetration’ were created. The ‘Partial PV penetration’ dataset was formed considering the original case study DN with 43% of PV penetration using P, Q load profiles at the user phases from smart metre measurements and V profiles at the user phases from PP simulations. The ‘Full PV penetration’ dataset was formed considering the modified version of the case study DN with 100% of PV penetration using P, Q load profiles at the user phases from smart metre measurements together with solar profiles obtained from pvlib python [46] as needed to meet 100% PV penetration of the DN and V profiles at the user phases from PP simulations.

The proposed model-free approaches were trained on data corresponding to limited PV penetration (43%) and subsequently tested under modified DN conditions representing increased PV penetration (100%). This enabled the study to evaluate whether the developed LR and NN models can adapt to more demanding DN conditions and capture voltage variations resulting from increased PV integration, which is particularly relevant for DNSPs when assessing future network planning. In practice, a real DN operates at one PV penetration level during a given time period, and large changes in penetration (e.g., from 43% to 100%) cannot occur within a short time window such as the one-month period considered in this study. Therefore, when analysing the impact of future PV integration scenarios, it is necessary to synthesise modified DN conditions. Hence, in this study, the PP model was used as a consistent benchmark against which the proposed model-free methods are trained as well as tested. Moreover, when developing the modified DN conditions with full PV penetration, pvlib python was utilised to create the synthesised PV generation profiles of the newly added PV systems.

As stated in Section 2, for this study, 11 months of smart metre measurements was available. For the development of LR and NN models, one month of data was considered. Data from the ‘Partial PV penetration’ dataset corresponding to the first 3 weeks of the one-month time period was taken for LR and NN model training, while data from the ‘Full PV penetration’ dataset corresponding to the rest of the one-month time period (referred to as last week) was taken for LR and NN model testing. This train-test split allowed the LR and NN models to learn from continuous variations in user load and solar generation. Here, one month of data was considered to analyse how accurate the developed LR and NN models can perform by being trained over a considerably short period of time, e.g., 3 weeks.

The dependent and independent variables utilised for LR and NN model development are briefly introduced in Table 6, where the derivation and application of these variables are discussed in detail below.

The actual P, Q load profiles of all user phases for the selected month were obtained after pre-processing the smart metre measurements as outlined in Section 3 and comprised within the ‘Partial PV penetration’ dataset representing the original load scenario of the DN with 43% of residential solar penetration. The new P, Q load profiles were formed by assigning solar PV installations to the user phases with no original solar PV installations and comprised within the ‘Full PV penetration’ dataset representing the modified load scenario of the DN with 100% of residential solar penetration.

As mentioned earlier, this study explored how LR and NN models can accurately capture the voltage variations caused by new PV installations. The most extreme configuration of a DN would be when all users have solar PV installed across all of their phases (i.e., 100% solar PV penetration). As such, the new P, Q load profiles were created to mimic a 100% solar PV penetration scenario for the case study DN. For this, it was required to assign new solar PV systems to the user phases originally with no solar PV installations. Solar PV penetration of 100% may not be compatible with the actual PV hosting capacity of the DN (depending on the ratings of installed solar panels); however, that practicality was not relevant in characterising the proposed LR and NN models. The objective of this paper was neither to estimate the PVHC of the DN, nor to establish the potential curtailment of local solar PV resources, but these will be undertaken in future studies.

To form the new P, Q load profiles, firstly, it was required to identify the user phases originally having installed solar PV systems. This was undertaken by following the solar PV identification approach presented in authors’ previous work [47], based on the facts that a BTM generation usually results in net negative P measurements (assuming that the full generation is not locally consumed all the time) and rooftop solar PV systems exhibit unique pattern in their generation profiles distinguishing them from other BTM generators such as small wind and BESSs. The utilised algorithm correctly identified all the user phases with originally installed solar PV systems when compared to the information available about the actual DN configuration.

Next, the sizes of originally installed solar PV systems were estimated by applying the PV size estimation approach developed in authors’ previous work [47]. The solar PV system sizes (PV panel ratings) estimated by the utilised approach are compared against the PV panel ratings as detected by network explorer software utilised by the DNSP in Table 7.

As observable, the solar PV size estimations were almost the same as those detected by the network explorer software utilised by the DNSP, except for user14. The maximum negative P measured by the smart metre at user14 is 8755 W, leading to a size estimation of 8.8 kW, which is significantly different from 7 kW, the PV panel rating as detected by the network explorer software utilised by the DNSP. Here, it is important to note that the utilised algorithm was entirely dependent upon the maximum negative P measured by the smart metres and any fault in the smart metres could directly affect the accuracy of solar size estimation and that these estimations were compared against those detected by the network explorer software utilised by the DNSP, which could have its own errors. However, the estimated solar PV sizes were utilised for the rest of this study.

Each user listed in Table 7 originally had solar PV installed on one phase, except for user14 who had solar PV originally installed on all three phases. Therefore, 15 out of 35 user phases, corresponding to 43% of total user phases, in the case study DN had original solar PV installations.

Then, the sizes of newly installed solar PV systems were to be identified. These were determined in a way such that the existing solar size distribution across user phases of the original DN was maintained in the modified DN as well. Accordingly, the exact per-phase PV system sizes existing in the original DN (2.6 kW, 3.2 kW, 4.4 kW, 5 kW, 6 kW and 8.2 kW) were considered when assigning the new solar PV systems.

When phase-wise solar PV sizes for user14 were found, the total PV size became 9 kW (as opposed to 8.8 kW in Table 7). This was because the maximum negative p values on each phase of user14 were 2591 W, 3163 W and 3001 W corresponding to solar PV sizes of 2.6 kW, 3.2 kW and 3.2 kW respectively. In a typical three-phase solar PV setup using a three-phase inverter, generation is evenly distributed across all three phases. However, the actual amount of power injected into the grid from each phase depends on how local consumption is balanced across them. Since this study analyses each user phase independently, solar PV installations were assigned separately to each identified phase. Assigning the same PV size to all three phases could result in unrealistically large systems for users who already have existing installations on one phase. For example, user22 originally had a 6 kW system on phase b. Assigning 6 kW to each phase would lead to a total of 18 kW, which is uncommon in practice. Therefore, when assigning new PV systems to three-phase users, the combined capacity across all phases was ensured to be realistic, even though it led to different PV sizes across the phases of a three-phase user. As mentioned earlier, this was not an issue for this study, since this study treated each user phase individually.

Distribution of solar PV system sizes utilised in the creation of ‘Partial PV penetration’ and ‘Full PV penetration’ datasets are given in Table 8, and the newly added solar PV systems are coloured in respective phase colours (i.e., red, green, blue).

Then, to obtain the solar irradiance information over the real location area of the case study DN and to simulate the performance of solar PV systems, pvlib python was utilised. Here, the generation profile of a solar module at the real location of each user was obtained using pvlib python over the last week of selected month by taking ‘Australia/NSW’ as the time zone and 245 m as the altitude considering a generalised altitude of DN location (242 m) and average height of a single storied house (3 m).

The parameters and other criteria configured in pvlib python for solar PV system characterisation are given in Table 9.

Here, solar modules of size 220 W with individual micro inverters were considered. The solar panels were placed on rooftop with a surface azimuth of 0° (North direction). The surface tilt of solar panels was kept at 22° with the horizontal. Rooftop solar PV system output depends on the amount of solar irradiance received by the surface of solar panels, which is influenced by many factors such as solar PV system location, solar panel surface azimuth and surface tilt. Different buildings belonging to the same DN will have rooftop solar panels installed facing different directions with different surface tilts depending on their location and roof design. To get exact solar panel configuration of each building in a DN, it is possible to observe every rooftop via a tool such as Google Maps. However, that was beyond the scope of this research. As this study was focused on investigating how well the proposed model-free voltage estimation methods can mimic the model-based method, the selection of solar panel parameters did not significantly affect the results since same solar panel parameters were used with all methods.

The ultimate application of this study is to estimate rooftop solar PVHC of a DN. For that, a generic constant value for surface azimuth and surface tilt of all solar panels is adequate. However, the consequent result will depend on the selection of the solar panel parameters. For the solar panel settings above, 16 February 2023 was found to be the day with highest instantaneous solar generation over the 11 months of timespan considered for the study. This is a summer day with sun located directly above the earth’s surface. Thus, a solar panel with a slightly lower surface tilt will get more solar insolation throughout the day and result in more solar generation, which will reduce the maximum amount of rooftop solar PV panels that can be installed in the DN without compromising any network performance limits or, in other words, will reduce the PVHC of the DN.

After obtaining the solar generation profiles of 220 W solar modules at real locations of users with the help of pvlib python, they were multiplied by the respective number of solar modules to get the solar generation profiles corresponding to the sizes of newly added solar PV systems at each user phase, which are coloured in Table 8. Finally, to create the new P, Q load profiles, these solar generation profiles were added on top of the respective actual P, Q load profiles over the last week of selected month of user phases with newly assigned solar PV systems. Here, the load profiles of user phases that had solar PV systems originally were not changed and thus, their new P, Q load profiles were same as their actual P, Q load profiles over the last week of the selected month.

The actual P and Q load profiles from smart metre measurements at all user phases (comprised within ‘Partial PV penetration’ dataset) over the first 3 weeks of the selected month formed the

P_{t r a i n}

and

Q_{t r a i n}

data. These were fed to the PP model and the resultant user phase voltages from PP simulations were obtained to form the

V_{t r a i n}

data. These

P_{t r a i n}

,

Q_{t r a i n}

and

V_{t r a i n}

data were utilised for LR and NN model training. It should be remembered that 15 out of 35 user phases, accounting for 43% of total user phases in the case study DN, originally had solar PV installed. It was important to ensure that the LR and NN models were capable of capturing behaviours in voltage variation induced by solar PV generation during the training phase, despite being tested under extrapolated conditions. Thus, the LR and NN models were supplied with sufficient and representative data to effectively learn the key underlying relationships within the DN. The new P and Q load profiles at all user phases (comprised within ‘Full PV penetration’ dataset) over the last week of the selected month obtained by increasing the solar PV installations across the case study DN from original 43% to 100% formed

P_{t e s t}

and

Q_{t e s t}

data. The

P_{t e s t}

and

Q_{t e s t}

datasets were formed in this manner to examine how accurate the proposed LR and NN models can predict on unseen data including new solar PV installations. These were fed to the PP model and the resultant user phase voltages from PP simulations were obtained to form the

V_{t e s t}

dataset. These

P_{t e s t}

,

Q_{t e s t}

and

V_{t e s t}

datasets were utilised for LR and NN model testing.

5. Linear Regression Models

To examine the robustness of linearised power flow approximation in the proposed model-free voltage estimation approach utilising LR, four representative months: December of 2022 and January, May and June of 2023 were selected, where the LR models were built on one month of data at a time. In this study, a total of 16 LR models were built with four LR models for each selected month, which are introduced in the latter part of this section. Within the 11 months, December recorded the highest monthly solar insolation, while June experienced the lowest monthly solar insolation, according to the Australian Bureau of Meteorology. January and May, which respectively represented periods adjacent to December and June and lie between these two bounds, were additionally analysed to enable a more rigorous assessment, as their corresponding datasets reflected intermediate solar insolation conditions.

LR models were developed by applying the linearised approximation of power flow given in Equation (1) [48].

V_{h, t} \approx V_{t}^{0} + \sum_{\bar{h} = 1}^{N} a_{h, \bar{h}} P_{\bar{h}, t} + \sum_{\bar{h} = 1}^{N} b_{h, \bar{h}} Q_{\bar{h}, t}

(1)

$V_{h, t}$ : V of user phase $h$ at time $t$
$V_{t}^{0}$ : V at distribution transformer LV bus at time $t$
$P_{\bar{h}, t}$ : P of user phase $\bar{h}$ at time $t$
$Q_{\bar{h}, t}$ : Q of user phase $\bar{h}$ at time $t$
$a_{h, \bar{h}} = (\frac{\partial V_{h}}{\partial P_{\bar{h}}})$ : influence of P of user phase $\bar{h}$ on V of user phase $h$
$b_{h, \bar{h}} = (\frac{\partial V_{h}}{\partial Q_{\bar{h}}})$ : influence of Q of user phase $\bar{h}$ on V of user phase $h$
$N :$ number of user phases considered in the LR model

Here, both per-phase and three-phase LR models were built considering no intercept in the linear relationship given in Equation (1). That is setting the parameter fit_intercept to False when fitting the LR model in scikit-learn [49]. Further, it is important to make a note on the expected nature of LR coefficients. The coefficient with

V_{t x}

should be approximately one according to the linear relationship given in Equation (1). The coefficients a and b of a considered phase should be mostly negative, as consumption of power by a user phase generally lowers the system voltage on that phase. Thus, in addition to developing LR models with the default setting of False with parameter positive during LR model fitting, LR models having forced-positive coefficients were also built by setting the parameter positive to True along with reverse signed P and Q loads. Therefore, four distinct LR models as given in Table 10 were examined for each of the four selected months, resulting in an overall analysis of 16 LR models in this study.

6. Neural Network Models

An NN begins with an input layer, which receives features from the dataset, and ends with an output layer that produces predictions tailored to a specific task such as class labels in classification or continuous values in regression. Between these layers lie the hidden layers, each composed of a configurable number of artificial neurons that enable the NN to learn complex patterns [50]. The output of an artificial neuron is determined by applying an activation function to the weighted sum of its inputs offset by a bias parameter, as illustrated in Figure 4 and given in Equation (2).

\hat{y} = g ((\sum_{i = 1}^{n} x_{i} w_{i}) + b)

(2)

$n :$ number of inputs to the artificial neuron
$x_{i} :$ $i$ ^th input of the artificial neuron
$w_{i} :$ $i$ ^th weight of the artificial neuron
$b :$ bias of the artificial neuron
$g :$ activation function of the artificial neuron
$\hat{y} :$ output of the artificial neuron

The depth (number of hidden layers) and width (number of neurons per layer) of an NN significantly influence its capacity to capture relationships within the dataset, with deeper or wider architectures generally being more powerful yet computationally demanding [50]. The performance and generalisation ability of NN models, however, depend not only on the architecture but also on a set of configurable parameters known as hyperparameters. These are not learnt from data during training but must be specified or tuned beforehand. Key hyperparameters include the number of hidden layers, the number of neurons per layer, activation functions, learning rate, optimiser choice, batch size, epochs, regularisation methods, and input data scaling techniques among others. Together, these hyperparameters form the backbone of NN design and training, and their careful configuration is critical for achieving high performance and robust generalisation [50].

To introduce non-linearity and enhance learning capacity of an NN model, activation functions such as ReLU, Tanh, Swish or Sigmoid are applied in hidden layers, while the output layer typically uses a Softmax function for classification or a Linear function for regression. NN learning is guided by an error function (also known as a loss function), which quantifies the difference between predicted and actual outputs. Common options of error functions include categorical cross-entropy for classification and mean squared error (MSE) for regression. NN weight updates during training are performed by an optimiser, such as stochastic gradient descent or Adam, which relies on gradients derived from the error function. Another critical hyperparameter in NNs is the learning rate, which controls the magnitude of weight adjustments. If the learning rate is set to be too high, training may become unstable and if it is set to be too low, learning may become slow or stagnant [50]. NN training is typically conducted over multiple epochs, with each epoch representing a full pass through the training dataset. Data is typically divided into smaller subsets called batches, defined by the batch size, which impacts computational efficiency as well as quality of gradient estimates. Data normalisation in NNs using methods such as MinMaxScaler, ensures balanced feature influence, improves gradient flow, and leads to faster, more stable training with better convergence and accuracy. Moreover, regularisation including L1 or L2 penalties or dropout are employed to reduce overfitting by constraining NN model complexity [50]. How these hyperparameters were selected and tuned for this study is discussed in the following section.

Hyperparameter Selection

NN parameters such as weights and biases are learnt and optimised from data during the NN training process. NN hyperparameters are configurable and not learnt from data. Even though NNs are powerful, they are prone to memorising training data rather than learning, if they are not carefully designed. Thus, it is necessary to carefully select and tune hyperparameters to achieve a desired level of NN performance. When selecting the hyperparameters for this study, the related literature were closely referred [13]. In this study, several fixed and varying hyperparameters were incorporated in NN modelling and their impact on NN performance was investigated. The P and Q load profiles at user phases formed the NN inputs, while the V profiles at user phases formed the NN outputs. Two conditions ‘Including V_tx’ and ‘Excluding V_tx’ respectively corresponding to the inclusion and exclusion of V_tx in NN input space were also considered in this study. A summary of the fixed and varying hyperparameters investigated in this study is provided in Table 11.

NN models in this study considered two sets of inputs corresponding to the two conditions ‘Including V_tx’ and ‘Excluding V_tx’. Under condition ‘Including V_tx’, the NN input space consisted of 73 inputs including instantaneous P, Q data at the 35 user phases and instantaneous V_tx on the three phases, while under condition ‘Excluding V_tx’, the NN input space consisted of 70 inputs made of instantaneous P, Q data at the 35 user phases. The NN output space comprises 35 outputs specifying the instantaneous V estimations at the 35 user phases. The NNs were designed to have only one hidden layer to decrease complexity and increase computational efficiency. They were not made unnecessarily deep with multiple hidden layers in this study, as with one hidden layer the desired accuracy could be achieved. Number of neurons in the hidden layer was varied among 210, 245 and 280 corresponding to six, seven and eight times the NN outputs. These numbers were chosen with due reference to [13], while ensuring that the hidden layer of the NN models was made sufficiently wide to effectively capture the underlying relationships between inputs and outputs.

In line with [13], the three activation functions ReLU, Tanh and Swish were explored in the hidden layer and the Linear activation function was applied in the output layer to accomplish the regression task of the NN model. ReLU (Rectified Linear Unit) is simple and commonly used in NNs. ReLU is highly efficient, as it outputs zero for negative inputs and the input itself for positive inputs. Tanh (Hyperbolic Tangent) outputs values between −1 and 1, which helps balance the data. However, Tanh can slow down learning when the inputs are large in magnitude. Swish is a more recent activation function, whose curve dips slightly below zero for negative inputs. Swish is smoother and often performs better in deep NNs but comes at a slightly higher computational cost. Linear activation function outputs the input directly without any transformation. It is often used in the output layer of regression models, where the goal is to predict continuous values. The NN models in this study employed the error function MSE, since it is best used in regression tasks, where penalisation of large errors is vital. MSE quantifies the difference between predicted and actual outputs by calculating the average of the squares of the differences between them. Adam (Adaptive Moment Estimation), a widely adopted optimiser in NNs known for its efficiency and robustness, was selected as the optimiser of the NN models in this study. Adam promotes faster convergence and improved performance on complex and noisy datasets by adaptively adjusting the learning rate for each parameter depending on the gradients derived from the error function. To scale these adjustments, Adam relies on a base learning rate, which was manually set to a value among 10⁻³, 10⁻⁴ and 10⁻⁵ in this study. These values for learning rate were chosen in direct reference to [13] and NNs with a learning rate of 10⁻⁴ performed best under both conditions as observable in Section 7.

In NN models, the number of epochs determines the duration of training, with too few potentially leading to underfitting and too many increasing the risk of overfitting. In this study, the number of epochs was capped at 1000, allowing for up to 1000 complete iterations over the entire training dataset. With early stopping enabled, training could terminate earlier if the NN performance ceased to improve for a predefined number of consecutive epochs, referred to as patience. Early stopping is a regularisation technique that helps to optimise NN generalisation and efficiency by preventing overtraining and reducing unnecessary computational effort. Early stopping monitors a metric such as loss or accuracy and stops training if the metric does not improve within the patience [50]. In this study, early stopping terminated training if the loss did not deteriorate for 25 consecutive epochs (monitor = ‘loss’, mode = ‘min’, patience = 25 (min_delta = default 0)). In this study, batches were introduced to update NN weights more efficiently and effectively by dividing the data into manageable chunks, improving both training speed and learning stability. The batch sizes 24, 48 and 72 respectively corresponding to 2, 4 and 6 h of data at 5 min resolution were considered. As evident from Section 7, under both conditions examined in this study, the NNs achieved their best performance with the smallest batch size considered, which was 24. However, the batch size was not lowered further, despite the potential for improved generalisation with smaller batch sizes, as doing so could decrease the training efficiency and the desired performance had already been attained.

To ensure numerical stability and improve learning efficiency in NN models, the MinMaxScaler, a normalisation technique that transforms data to a typical fixed range of [0, 1] or [−1, 1], was employed. In this study, the fixed range of [0, 1] was utilised and the scaling was performed on both inputs and outputs by fitting the scalers only on the training dataset to avoid data leakage. After prediction, the outputs were inverse transformed to the original scale to ensure that the predictions were interpretable in the real-world context. The influence of the L2 regularisation technique (also known as Ridge) was evaluated in this study by implementing the NNs both with and without its application. Regularisation in NNs refers to techniques used to reduce overfitting, improve generalisation, and ensure that the NN performs well on unseen data. L2 regularisation adds the sum of squared weights to the loss function and encourages the NN to keep weights small, which can reduce overfitting. L2 regularisation factor directly controls how strongly the NN penalises large weights and requires careful tuning, where it being too small could result in almost no regularisation and an overfitted NN, and too large could result in strong regularisation and an underfitted NN. In this study, three L2 regularisation factors 10⁻⁵, 10⁻⁶ and 10⁻⁷ were considered for the NN hidden layer with due reference to [13], and the NNs with L2 regularisation factor of 10⁻⁷ performed best under both conditions investigated in this study, as observable in Section 7. The L2 regularisation factor was not further lowered, because doing so could reduce regularisation and result in an overfitted NN.

Unlike the LR models, which were evaluated separately across four different months, the NN models in this study were assessed only for May 2023. This approach was justified by findings from the LR analysis, which showed that model-free voltage estimation was not significantly influenced by solar insolation variations. Additionally, unlike LR models, NN models were not challenged with linearised power flow approximations, making them sufficient to be tested on any generic scenario of power consumption. As mentioned earlier, this study examined two conditions (‘Including V_tx’ and ‘Excluding V_tx’) with NNs derived from two distinct input sets. Under each condition, 324 hyperparameter combinations were evaluated. These hyperparameter combinations comprise a fixed number of outputs, fixed number of hidden layers, fixed output activation function, fixed error function, fixed optimiser, fixed number of epochs, fixed normalisation, three counts of hidden layer neurons, three hidden layer activation functions, three learning rates, three batch sizes, activation or deactivation of L2 regularisation, and three L2 regularisation factors (where L2 regularisation was enabled). Thus, overall this study investigated 648 (= 324 × 2) distinct NN models.

To assess the performance of these NN models, k-fold cross-validation was employed in this study. K-fold cross-validation is a statistical technique commonly employed to evaluate the generalisation performance of NNs. Here, the original dataset is partitioned into k equally sized subsets or folds. The NN model is then trained and validated k times, each time using a different fold as the validation set while the remaining k–1 folds are used for training. This process ensures that every data point is used for both training and validation, thereby reducing bias associated with a single train-test split. The results from each iteration are aggregated, typically by averaging, to produce a more reliable estimate of the NN performance. In this study, three-fold cross-validation was independently applied to the training dataset of each NN model, and the MSE was averaged across the three iterations for each NN model. Then, under each condition, the ten hyperparameter combinations corresponding to the ten NN models (five with L2 regularisation enabled and five with L2 regularisation disabled) with least averaged MSE from three-fold cross-validation were identified. After that, under each condition, using the ten identified hyperparameter combinations, ten new NN models were generated. Here, the original training dataset of each condition was randomly separated for training (80% of original training dataset) and validation (remaining 20% of original training dataset) and the entire testing dataset was used for testing. Then, under each condition, the NN model with least root mean squared error (RMSE) and least maximum absolute error (MaxAE) was identified as the best-performing NN model under the respective condition. Figure 5 illustrates how the MSE loss improved during training and validation of these two NN models.

7. Results and Discussion

This section evaluates the performance accuracy and efficiency of the 16 LR models and the two identified NN models, which were trained with the ‘Partial PV penetration’ dataset and tested with the ‘Full PV penetration’ dataset as discussed in the previous sections. This section also provides an overall comparison of the proposed model-free voltage estimation methods using LR and NNs against the examined model-based voltage estimation method using PP in terms of data requirement, efficiency and accuracy. The distinctive advantages of LR over NNs in model-free voltage estimation, specifically with regard to the aspects of practicality, and interpretability are highlighted as well. Furthermore, how the proposed model-free voltage estimation approaches can contribute to sustainable benefits is also discussed in this section.

7.1. Accuracy of Proposed Model-Free Voltage Estimation Methods

The predicted voltages from the LR and NN models were compared against the PP simulated voltages. The performance accuracy of the LR and NN models was investigated considering the evaluation metrics; coefficient of determination (R²), root mean squared error (RMSE) and maximum absolute error (MaxAE). R² determines the proportion of variance in the dependent variable (voltages at user connection points) that can be explained by a model. Therefore, R² provides a measure of the goodness of fit of a model. R² can take any value between zero and one, and the higher the R², the better the fitness of a model to target values (PP simulated voltages). For example, a model with R² of 0.90 will account for 90% of variance in the dependent variable. RMSE measures the average difference between model predictions and target values and provides an estimation of how accurate a model can predict the target values. The lower the RMSE, the better the model performance. Thus, a perfect model (a hypothetical scenario of model always exactly predicting the target values) would have an RMSE of zero. MaxAE represents the largest absolute error in model predictions when compared against the target values. MaxAE analyses the fitness of a model and indicates the worst case of model predictions. A large MaxAE suggests that the respective data point is an outlier, or the model is not capable of accurately capturing the underlying relationship between dependent and independent variables at the instance under consideration.

7.1.1. Linear Regression Models

The R², RMSE and MaxAE of all 16 LR models during testing are provided in Table 12. R² close to one along with RMSE and MaxAE close to zero will reflect that the LR predictions are close to PP simulation results, and thus the respective LR model is effectively mimicking the PP simulations.

It is observable from Table 12 that the best performance is shown by the ‘Three-phase, original coefficients’ LR model with highest R², least RMSE and least MaxAE across all four months. It is also visible that the results from three-phase LR models are comparatively more acceptable than those from per-phase LR models. The user phase voltages estimated by the LR models when tested against those simulated by the PP model for the month with maximum solar generation variation (May 2023) are plotted in Figure 6, where the PP simulated voltages increase along the x axes and the LR predicted voltages increase along the y axes. When the LR model predictions were same or closer to the PP simulations, the markers lied on or close to the diagonal plotted in grey colour. Accordingly, the ‘Three-phase, original coefficients’ LR model with a linear plot in Figure 6, produced the best voltage predictions. For LR models ‘Per-phase, original coefficients’, ‘Per-phase, forced-positive coefficients’ and ‘Three-phase, forced-positive coefficients’, the markers got more dispersed from the diagonals as the voltages increased, indicating that their predictions deviated more from the PP simulation results as the voltages increased.

From Table 12, it is evident that the performance of LR models is not significantly affected by the level of solar insolation, because all LR models developed independently on separate months produced approximately the same outcome. This is further illustrated in the boxplots of Figure 7, which depicts the spread of errors between simulated PP voltages and predicted LR voltages (PP voltages–LR voltages) resulted at each user phase by ‘Three-phase, original coefficients’ LR model developed for each month. In Figure 7, the user phases on phase a, phase b and phase c are coloured in red, green, and blue respectively, and the user phases on Feeder1 are shown in respective dark colours, while the user phases on Feeder2 are shown in respective light colours.

As mentioned earlier, the voltage variation at a user is proportional to the corresponding line impedance, which is determined by the distance from distribution transformer to the user. Thus, the voltage variations at users tend to increase along a feeder. This means users located close to the distribution transformer will have comparatively low voltage variation than those located further down the feeder. From Figure 7, it is visible that on each phase the error spreads increase along the feeders, approximately following the geographical location of the users applied in labelling user phases in Figure 1. This indicates that, as the voltage variation at a user increases or as the line impedance increases down the feeders, the voltage prediction accuracy of proposed LR models tends to reduce. It is also observable that the errors computed by subtracting LR voltages from PP voltages are predominantly negative. This indicates that the majority of the estimated voltages from LR exceed the corresponding PP voltages, given that both voltage values are strictly positive.

7.1.2. Neural Network Models

An overview of an NN model developed in this study is presented in Figure 8. Each neuron in the hidden and output layers operate as depicted in Figure 4. To maintain visual clarity, individual weights are not shown in Figure 8. Here,

n

refers to the number of inputs to the NN model,

x_{i}

refers to the

i

^th input of the NN model,

{B i a s}_{H}

refers to the set of biases for the hidden layer,

{B i a s}_{O}

refers to the set of biases for the output layer,

g_{H}

refers to the activation function of the hidden layer,

g_{O}

refers to the activation function of the output layer,

m

refers to the number of outputs/targets of the NN model,

{\hat{y}}_{i}

refers to the

i

^th output of the NN model and

y_{i}

refers to the

i

^th target of the NN model.

Table 13 provides the varying hyperparameters and the evaluation metrics (RMSE, MaxAE and R²) of the NN models that were identified to be performing best (with least RMSE and least MaxAE) during testing (at ‘Full PV penetration’) under each condition (‘Including V_tx’ and ‘Excluding V_tx’) investigated in this study. Figure 9 plots the voltage estimations from the two NN models developed under the two conditions against the PP simulated voltages. Here, the PP simulated voltages increase along the x axes, while the NN predicted voltages increase along the y axes. When the NN model predictions were same or closer to the PP simulations, the markers lied on or close to the diagonal plotted in grey colour.

As observable in Table 13 and Figure 9, with almost the same performance accuracy, NN models were equally effective under both conditions. When multiple iterations were undertaken to choose the best-performing NN model, the number of neurons in the hidden layer, the activation function of the hidden layer and the number of epochs varied. However, the best performance was always seen with NN models with L2 regularisation enabled, and the same regularisation factor for a given condition at all iterations. Further, the identified NN models always had a learning rate of 10⁻⁴ and the smallest batch size of 24. It can be seen that under ‘Including V_tx’ condition, early stopping was activated, and the number of epochs was limited, making the process more efficient.

It is evident from the condition ‘Excluding V_tx’ that the NN models could capture the underlying relationships between inputs and outputs even without the insights from transformer voltage. However, if the actual transformer voltage (with variance) was available, the NN models could have performed even better, as it would have provided more insights into the underlying relationships rather than the constant value assumed in this study. Since a constant voltage of 1.05 pu was assumed at the distribution transformer LV bus under the condition ‘Including V_tx’, applying MinMaxScaler normalisation resulted in this constant voltage being transformed to zero across all time instances. It is vital not to be misled by these results when deciding feature importance and be aware that this feature (V_tx) was constant by design and could have provided deeper insights if the actual variance was incorporated. Nevertheless, even in the absence of this feature, NN models could provide sufficient accuracy, whereas with the LR approach, this feature was essential either as an approximate constant value, as adopted in this study, or as actual measurements obtained from a transformer monitor when available.

The boxplots in Figure 10 depict the spread of errors between simulated PP voltages and predicted NN voltages (PP voltages–NN voltages) encountered by each user phase under the conditions ‘Including V_tx’ and ‘Excluding V_tx’ during NN testing (at ‘Full PV penetration’). The user phases on phase a, phase b and phase c are coloured in red, green, and blue respectively. The user phases on Feeder1 are shown in respective dark colours while the user phases on Feeder2 are shown in respective light colours. No significant pattern in error distribution across the user phases or along the LV feeders can be identified from these plots for NN results, unlike with those for LR results. However, it can be observed that the errors from NN models for all user phases were small and acceptable under both conditions examined in this study.

7.2. Efficiency of Proposed Model-Free Voltage Estimation Methods

All model-based and model-free simulations were implemented using Python programming language in a Databricks workspace built on Amazon Web Services cloud infrastructure. The Databricks cluster comprises one driver node with 32 GB memory and four CPU cores and two worker nodes with 32 GB memory and four CPU cores each. NN models were constructed using Keras [51] interface within the TensorFlow [52] framework, with additional data pre-processing and model evaluation tasks conducted using scikit-learn library.

To estimate voltages at all user phases in the DN at one timestamp, the model-based method using PP took 607 ms, while the proposed model-free method using LR took only 0.001 ms on average by a selected LR model. The approximate time for execution of one timestamp by the NN models under ‘Including V_tx’ and ‘Excluding V_tx’ conditions was 0.060 ms and 0.058 ms respectively. With optimised coding and more powerful hardware, the execution time of all methods could be further improved. Nevertheless, this aspect was not critical for this study, which focused to compare different approaches of voltage estimation. However, it is clear that the proposed model-free approaches of voltage estimation are significantly faster than the tested model-based method and have stronger potential for application in real-time DN estimations. For example, consider the allocation of dynamic operating envelopes (DOEs), which determines the upper and lower limits of power imports and/or exports of distributed energy resources (DERs) within a given time interval, where a time interval usually spans from 5 to 30 min [53]. This involves identification of optimal inverter control settings by estimating voltages at DER connection points every 5 to 30 min, which will be possible with all approaches examined in this study, while it will be more practically feasible with the proposed model-free methods.

7.3. Overall Comparison of PP, LR and NN Approaches of Voltage Estimation

Table 14 provides an overall comparison across PP, LR and NN approaches in terms of requirements for model development and performance efficiency and accuracy. The LR model ‘Three-phase, original coefficients’ developed using data of the month with maximum solar generation variation (May 2023) was taken for the comparison.

The development of PP model required up to date DN topology, geographic coordinates of DN components, as well as transformer and line parameters, which are typically unavailable at the DNSPs. Even if those data were available at the DNSPs allowing the creation of detailed network models in a chosen power system analysis software, the execution of those models will be extremely time-intensive compared to the proposed model-free methods. The development of LR and NN models (except for NN models under ‘Excluding

V_{t x}

’ condition) required insights of V_tx, either in the form of an approximate constant value or actual historical measurements from a transformer monitor, and historical P, Q and V measurements at users from smart metres. Since this study was a comparison of model-free methods against PP simulations, V from PP simulations were employed in the LR and NN model development, so that any errors inherently present within PP simulations could be disregarded. However, when the proposed model-free methods are applied in the real world, V measurements from smart metres will be required. It is observable from Table 14 that the proposed model-free approaches could provide sufficiently accurate voltage estimations in a significantly faster manner when compared with the model-based approach using PP. Further, the strong performance of proposed model-free approaches in the extrapolated test environment (‘Full PV penetration’) demonstrates that both LR and NN models could effectively capture the voltage variations associated with increased solar PV generation.

The accuracy of voltage estimation from the proposed model-free approaches is further illustrated by the voltage variations during the testing phase of LR and NN models (at ‘Full PV penetration’) plotted in Figure 11. These plots clearly show how closely the proposed model-free estimation methods align with the model-based simulation results.

It is evident from Table 14 that NNs could provide similarly accurate results as LR, even without the input of V_tx. However, NNs took a significantly longer time to train (approximately 7 min) than LR (approximately 1 s), making it comparatively less practical, where the DNs can be much larger taking even more time for NN training. Furthermore, the straightforward and transparent nature of LR, contrasting with the complex, black-box characteristics of NNs that require extensive hyperparameter tuning, makes LR inherently more interpretable than NNs. It can be concluded that both the proposed model-free approaches of LR and NNs can be effective and efficient alternatives to model-based voltage estimation approaches and LR owing to its simplicity and quick training capability is more practical and interpretable compared to NNs.

It is important to emphasise that the LR and NN models were trained and tested using voltages generated via PP simulations. Consequently, the reported errors primarily reflect the ability of the proposed model-free approaches to replicate the behaviour of the model-based power flow solution. This is consistent with the primary objective of the paper, which was to examine whether model-free methods can serve as practical alternatives to conventional power flow simulations for voltage estimation tasks. Although real measurements may include additional uncertainties such as measurement noise, the proposed model-free methods are expected to adequately capture the underlying non-linear relationships in the DN, similar to how they successfully reproduced the behaviour of the PP simulations.

7.4. Benefits of Proposed Model-Free Voltage Estimation Methods

Table 15 gives monthly solar generation of the case study DN obtained using pvlib python for the cases of ‘Partial PV penetration’ and ‘Full PV penetration’ along with the additional monthly solar generation of the DN due to increase in rooftop solar PV penetration to 100%. It is observable that by increasing installation of rooftop solar PV systems in the DN from 43% to 100%, the total solar generation of DN is approximately doubled for each selected month. This increase in solar generation will cause more variations in system voltage than before, and it is paramount to have an accurate and efficient way of capturing those variations. From the accuracy and efficiency seen in the results of this study, it is evident that the proposed model-free voltage estimation methods are promising ways of estimating those voltage variations. Having such reliable tools will help the DNSPs in confidently embracing the ongoing renewable revolution without unnecessarily restricting installation of new rooftop solar PV systems or limiting enlargement of existing systems. This will ultimately help efficient utilisation of existing grid infrastructure, enhance financial gains of rooftop solar PV system owners, and reduce overall carbon footprint of the DN.

According to Table 15, 100% installation of rooftop solar PV systems in the DN results in an average of 13.23 MWh of additional solar generation per month approximately accounting for 158.76 MWh of additional annual solar generation in the DN. This additional amount of electrical energy generated through solar power reduces 122 tons of annual CO₂ emissions equivalent to 54.6 t of annual coal combustion according to Greenhouse Gas Equivalencies Calculator developed by Environmental Protection Agency, United States [54]. This calculator uses United States national average emission factors for electricity generation, which may not be the same in the context of Australia. However, to obtain a reasonable understanding, results from this calculator are useful.

As mentioned earlier, model-based non-linear power flow simulations are highly complex and time-intensive, whose complexity exponentially increases with longer time horizons and larger DNs. Further, model-based non-linear power flow requires highly accurate DN data, which is typically unavailable with the DNSPs. On the other hand, model-free approaches proposed in this study can be trained offline using historical smart metre data. Although training may require some time for larger networks, it will still be beneficial as, after the LR or NN models are trained for once, they can be deployed anytime later as needed as long as the DN topology remains unchanged. In the event of any changes to the DN topology, retraining of the LR and NN models would be necessary. Similarly, any models developed using a power system analysis software would also require updating to reflect the new topology.

For model-based approaches, updating the network models continues to rely on access to accurate and detailed network information, as well as the technical expertise required to revise and validate the system representation. Acquiring the necessary field data to support such updates may be time-consuming, particularly when dependent on scheduled site visits, unless network modifications are recorded at the time they occur. In contrast, model-free methods depend only on the availability of updated smart metre data rather than precise network parameters or specialised modelling expertise. While time is needed to gather sufficient post-change smart metre data before retraining the LR and NN models, the absence of dependence on precise network modelling enhances the practical flexibility of the model-free approaches. Furthermore, although the LR and NN models in this paper were trained using one month of smart metre data, this duration is not strictly mandatory. A shorter data period may still provide reliable results, provided that the data is of sufficient quality and representativeness.

As evidently shown earlier, the proposed model-free voltage estimation methods are substantially faster than the model-based method, making it particularly suitable for real-time network estimations. Thus, the real benefits from proposed model-free voltage estimation methods can be harnessed when DNSPs are looking for real-time estimation of hosting capacity of DERs such as rooftop solar PVs and EVs.

8. Conclusions

This paper independently employed LR and NNs for model-free voltage estimation of LV DNs with high penetration of residential rooftop solar. Here, 16 distinct LR models and 648 NN models were trained utilising measurements from smart metres in a real Australian DN with 43% solar PV penetration and were tested under modified DN conditions with 100% solar PV penetration. The proposed model-free approaches of voltage estimation were compared against model-based non-linear power flow simulated in PP. The results demonstrated that both proposed model-free approaches are capable of estimating voltages at user connection points with comparable accuracy, while offering greater efficiency with faster performance than conventional model-based methods.

The LR model ‘Three-phase, original coefficients’, developed on three-phase basis without forcing coefficients to be positive, produced comparatively the best voltage predictions from the proposed LR approach. It accurately followed the PP simulations with almost negligible errors for all four selected months. For example, it had an R² of 0.9998, RMSE of 0.01 V and MaxAE of 0.11 V for the month of May 2023. It was evident that, as the voltage variation at users increased or as the line impedance increased down the feeders, the voltage prediction accuracy of proposed LR models slightly reduced. Moreover, the two NN models identified as best-performing under the conditions ‘Including V_tx’ and ‘Excluding V_tx’ were able to effectively follow the PP simulations with respective MaxAEs of 0.15 V and 0.13 V and same RMSE of 0.02 V. Additionally, the simulation of LR and NN models demonstrated significant efficiency compared to PP model.

It is suggested that the proposed model-free methods can be utilised as potential alternatives to existing model-based methods for estimating phase voltages at users, allowing reliable decision making related to safe accommodation of solar PV installations in DNs. Thus, it is proposed that LR and NN models can be effectively deployed for efficient decision making involved with advanced power engineering applications such as model-free DER hosting capacity estimation and DOE allocation if instantaneous smart metre data is available.

It is also suggested that LR models can be more practical and interpretable compared to NN models when estimating voltages in LV DNs. LR models are straightforward to implement and do not require architecture design or hyperparameter tuning, whereas NN models typically require careful selection of architectures and fine-tuning of hyperparameters and often operate as black-box models. In addition, LR directly provides coefficients that represent voltage-to-power sensitivities, making the relationships captured by the model easier to analyse and interpret. Furthermore, it was evident that LR training is faster compared to NN training under the studied scenarios, which can be advantageous in applications where models need to be retrained frequently. However, this does not imply that LR will always outperform NN models in predictive accuracy. More complex or highly non-linear systems, larger datasets, or architectures that explicitly incorporate network structure may benefit from more advanced NN models.

In this study, the NN architecture was intentionally limited to a single hidden layer to reduce model complexity and improve computational efficiency, as this configuration was sufficient to achieve highly accurate results under considered conditions. The aim of having NNs was to investigate whether introducing non-linear modelling capability provides a meaningful improvement over LR for voltage estimation in LV DNs. Although the DN considered in this study is modest in size, it represents a real Australian LV DN and therefore provides a realistic test case representative of typical Australian LV distribution systems. As mentioned earlier, more advanced NN architectures may become beneficial under conditions where the system behaviour exhibits stronger non-linearities. In such cases, architectures that explicitly incorporate system structure such as graph neural networks that encode network topology or recurrent neural networks that capture temporal dependencies may provide additional modelling capability. This highlights a promising avenue for future research, including the investigation of more advanced NN architectures as well as larger and more nonlinear networks.

Among the model-free voltage estimation methods proposed in this paper, the LR-based approach explicitly requires the transformer voltage as an input variable, and therefore, its performance may be sensitive to inaccuracies in the assumed swing bus voltage. Additionally, in DNs with higher transformer impedance, significant upstream voltage variability, or active tap-changing transformers, deviations in transformer voltage may propagate along the feeders and affect voltage estimation accuracy. However, since the objective of this paper was a comparative evaluation of different model-free techniques (LR and NNs) that can serve as effective alternatives to model-based approaches (PP) of voltage estimation in LV DNs, applying the same swing bus voltage assumption across all methods ensured consistency and enabled a fair comparison of their relative performance. Further, with the high rollout of distribution transformer monitors anticipated globally in the electricity networks, the requirement of transformer voltage insights for the proposed model-free voltage estimation approach with LR will be sufficiently accomplished in the near future.

The primary objective of this paper was a comparative evaluation of model-free techniques as potential alternatives to model-based approaches for voltage estimation in PV-rich LV DNs. Accordingly, PP was used as the benchmark model-based power flow framework, and the LR and NN models were evaluated against PP-generated voltages to assess their ability to reproduce the behaviour of a model-based voltage estimation method. It is important to note that PP simulations do not represent perfect ground truth, as power flow models themselves may exhibit deviations from real measurements due to modelling assumptions and parameter uncertainties. However, the use of PP as a benchmark did not hinder the objective of evaluating whether the proposed model-free approaches can effectively replicate model-based voltage estimation results. Although real measurements may include additional uncertainties such as measurement noise, the proposed model-free methods are expected to adequately capture the underlying non-linear relationships in the DNs, similar to how they successfully reproduced the behaviour of the PP simulations in this study.

When considering higher PV penetration levels, pvlib python was utilised in this study to synthesise the modified DN conditions. The pvlib python framework incorporated geographical location and irradiance modelling, allowing the created PV profiles to reflect physically consistent solar generation patterns. Further, when assigning new PV systems, the existing PV size distribution in the DN was preserved, ensuring consistency with the original network structure. However, this study assumed a uniform panel configuration for all newly added PV systems. It is important to note that this assumption did not adversely affect the focus of this study to evaluate the capability of the proposed model-free voltage estimation approaches in reproducing the behaviour of the model-based method under increased PV penetration conditions. Authors acknowledge that real-world deployment conditions may involve additional variability associated with diverse orientations, roof tilts, partial shading conditions, inverter characteristics, and short-term weather variability such as cloud transients, which could influence model generalisation. Investigating these aspects, along with larger and more diverse networks, was out of the scope of this paper and represents a valuable direction for future work.

The proposed model-free voltage estimation methods are primarily intended for application at the level of individual LV DNs, rather than large-scale integrated systems. In practice, LV DNs are inherently limited in size by the capacity of their associated distribution transformers. Therefore, while the case study network is relatively small, it is representative of a typical Australian LV DN, and extreme scaling to very large numbers of user phases is generally not encountered within a single LV DN. For larger systems, such as those involving multiple LV DNs connected to a medium voltage (MV) substation, the proposed approaches can be applied in a decentralised manner, where each LV DN is modelled independently using its own LR or NN model. These individual models can be executed in parallel, enabling efficient computation and mitigating scalability concerns. The aggregated impact on the MV level can then be assessed by combining the outputs of these individual models, providing a practical and computationally tractable approach for larger systems. However, if the proposed model-free methods are to be extended to model an entire MV network as a single unified system, additional considerations would be required. In such cases, more advanced techniques such as dimensionality reduction, clustering, or hierarchical modelling may be necessary to ensure computational tractability and represents a promising future research direction. Nevertheless, the proposed fundamental concept of model-free voltage estimation using data-driven approaches will remain valid and fully applicable.

Author Contributions

Conceptualisation, T.K., B.B., J.C.K. and D.A.R.; Methodology, T.K. and B.B.; Writing―original draft, T.K.; Writing―review and editing, B.B., J.C.K. and D.A.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Gridsight through a Higher Degree Research Scholarship at the University of Wollongong.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors wish to acknowledge the contributions of Gridsight and Endeavour Energy for their technical support and access to field data.

Conflicts of Interest

Author Brendan Banfield was employed by the company Gridsight. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Torquato, R.; Salles, D.; Pereira, C.O.; Meira, P.C.M.; Freitas, W. A Comprehensive Assessment of PV Hosting Capacity on Low-Voltage Distribution Systems. IEEE Trans. Power Deliv. 2018, 33, 1002–1012. [Google Scholar] [CrossRef]
Council, C.E. Rooftop Solar and Storage Report—Januray–June 2024. Available online: https://cleanenergycouncil.org.au/getmedia/290496f3-a240-4a12-bc63-7b374f8c71c4/rooftop-solar-and-storage-report_jan-june-2024.pdf (accessed on 4 June 2025).
Abad, M.S.S.; Ma, J.; Zhang, D.; Ahmadyar, A.S.; Marzooghi, H. Probabilistic Assessment of Hosting Capacity in Radial Distribution Systems. IEEE Trans. Sustain. Energy 2018, 9, 1935–1947. [Google Scholar] [CrossRef]
Sadabadi, M.S.; Meng, X.; Liu, Z. Resilient and Robust Voltage Regulation in Shipboard DC Microgrids with ZIP Loads Under Actuator and Parameter Uncertainties. IEEE Trans. Transp. Electrif. 2026, 12, 578–588. [Google Scholar] [CrossRef]
Meng, X.; Xie, D.; Lin, H.; Lin, C.; Ge, X.; Liu, Z. Dissipativity-Based Multiport Stability Root-Cause Identification and Mitigation for Solid-State Transformers. IEEE Trans. Ind. Electron. 2026, 1–13. [Google Scholar] [CrossRef]
Chathurangi, D.; Jayatunga, U.; Perera, S. Recent investigations on the evaluation of solar PV hosting capacity in LV distribution networks constrained by voltage rise. Renew. Energy 2022, 199, 11–20. [Google Scholar] [CrossRef]
AS 61000.3.100; Electromagnetic Compatibility (EMC) Part 3.100: Limits—Steady State Voltage Limits in Public Electricity Systems. Standards Australia: Sydney, NSW, Australia, 2011.
AS IEC 60038:2022; Standard Voltages. Standards Australia: Sydney, NSW, Australia, 2022.
AS/NZS 4777.2; Grid Connection of Energy Systems Via Inverters. Part 2: Inverter Requirements. Standards Australia: Sydney, NSW, Australia, 2020.
AS/NZS 4777.1:2016; Grid Connection of Energy Systems Via Inverters. Part 1: Installation Requirements. Standards Australia: Sydney, NSW, Australia, 2016.
Abad, M.S.S.; Ma, J. Photovoltaic Hosting Capacity Sensitivity to Active Distribution Network Management. IEEE Trans. Power Syst. 2021, 36, 107–117. [Google Scholar] [CrossRef]
Bassi, V.; Ochoa, L.; Alpcan, T. Model-Free Voltage Calculations for PV-Rich LV Networks: Smart Meter Data and Deep Neural Networks. In Proceedings of the 2021 IEEE Madrid PowerTech, Madrid, Spain, 28 June–2 July 2021; IEEE: New York, NY, USA, 2021; pp. 1–6. [Google Scholar] [CrossRef]
Bassi, V.; Ochoa, L.F.; Alpcan, T.; Leckie, C. Electrical Model-Free Voltage Calculations Using Neural Networks and Smart Meter Data. IEEE Trans. Smart Grid 2023, 14, 3271–3282. [Google Scholar] [CrossRef]
Azzolini, J.A.; Reno, M.J.; Yusuf, J.; Talkington, S.; Grijalva, S. Calculating PV Hosting Capacity in Low-Voltage Secondary Networks Using Only Smart Meter Data. In Proceedings of the 2023 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Washington, DC, USA, 16–19 January 2023; IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar] [CrossRef]
Mirtaheri, S.L.; Shahbazian, R. Machine Learning: Theory to Applications; CRC Press: Boca Raton, FL, USA, 2022. [Google Scholar]
Fazio, A.R.D.; Perna, S.; Russo, M.; De Santis, M. Linear Power Flow Method for Radial Distribution Systems Including Voltage Control Devices. IEEE Trans. Ind. Appl. 2024, 60, 4749–4761. [Google Scholar] [CrossRef]
Hua, Z.; Zhou, B.; Chan, K.W.; Zhang, C.; Cao, Y.; Wang, P.; Xia, M. A Progressive Polyhedral Approximation Method for Nonlinear PDE-Constrained Electricity-Water Nexus Dispatch. IEEE Trans. Smart Grid 2025, 16, 2703–2706. [Google Scholar] [CrossRef]
Wang, X.; Zhao, Y.; Zhou, Y. A Data-Driven Topology and Parameter Joint Estimation Method in Non-PMU Distribution Networks. IEEE Trans. Power Syst. 2024, 39, 1681–1692. [Google Scholar] [CrossRef]
Shi, Z.; Xu, Q.; Liu, Y.; Wu, C.; Yang, Y. Line parameter, topology and phase estimation in three-phase distribution networks with non- μ PMUs. Int. J. Electr. Power Energy Syst. 2024, 155, 109658. [Google Scholar] [CrossRef]
Karunarathne, E.; Liu, M.Z.; Ochoa, L.F.; Alpcan, T. Using Real Smart Meter Data to Construct Three-Phase Low Voltage Network Models. IEEE Trans. Power Syst. 2024, 40, 2465–2477. [Google Scholar] [CrossRef]
Kalinga, T.; Banfield, B.; Knott, J.C.; Robinson, D.A. Linear regression for model-free voltage estimation of LV distribution networks with high penetration of electric vehicles. In Proceedings of the 2025 IEEE PES 17th Asia-Pacific Power and Energy Engineering Conference (APPEEC), Auckland, New Zealand, 2–5 December 2025; IEEE: New York, NY, USA, 2025; pp. 1–6. [Google Scholar] [CrossRef]
Yi, Y.; Liu, S.; Zhang, Y.; Xue, Y.; Deng, W.; Li, Q. Phase Identification of Low-voltage Distribution Network Based on Stepwise Regression Method. J. Mod. Power Syst. Clean Energy 2023, 11, 1224–1234. [Google Scholar] [CrossRef]
Li, P.; Wu, W.; Wang, Y.; Hu, Y.; Wu, Z.; Li, Y.; Yuan, Y. A Data-Driven Linear Robust Optimal Power Flow Model. In Proceedings of the 2023 IEEE 7th Conference on Energy Internet and Energy System Integration (EI2), Hangzhou, China, 15–18 December 2023; IEEE: New York, NY, USA, 2024; pp. 650–654. [Google Scholar] [CrossRef]
Ang, E.Y.M.; Paw, Y.C. Linear Model for Online State of Health Estimation of Lithium-Ion Batteries Using Segmented Discharge Profiles. IEEE Trans. Transp. Electrif. 2023, 9, 2464–2471. [Google Scholar] [CrossRef]
Selvi, M.V.; Mishra, S. Investigation of Performance of Electric Load Power Forecasting in Multiple Time Horizons with New Architecture Realized in Multivariate Linear Regression and Feed-Forward Neural Network Techniques. IEEE Trans. Ind. Appl. 2020, 56, 5603–5612. [Google Scholar] [CrossRef]
Wahbah, M.; Feng, S.; EL-Fouly, T.H.M.; Zahawi, B. Root-Transformed Local Linear Regression for Solar Irradiance Probability Density Estimation. IEEE Trans. Power Syst. 2020, 35, 652–661. [Google Scholar] [CrossRef]
Patel, A.M.; Singal, S.K. Optimal component selection of integrated renewable energy system for power generation in stand-alone applications. Energy 2019, 175, 481–504. [Google Scholar] [CrossRef]
Fergus, P.; Chalmers, C. Applied Deep Learning: Tools, Techniques, and Implementation; Computational Intelligence Methods and Applications; Springer International Publishing: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
Simonovska, A.; Bassi, V.; Givisiez, A.G.; Ochoa, L.F.; Alpcan, T. An electrical model-free three-phase OPF for PV-rich LV networks using smart meter and transformer data. Electr. Power Syst. Res. 2025, 240, 111284. [Google Scholar] [CrossRef]
Liu, L.; Shi, N.; Wang, D.; Ma, Z.; Wang, Z.; Reno, M.J.; Azzolini, J.A. Voltage Calculations in Secondary Distribution Networks via Physics-Inspired Neural Network Using Smart Meter Data. IEEE Trans. Smart Grid 2024, 15, 5205–5218. [Google Scholar] [CrossRef]
Yildiz, T.; Abur, A. Convolutional Neural Network-assisted fault detection and location using few PMUs. Electr. Power Syst. Res. 2024, 235, 110705. [Google Scholar] [CrossRef]
Hernandez-Robles, I.A.; González-Ramírez, X.; Álvarez-Jaime, J.A. Effectiveness of forecasters based on Neural Networks for Energy Management in Zero Energy Buildings. Energy Build. 2024, 316, 114372. [Google Scholar] [CrossRef]
Cao, Z.; Wang, J.; Xia, Y. Combined electricity load-forecasting system based on weighted fuzzy time series and deep neural networks. Eng. Appl. Artif. Intell. 2024, 132, 108375. [Google Scholar] [CrossRef]
Qiu, H.; Shi, K.; Wang, R.; Zhang, L.; Liu, X.; Cheng, X. A novel temporal–spatial graph neural network for wind power forecasting considering blockage effects. Renew. Energy 2024, 227, 120499. [Google Scholar] [CrossRef]
Brester, C.; Kallio-Myers, V.; Lindfors, A.V.; Kolehmainen, M.; Niska, H. Evaluating neural network models in site-specific solar PV forecasting using numerical weather prediction data and weather observations. Renew. Energy 2023, 207, 266–274. [Google Scholar] [CrossRef]
El Fallah, S.; Kharbach, J.; Hammouch, Z.; Rezzouk, A.; Jamil, M.O. State of charge estimation of an electric vehicle’s battery using Deep Neural Networks: Simulation and experimental results. J. Energy Storage 2023, 62, 106904. [Google Scholar] [CrossRef]
Bassi, V.; Ochoa, L.F.; Alpcan, T.; Leckie, C.; Liu, M.Z. Smart Meter Data and Operating Envelopes in LV Networks: A Model-Free Approach. In Proceedings of the 2023 IEEE PES Innovative Smart Grid Technologies Latin America (ISGT-LA), San Juan, Puerto Rico, 6–9 November 2023; IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar] [CrossRef]
O’Malley, A.; Palacios-Garcia, E.J.; Hayes, B.P. Model-Free Voltage Estimation of Low Voltage Electrical Power Distribution Systems using Smart Meter Data. In Proceedings of the 2024 IEEE PES Innovative Smart Grid Technologies Europe (ISGT EUROPE), Dubrovnik, Croatia, 14–17 October 2024; IEEE: New York, NY, USA, 2025; pp. 1–5. [Google Scholar] [CrossRef]
Pereira, O.; Bassi, V.; Alpcan, T.; Ochoa, L.F. Assessing the robustness of machine learning-based voltage calculations for LV networks. Sustain. Energy Grids Netw. 2025, 42, 101716. [Google Scholar] [CrossRef]
Thurner, L.; Scheidler, A.; Schäfer, F.; Menke, J.H.; Dollichon, J.; Meier, F.; Meinecke, S.; Braun, M. Pandapower—An Open-Source Python Tool for Convenient Modeling, Analysis, and Optimization of Electric Power Systems. IEEE Trans. Power Syst. 2018, 33, 6510–6521. [Google Scholar] [CrossRef]
Wen, L.; Zhou, K.; Yang, S.; Li, L. Compression of smart meter big data: A survey. Renew. Sustain. Energy Rev. 2018, 91, 59–69. [Google Scholar] [CrossRef]
Wang, Y.; Chen, Q.; Hong, T.; Kang, C. Review of Smart Meter Data Analytics: Applications, Methodologies, and Challenges. IEEE Trans. Smart Grid 2019, 10, 3125–3148. [Google Scholar] [CrossRef]
Analytics, I. Smart Electricity Meter Market 2024: Global Adoption Landscape. Available online: https://iot-analytics.com/smart-meter-adoption/ (accessed on 22 August 2024).
Australian Energy Market Commission. Review of the Regulatory Framework for Metering Services, Final Report; AEMC: Sydney, NSW, Australia, 2023.
Kalinga, T.; Banfield, B.; Knott, J.C.; Robinson, D.A. K-Means Clustering and Linear Regression for User Phase Identification, Verification, and Topology Determination Under Varied Smart Meter Penetration. Energies 2025, 19, 183. [Google Scholar] [CrossRef]
Holmgren, W.; Anderson, K.; Hansen, C.; Mikofski, M.; Jensen, A.R.; Lorenzo, A.; Krien, U.; Driesse, A.; Stark, C.; Luis, E.; et al. pvlib/pvlib-python: V0.10.2. Zenodo 2023. [Google Scholar] [CrossRef]
Kalinga, T.; Banfield, B.; Knott, J.C.; Robinson, D.A. Smart Meter Data-Driven Characterization of LV Electricity Distribution Networks. In Proceedings of the 2023 IEEE International Conference on Energy Technologies for Future Grids (ETFG), Wollongong, NSW, Australia, 3–6 December 2023; IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
Weckx, S.; D’Hulst, R.; Driesen, J. Voltage Sensitivity Analysis of a Laboratory Distribution Grid with Incomplete Data. IEEE Trans. Smart Grid 2015, 6, 1271–1280. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Zollanvari, A. Machine Learning with Python: Theory and Implementation; Springer International Publishing: Cham, Switzerland, 2023. [Google Scholar] [CrossRef]
Keras. Keras: Deep Learning for Humans. Available online: https://keras.io/ (accessed on 9 June 2025).
TensorFlow. Available online: https://www.tensorflow.org/ (accessed on 9 June 2025).
Blackhall, L. On the Calculation and Use of Dynamic Operating Envelopes, Evolve Project M4 Knowledge Sharing Report; The Australian National University: Canberra, ACT, Australia, 2020. [Google Scholar]
Agency, E.P.; States, U. Greenhouse Gas Equivalencies Calculator. Available online: https://www.epa.gov/energy/greenhouse-gas-equivalencies-calculator (accessed on 10 July 2024).

Figure 1. Case study LV underground distribution network.

Figure 2. Distribution transformer voltage variation over 24-h period on 17 February 2023.

Figure 3. PP simulated voltages and smart metre measured voltages at user24 over 17 February 2023.

Figure 4. Illustration of an artificial neuron.

Figure 5. MSE loss improvement during training and validation of NN models.

Figure 6. LR vs. PP voltages for the LR models for the month of May 2023.

Figure 7. Errors (PP voltages–LR voltages) of ‘Three-phase, original coefficients’ LR model.

Figure 8. Overview of an NN model developed in this study.

Figure 9. NN vs PP voltages.

Figure 10. Errors (PP voltages–NN voltages) resulted under the conditions ‘Including V_tx’ and ‘Excluding V_tx’.

Figure 11. PP, LR and NN voltages at ‘Full PV penetration’.

Table 1. Low voltage standards in practice internationally.

Standard	Voltage Range	Nominal Voltage (1/2/3 Phase)
American Standard ANSI C84.1	±5% of nominal system voltage	120 V/240 V/480 V, 60 Hz
Canadian Standards Association	±6% of nominal system voltage	120 V/240 V/480 V, 60 Hz
European Standard EN-50160	±10% of nominal system voltage	230 V/400 V, 50 Hz
Australian Standard AS 61000.3.100	−6/+10% of nominal system voltage	230 V/400 V, 50 Hz
Australian Standard AS IEC 60038	±10% of nominal system voltage	230 V/400 V, 50 Hz

Table 2. Recent literature employing LR in power engineering applications.

Reference	Year	Application	Description
[18]	2024	Topology and parameter estimation	Develops an LR model for topology and parameter estimation of radial DNs using only historical smart metre measurements.
[19]	2024	Line parameter, topology and phase estimation	Presents a method for joint estimation of line parameters, topology and phase labels of DNs based on LR using only smart metre measurements.
[20]	2024	Network model construction	Employs multi-variable LR for three-phase LV network model construction using only smart metre data.
[21]	2025	Voltage estimation	Presents a smart metre data-driven LR-based approach for voltage estimation of LV DNs with high penetration of electric vehicles (EVs).
[22]	2023	Phase identification	Establishes a multi-variable LR model based on the principle of energy conservation for phase identification of LV DNs.
[23]	2023	Optimal power flow calculation	Proposes data-driven linear optimal power flow by establishing branch power flow and square of node voltage amplitude as an LR model.
[24]	2023	Battery state of health estimation	Performs online estimation of state of health of lithium-ion batteries using multiple LR.
[25]	2020	Daily load forecast	Compares multi-variate LR and feed-forward neural networks for daily load forecast and concludes LR works superior during longer lead times.
[26]	2020	Solar irradiance estimation	Applies local LR in combination with a root transformation technique for solar irradiance probability density estimation.
[27]	2019	Optimal component selection	Uses multi-variable LR for optimal component selection based on overall cost and power reliability of off-grid renewable energy systems.

Table 3. Recent literature employing NNs in power engineering applications.

Reference	Year	Application	Description
[29]	2025	Optimal power flow calculation	Proposes a three-phase OPF approach using an NN model trained with historical smart metre and transformer data to calculate setpoints of PV inverters.
[30]	2024	Voltage calculation	Develops an NN model using smart metre data and inspired by a coupled power flow model of primary-secondary DNs for voltage calculation of secondary DNs.
[31]	2024	Fault detection and location	Applies convolutional NNs to identify and locate faults in DNs using data from minimal number of strategically placed phasor measurement units.
[32]	2024	Energy management	Develops NN-based models for forecasting solar PV generation and energy consumption to support energy management in zero energy buildings.
[33]	2024	Load forecasting	Proposes a load forecasting approach integrating deep NNs with weighted fuzzy time series models, utilising actual seasonal load data.
[34]	2024	Wind power forecasting	Introduces an NN model for wind power forecasting that simultaneously analyses temporal and spatial features of wind energy using real-world data.
[35]	2023	Solar PV forecasting	Develops an NN model for solar PV forecasting using solar PV output data, historical weather observations and numerical weather predictions.
[36]	2023	Battery state of charge estimation	Uses deep NNs to perform a comparative study between experimental-based and simulation-based battery state of charge estimation of EVs.
[13]	2023	Voltage calculation	Presents a voltage calculation method for LV networks using NN models to capture relationships among historical smart metre data.
[37]	2023	Operating envelope determination	Employs NN models developed with historical smart metre data to calculate voltages to be used for OE determination in near real-time or in advance.

Table 4. Standard deviation in voltages and mean voltages at distribution transformer over 11 months.

	Standard Deviation in Voltage	Mean Voltage
Phase a (red)	2.78 V	241.72 V
Phase b (green)	2.41 V	241.37 V
Phase c (blue)	2.33 V	241.24 V

Table 5. Standard deviation in voltages and mean voltages at distribution transformer on 17 February 2023.

	Standard Deviation in Voltage	Mean Voltage
Phase a (red)	2.88 V	241.29 V
Phase b (green)	2.22 V	240.77 V
Phase c (blue)	2.33 V	240.66 V

Table 6. Variables utilised for LR and NN model development.

Variable	Description
V_tx	Distribution transformer LV bus phase voltage (1.05 pu)
P_train	Actual P load profiles of user phases over first 3 weeks of selected month
Q_train	Actual Q load profiles of user phases over first 3 weeks of selected month
V_train	Pandapower simulated voltages of user phases for P_train and Q_train
P_test	New P load profiles of user phases over last week of selected month
Q_test	New Q load profiles of user phases over last week of selected month
V_test	Pandapower simulated voltages of user phases for P_test and Q_test

Table 7. Comparison of solar PV panel ratings as estimated by the approach utilised.

Users	Maximum Negative $P$ (W)	Solar PV Panel Ratings Estimated (kW)	Solar PV Panel Ratings as Detected by DNSP (kW)
user04	4880	5	5
user05	4819	5	5
user06	4965	5	5
user07	4892	5	5
user08	4903	5	5
user10	8057	8.2	8
user14	8755	8.8	7
user17	4824	5	5
user18	4880	5	4.5
user22	5912	6	6
user23	4206	4.4	4
user24	4966	5	5
user28	8070	8.2	8

Table 8. Solar PV system sizes utilised in the creation of ‘Partial PV penetration’ and ‘Full PV penetration’ datasets.

User	Phase	Solar PV Size (kW)		User	Phase	Solar PV Size (kW)
User	Phase	‘Partial PV Penetration’	‘Full PV Penetration’	User	Phase	‘Partial PV Penetration’	‘Full PV Penetration’
user01	c		5	user15	c		8.2
user02	a		6	user16	a		3.2
user02	b		2.6	user16	b		2.6
user02	c		2.6	user16	c		3.2
user04	c	5	5	user17	c	5	5
user05	a	5	5	user18	b	5	5
user06	b	5	5	user21	b		5
user07	a		3.2	user22	a		2.6
user07	b	5	5	user22	b	6	6
user07	c		3.2	user22	c		2.6
user08	c	5	5	user23	a	4.4	4.4
user10	b	8.2	8.2	user24	a		3.2
user12	a		4.4	user24	b	5	5
user12	b		3.2	user24	c		3.2
user12	c		3.2	user26	a		8.2
user14	a	2.6	2.6	user27	c		5
user14	b	3.2	3.2	user28	b	8.2	8.2
user14	c	3.2	3.2

Table 9. PV system characterisation in pvlib python.

Parameter/Criteria	Setting/s
Irradiance model	‘clearsky’
Solar module	‘Canadian_Solar_CS5P_220M___2009_’ from ‘SandiaMod’ module collection
Inverter	‘ABB__MICRO_0_25_I_OUTD_US_208__208V_’ from ‘CECInverter’ inverter collection
Temperature model	‘sapm’, ‘close_mount_glass_glass’
Installation	surface_azimuth = 0, surface_tilt = 22, modules_per_string = 1, strings_per_inverter = 1

Table 10. LR models developed for a given month.

LR Model	Per-Phase/Three-Phase	Parameter Positive
Per-phase, original coefficients	Per-phase	False
Per-phase, forced-positive coefficients	Per-phase	True
Three-phase, original coefficients	Three-phase	False
Three-phase, forced-positive coefficients	Three-phase	True

Table 11. Hyperparameters investigated in the study.

Hyperparameter	Setting
Inputs	(35 × 2) + 3 = 73 under ‘Including V_tx’ and 35 × 2 = 70 under ‘Excluding V_tx’
Outputs	35
No. of hidden layers	1
No. of neurons in hidden layer	[6, 7, 8] × 35
Activation function of hidden layer	[ReLU, Tanh, Swish]
Activation function of output layer	Linear
Error function	MSE
Optimiser	Adam
Learning rate	[10⁻³, 10⁻⁴, 10⁻⁵]
Epochs	1000
Batch size	[24, 48, 72] corresponding to [2, 4, 6 h]
Normalisation	MinMaxScaler with fixed range [0,1]
L2 regularisation	[enabled, disabled]
L2 regularisation factor of hidden layer	[10⁻⁵, 10⁻⁶, 10⁻⁷]

Table 12. R², RMSE and MaxAE of LR models.

	‘Per-phase, original coefficients’ LR model			‘Per-phase, forced-positive coefficients’ LR model			‘Three-phase, original coefficients’ LR model			‘Three-phase, forced-positive coefficients’ LR model
Month	R²	RMSE (V)	MaxAE (V)	R²	RMSE (V)	MaxAE (V)	R²	RMSE (V)	MaxAE (V)	R²	RMSE (V)	MaxAE (V)
December 2022	a: 0.3458 b: 0.8738 c: 0.9315	a: 0.51 b: 0.30 c: 0.27	a: 2.54 b: 1.58 c: 1.97	a: 0.2202 b: 0.8517 c: 0.9481	a: 0.51 b: 0.33 c: 0.23	a: 2.49 b: 1.59 c: 1.50	0.9997	0.01	0.10	0.6359	0.36	2.93
January 2023	a: 0.1393 b: 0.8436 c: 0.9807	a: 0.52 b: 0.35 c: 0.12	a: 2.30 b: 1.73 c: 1.40	a: −0.1539 b: 0.8411 c: 0.9456	a: 0.60 b: 0.36 c: 0.22	a: 2.64 b: 1.72 c: 2.07	0.9996	0.02	0.08	0.3839	0.46	3.03
May 2023	a: 0.4814 b: 0.8765 c: −2.8214	a: 0.61 b: 0.29 c: 1.67	a: 2.75 b: 1.46 c: 6.30	a: 0.4617 b: 0.8716 c: 0.9160	a: 0.62 b: 0.29 c: 0.24	a: 2.91 b: 1.37 c: 1.31	0.9998	0.01	0.11	0.6822	0.39	2.78
June 2023	a: 0.8178 b: 0.8742 c: 0.9756	a: 0.38 b: 0.31 c: 0.14	a: 1.97 b: 1.52 c: 1.13	a: 0.8451 b: 0.8822 c: 0.9709	a: 0.35 b: 0.30 c: 0.15	a: 1.74 b: 1.55 c: 1.20	0.9998	0.01	0.08	0.9024	0.26	1.67

Table 13. Best-performing NN models during testing under the conditions ‘Including Vtx’ and ‘Excluding Vtx’.

	‘Including V_tx’	‘Excluding V_tx’
Hidden layer neurons	6 × 35	8 × 35
Hidden layer activation function	Tanh	Swish
L2 regularisation	enabled	enabled
Hidden layer L2 regularisation factor	10⁻⁷	10⁻⁷
Learning rate	10⁻⁴	10⁻⁴
Batch size	24	24
Epochs	738	1000
Test RMSE (V)	0.02	0.02
Test MaxAE (V)	0.15	0.13
Test R²	0.9996	0.9995

Table 14. Comparison of PP, LR and NN approaches (May 2023).

	PP	LR (‘Three-Phase, Original Coefficients’)	NN (‘Including V_tx’)	NN (‘Excluding V_tx’)
Data required for model development	- DN topology - Geographic coordinates of DN components - Transformer parameters - Line parameters	- V_tx - P, Q, V at users	- V_tx - P, Q, V at users	- P, Q, V at users
Approximate time for training and testing	N/A	1 s	7 min	7 min
Approximate time for execution of one timestamp	607 ms	0.001 ms	0.060 ms	0.058 ms
RMSE against PP during testing	N/A	0.01 V	0.02 V	0.02 V
MaxAE against PP during testing	N/A	0.11 V	0.15 V	0.13 V

Table 15. Comparison of monthly solar generation of the case study DN using pvlib python.

Month	Monthly Solar Generation of DN (MWh)		Additional Monthly Solar Generation of DN (MWh)
Month	‘Partial PV Penetration’	‘Full PV Penetration’	Additional Monthly Solar Generation of DN (MWh)
December 2022	15.02	31.04	16.02
January 2023	14.91	30.81	15.90
May 2023	10.42	21.53	11.11
June 2023	9.27	19.16	9.89

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kalinga, T.; Banfield, B.; Knott, J.C.; Robinson, D.A. Comparison of Linear Regression and Neural Networks for Model-Free Voltage Estimation of Low Voltage Distribution Networks with High Penetration of Residential Rooftop Solar. Electronics 2026, 15, 1467. https://doi.org/10.3390/electronics15071467

AMA Style

Kalinga T, Banfield B, Knott JC, Robinson DA. Comparison of Linear Regression and Neural Networks for Model-Free Voltage Estimation of Low Voltage Distribution Networks with High Penetration of Residential Rooftop Solar. Electronics. 2026; 15(7):1467. https://doi.org/10.3390/electronics15071467

Chicago/Turabian Style

Kalinga, Tharushi, Brendan Banfield, Jonathan C. Knott, and Duane A. Robinson. 2026. "Comparison of Linear Regression and Neural Networks for Model-Free Voltage Estimation of Low Voltage Distribution Networks with High Penetration of Residential Rooftop Solar" Electronics 15, no. 7: 1467. https://doi.org/10.3390/electronics15071467

APA Style

Kalinga, T., Banfield, B., Knott, J. C., & Robinson, D. A. (2026). Comparison of Linear Regression and Neural Networks for Model-Free Voltage Estimation of Low Voltage Distribution Networks with High Penetration of Residential Rooftop Solar. Electronics, 15(7), 1467. https://doi.org/10.3390/electronics15071467

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparison of Linear Regression and Neural Networks for Model-Free Voltage Estimation of Low Voltage Distribution Networks with High Penetration of Residential Rooftop Solar

Abstract

1. Introduction

2. Case Study Distribution Network

3. Pre-Processing

3.1. P, Q Load Measurements

3.2. Swing Bus Voltage Selection

3.3. Pandapower Model Assessment

4. Load Profiles for Linear Regression and Neural Network Models

5. Linear Regression Models

6. Neural Network Models

Hyperparameter Selection

7. Results and Discussion

7.1. Accuracy of Proposed Model-Free Voltage Estimation Methods

7.1.1. Linear Regression Models

7.1.2. Neural Network Models

7.2. Efficiency of Proposed Model-Free Voltage Estimation Methods

7.3. Overall Comparison of PP, LR and NN Approaches of Voltage Estimation

7.4. Benefits of Proposed Model-Free Voltage Estimation Methods

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI