An Analysis of the Operation of Distribution Networks Using Kernel Density Estimators

: Efﬁciency in the operation of distribution networks is one of the commonly recognised goals of the Smart Grid aspect. Novel approaches are needed to assess the level of energy loss and reliability in electricity distribution. Transmission of electricity in the power system is invariably accompanied by certain physical phenomena and random events causing losses. Identifying areas where excessive energy losses or excessive grid failure occur is a key element for energy companies in resource management. The study presented in the article is based on data obtained from distribution system operators concerning 41 distribution regions in Poland for a period of 5 years. The ﬁrst part of the article presents an analysis of the distribution of values for the introduced energy density and energy losses in the lines of medium- and low-voltage networks and in transformers supplying the low-voltage network. The second part of the article presents the assessment of the network reliability of the same distribution regions based on analysis of the distributions of System Average Interruption Duration Index (SAIDI) and System Average Interruption Frequency Index (SAIFI) values for planned and unplanned outages. Data analysis is performed by non-parametric methods by means of kernel estimators.


Introduction
Implementation of European Union provisions in the field of energy efficiency are focused on increasing energy security, counteracting climate change, and should also have a positive impact on the economy through the development of the market for new services and innovative energy technologies [1]. The improvement in energy efficiency can be achieved in many ways, one of which is to reduce electricity losses in distribution grids [2,3], to construct smart micro-grids [4,5], to carry out a stochastic loss analysis approach for distribution systems [6], to assess reliability of renewable energy interfaced distribution system taking into account loss minimisation [7] or to integrate of wind power and energy storage with the bulk power system [8].
Distribution grids are the final element of a complex power system. Their work is influenced by a huge number of factors, many of which, including the most important ones, are random. It is therefore advisable to analyse the operation of distribution networks using appropriate statistical methods, taking all these factors as random variables. Such analyses are necessary, for example, in order to define a strategy for the development and modernisation of distribution networks. Due to the complexity of the problem, its degree of difficulty and labour intensity, the works on this issue are usually academic in nature. These studies often discuss theoretical foundations and present mathematical models without reference to specific networks [9].
In addition, it should be noted that distribution networks are dynamic, both in terms of their structure and installed devices, as well as in terms of their load or the environmental conditions in which they operate. Therefore, modelling the entire distribution system does not make much sense because the obtained results will always be burdened with many simplifying assumptions, and accordingly, also with significant errors. It is advisable for distribution companies to carry out practical analyses of real networks based on the available data so that actions can be taken to improve their functioning.
Many tasks of designing the development of the distribution network or the assessment of working conditions are generally based on the results of estimation determined for the most difficult working conditions. State estimation in relation to power systems consists in estimating the complete set of information about the system operation regarding generation, voltage and load values in all nodes of the distribution network. Apart from the state estimation task [10][11][12][13][14], there is also a narrower problem concerning the estimation of loads [15,16]. The estimation methods are based on computational methods in which statistical models [17,18], the theory of fuzzy sets [19], artificial neural networks [20] or hybrid algorithms combining various computational techniques, such as neural networks and fuzzy logic [21], as well as methods of optimising particle swarms with elements of genetic algorithms, are most often used [12,22]. In the area between the analytical and statistical approaches, there are methods that combine elements of both computational techniques. They constitute a practical compromise, often adequate to the current national realities of distribution systems, between the amount of input data needed for calculations and the accuracy of the results of power and energy losses [23][24][25]. These methods, which could be described as hybrid, use the information available in the energy reporting on energy flows through individual groups of distribution network elements and substitute network models. Based on such methods, it is possible to build a methodology supporting the analysis of the operation of distribution networks [20,26,27].
The issues related to the reliability of power system operations are currently generating great interest all over the world. Many works have been devoted to this subject-both experimental and theoretical. In the subject literature, the reliability of the power system is significantly related to the name of Roy Billinton, who together with a group of co-workers published many articles and books, e.g., [28][29][30][31][32]. Usually, the reliability of the generation, transmission or distribution subsystem is analyzed independently [33][34][35][36]. Reliability calculation methods can be generally divided into analytical, consisting in the analysis of random events or processes; simulation (Monte Carlo, statistical modelling), consisting in simulating random events and processes; mixed (combined), which is a combination of analytical and simulation methods [37].
It is often necessary to select the appropriate method for a given task. It may turn out that for some problems the use of one method turns out to be insufficient and there is a need to use other methods, more effective for a given class of problems. In [38] a number of problems and challenges which researchers in reliability engineering are facing when analysing complex systems were shared. In [39] problems in the reliability planning of electric distribution networks were treated as multiobjective issues, consisting of the minimisation of three objective functions. In the literature on reliability, we can find many different methods of analysis, for example using a discrete particle swarm optimisation [40,41], using a fuzzy-based analytical and a fuzzy-based Monte Carlo simulation technique [42] or using a nonparametric estimator [43]. Whereas in [44] in an interesting way the history of reliability engineering was presented. For one of the ways to effectively formulate and solve tasks related to the analysis of the power grid operation is non-parametric statistics methods, in particular the Kernel Density Estimation (KDE) application.

Energy Losses in Distribution Networks and Transformers
The flow of electric current through the distribution network is inherently associated with the loss of power and energy in it; active on element resistances, reactive on reactances, both of which are harmful. The energy to overcome losses must be generated in power plants, which requires the expansion of equipment and the use of correspondingly more fuel for its production. These losses have to be passed on by all the system elements involved in the distribution of energy, which requires increasing its nominal transmission capacity. By converting to heat, in accordance with Joule's law, losses cause heating of the leading current parts of the system components, forcing their respective dimensioning.
Balance grid losses (also referred to as the balance difference or balance losses) are the difference between the energy fed into the grid and energy received from that grid. Balance losses range from a few to several per cent of the volume of transmitted energy, and their size proves the technical and organisational level of the distribution company. The percentage distribution of energy losses in the distribution network in relation to the technical losses in the low-voltage (LV), medium-voltage (MV) and 110 kV networks, respectively, and to the balance losses in the distribution network for one of the Distribution System Operators (DSO) in Poland are given in Table 1. Balance losses are the sum of technical (current + voltage) and commercial losses, which arise mainly in the LV grid. For domestic distribution networks, energy losses in the 110 kV grid account for approx. 20%, in the MV grid approx. 50%, and in the LV grid approx. 30% of all balance losses [2]. Technical losses account for about 80% of energy losses, commercial losses for 20%. Losses in MV are 30%, in MV/LV transformers it is 17% and in LV lines 7% which have the largest share in technical losses. Losses in 110 kV grid lines are 12% and in 110/MV transformers are 7%, which also have a large share. No analysis has been carried out for high-voltage networks as they are 100% reserved and work with the system network; thus, distributors are not always fully responsible for the optimal operation of these networks.
Losses in MV/LV transformers for a given distribution area will depend primarily on the power of the installed transformers. Transformers should be selected so that losses in them (load and no-load ones) are as low as possible. If the minimum energy losses in the transformers are taken as a function of the aim, then the optimal transformer load factor (peak) is 0.7 ÷ 0.8 [2]. For most distribution areas, the transformer load factors (peak) are 0.4 ÷ 0.5. Due to the lack of metering of the network, the exact value of the energy fed into the low-voltage network is not known; hence, the energy losses in these networks cannot be precisely determined. The load energy losses depend on many factors: the energy density, the length of the network lines, their cross-sections, cut-outs in the networks, or the aforementioned transformer loads. About 55% of energy losses in the distribution network occur in the lines of the LV and MV networks and in MV/LV transformers; therefore, analysis methods should be developed to improve the operation of these areas [2].

Reliability of Distribution Networks
An important aspect of the operation of the distribution network in terms of its effective operation is its reliability. The importance of the continuity and quality of electricity supply is constantly growing. The continuity of electricity supply is seen by most people as a necessary part of our daily lives. An unpredictable, temporary lack of energy supplies often implies problems at the level of everyday human functioning. In general, we only see the importance of the continuity of electricity supply in the absence of electricity. Much greater problems arise in the event of a power failure in production companies, e.g., glassworks or mines. There, the economic losses caused in the event of a power failure are incomparably greater and are associated with a threat to the lives of employees. The model for assessing the performance of energy enterprises in force in Poland requires a continuous analysis of key reliability indicators used in quality regulation [45]. The calculated qualitative indicators enable comparative analysis of domestic distribution network operators. It should be emphasised that the reliability of distribution networks is subject to changes. New trends and directions for the development of electromobility, the construction of energy storage, microgrids and energy clusters are visible. The methods of network operation are also changing. All this undoubtedly affects the values of the reliability indicators of distribution networks [46]. The above reasons make it necessary to conduct research in this field [7,47].

Statistical Methods of Analysing Operation of Distribution Networks
There are two basic statistical approaches for estimating the density function of the analysed random variables: parametric and non-parametric ones [48]. In the parametric approach to estimation, the model is adjusted to the adapted basic parametric model. The main advantage of the parametric approach is that it is easy to infer, and there is no problem of bias when using parametric methods. The main disadvantage of parametric models is that the actual distribution of the random variable under study must be fully known. For a wrongly chosen distribution model, the parametric approach leads to inconsistent estimators and thus to incorrect inference. In parametric approaches, data analysis most often comes down to checking the fit to a specific model distribution.
The non-parametric approach is definitely more flexible. Nothing more is needed than a basic assumption regarding the smoothness of the random variable distribution function we are looking for. The approach, without imposing the initial assumptions of the model that are difficult to verify, allows the data to speak for themselves. The non-parametric method is especially useful for exploratory data analysis in situations where the density function cannot be unequivocally determined. The main disadvantage of non-parametric methods is the necessity to determine the smoothing parameter [48].

Non-Parametric Method of Analysing Data on Operation of Distribution Network
The concept of kernel estimators has become one of the basic methods of nonparametric estimation. They were first introduced to the research literature for univariate data in the 1950s and 1960s independently by M. Rosenblatt and E. Parzen, and their basic concept was derived from the problem of estimating the density function of the probability distribution. It was soon concluded that analogous estimators for multivariate data would be an important addition to multivariate statistics. On the basis of subsequent studies, the multivariate estimation of the kernel density reached a level of maturity comparable to its one-dimensional counterparts. A typical issue for the use of kernel density estimation (KDE) is to determine the density function of the probabilistic distribution of a random variable on the basis of the obtained sample [48,49].
The definition of KDE: Let there be a given n dimensional random variable X, the distribution of which has density f . Its kernel estimatorf : R n → [0, ∞) is determined on the basis of the value of m element random sample x 1 , x 2 , . . . , x m , obtained from variable X, which in its basic form is defined by the equation:f where function K : R n → [0, ∞), measurable, symmetric about zero and having a weak global maximum at this point, satisfies the condition R n K(x)dx = 1 and is called the kernel; H is known as the bandwidth. Bandwidth H is a matrix of smoothing parameters and its choice is crucial for the performance of kernel estimators [50]. From a statistical point of view, the shape of the kernel does not matter that much. There is the possibility of an arbitrary choice-the normal kernel is most often chosen. When selecting the K kernel function, the properties of the obtained estimator, the simplicity of calculations and the properties of the kernel function should be taken into account.
In the case of the multivariate x ∈ R two natural specializations of the above concept are used: the radial kernel [48] K and the product kernel: where C is a positive constant, determined so that R n K(x)dx = 1 is satisfied, while K(x i ) is a one-dimensional kernel for each coordinate. The radial kernel is more effective than the product one, but from the standpoint of application uses, the difference is negligible. These issues are discussed in detail in [48]. The practical application of analysis with kernel random variable estimators requires the use of a computer with appropriate statistical software, e.g., in the R environment. The ks library provides both single and multivariate data analysis, including functionality for kernel density estimation and kernel discriminant analysis. For KDE there are several varieties of bandwidth selectors: plug-in (PI), least squares (or unbiased) cross validation (LSCV or UCV), biased cross validation (BCV), smoothed cross validation (SCV) and normal scale (NS) [50]. Non-parametric data analysis methods have been used, for example, in [51][52][53][54][55][56]. Although the methods for determining the smoothing parameter are well researched and described, their use often requires their proper application. During the work on the article, calculations were performed for all the H determination methods available in the ks library. As a result of comparing the results obtained for the studied data for each H determination method, the PI method was selected (it took into account the variability of the analysed data most fully).

Network Loss Analysis
In order to analyse the operation of electricity distribution networks, data from 41 national distribution areas (accounting for over 40% of the territory of Poland) were collected and processed. The data concerned energy introduced and sold from the LV and MV grids, balance losses occurring in these grids and technical parameters such as area, length of lines making up the LV and MV grids, in addition to the number and power of MV/LV transformers. Measurement of distribution areas provides information on balance losses in the 110 kV network and together in the LV as well as the MV. In order to determine the energy losses in individual network elements, it is necessary to calculate them.The STRATY (LOSSES) program [3] was used for this purpose. Among others, load losses in the LV and MV grid lines and losses in the MV/LV transformers were calculated. Using kernel estimators, studies were carried out to determine which distribution areas have high energy losses, and then to assess what is causing it. Figure 1 presents the analysis of energy losses for low voltage networks using twodimensional estimation of the probability density function of the analysed variables with the normal kernel. The variables in Figure 1a relate to the average energy consumption by the recipient supplied from the low voltage network and the percentage of load losses occurring in the lines of this network. Figure 1b shows the average length of the low voltage line in relation to the percentage of load losses in the lines. All drawings in the article obtained using the ks library in the R environment [57]. show a relatively high heterogeneity of the low voltage grid, much greater in the case of the average line length analysis than in the case of the average energy consumption by the consumer. The load energy losses in the LV lines are from 0.63% to 2.37%, the median is 1.05%; the average energy consumption of a single consumer supplied from the low voltage grid is from 2.28 to 5.05 MWh/consumer/year and the median is 3.07 MWh/consumer; the average length of the low-voltage line is from 305 m to 651 m; the median is 547 m. Some distribution areas have significantly different (higher or lower) energy losses from other areas. The conducted study showed that the lower energy losses resulted from much shorter network circuits, characteristic of cable networks. The higher load energy losses were the result of long low voltage grid circuits. Long line circuits, even at low loads, can be a problem for distribution companies and investment efforts should be stepped up in these areas. Figure 2 presents the two-dimensional PDF function of load energy losses in the medium-voltage network. Figure 2a shows the energy density and percentage load losses occurring in the lines of the medium voltage network, Figure 2b shows the length of the medium-voltage lines and the percentage load losses occurring in these lines. load losses occurring in the MV lines, respectively. The diagrams show the high heterogeneity of the medium-voltage network. The load energy losses in the MV lines range from 0.25% to 4.12%, the median is 1.90%; the energy density in the medium voltage network ranges from approx. 80 to approx. 7000 MWh/km 2 , the median is 232 MWh/ km 2 ; the length of the medium-voltage lines ranges from 786 km to 4697 km, the median is 2732 km. Significantly lower values of the load energy losses are due to the high energy density. The high-load energy losses are associated with long strings of medium-voltage lines. The low energy losses occur with high energy density, which is characteristic of cable networks. Further analysis should be carried out in these areas in order to more accurately determine the cause of the high energy losses. If the scope of the collected data allows, the analysis should be carried out separately for cable lines and overhead lines.    The graphs show a large heterogeneity of losses in the MV/LV transformers. It may result from several reasons: (1) load losses depend on the transformer load factor; the lower the load, the lower the losses, but the technical losses in transformers (voltage and load) are higher, (2) transformers of different classes operate in networks; thus, the transformer class determines the size of losses. Further analysis of the outliers should be carried out to ascertain the cause of the too high energy losses.

Reliability Analysis
In the second part of the study, the operation of 41 distribution areas was analysed, the same for which the network loss analysis was carried out, in terms of its operational reliability. The SAIDI (System Average Outage Duration Index) and SAIFI (System Average Outage Frequency Index) indicators are commonly used in the global reliability analysis. SAIDI is an indicator of the average system duration of a long and very long outage in electricity supply, expressed in minutes per customer per year, being the sum of the products of its duration and the number of customers exposed to the effects of this outage during the year, divided by the total number of customers served. Similarly, SAIFI refers to the average system frequency of long and very long outages in electricity supply, which is the number of all these outages during the year, divided by the total number of customers served.
The SAIDI and SAIFI indicators are determined separately for scheduled and unplanned outages. In Poland, during calculations, catastrophic outages, i.e., outages lasting longer than 24 h, should be analysed separately. SAIDI and SAIFI do not include short outages lasting less than 3 min. Figure 4a presents the two-dimensional PDF of the distribution of the SAIFI for scheduled outages (SAIFIp) and SAIDI for scheduled outages (SAIDIp) indicators, while Figure 4b presents the SAIFI for unplanned outages (SAIFIn) as well as SAIDI for unplanned outages (SAIDIn) indicators.  It should be noted that in one of the analysed areas there were above-average, unfavourable weather conditions, which were reflected in the SAIFIn and SAIDIn values. Figure 5a presents the analysis of the distribution of the average system index for unplanned long and very long outages with catastrophic outages, respectively for the duration (SAIDInk) and their frequency (SAIFInk) of the analysed 41 regions. Similarly, Figure 5b presents the distributions for the sum of the times of all planned and unplanned outages together with catastrophic (SAIFI_sum, SAIDI_sum).

Conclusions
The analyses carried out in this study allow for the following conclusions: • The non-parametric approach is much more flexible than the parametric approach. The estimation of the density function of the analysed random variables using KDE provides a productive tool for assessing the operation of the distribution system. • The presented analysis of the data on the operation of distribution networks allows areas to be found for which energy losses or reliability levels are outliers in relation to other regions, and enables distribution companies to optimally invest their funds in the distribution network. This method was used in the energy audit for one of the Distribution System Operators. • The analysis of the operation of national distribution networks shows the need for large investments in selected parts of the network. On the basis of the conducted study, it is possible to precisely indicate which areas should be modernised in the first place. • Indicating outliers allows you to narrow down the area for which, having more detailed data, such as e.g., grid diagrams, energy fed into a given line or transformer, it is possible to conduct an in-depth analysis and determine which lines or transformers require modernisation. • The main reasons for the differences in the levels of network reliability in individual areas include: various share of cable lines in the network, the renovation policy implemented or the strategy of managing failure and post-emergency works. • It should be noted that distribution companies are currently facing new challenges related to adapting the distribution network to integration with dynamically installed distributed energy sources. Funding: This research received no external funding.

Conflicts of Interest:
Authors declare no conflict of interest.