Benchmarking Daily Line Loss Rates of Low Voltage Transformer Regions in Power Grid Based on Robust Neural Network

: Line loss is inherent in transmission and distribution stages, which can cause certain impacts on the proﬁts of power-supply corporations. Thus, it is an important indicator and a benchmark value of which is needed to evaluate daily line loss rates in low voltage transformer regions. However, the number of regions is usually very large, and the dataset of line loss rates contains massive outliers. It is critical to develop a regression model with both great robustness and e ﬃ ciency when trained on big data samples. In this case, a novel method based on robust neural network (RNN) is proposed. It is a multi-path network model with denoising auto-encoder (DAE), which takes the advantages of dropout, L2 regularization and Huber loss function. It can achieve several di ﬀ erent outputs, which are utilized to compute benchmark values and reasonable intervals. Based on the comparison results, the proposed RNN possesses both superb robustness and accuracy, which outperforms the testing conventional regression models. According to the benchmark analysis, there are about 13% outliers in the collected dataset and about 45% regions that hold outliers within a month. Hence, the quality of line loss rate data should still be further improved.


Introduction
Line loss rate is a vital indicator in power gird as it can reflect the operation and management levels in both economic and technical aspects [1].It is inevitable and directly impacts the profits of power-supply corporations [2].Commonly, line loss can be classified as technical and non-technical ones.In terms of technical line loss, it is caused by the electro-thermal effect of conductors, which is unavoidable over the transmission process.The level of technical line loss depends on the structure and operation state of the power grid, conductor type, as well as the balance condition of three-phase loads [3].Non-technical line loss usually occurs due to electricity theft, which may lead to abnormal values in metered line loss rates.Hence, it is urgently concerned by power grid operators whether the daily line loss rate values are within a reasonable range, i.e., the pass percentages of daily line loss rates.It is sometimes difficult to distinguish normal line loss rate values and outliers from a large number of collected samples.Besides the intervals, a benchmark value of daily line loss rate is also essential for transformer regions, as it directly provides the information what a correct daily line loss rate value the region should approximately achieve, in order for operators to better know the operating condition of the region, as well to further improve the level of line loss management.In this case, an accurate calculation method is still needed nowadays to obtain the benchmark values and reasonable intervals of line loss rates, as well as to recognize outliers from those samples of line loss rates.
In the field of data mining and analysis, there are usually four approaches to calculate benchmarks and detect outliers, i.e., the empirical method, statistical method, unsupervised method and supervised method [4][5][6][7].Firstly, empirical method utilizes practical experience to set an interval with fixed bounds, where a value out of the interval can be treated as an outlier.In the benchmarking of daily line loss rates, an empirical interval is customarily set to be −1%~5%.It is noted that although line loss rate is usually non-negative, a value no less than −1% is acceptable due to unavoidable acquisition errors.This method is simple and easy to realize, whereas a fixed interval is sometimes inaccurate and cannot reflect the influences of relevant factors.Secondly, the statistical method is aimed to research the distribution of data samples, where the outliers can be eliminated using probabilistic density functions [8] or box-plots [9,10].Comparing to empirical methods, the interval bounds in statistical methods is able to adapt to different testing samples, while the influencing factors of line loss rates can still hardly be involved in this kind method.Thirdly, the unsupervised method, i.e., the clustering method, is also an efficient way to detect outliers [11].In clustering, a sample of line loss rate can be treated as a data point, where the values of line loss rate and its influencing factors are the different dimensional attributes of the point.The outliers can be identified based on the distances between the data points and clustering centers [12][13][14].Unsupervised methods usually outperform empirical and statistical methods, as multi-dimensional factors can be inputted and analyzed [15].Nevertheless, unsupervised methods still face several problems.On one hand, it is sometimes difficult to design a proper distance function as those input factors are in dissimilar dimensions and units.On the other hand, some clustering methods involve all data points to update clustering centers, e.g., k-means and fuzzy C-means (FCM), where outliers may easily affect the centers and final clustering results.Reasonable intervals cannot be calculated by unsupervised methods as well.Finally, supervised methods utilize machine learning models to solve classification [16,17] and regression problems [18,19], which are designed for outlier detecting and benchmark calculating tasks, respectively.Classification models learn from labeled samples to distinguish normal and abnormal data.However, line loss rate samples are usually unlabeled, as it is unable to recognize whether a collected value of line loss rates is normal or not.
According to the relevant references, data-driven methods have been widely applied to estimate line loss values and loss rates, where clustering and regression methods are the two major ones [20].In [21], FCM is adopted to select a reference value for each type of feeders, in order to calculate the limit line loss rates of feeders for distribution networks.Similarly, another cluster method, namely Gaussian mixture model (GMM), is utilized to calculate line loss rates for low-voltage transformer regions in [22].The above two methods both calculate a fixed benchmark value for each cluster, where they are not designed for a single feeder or region.Considering this fact, regression methods are proposed to calculate a benchmark with certain inputs.Those inputs are the influencing factors of line losses from only one feeder or region.Reviewing the state-of-art methods, decision tree and its derived boosting models are the most frequent-used ones in line loss computation.In [23,24], gradient boosting decision tree (GBDT) is used to predict and estimate line losses for distribution networks and transmission lines, respectively.Among those, power flow and weather information are taken as input factors.In [25], extreme gradient boosting (XGBT) model is proposed to estimate line losses for distribution feeders, based on the characteristics of feeders.
Regression models can obtain benchmark values for line loss rate samples based on different influencing factors, where a value greatly away from benchmarks can be treated as an outlier.Nevertheless, a regression model may perform poor robustness and reliability when directly trained on samples with massive outliers, causing it critical to develop a highly robust method.In addition to the boosting models mentioned before, knearest neighbors (KNN) is one of the most common-used machine learning models with great robustness [26][27][28].It analyzes the similarity between the predicted sample and original training ones, in order to calculate an averaged value based on those nearest training samples.This calculating method can relieve certain impacts from outliers, but it increases the computational burden at the application stage and can hardly deal with high-dimensional inputs.Besides, support vector machine (SVM) is also frequently utilized for robust regression [29].As the regression results of SVM are strongly correlated with support vectors, the influences of outliers are decreased.However, SVM shows low training efficiency on a large amount of samples.Due to the great number of regions in practical application, a method with both high efficiency and robustness is proposed in this study, called robust neural network (RNN).A neural network utilizes error back-propagation (BP) and mini-batch gradient descent algorithms to update its parameters, suitable for big data samples training.Besides, RNN modifies the structure of conventional neural networks, further increasing its robustness against outliers.The main contributions of this study can be summarized in detail as follows.
(1) As the number of researches that focus on benchmarking daily line loss rates is limited, a supervised regression method is proposed in this study to obtain benchmark values of daily line loss rates in different transformer regions.In the proposed supervised method, various influencing factors of line loss rates are considered, where a high computation accuracy can be thus ensured.(2) A novel RNN model is proposed in this study.It possess a multi-path architecture with denoising auto-encoders (DAEs).Moreover, L2 regularization, dropout layer and Huber loss function are also applied in RNN.The robustness and reliability of the proposed regression model are greatly improved when compared with conventional machine learning models according to the testing datasets in the case study.(3) Based on the multiple outputs of the RNN, a method is proposed to calculate benchmark values and reasonable intervals for line loss rate samples.It can precisely evaluate the quality of sampled datasets and eliminate outliers of line loss rates, increasing the stability of data monitoring.
The rest of the paper is organized as follows.The utilized dataset and the proposed method based on RNN are introduced in Section 2. The comparison results and discussions are provided in Section 3. The conclusion is drawn finally in Section 4.

Theoretical Computation Equations of Line Losses
Theoretical equations of line losses are proposed to compute technical line losses, which are based on the equivalent resistance method.The method supposes that there is an equivalent resistance at the head of line, where the energy loss of three-phase three-wire and three-phase four-wire systems can be formulated as [30]: where ∆A b refers to the theoretical line loss with balanced three-phase load.N is the structure coefficient, which is equal to 3 at a three-phase three-wire system and equal to 3.5 at three-phase four-wire system.K, I av , R eq and T denote the shape coefficient of load curves, the average current at the head of line (A), the equivalent resistance of conductors (Ω) and the operating time (h), respectively.Furthermore, R eq can be computed under the following equation: Appl.Sci.2019, 9, 5565 4 of 17 where N i , A i and R i are the structure coefficient, the metered electricity power and the resistance of the ith line segment, respectively.A j denotes the electricity power collected from the jth power meter.
For a system with balanced three-phase load, the theoretical line loss should be corrected as: where K ub represents the correction coefficient that can be defined as: where k = 2 with one phase heavy load and two phase light loads.If there are two phase heavy loads, k = 8. δ I denotes the unbalance level of three-phase load, which can be calculated as follows: where I max is the current from the phase with the maximum load.Thus, the theoretical line loss defined above is an unavoidable energy loss, which is so-called technical line loss.However, the non-technical line loss caused by electricity theft is also concerned by power grid operators.As those non-technical loss situations can lead to abnormal values in the metered daily line loss rates, it is of necessity to calculate reasonable intervals for outlie discrimination, which is one of the purposes in this study.

Datasets
In practical application, the pass percentages of daily line loss rates in transformer regions are generally examined once a month in State Grid Corporation of China.In this case, the line loss rate dataset from July 2017 is utilized in the study, which is collected in daily intervals, in order to examine the pass percentage of line loss rates in the month.The pass percentage quota is especially important in July as it usually meets the peak load period of summer.The dataset is obtained from a total of 19,884 regions, which are located in Wuxi, Jiangsu Province, China.As a result, there are altogether 616,404 samples in this study, which satisfies the demand of big data analysis.Based on the dataset, about 80% of the samples (15,907 regions) are chosen as training ones and the others (3977 regions) are testing samples.

Data Quality Analysis
The research object in this study is daily line loss rate, some of whose curves are presented as examples in Figure 1.Besides, the 25 percentile (q1), median (q2), 75 percentile (q3), maximum (max), minimum (min), mean, standard deviation (std), lower bound (l a ) and upper bound (u a ) values of the overall line loss rate dataset are calculated and provided in Table 1.The distribution box-plots of the line loss rates are provided in Figure 2. It is noted that lower bound (l a ) and upper bound (u a ) are calculated based on 25 percentile (q1) and 75 percentile (q3) [31], where a value out of the bounds can be treated as an outlier: According to the curves and quality analysis, the data characteristics of daily line loss rates can be summarized as follows: 1.
The line loss rate data possess few daily regularity and show high fluctuation.From Figure 1, the curves of line loss rates in different regions change greatly over the days, where historical line loss rates can be hardly used to estimate the further values.Thus, it is vital in this study to select influencing factors of line loss rate.

2.
The deviations of outliers in the dataset are sometimes extremely away from normal values, indicating the low dependability of the acquisition and communication equipment.According to Table 1 and Figure 2, the lower and upper bounds of the original dataset in the box-plot are −1.57% and 5.22%, respectively, which is quite close to the project standard (−1% and 5%).However, the maximum and minimum of the collected line loss rates are 100% and −1.69 × 10 6 %, respectively; greatly different from the bounds.In this case, benchmarking line loss rates is still necessary nowadays in practicable applications.

3.
The quality of the dataset is poor to be used directly.As the component analysis of the dataset is presented in Figure 3, there are a large number of outliers and missing values, constituting 8.67% and 6.72% of the overall dataset.In this study, the spline interpolation method is utilized to fill the missing values.From Table 1 and Figure 2, the dataset after interpolation holds a similar distribution comparing to that of the original dataset.On the contrary, although the outliers can be eliminated directly based on l a and u a , the distribution will change and it will be difficult to calculate an accurate reasonable intervals.
Appl.Sci.2019, 9, x FOR PEER REVIEW 5 of 17 to 0 and 0, the lower and upper bounds of the original dataset in the box-plot are -1.57% and 5.22%, respectively, which is quite close to the project standard (−1% and 5%).However, the maximum and minimum of the collected line loss rates are 100% and -1.69×10 6 %, respectively; greatly different from the bounds.In this case, benchmarking line loss rates is still necessary nowadays in practicable applications.
3. The quality of the dataset is poor to be used directly.As the component analysis of the dataset is presented in 0, there are a large number of outliers and missing values, constituting 8.67% and 6.72% of the overall dataset.In this study, the spline interpolation method is utilized to fill the missing values.From 0 and 0, the dataset after interpolation holds a similar distribution comparing to that of the original dataset.On the contrary, although the outliers can be eliminated directly based on la and ua, the distribution will change and it will be difficult to calculate an accurate reasonable intervals.

Influencing Factors of Line Loss Rate
Taking into account of both the possible influencing factors and the information recorded, a total of twelve factors are selected as inputs of the regression models, as shown in 0. Among those, the third and fourth factors are one-bit codes, while the others are numerical values.

Influencing Factors of Line Loss Rate
Taking into account of both the possible influencing factors and the information recorded, a total of twelve factors are selected as inputs of the regression models, as shown in 0. Among those, the third and fourth factors are one-bit codes, while the others are numerical values.

Influencing Factors of Line Loss Rate
Taking into account of both the possible influencing factors and the information recorded, a total of twelve factors are selected as inputs of the regression models, as shown in Table 2.Among those, the third and fourth factors are one-bit codes, while the others are numerical values.

Calculation of Benchmark Values and Reasonable Intervals
According to the data quality analysis, the original datasets contain a large quantile of outliers, which are far from the rational range, causing it much difficult to obtain an accurate result.Therefore, the task of this study is to utilize robust learning strategy and achieve a stable regression result away from outliers, as shown in Figure 4.

Calculation of Benchmark Values and Reasonable Intervals
According to the data quality analysis, the original datasets contain a large quantile of outliers, which are far from the rational range, causing it much difficult to obtain an accurate result.Therefore, the task of this study is to utilize robust learning strategy and achieve a stable regression result away from outliers, as shown in 0.  Commonly, there is a robust learning solution that one can set thresholds manually and delete outliers from the dataset according to those thresholds, where the rest of the dataset can be applied to train a machine learning model.However, it still remains a problem how to decide precise thresholds.Besides, computed bounds of the reasonable interval from the learning model may be close to the manual thresholds, which breaks the distribution of the original dataset and makes it meaningless to train a probabilistic learning model.In this case, the calculation method based on RNN is proposed, as shown in 0, which consists of the following steps: 1. Build a RNN.In order to fully expand its robustness, DAE, multi-path architecture, L2 regularization, dropout layer and Huber loss function are applied.It is noted that a RNN possesses ten output nodes, where each node is connected to one layer with a dissimilar dropout rate (from 0.05 to 0.50).2. Calculate an average value according to the ten different outputs, which is the final benchmark value of line loss rate, as: where  i y is the ith benchmark value, and n i y is the nth output of the ith line loss rate.
3. Operate error analysis to acquire a reasonable interval.Not only the absolute error between the benchmark value and the actual line loss rates is computed, the variance of different outputs is calculated as well.According to the interval result, data points that are not involved between the bounds of the interval are considered as outliers.The operation can be described under the following equations: where e1 and e2 are the results of error analysis.e1 is a constant and e2 changes according to the number i. ns is the number of training samples; * i y is the ith actual line loss rate; li and ui are the lower and upper bounds of the reasonable interval, respectively.Furthermore, as outliers exist in the actual line loss rate values, which may influence the result of e1, a two-tailed test is utilized Commonly, there is a robust learning solution that one can set thresholds manually and delete outliers from the dataset according to those thresholds, where the rest of the dataset can be applied to train a machine learning model.However, it still remains a problem how to decide precise thresholds.Besides, computed bounds of the reasonable interval from the learning model may be close to the manual thresholds, which breaks the distribution of the original dataset and makes it meaningless to train a probabilistic learning model.In this case, the calculation method based on RNN is proposed, as shown in Figure 5, which consists of the following steps: 1.
Build a RNN.In order to fully expand its robustness, DAE, multi-path architecture, L2 regularization, dropout layer and Huber loss function are applied.It is noted that a RNN possesses ten output nodes, where each node is connected to one layer with a dissimilar dropout rate (from 0.05 to 0.50).

2.
Calculate an average value according to the ten different outputs, which is the final benchmark value of line loss rate, as: where y i is the ith benchmark value, and y n i is the nth output of the ith line loss rate.

3.
Operate error analysis to acquire a reasonable interval.Not only the absolute error between the benchmark value and the actual line loss rates is computed, the variance of different outputs is calculated as well.According to the interval result, data points that are not involved between the bounds of the interval are considered as outliers.The operation can be described under the following equations: where e 1 and e 2 are the results of error analysis.e 1 is a constant and e 2 changes according to the number i. n s is the number of training samples; y * i is the ith actual line loss rate; l i and u i are the lower and upper bounds of the reasonable interval, respectively.Furthermore, as outliers exist in the actual line loss rate values, which may influence the result of e 1 , a two-tailed test is utilized to eliminate the possibly abnormal y * i values that are smaller than 0.7 percentile and bigger than 99.3 percentile, as shown in Figure 6:

Robust Neural Network
As mentioned before, an RNN is used in this study for robust learning, whose architecture is provided in 0. It is made up of three main paths which are combined by concatenation, where there is a DAE on each main path.The concatenated output nodes are placed in a same layer, which represent high-order features extracted from original inputs.For the further improvement of robustness, L2 regularization is utilized in this layer to limit the output values of those nodes.Then, ten dropout layers with different dropout rate are stacked after the high-order feature layer, where ten outputs can be obtained.The ten outputs are analyzed to calculate the benchmark value and reasonable interval, as introduced above in Section 2.3.

Robust Neural Network
As mentioned before, an RNN is used in this study for robust learning, whose architecture is provided in 0. It is made up of three main paths which are combined by concatenation, where there is a DAE on each main path.The concatenated output nodes are placed in a same layer, which represent high-order features extracted from original inputs.For the further improvement of robustness, L2 regularization is utilized in this layer to limit the output values of those nodes.Then, ten dropout layers with different dropout rate are stacked after the high-order feature layer, where ten outputs can be obtained.The ten outputs are analyzed to calculate the benchmark value and reasonable interval, as introduced above in Section 2.3.

Robust Neural Network
As mentioned before, an RNN is used in this study for robust learning, whose architecture is provided in Figure 7.It is made up of three main paths which are combined by concatenation, where there is a DAE on each main path.The concatenated output nodes are placed in a same layer, which represent high-order features extracted from original inputs.For the further improvement of robustness, L2 regularization is utilized in this layer to limit the output values of those nodes.Then, ten dropout layers with different dropout rate are stacked after the high-order feature layer, where ten outputs can be obtained.The ten outputs are analyzed to calculate the benchmark value and reasonable interval, as introduced above in Section 2.3.

Denoising Auto-Encoder
The architecture of DAE is provided in 0. It is a robust variant of auto-encoder, which possesses a noise layer before encoder [32], such as a normal (Gaussian) noise layer: where xi and xi,n are the ith input and the ith output of the noise layer, respectively.N (0,σ 2 ) is a normal distribution with a mean value of 0 and a variance value of σ 2 .In this study, σ is set to be 0.05 when the inputs are normalized into [0, 1].Besides, the encoder and decoder layers in DAE are both made up of conventional fullyconnected (FC) layers, whose equation can be expressed as:

Input
where l i y and −1 l j x are the ith output and the jth input of the lth layer, respectively; wij and bj are the weight and bias of the FC layer that connect the jth input and the ith output.For the encoder layer, the number of outputs is smaller than that of inputs, i.e., i < j; and the number of outputs in the decoder layer is equal to that of the original inputs.Hence, after the computation of DAE, the dimension and size of inputs remain unchanged, whereas the robustness of features increases as they can resist a certain degree of noise interference.

Denoising Auto-Encoder
The architecture of DAE is provided in Figure 8.It is a robust variant of auto-encoder, which possesses a noise layer before encoder [32], such as a normal (Gaussian) noise layer: where x i and x i,n are the ith input and the ith output of the noise layer, respectively.N (0,σ 2 ) is a normal distribution with a mean value of 0 and a variance value of σ 2 .In this study, σ is set to be 0.05 when the inputs are normalized into [0, 1].

Denoising Auto-Encoder
The architecture of DAE is provided in 0. It is a robust variant of auto-encoder, which possesses a noise layer before encoder [32], such as a normal (Gaussian) noise layer: where xi and xi,n are the ith input and the ith output of the noise layer, respectively.N (0,σ 2 ) is a normal distribution with a mean value of 0 and a variance value of σ 2 .In this study, σ is set to be 0.05 when the inputs are normalized into [0, 1].Besides, the encoder and decoder layers in DAE are both made up of conventional fullyconnected (FC) layers, whose equation can be expressed as:

Input
where l i y and −1 l j x are the ith output and the jth input of the lth layer, respectively; wij and bj are the weight and bias of the FC layer that connect the jth input and the ith output.For the encoder layer, the number of outputs is smaller than that of inputs, i.e., i < j; and the number of outputs in the decoder layer is equal to that of the original inputs.Hence, after the computation of DAE, the dimension and size of inputs remain unchanged, whereas the robustness of features increases as they can resist a certain degree of noise interference.Besides, the encoder and decoder layers in DAE are both made up of conventional fully-connected (FC) layers, whose equation can be expressed as: where y l i and x l−1 j are the ith output and the jth input of the lth layer, respectively; w ij and b j are the weight and bias of the FC layer that connect the jth input and the ith output.For the encoder layer, the number of outputs is smaller than that of inputs, i.e., i < j; and the number of outputs in the decoder layer is equal to that of the original inputs.Hence, after the computation of DAE, the dimension and size of inputs remain unchanged, whereas the robustness of features increases as they can resist a certain degree of noise interference.

Multiple Paths Combined by Addition and Concatenation
There are altogether three main paths in RNN, which have similar layers and whose outputs are combined under a concatenation operation: where C(•) is the concatenation operation, which combines output nodes from different layers into an entire layer; y c and y k,mp are the output vector of the concatenation and the output vector of the kth main path, respectively; is the computation result of the kth main path; x denotes the input vector of RNN; w n w k and b n b k are the weight matrixes and bias vectors of the kth path, respectively; n w and n b are the numbers of weights and biases, respectively.
Furthermore, a main path is formed by two sub-paths, i.e., a DAE sub-path and a FC layer sub-path.The outputs of the two sub-paths are added as the output of a main path, as follows: where g k (•) represents the computation of DAE on the kth main path; w sp k and b sp k are the weight matrix and bias vector of the FC layer sub-path on the kth main path, respectively.

Dropout
Dropout is a special kind of layer, which is efficient to prevent over-fitting [33].The procedure of dropout can be summarized into two steps, i.e., the training step and application step.For a conventional FC layer as expressed in Equation ( 12), there are j input nodes.In the training step, the input nodes will be abandoned under a probability p (0 < p < 1), where those abandoned nodes cannot be connected to outputs [34], as shown in Figure 9.After training, all the input nodes are operating in the application step, whereas the weight values are multiplied by the probability p, which can be described as: where p is the probability, called dropout rate.It is a hyper-parameter and is set from 0.05 to 0.50, with a step of 0.05, in order to obtain ten different outputs in this study.

Multiple Paths Combined by Addition and Concatenation
There are altogether three main paths in RNN, which have similar layers and whose outputs are combined under a concatenation operation: where C(•) is the concatenation operation, which combines output nodes from different layers into an entire layer; yc and yk,mp are the output vector of the concatenation and the output vector of the kth main path, respectively; ( ,{ , }) is the computation result of the kth main path; x denotes the input vector of RNN; respectively; nw and nb are the numbers of weights and biases, respectively.Furthermore, a main path is formed by two sub-paths, i.e., a DAE sub-path and a FC layer subpath.The outputs of the two sub-paths are added as the output of a main path, as follows: where gk(•) represents the computation of DAE on the kth main path;

Dropout
Dropout is a special kind of layer, which is efficient to prevent over-fitting [33].The procedure of dropout can be summarized into two steps, i.e., the training step and application step.For a conventional FC layer as expressed in Equation ( 12), there are j input nodes.In the training step, the input nodes will be abandoned under a probability p (0 < p < 1), where those abandoned nodes cannot be connected to outputs [34], as shown in 0. After training, all the input nodes are operating in the application step, whereas the weight values are multiplied by the probability p, which can be described as: where p is the probability, called dropout rate.It is a hyper-parameter and is set from 0.05 to 0.50, with a step of 0.05, in order to obtain ten different outputs in this study.

Huber Loss Function
The training process of neural network is to set a loss function and utilize back-propagation (BP) gradient descent algorithm to update the parameters layer by layer.One of the most common-used function is mean squared error (MSE):

Huber Loss Function
The training process of neural network is to set a loss function and utilize back-propagation (BP) gradient descent algorithm to update the parameters layer by layer.One of the most common-used function is mean squared error (MSE): where n s is the number of training samples; y i and y * i are the ith predicted and the ith actual outputs, respectively.Besides, mean absolute error (MAE) is another conventional function, which can be defined as follows: where MSE and MAE can be also called L1 loss and L2 loss, as linear term and quadratic term are used in the MSE and MAE, respectively.Comparing MSE and MAE, MSE holds a smoother derivative function, which can benefit the calculation of gradient descent algorithm, whereas a small difference in MAE may cause a huge change in parameter updating.On the contrary, MAE demonstrates a better capability than MSE when fighting against outliers [35].In this case, a Huber loss function is applied in this study that combines the merits of both MSE and MAE [36], as shown in Figure 10: where δ is a hyper-parameter that needs to be set manually.It is set to be 10% in this study.
where MSE and MAE can be also called L1 loss and L2 loss, as linear term and quadratic term are used in the MSE and MAE, respectively.Comparing MSE and MAE, MSE holds a smoother derivative function, which can benefit the calculation of gradient descent algorithm, whereas a small difference in MAE may cause a huge change in parameter updating.On the contrary, MAE demonstrates a better capability than MSE when fighting against outliers [35].In this case, a Huber loss function is applied in this study that combines the merits of both MSE and MAE [36], as shown in 0: , 1 Huber , where δ is a hyper-parameter that needs to be set manually.It is set to be 10% in this study.

L2 Regularization
L2 regularization in this study is aimed to set a penalty term for the nodes that hold large activation outputs, in order to prevent over-fitting and increase robustness of the neural network.The regularization works during the training phase, where a two-norm penalty term is added to the training loss function [37], which can be expressed as follows: ( ) where L represents the final loss function for model training; and λ is a hyper-parameter of the penalty term, which is set to be 0.001 in this study.

Results and Discussion
The architecture and hyper-parameters of the proposed RNN are presented in 0.Moreover, considering the fact that there are a large number of training samples, k-nearest neighbors (KNN), decision tree regression (DTR) and single-hidden-layer artificial neural network (ANN) are established for comparisons, which hold relatively high training efficiency on big datasets.The training of the deep RNN model is conducted based on a personal computer with NVIDIA GTX 1080 GPU using Python 3.5 and Tensorflow 1.4.The calculation results and discussions are provided in detail as follows.It should be noted that all the hyper-parameters and training configurations of RNN, along with the hyper-parameters mentioned in Section 2.4 (i.e., σ, δ and λ), are chosen via gird

L2 Regularization
L2 regularization in this study is aimed to set a penalty term for the nodes that hold large activation outputs, in order to prevent over-fitting and increase robustness of the neural network.The regularization works during the training phase, where a two-norm penalty term is added to the training loss function [37], which can be expressed as follows: where L represents the final loss function for model training; and λ is a hyper-parameter of the penalty term, which is set to be 0.001 in this study.

Results and Discussion
The architecture and hyper-parameters of the proposed RNN are presented in Table 3.Moreover, considering the fact that there are a large number of training samples, k-nearest neighbors (KNN), decision tree regression (DTR) and single-hidden-layer artificial neural network (ANN) are established for comparisons, which hold relatively high training efficiency on big datasets.The training of the deep RNN model is conducted based on a personal computer with NVIDIA GTX 1080 GPU using Python 3.5 and Tensorflow 1.4.The calculation results and discussions are provided in detail as follows.It should be noted that all the hyper-parameters and training configurations of RNN, along with the hyper-parameters mentioned in Section 2.4 (i.e., σ, δ and λ), are chosen via gird search with three-fold Based on the proposed RNN, the pass percentage results of line loss rates can be analyzed, which are shown in 0. For the data point analysis of line loss rates, the number of outliers are more than that in 0, as the proposed method is able to precisely identify those outliers that are greatly different from the benchmark values.Furthermore, although the percentages of missing values and outliers are not relatively big in all data points, which are 6.72% and 13.06%, respectively, the regions that possess no missing and abnormal values within a month only occupy 19.84% of the whole dataset in the study, indicating a low reliability of the current acquisition equipment.

Comparsions and Disscussion
In order to evaluate the performance of the proposed method, including robustness and accuracy, comparison studies are conducted in this section.The hyper-parameters of the established KNN, DTR and ANN are provided in 0. The comparison results are presented and discussed in detail as follows.Based on the proposed RNN, the pass percentage results of line loss rates can be analyzed, which are shown in Figure 12.For the data point analysis of line loss rates, the number of outliers are more than that in Figure 3, as the proposed method is able to precisely identify those outliers that are greatly different from the benchmark values.Furthermore, although the percentages of missing values and outliers are not relatively big in all data points, which are 6.72% and 13.06%, respectively, the regions that possess no missing and abnormal values within a month only occupy 19.84% of the whole dataset in the study, indicating a low reliability of the current acquisition equipment.Based on the proposed RNN, the pass percentage results of line loss rates can be analyzed, which are shown in 0. For the data point analysis of line loss rates, the number of outliers are more than that in 0, as the proposed method is able to precisely identify those outliers that are greatly different from the benchmark values.Furthermore, although the percentages of missing values and outliers are not relatively big in all data points, which are 6.72% and 13.06%, respectively, the regions that possess no missing and abnormal values within a month only occupy 19.84% of the whole dataset in the study, indicating a low reliability of the current acquisition equipment.

Comparsions and Disscussion
In order to evaluate the performance of the proposed method, including robustness and accuracy, comparison studies are conducted in this section.The hyper-parameters of the established KNN, DTR and ANN are provided in 0. The comparison results are presented and discussed in detail as follows.

Comparsions and Disscussion
In order to evaluate the performance of the proposed method, including robustness and accuracy, comparison studies are conducted in this section.The hyper-parameters of the established KNN, DTR and ANN are provided in Table 5.The comparison results are presented and discussed in detail as follows.In order to evaluate the robustness of the proposed method, the distributions of the calculated benchmark values based on different testing models are analyzed, as shown in Figure 13.The detailed values of the distribution indictors are presented in Table 6.From the results, the testing ANN model shows the worst performance and is totally unable to calculate a valid benchmark value.The maximum and minimum values from the ANN are 4.49 × 10 6 % and −8.26 × 10 7 %, respectively, which can hardly be applied as benchmarks.KNN and DTR obtain similar results according to the distributions.They both utilize massive training samples that are close to the situation of an unknown testing region to decide new benchmark values.Thus, they are able to receive a better robustness than ANN in the study and are practicable at most of the testing regions.However, the minimum benchmark of the two models is −8.13 × 10 4 %, which is still not a reasonable value.Furthermore, the proposed RNN achieve the best result among all the four testing models, where the calculated benchmark values are within a reasonable range.The standard deviation of the benchmark values calculated by RNN is only 0.80%, indicating a stable and robust result obtained via the proposed method.
Appl.Sci.2019, 9, x FOR PEER REVIEW 14 of 17 In order to evaluate the robustness of the proposed method, the distributions of the calculated benchmark values based on different testing models are analyzed, as shown in 0. The detailed values of the distribution indictors are presented in 0. From the results, the testing ANN model shows the worst performance and is totally unable to calculate a valid benchmark value.The maximum and minimum values from the ANN are 4.49×10 6 % and −8.26×10 7 %, respectively, which can hardly be applied as benchmarks.KNN and DTR obtain similar results according to the distributions.They both utilize massive training samples that are close to the situation of an unknown testing region to decide new benchmark values.Thus, they are able to receive a better robustness than ANN in the study and are practicable at most of the testing regions.However, the minimum benchmark of the two models is −8.13×10 4 %, which is still not a reasonable value.Furthermore, the proposed RNN achieve the best result among all the four testing models, where the calculated benchmark values are within a reasonable range.The standard deviation of the benchmark values calculated by RNN is only 0.80%, indicating a stable and robust result obtained via the proposed method.

Accuracy Analysis
Besides the performance of robustness, accuracy is another important criterion in the study.As a result, three loss indicators are utilized to compare the four testing models, i.e., MAE, MSE and Huber loss, as their equations are discussed in Section 2.4.4.The outliers in the testing samples are eliminated before loss calculation using the two-tailed test, which is introduced above in Figure 6.The comparison result is shown in Table 7.According to the result, ANN performs the worst as its three loss indicators are much more than those of other models.Hence, the testing ANN is inapplicable when directly trained on samples with extreme outliers.Besides, although KNN and DTR show similar robustness, their accuracy indicators are quite different.KNN obtains the best MAE indicator, whereas the MSE value of KNN is bigger than that of the proposed RNN, due to the small number of outliers calculated by KNN.Comparing those indicators comprehensively, the proposed RNN shows the highest performance, as it achieves the best MSE and Huber loss indictors with a small MAE value as well.

Conclusions
Daily line loss rate is a critical indictor for power-supply corporations as it greatly affects their profits.In order to better manage the level of line losses and provide guidance for the construction and operation of low-voltage transformer regions, it is concerned with developing an efficient method to compute benchmark values for daily line loss rates.Based on the benchmarks, reasonable intervals of the daily line loss rates can be further obtained, which are able to help discover abnormal line loss rate values, as well to help operators to check and confirm those irregular operating conditions.However, the number of studies is limited that research on calculating benchmark values of daily line loss rates and eliminating outlier from collected line loss rate data.As a result, a regression calculation method based on RNN is proposed in this study.It consists of DAE, three main paths, dropout layers, Huber loss function, L2 regularization and ten outputs.The benchmarks are calculated according to the mean values of the ten outputs.After error analysis, reasonable intervals can be obtained to detect outliers of original line loss rate samples.
From the case study and comparison results, conventional ANN model fails to calculate results of benchmarks, as it cannot deal with outliers.KNN, DTR and the proposed RNN are proved applicable in the case study, where the proposed RNN outperforms the other two models.It shows both high accuracy and robustness among all the testing models.Furthermore, according to the final result obtained from the proposed RNN, there are about 13% outliers in the overall data points.
The percentage of regions only occupy about 20% that hold no missing and abnormal values of line loss rates within a month, indicating a low dependability of the acquisition equipment.Therefore, a reliable monitoring and management system for line loss data is still necessary nowadays in power grid.

Figure 1 .
Figure 1.Examples of daily line loss rates in July 2018 from different regions in Wuxi, Jiangsu Province, China.

Figure 2 .Figure 2 .
Figure 2. The distribution box-plots of original dataset and the dataset after interpolation operation.

Figure 4 .
Figure 4. Conventional learning methods may be easily affected by outliers.(a) Common condition; (b) affected by outliers.

Figure 4 .
Figure 4. Conventional learning methods may be easily affected by outliers.(a) Common condition; (b) affected by outliers.

Figure 5 .Figure 6 .
Figure 5.The flowchart of the proposed method in this study for calculating benchmark values and reasonable intervals based on robust neural network (RNN).

Figure 5 .Figure 5 .Figure 6 .
Figure 5.The flowchart of the proposed method in this study for calculating benchmark values and reasonable intervals based on robust neural network (RNN).

Figure 6 .
Figure 6.A two-tailed test to eliminate possibly abnormal line loss rate values.

Figure 7 .
Figure 7.The architecture of the proposed robust neural network (RNN).

Figure 7 .
Figure 7.The architecture of the proposed robust neural network (RNN).

Figure 7 .
Figure 7.The architecture of the proposed robust neural network (RNN).

b
are the weight matrixes and bias vectors of the kth path,

b
are the weight matrix and bias vector of the FC layer sub-path on the kth main path, respectively.

Figure 9 .
Figure 9.The principle of dropout during the training step.

Figure 9 .
Figure 9.The principle of dropout during the training step.

Figure 10 .
Figure 10.The principle of Huber loss function.

Figure 10 .
Figure 10.The principle of Huber loss function.

Figure 11 .
Figure 11.The results of benchmark values and reasonable intervals in six testing regions.

Figure 12 .
Figure 12.Pass percentage analysis of the line loss rates based on robust neural network (RNN).(a) Analysis of data points; (b) analysis of transformer regions.

Figure 11 .
Figure 11.The results of benchmark values and reasonable intervals in six testing regions.

Figure 11 .
Figure 11.The results of benchmark values and reasonable intervals in six testing regions.

Figure 12 .
Figure 12.Pass percentage analysis of the line loss rates based on robust neural network (RNN).(a) Analysis of data points; (b) analysis of transformer regions.

Figure 12 .
Figure 12.Pass percentage analysis of the line loss rates based on robust neural network (RNN).(a) Analysis of data points; (b) analysis of transformer regions.

Figure 13 .
Figure 13.The distributions of the calculated benchmark values based on different testing models.

Figure 13 .
Figure 13.The distributions of the calculated benchmark values based on different testing models.

Table 1 .
The data quality analysis based on the overall line loss rate dataset.
Figure 1.Examples of daily line loss rates in July 2018 from different regions in Wuxi, Jiangsu Province, China.

Table 1 .
The data quality analysis based on the overall line loss rate dataset.

Table 2 .
Influencing factors of line loss rate utilized in this study.

Table 2 .
Influencing factors of line loss rate utilized in this study.

Table 2 .
Influencing factors of line loss rate utilized in this study.

6.72% The total number of data points: 616,404 The total number of regions: 19,884 Pass percentage analysis of transformer regions Normal: 3
,946(19.84%)

Table 5 .
The hyper-parameters of the established k-nearest neighbors (KNN), decision tree regression (DTR) and artificial neural network (ANN).

Table 5 .
The hyper-parameters of the established k-nearest neighbors (KNN), decision tree regression (DTR) and artificial neural network (ANN).

Table 6 .
The robustness analysis results of different testing models.

Table 6 .
The robustness analysis results of different testing models.

Table 7 .
The accuracy analysis results of different testing models.