Application of Machine Learning for Fault Classiﬁcation and Location in a Radial Distribution Grid

: Fault location with the highest possible accuracy has a signiﬁcant role in expediting the restoration process, after being exposed to any kind of fault in power distribution grids. This paper provides fault detection, classiﬁcation, and location methods using machine learning tools and advanced signal processing for a radial distribution grid. The three-phase current signals, one cycle before and one cycle after the inception of the fault are measured at the sending end of the grid. A discrete wavelet transform (DWT) is employed to extract useful features from the three-phase current signal. Standard statistical techniques are then applied onto DWT coe ﬃ cients to extract the useful features. Among many features, mean, standard deviation (SD), energy, skewness, kurtosis, and entropy are evaluated and fed into the artiﬁcial neural network (ANN), Multilayer perceptron (MLP), and extreme learning machine (ELM), to identify the fault type and its location. During the training process, all types of faults with variations in the loading and fault resistance are considered. The performance of the proposed fault locating methods is evaluated in terms of root mean absolute percentage error (MAPE), root mean squared error (RMSE), Willmott’s index of agreement (WIA), coe ﬃ cient of determination (R 2 ), and Nash-Sutcli ﬀ e model e ﬃ ciency coe ﬃ cient (NSEC). The time it takes for training and testing are also considered. The proposed method that discrete wavelet transforms with machine learning is a very accurate and reliable method for fault classifying and locating in both a balanced and unbalanced radial system. 100% fault detection accuracy is achieved for all types of faults. Except for the slight confusion of three line to ground (3LG) and three line (3L) faults, 100% classiﬁcation accuracy is also achieved. The performance measures show that both MLP and ELM are very accurate and comparative in locating faults. The method can be further applied for meshed networks with multiple distributed generators. Renewable generations in the form of distributed generation units can also be studied.


Introduction
An electric power system includes three parts: generation, transmission, and distribution. The electric power produced at the generation stations is transmitted over long distance with high-voltage transmission lines. The voltage level is stepped down at distribution substations. Low-voltage in the loading and fault impedance are considered, and both balanced and unbalanced loading are studied. All types of faults are considered. Comparisons between MLP and ELM are deliberate. The proposed methodology can be complemented by considering meshed networks and smart grids with high penetration of distributed generation.
The remaining part of this paper is organized in four sections. Section 2 presents the theoretical background of machine learning. Section 3 explains the specifics of the discussions. Different case studies, results, and discussions are discussed in Section 4. Finally, the concluding remarks are summarized in Section 5.

Machine Learning Tools and Signal Processing
As it is highlighted in the previous section, knowledge-based fault classification and location techniques are more appropriate for distribution networks. For a knowledge-based technique to function effectively, the machine should be trained with appropriate information in advance. This training process is called machine learning. The following section briefly describes the types of machine learning tools. It is followed by a detailed discussion on one type of deep learning, called Multilayer feedforward network and extreme learning machine. Signal processing techniques that are applied to extract important features from measured signals are also discussed.

Machine Learning Tools
Machine learning algorithms build a model that maps a certain set of features from an input X to output Y, where the (X, Y) pair is a training dataset, using sample data collected from a given distribution and limited in amount to perform prediction without expert intervention [16]. The three common types of machine learning paradigms are: supervised, unsupervised, and reinforcement [17]. The multilayer perceptron network and extreme learning machine, both of which are based on an artificial neural network (ANN), are briefly discussed below.

Multilayer Perceptron (MLP) Network
MLP is the type of feed-forward ANN most frequently used. MLP consists of three groups of layers: the input layer, hidden layer, and output layer. The weights and biases are initialized with pseudo-random values and are changed through supervised learning methods. The gradient descent-based back propagation methods are used to tune the parameters. The input layer receives a dataset and passes it into the hidden layer. The hidden layer process the data with the help of activation function and finally, the targeted output is passed to the output layer [18].

Extreme Learning Machine (ELM)
An ELM is a feed-forward neural network with a single hidden layer which was first proposed by Huang et al. [19]. ELM compute the network parameters analytically unlike MLP, which determine the weights based on the gradient descent training procedure [20,21]. The norm of the output weights being minimum confirms that both the training error and norm of weights are minimum and hence guarantee the best generation performance and uniqueness of the solution [22]. ELMs are able to generalize well and can be trained faster than networks that are trained based on gradient descent back propagation [22].
Mathematically, the network output model for separate samples of L arbitrary hidden layers is described using Equation (1), where X = {(x i , t i )|xi ∈ R n , t i ∈ R m , i = 1, . . . , N} is a training set for a standard single hidden layer feed-forward neural network (SLFN), L is the number of hidden nodes, G(x) is activation function, a i is the input weight vector relating to the ith hidden node, with the input nodes or the center of the ith hidden node, β i is the weight vector connecting the ith hidden node and the output node, and b i is the threshold or impact factor of the ith hidden node. The equivalent equation of the above is Fβ = T, where F, β, and T are as defined in Equations (2)-(4): In a case where number of neurons in the hidden layer and distinct training samples are equal, i.e., M = L, the hidden layer output matrix of the network F will become a square invertible matrix and consequently, SLFNs will approximate the training samples accurately. As a result, ELMs aim to minimize both training error and the norm of the output weights according to the objective function given in Equation (5): Minimize Fβ − T 2 and β Lastly, the original form of ELMs uses the minimal norm least square and output weights ( β i ) given by Equation (6): where F † is the pseudo inverse of hidden layer output matrix F.

Signal Processing Techniques
Applying any of the machine learning techniques on a given measured data, such as current, the signals should first be transformed into useful features.

Wavelet Transform (WT)
Traveling waveforms are non-periodic by nature, consisting of localized high-frequency oscillations. These oscillations are superimposed on the power frequency and its harmonics. WT is a suitable tool to analyze such a signal having time-varying frequency. There are two concepts in WT: scaling and shifting. Scaling is the process of stretching or shrinking the signal in time while shifting is delaying or advancing the onset of the wavelet along the length of the signals. Unlike short-time Fourier Transform, which uses fixed window size in the frequency-time plane, WT uses variable widow sizes. It means that the use of longer time intervals is possible when it is required to get more precise low-frequency information from the signal [1]. This makes WT more accurate in localizing the signal both in time and in frequency than short-time Fourier Transform. WT is more popular in power system applications.
Based on how the wavelets are scaled and shifted, WT is divided into two: continuous WT (CWT) and discrete WT (DWT) [23]. CWT of a time domain signal, x(t), is defined by Equation (7): To implement wavelet transform digitally, discrete wavelet transforms (DWT) is used, which is computed using Equation (8). In order to extract sub-band information from a transient signal, multi-resolution wavelet can be applied. Daubechies wavelets is widely used in analyzing traveling waves [24]. These wavelets are found to have a closer match with the processed signal. Being more localized and compact in time makes them more suitable to analyze both short and fast transients. Complete reconstruction of the signal is also possible.
where ψ (t) is the mother wavelet, a 0 m and nb 0 a 0 m are scaling factors (a and b in Equation (7)), and n and m are integers. The coefficients in a standard DWT are determined by sampling the corresponding CWT on a dyadic grid. DWT have simple implementation and less computation requirements compared to CWT, and as a result, it becomes the choice of many researchers to use DWT in analyzing power system signals. Thus, many researchers use DWT to analyze the behavior of voltage/current transients produced by fault events in distribution grids [25]. Additionally, DWT-based multi-resolution analysis offers an effective approach to inspect features of a signal at different frequency bands and is widely used for locating faults in a distribution network.

Proposed Method
In this paper, knowledge-based fault location and classification is proposed using advanced signal processing and machine learning tools. The three-phase measured current signal is fed into the tool as an input and the type of fault and its location are determined as output. Wavelet decomposition and statistical measures are used to extract the key features from the measured signal. MLP and ELM are then used to make a decision based on the features extracted. All of the distribution network model, signal processing, and ML tools are implemented in Matlab. The procedures are discussed in the following sections.

Fault Detection, Classification, and Location
The flow chart shown in Figure 1 describes the procedures followed to detect, classify, and locate faults in a distribution network. The procedures can be summarized into the following seven steps. i.
Three-phase current measurement: two cycle (one pre-fault and one post-fault) current signals are taken for each phase current measured at the sending end of the line under consideration. ii.
Wavelet decomposition: With the help of DWT, the current signals are decomposed into seven details and an approximate coefficient using a seven-level mother wavelet of type Daubechies (db4). This step gives seven details for each phase current (a total of 21 details) and one approximate for each phase current (a total of 3 approximate) coefficient. It means that for each fault, there will be a total of 24 coefficients where each coefficient is representing a series of numbers. iii.
Statistical feature extraction: Statistical measures are applied to extract six statistical features for each of the DWT coefficients. These are skewness, mean, energy, entropy, standard deviation, and kurtosis. Taking all the details and approximate coefficients, this step results in a total of 144 statistical features which are fed into the machine learning as input. iv.
Data generation: The above three steps are repeated for different fault locations, different loading, and different fault impedance values. Sufficient data should be generated for successful machine learning. v.
Machine learning: The generated data is then used to train the ANN for fault detection, classification, and location. MLP is used for detection and classification while both ELM and MLP are applied separately for fault location.
Appl. Sci. 2020, 10, 4965 6 of 14 vi. Fault detection and classification: Once the machine learning is sufficiently trained, it will be able to detect if there is a fault or not. vii.
Fault location: Once a fault is detected and classified, its location is then determined using ELM or MLP. The location is expressed in terms of distance which is the output of regression-based machine learning, while the type of fault is given as an output of classification-based machine learning.

Proposed Method
In this paper, knowledge-based fault location and classification is proposed using advanced signal processing and machine learning tools. The three-phase measured current signal is fed into the tool as an input and the type of fault and its location are determined as output. Wavelet decomposition and statistical measures are used to extract the key features from the measured signal. MLP and ELM are then used to make a decision based on the features extracted. All of the distribution network model, signal processing, and ML tools are implemented in Matlab. The procedures are discussed in the following sections.

Fault Detection, Classification, and Location
The flow chart shown in Figure 1 describes the procedures followed to detect, classify, and locate faults in a distribution network. The procedures can be summarized into the following seven steps. Faults are applied at every 0.5 km interval. A ±5% load variation and fault resistance in the range of 0-15Ω is considered. The distribution network was modelled using Simulink and the code necessary to extract and process the data was programmed using Matlab.

Fault Detection and Classification with ANN
885 samples comprised of faulty and 526 non-faulty conditions were used for fault detection and classification. To build the best possible network, different number of neurons were tested and the best performance was obtained from ANN with 11 hidden neurons for the balanced load scenario and 12 hidden neurons for the unbalanced scenario (as shown in Figures 2 and 3) with regard to minimum mean square error (MSE) and general precision. A Levenberg-Marquardt (trainlm) algorithm was used to train the network in both cases. In the supervised learning process, 70%, 15%, and 15% of the samples were used for training, validation, and testing purposes, respectively. 885 samples comprised of faulty and 526 non-faulty conditions were used for fault detection and classification. To build the best possible network, different number of neurons were tested and the best performance was obtained from ANN with 11 hidden neurons for the balanced load scenario and 12 hidden neurons for the unbalanced scenario (as shown in Figures 2 and 3) with regard to minimum mean square error (MSE) and general precision. A Levenberg-Marquardt (trainlm) algorithm was used to train the network in both cases. In the supervised learning process, 70%, 15%, and 15% of the samples were used for training, validation, and testing purposes, respectively.

Fault Location with ELM and MLP
The data samples used in the fault detection are combined with the additional data collected for faults at a different distance (target output) for each type of fault. The data was grouped into two: the training and testing data. Training and testing were done using either MLP or ELM. The results were then compared.
For MLP, a single layer network was selected. Levenberg-Marquadt was used as the training algorithm while hyperbolic tangent sigmoid function was used as the activation function of the hidden layer for the MLP (as shown in Figure 4). For ELM, on the other hand, a zero type algorithm with a regulation coefficient and radial basis function (RBF) were selected as optimal kernel parameters. The performance of machine learning tools in detecting, classifying, and locating the faults are discussed for each scenario. To inspect the effectiveness of locating the faults in the distribution grid, the following performances measures were considered: measurements, including root mean squared error (RMSE), mean absolute percentage error (MAPE), coefficient of determination ( ), Nash-Sutcliffe efficiency coefficient (NSEC), and Willmott's index of agreement (WIA), which were evaluated for the testing dataset.

Case Studies, Results, and Discussion
In this paper, a simple three-phase radial distribution network was used to demonstrate the proposed fault classification and location method. The distribution network has 120 kV balanced generation, one 120 kV/25 kV step down transformer, a 30 km distribution line, one 25 kV/575 V step down transformer, and a lumped load.
The distribution network was modelled on Simulink as a four-bus radial system, as shown in Figure 4. The distribution line was modelled using its pi-equivalent. The fault was modelled by dividing the line into two, with the first part representing the line on the sending side of the fault and the second part representing the section of the line after the fault. Two scenarios were considered to test the robustness of the proposed methodology. The first scenario was under a balanced load

Fault Location with ELM and MLP
The data samples used in the fault detection are combined with the additional data collected for faults at a different distance (target output) for each type of fault. The data was grouped into two: the training and testing data. Training and testing were done using either MLP or ELM. The results were then compared.
For MLP, a single layer network was selected. Levenberg-Marquadt was used as the training algorithm while hyperbolic tangent sigmoid function was used as the activation function of the hidden layer for the MLP (as shown in Figure 4). For ELM, on the other hand, a zero type algorithm with a regulation coefficient and radial basis function (RBF) were selected as optimal kernel parameters.

Fault Location with ELM and MLP
The data samples used in the fault detection are combined with the additional data collected for faults at a different distance (target output) for each type of fault. The data was grouped into two: the training and testing data. Training and testing were done using either MLP or ELM. The results were then compared.
For MLP, a single layer network was selected. Levenberg-Marquadt was used as the training algorithm while hyperbolic tangent sigmoid function was used as the activation function of the hidden layer for the MLP (as shown in Figure 4). For ELM, on the other hand, a zero type algorithm with a regulation coefficient and radial basis function (RBF) were selected as optimal kernel parameters. The performance of machine learning tools in detecting, classifying, and locating the faults are discussed for each scenario. To inspect the effectiveness of locating the faults in the distribution grid, the following performances measures were considered: measurements, including root mean squared error (RMSE), mean absolute percentage error (MAPE), coefficient of determination ( ), Nash-Sutcliffe efficiency coefficient (NSEC), and Willmott's index of agreement (WIA), which were evaluated for the testing dataset.

Case Studies, Results, and Discussion
In this paper, a simple three-phase radial distribution network was used to demonstrate the proposed fault classification and location method. The distribution network has 120 kV balanced generation, one 120 kV/25 kV step down transformer, a 30 km distribution line, one 25 kV/575 V step down transformer, and a lumped load.
The distribution network was modelled on Simulink as a four-bus radial system, as shown in Figure 4. The distribution line was modelled using its pi-equivalent. The fault was modelled by dividing the line into two, with the first part representing the line on the sending side of the fault and the second part representing the section of the line after the fault. Two scenarios were considered to test the robustness of the proposed methodology. The first scenario was under a balanced load condition while the second scenario was an unbalanced loading condition. The details of the The performance of machine learning tools in detecting, classifying, and locating the faults are discussed for each scenario. To inspect the effectiveness of locating the faults in the distribution grid, the following performances measures were considered: measurements, including root mean squared error (RMSE), mean absolute percentage error (MAPE), coefficient of determination (R 2 ), Nash-Sutcliffe efficiency coefficient (NSEC), and Willmott's index of agreement (WIA), which were evaluated for the testing dataset.

Case Studies, Results, and Discussion
In this paper, a simple three-phase radial distribution network was used to demonstrate the proposed fault classification and location method. The distribution network has 120 kV balanced generation, one 120 kV/25 kV step down transformer, a 30 km distribution line, one 25 kV/575 V step down transformer, and a lumped load.
The distribution network was modelled on Simulink as a four-bus radial system, as shown in Figure 4. The distribution line was modelled using its pi-equivalent. The fault was modelled by dividing the line into two, with the first part representing the line on the sending side of the fault and the second part representing the section of the line after the fault. Two scenarios were considered to test the robustness of the proposed methodology. The first scenario was under a balanced load condition while the second scenario was an unbalanced loading condition. The details of the parameters for each scenario are summarized in the following sections.

Balanced Load (Scenario I)
In this scenario, the distribution network was assumed to be fully balanced. A balanced inductive load was used. The details of the distribution network parameters for this scenario are summarized in Table 1.  Figure 5 shows the measured three-phase current at the sending end for a line-to-ground fault. The fault is initiated at time t = 0.0167 s (equivalent to one cycle). Before the fault, the current signals look alike for all phases, as they are seen more clearly in Figure 5. The phase currents after the fault were generally higher than the pre-fault current signals. The fault currents go under transients and settle down after a cycle or so if the fault is not cleared.   Figure 5 shows the measured three-phase current at the sending end for a line-to-ground fault. The fault is initiated at time t = 0.0167 s (equivalent to one cycle). Before the fault, the current signals look alike for all phases, as they are seen more clearly in Figure 5. The phase currents after the fault were generally higher than the pre-fault current signals. The fault currents go under transients and settle down after a cycle or so if the fault is not cleared. As the fault clearance process usually takes more than one cycle (0.0167 s for a 60 Hz system) and the fault currents settles after one cycle, taking one cycle before the fault and one cycle after the fault will give sufficient information about the nature of the fault. Decompose, and extract seven details and an approximate coefficient. Examples of these coefficients for the current in phase A under a SLG (single line to ground) fault are shown Figure 6. As the fault clearance process usually takes more than one cycle (0.0167 s for a 60 Hz system) and the fault currents settles after one cycle, taking one cycle before the fault and one cycle after the fault will give sufficient information about the nature of the fault. Decompose, and extract seven details and an approximate coefficient. Examples of these coefficients for the current in phase A under a SLG (single line to ground) fault are shown Figure 6.
As the fault clearance process usually takes more than one cycle (0.0167 s for a 60 Hz system) and the fault currents settles after one cycle, taking one cycle before the fault and one cycle after the fault will give sufficient information about the nature of the fault. Decompose, and extract seven details and an approximate coefficient. Examples of these coefficients for the current in phase A under a SLG (single line to ground) fault are shown Figure 6.   Table 2. The detection is represented by the no-fault condition. The diagonal elements of the confusion matrix show the successful classifications while the off-diagonal elements represent those that are unsuccessful. It can be seen that the detection (no fault) and all types of faults are classified with 100% accuracy.   Table 3 presents the selected performance measures used to compare MLP and ELM when finding faults of different types. The values of R 2 , NSEC, and WIA in both cases were very close to 1 and MLP showed slightly better performance, whereas the performance measures of RMSE and MAPE showed very low values in both cases, indicating very good performance. Additional comparison can be made by considering the training time and testing time required for each tool. The MLP approach takes 8.78-14.32 s to train the network while the ELM training only takes a fraction of a second. The testing times in both cases were generally comparable. For some of the faults, the ELM approach showed a slightly faster testing time, and for the others, it showed a slightly slower testing time. It can be concluded that both ELM and MLP give comparably high performance in locating all types of faults and ELM shows a slightly higher performance than the MLP.

Unbalanced Load (Scenario II)
In this scenario, the load connected at the end of the distribution network is assumed to be an unbalanced inductive load. Other parameters are kept the same as scenario I. Table 4 summarizes the details of all the parameters. In addition to the 10 types of faults, the 3LG fault is also considered in this case to show how similar it is to the 3L fault. Like the balanced case, statistical features were extracted for 885 samples for each type of 11 faults and 526 non-fault samples. The detection and classification results are shown in the form of a confusion matrix, as shown in Table 5. The overall performance in this case was 91.4%. It can be seen that the detection (no fault) and all types of faults except ABC and ABCG are classified 100%. The confusion between the ABC and ABCG is expected because the two faults are symmetrical and the fault currents are not affected by the fault impedance (i.e., the neutral current is zero).

Locating the Fault (Scenario II)
In Table 6, the values of root mean square error and mean absolute percentage error are relatively small, whereas the values of R 2 , NSEC, and WIA are almost in unity, indicting the high performance of both MLP and ELM. The MLP method needs 10.0988-11.7969 s to train the network and ELM trains the network in a fraction of a second. Like the balanced load scenario, some performance measures and testing times shows that MLP is slightly better than ELM for some faults, while ELM is slightly better for the others, and both of them perform equally in a few cases. Considering all the parameters, ELM performs slightly better than MLP.

Conclusions
In this paper, fault detection, classification, and location techniques were applied on the simple radial distribution network. The knowledge-based technique was implemented using machine learning tools. Wavelet decomposition was used together with statistical measures to extract useful information from the measured current signal. Those features were used to train the neural networks. The proposed technique was applied on a simple radial distribution network with both balanced and unbalanced load conditions. The performance of the proposed fault locating method was evaluated in terms of the mean absolute percentage error (MAPE), root mean squared error (RMSE), coefficient of determination (R 2 ), Willmott's index of agreement (WIA), and Nash-Sutcliffe model efficiency coefficients (NSEC). The time it takes for training and testing were also considered.
The proposed method shows that discrete wavelet transform with machine learning is a very accurate and reliable method to classify and locate faults in both a balanced and unbalanced radial system. In the two scenarios, it was found that detection was 100% accurate. Classification was also found to be 100% accurate, except for in 3LG and 3L faults. The fault location was done using two options: ELM and MLP. Both methods performed very well. ELM was found to be faster to train the network. In other performance indexes, including testing time, slight differences were observed, but they were not sufficient enough to say that one is better than the other. It can be said that both ELM and MLP are very efficient for the classification and location of faults, and that ELM is faster to train.
Although the proposed technique was tested only for a radial system, the methodology is general enough and can be applied to other types of networks. The study can be complemented by considering distribution networks with mesh topology and multiple generation units. Renewable generations in the form of distributed generation units can also be studied. The method can be further improved by considering measurement noise and uncertainties in the transmission line and transformer parameters. Consideration of unsymmetrical transmission lines could be interesting to further study 3L and 3LG faults.
Another area of research is with regard to the statistical features. In this study, 144 features were considered for each fault, which could be reduced by identifying the most important statistical features from the less important features.