Distribution Grids Fault Location employing ST based Optimized Machine Learning Approach

Precise information of fault location plays a vital role in expediting the restoration process, after being subjected to any kind of fault in power distribution grids. This paper proposed the Stockwell transform (ST) based optimized machine learning approach, to locate the faults and to identify the faulty sections in the distribution grids. This research employed the ST to extract useful features from the recorded three-phase current signals and fetches them as inputs to different machine learning tools (MLT), including the multilayer perceptron neural networks (MLP-NN), support vector machines (SVM), and extreme learning machines (ELM). The proposed approach employed the constriction-factor particle swarm optimization (CF-PSO) technique, to optimize the parameters of the SVM and ELM for their better generalization performance. Hence, it compared the obtained results of the test datasets in terms of the selected statistical performance indices, including the root mean squared error (RMSE), mean absolute percentage error (MAPE), percent bias (PBIAS), RMSE-observations to standard deviation ratio (RSR), coefficient of determination (R2), Willmott’s index of agreement (WIA), and Nash–Sutcliffe model efficiency coefficient (NSEC) to confirm the effectiveness of the developed fault location scheme. The satisfactory values of the statistical performance indices, indicated the superiority of the optimized machine learning tools over the non-optimized tools in locating faults. In addition, this research confirmed the efficacy of the faulty section identification scheme based on overall accuracy. Furthermore, the presented results validated the robustness of the developed approach against the measurement noise and uncertainties associated with pre-fault loading condition, fault resistance, and inception angle.


Introduction
The detailed information regarding the faulty area and fault location, plays a vital role in expediting the restoration process in electric utilities after being subjected to any kind of fault.Consequently, there is a growing research interest to locate faults and identify faulty sections in distribution grids efficiently, to reduce customer minute losses and revenue losses for the utilities.The available fault location techniques for distribution grids can be categorized into three major groups namely the impedance, the traveling wave, and the knowledge-based techniques [1][2][3].The impedance-based technique, evaluates fault location using voltage and current measurements available at the substation end, as well as technical information of the distribution grids, including grid topology, load, and line data.This approach analyzes the fault location either in phase domain [4][5][6] or using symmetrical components [7][8][9].However, the impedance-based fault location approach, frequently leads towards multiple estimations and they are generally based on a hectic iterative process.Additionally, the dynamic behavior of loads, the presence of laterals, inherent properties of distribution grids, measurement noise, and fault resistance and inception angle uncertainties, heavily degrade the performance of these techniques [1][2][3].However, the development of advanced measurement infrastructure, communication systems, and the willingness of the electric system decision makers paved the way for a faster and more accurate fault location technique, called the traveling wave technique.The technique is based on the characteristic frequencies of the traveling waves associated with faults in the electrical grids [10], and can be ramified into A, B, and C-type [11].The A-type and the B-type fault location techniques, are based on online measurements and use single/double ended recorded traveling waves originated by faults, whereas the C-type technique injects traveling waves manually to locate faults in electricity grids [12].These techniques are very useful for long transmission/distribution lines, but distribution lines are short by nature.Consequently, they cannot effectively locate faults in short length distribution lines.The third category fault location technique namely, the knowledge-based approach, offers auspicious prospects in dealing with distribution grid faults considering their intrinsic complexities.In Reference [13], the artificial neural network (ANN) was employed to locate distribution grid faults based on voltage, current, and active power measurements collected from the substation end.The authors of Reference [14][15][16][17], combined the wavelet transform and different machine learning tools to locate faults in distribution grids.However, the wavelet transform (WT) based decompositions, do not uphold the phase information of the original signal and are sometimes sensitive to the presence of measurement noise [18][19][20].
However, most of the referenced fault location techniques did not consider the presence of measurement noise and uncertainties associated with pre-fault loading condition, fault resistance, and inception angle in locating faults and identifying faulty sections in power distribution grids.Additionally, an advanced signal processing technique (SPT) namely, the Stockwell transform that combines the advantages of wavelet transform and short time Fourier transform was employed recently to detect and classify faults in distribution grids in Reference [21], but not employed in locating faults and identifying faulty sections in distribution grids.Therefore, this research proposes the hybrid fault location and faulty section identification approach, for power distribution grids, combining the Stockwell transform and different machine learning tools.The proposed approach starts with the extraction of useful features from the ST decomposed faulty current signals, collected from different branches of the grids.Then it fetches the extracted characteristic features, as inputs of the different machine learning tools, including the MLP-NN, SVM, and ELM to get decisions on fault location and faulty section.Additionally, the proposed technique employs the CF-PSO to optimize the SVM and ELM key parameters, to achieve better accuracy in locating distribution grid faults.The presented results, demonstrated the independence of the proposed approach in the presence of measurement noise, pre-fault loading condition, fault resistance, and inception angle.

Background Theories
The combination of SPT and MLT, received significant attention in diagnosing power systems transients in recent years [22][23][24], where the researchers used SPT to collect characteristic features related to the electrical transients, and fetched them as inputs to MLT.The following parts of this section, briefly describe the employed SPT and MLT in this research.

Stockwell Transform
Among many SPT, the Fourier transform (FT), is one of the most widely used techniques to transform time domain signals into the frequency domain to analyze harmonics and design filters.However, the FT loses the temporal information and provides erroneous results for non-stationary signals.In response to this deficiency, Dennis Gabor (1946) employed a small sampling window of the regular interval to map a signal into a two-dimensional function of time and frequency, Energies 2018, 11, 2328 3 of 23 and the adaptation is known as short time Fourier transform (STFT) [25].Unfortunately, the STFT cannot provide better time-frequency resolution simultaneously, where a better frequency resolution leads towards a worse time resolution, and vice versa.The wavelet transform (WT), resolves the resolution problem by employing larger windows at lower frequencies and smaller windows at higher frequencies [26].However, the WT is sensitive to the presence of measurement noise and does not uphold the phase information of the non-stationary signals.Consequently, Stockwell et al. [18][19][20] developed the ST combining the benefits of the STFT and WT, that effectively upholds the referenced frequency and phase information.The ST of a signal x(t) can be defined as: where, is the Gaussian window function, f represents the frequency, and t and τ are the time variables.
The ST transforms the time domain signals, into both the frequency and time domains.The coefficients of this transform produce complete resolutions for each designated frequency and cover the whole temporal axis by picking up the possible values of τ where the τ indicates the center of the window function.Different values of f are employed to adjust the sizes of the Gaussian windows, with a view to realizing multiresolution over different frequencies over the temporal axis.However, the ST on a signal provides an M × N matrix, commonly known as the S-matrix, where all the elements are complex numbers.The rows of the matrix pertain to frequency characteristics and the columns pertain to time characteristics.However, the following equation gives the energy matrix of the same signal: The proposed fault location approach, generates three S-matrices and three E-matrices, from a three-phase current signal through ST.Hence, it creates new vectors namely S cmax , S rmax, and E max from the S-matrices and the E-matrices.The S cmax -vectors contain the maximum absolute values of the columns of the S-matrices, whereas the S rmax -vectors contain the maximum absolute values of the rows of the S-matrices, and the E cmax -vectors contain the absolute maximum values of the columns of the energy matrices.The approach also produces another set of vectors called S c-phase-max , that contain the phase angles of the S-matrices associated with the elements of S cmax .Eventually, the proposed approach applies standard statistical techniques on the produced vectors and extracts characteristic features that contain fault signature.Among many features, the amplitude, the gradient of amplitude, mean value, standard deviation, entropy, skewness, kurtosis, time of occurrence, and energy of different harmonics are widely used features for the analysis of power quality transients [21,[27][28][29].

Multilayer Perceptron Neural Network
Artificial neural networks have become very popular for many engineering applications, due to their parallel computing abilities and adaptiveness to external disturbances.The MLP-NN is one of the widely used ANN and consists of input, hidden, and output layers.The hidden layer uses squashing functions to process the inputs before sending them to the output layer.In order to minimize training error, the supervised learning algorithm optimizes randomly initiated connecting weights and biases [30][31][32].In this paper, the total number of hidden neurons for the MLP-NN are chosen through a systematic trial and error approach.

Support Vector Machine
The SVM was first introduced by Boser et al. [33] with a view to efficient analysis of data, and was primarily restricted to classification problems only.With the passage of time, the SVM was extended to solve regression problems that construct an optimal geometric hyperplane to distinguish the available data, and to map them into the higher dimensional feature space [34].It forms a separation surface employing various functions including sigmoidal, polynomial, and radial basis functions.This paper employed a MATLAB/SIMULINK based SVM-KM toolbox [35], to locate distribution grid faults.It considered the mean absolute percentage error (MAPE) of the test dataset as the objective function, while optimizing SVM parameters including regularization coefficients (λ and C), tolerance of termination criterion (ε), and kernel option (K o ) employing CF-PSO.

Extreme Learning Machine
Huang et al. [36] introduced the extreme learning machines for single hidden layer feedforward neural networks, that are a thousand-fold faster over the traditional feedforward neural networks, in attaining generalization performance efficiently.The ELM evaluates the output weights analytically, by picking the input weights randomly [37].This paper employs an ELM toolbox [38], developed in the MATLAB/SIMULINK platform to locate distribution grid faults.Like the SVM, this research considers the MAPE of the test dataset as the objective function, while optimizing ELM parameters including the regularization coefficient (C R ) and the kernel option (K p ) with the aid of CF-PSO.

Constriction-Factor Partcile Swarm Optimization
The population-based stochastic optimization approach widely known as the particle swarm optimization (PSO) mimics the social behavior of swarms [39].Initially, it was considered that the inertia weight of the PSO technique gets decreased monotonously in each generation to update the velocity, hence to update the position of the swarms.This monotonicity sometimes leads towards fall of the quality of the solutions.Consequently, Clerc [40] introduced the idea of constriction factor to update the velocity by ensuring system convergence and effective diversified search.The following steps briefly discuss the CF-PSO technique: Step 1: Initialization The CF-PSO employs the following equations to initialize positions and velocities of the swarms randomly from the search space, for a pre-specified number of particles: where, U represents the uniform distribution; i and j represent particle size and dimension of the problem; x min and x max refer to the lower and upper boundaries of the solution space; v min and v max represent the lower and upper boundaries of the velocities of the particles.
Step 2: Fitness assessment and updating the best solution This step assesses the fitness of swarms, based on their randomly initiated positions.Hence, it stores the fitness values of all swarms as the individual best solutions for the first generation.From the second generation, this step compares the new fitness of the swarms with the fitness of the previous generation and updates the individual best solutions accordingly.Besides this, the step picks the global best solution from the individual best solutions in each generation.Finally, the CF-PSO chooses the optimal solution from the global best solutions.
Step 3: Checking the stopping criteria The CF-PSO checks the stopping criteria after a certain number of generations, to avoid premature convergence.It stops the optimization algorithm if the objective function does not change for a pre-specified number of generations, or it reaches the targeted number of generations.
Step 4: Updating inertia weights, velocities, and positions This step upgrades the inertia weights and the velocity of each swarm, with the aid of the following equations in each generation.Hence, this step upgrades the positions of the swarms based on the revised velocities.
w t = w max − w max − w min t max × t (5) where, w t is the inertia weight of t th generation; v t j,k is the velocity of j th particle in t th generation; v and x are the velocity and the position of j th particle in (t − 1) th generation x * * (t−1) j,k ; and x * (t−1) j,k are the global and individual best positions of (t − 1) th generation.w max and w min are the maximum and minimum values of inertia weight; c f is the constriction factor; r 1 and r 2 are uniformly distributed random numbers in [0 1]; and c 1 and c 2 are known as the cognitive and social parameters, respectively.Figure 1 shows the complete flowchart of the presented CF-PSO algorithm.However, this paper sets v max and v min to 4 and −4, respectively, and starts checking of the stopping criteria after 30 generations, through a systematic trial and error approach, and based on the experience as described in References [41][42][43].Likewise, this paper sets other parameters as w max = 1.2, w min = 0.
where,   is the inertia weight of t th generation;  ,  is the velocity of j th particle in t th generation; , (−1) and  , (−1) are the velocity and the position of j th particle in (t − 1) th generation  , * * (−1) ; and  , * (−1) are the global and individual best positions of (t − 1) th generation.  and   are the maximum and minimum values of inertia weight; cf is the constriction factor; r1 and r2 are uniformly distributed random numbers in [0 1]; and c1 and c2 are known as the cognitive and social parameters, respectively.
Figure 1 shows the complete flowchart of the presented CF-PSO algorithm.However, this paper sets vmax and vmin to 4 and −4, respectively, and starts checking of the stopping criteria after 30 generations, through a systematic trial and error approach, and based on the experience as described in References [41][42][43].Likewise, this paper sets other parameters as   = 1.2,   = 0.1, 4.05 ≤  ≤ 4.15 , 2.00 ≤  1 ≤ 2.05.
Generation of the initial positions and the velocities of the particles using Eq. ( 3) and ( 4) Evaluation of the fitness of the particles and storing/updating the global and the individual best solutions Updating weights, velocities and positions of the particles using Eq. ( 5), (6), and (

Proposed Fault Location Technique
The proposed fault location technique comprises of three steps namely the data generation, ST based feature extraction and selection, and training and testing of the machine learning tools to locate faults/identify faulty sections in the distribution grids.The following parts of this section describe the mentioned steps:

Data Generation
The proposed fault location/faulty section identification scheme, applies hundreds of faults on the selected test distribution feeders by varying the pre-fault loading condition, fault resistance, class, and inception angle.For each faulty case, the proposed technique records three-phase current signals from different locations/branches of the distribution feeders, with a sampling frequency of 20 kHz.The technique also keeps track of the fault locations and faulty sections, for future use to train different machine learning tools as the regression and classification problems, respectively.However, the ST extracted features are employed as inputs, and the fault locations and faulty sections are considered as the outputs.

ST Based Feature Extraction and Selection
To select the useful features, the proposed fault location technique decomposes one pre and one post cycle current signals recorded from an applied phase-A-to-ground (AG) fault, on the four-node test distribution feeder employing the ST.Hence, the developed approach calculates the following statistical features from the produced S cmax , S rmax , E max , and S c-phase-max vectors, as discussed in Section II (A): Therefore, the proposed approach collects 28 features for the phase-A-current and repeats the same step for 500 similar faulty cases.Eventually, the technique ends up with a matrix of 500 × 28 for the phase-A-current.Then it selects twelve useful features, and removes twelve more redundant and insignificant features, based on their correlation factor.In addition, the technique also removes four more features, as they possess constant/zero values for all 500 faulty cases.Consequently, the developed technique selects a total of thirty-six (= 12 × 3) features, for each three-phase branch currents.Table 1 summarizes the feature selection and removal process.

Training and Testing of the Machine Learning Tools
As mentioned earlier, the proposed technique records three-phase faulty current signals from different locations/branches of the selected test distribution feeders and extracts thirty-six features employing the ST for each three-phase branch current signal, where the faulty cases are generated by varying the pre-fault loading condition, fault resistance, inception angle, class, and location.Then the technique employs the selected ST extracted features to the machine learning tools as inputs, and the fault locations/faulty sections as outputs.The proposed ST based MLT approach, considers the fault location problem as a regression problem, and the faulty section identification problem as a classification problem.It is worth mentioning, that the proposed approach employs 70% of the generated faulty cases for the training purpose and uses rest of the cases for testing purposes.
The proposed fault location technique (regression problem), evaluates different statistical performance indices, including the mean absolute percentage error (MAPE), root mean squared error (RMSE), percent bias (PBIAS), RMSE-observations to standard deviation ratio (RSR), the coefficient of determination (R 2 ), Willmott's index of agreement (WIA), and the Nash-Sutcliffe model efficiency coefficient (NSEC) on the test datasets, to investigate the efficacy of the employed machine learning tools.According to Lewis [44], any model is considered highly accurate for forecasting, if the developed model gives a MAPE less than 10%.Additionally, lower values of the RMSE and RSR indicate the strength of the model [45], whereas the ideal value of the PBIAS is zero, which says the developed model can predict desired outputs accurately.Negative and positive values of PBIAS, demonstrate the overestimation and underestimation of the predicted outputs, respectively.Moreover, the values of R 2 , WIA, and NSEC vary from 0 to 1.The value 1 refers to a perfect match between the predicted and actual outputs, whereas the value 0 demonstrates that the output cannot be predicted at all from available inputs [45,46].Like the statistical performance indices, the scatter plot provides a comprehensive summary of a set of bivariate data (two variables) and is often employed to determine the potential associations between the variables.The resulting pattern, demonstrates the type and strength of the relationship between two variables.The more data points located in the neighborhood of the identity line (i.e., y = x), the more the data sets agree with each other.If the model output and actual data are the same, then all data points fall on the identity line.Based on the evaluated statistical performance indices of the test datasets, the proposed technique selects the fault location (regression problem) models.Conversely, the faulty section identification technique (classification problem) selects the efficient model based on the overall accuracy.However, after selection of the efficient models, the proposed ST based MLT approach diagnoses the faults based on the recorded three-phase current signals, as illustrated graphically in Figure 2. located in the neighborhood of the identity line (i.e., y = x), the more the data sets agree with each other.If the model output and actual data are the same, then all data points fall on the identity line.Based on the evaluated statistical performance indices of the test datasets, the proposed technique selects the fault location (regression problem) models.Conversely, the faulty section identification technique (classification problem) selects the efficient model based on the overall accuracy.However, after selection of the efficient models, the proposed ST based MLT approach diagnoses the faults based on the recorded three-phase current signals, as illustrated graphically in Figure .2.

Cross Validation
Cross-validation is a popular strategy for assessing how the results of statistical analysis techniques generalize to an independent data set [47].To select an accurate model, the researchers developed several cross validation techniques, including the k-fold cross-validation, jackknife cross-validation, and independent data test [48][49][50][51][52][53].This research work, employed the k-fold cross validation technique to select the appropriate machine learning predictor.The employed validation technique partitions the available data into k segments or folds (roughly equal) and employs k-1 segments for training the predictor model, and the remaining segment for validation of the model.

Results and Discussions
The proposed fault location/faculty section identification techniques based on the ST and MLT, were tested on two different test distribution feeders.The following parts of this section, briefly introduce the test distribution feeders and provide essential discussions on the simulation results.

Example 1: Four-Node Test Distribution Feeder
The four-node test distribution feeder consists of two distribution transformers, a distribution line, a lumped load, and a source, as presented in Figure 3, whereas Table 2 presents the technical specifications of the test feeder.This research modeled the test distribution feeder for Example 1 in the MATLAB/SIMULINK environment, considering its unbalanced loading condition.

Cross Validation
Cross-validation is a popular strategy for assessing how the results of statistical analysis techniques generalize to an independent data set [47].To select an accurate model, the researchers developed several cross validation techniques, including the k-fold cross-validation, jackknife crossvalidation, and independent data test [48][49][50][51][52][53].This research work, employed the k-fold cross validation technique to select the appropriate machine learning predictor.The employed validation technique partitions the available data into k segments or folds (roughly equal) and employs k -1 segments for training the predictor model, and the remaining segment for validation of the model.

Results and Discussions
The proposed fault location/faculty section identification techniques based on the ST and MLT, were tested on two different test distribution feeders.The following parts of this section, briefly introduce the test distribution feeders and provide essential discussions on the simulation results.

Example 1: Four-Node Test Distribution Feeder
The four-node test distribution feeder consists of two distribution transformers, a distribution line, a lumped load, and a source, as presented in Figure 3, whereas Table 2 presents the technical specifications of the test feeder.This research modeled the test distribution feeder for Example 1 in the MATLAB/SIMULINK environment, considering its unbalanced loading condition.The proposed fault location technique, generated 1200 cases of AG faults by varying the pre-fault loading condition, fault resistance, inception angle, and location.It recorded a three-phase current signal from the data measurement bus for each faulty case and employed the ST to extract the features from the recorded current signals.After selection of the previously mentioned thirty-six features for each faulty case, the proposed fault location technique fetched them as inputs to different machine learning tools, where their outputs were the corresponding fault locations.The proposed fault location scheme chose the SVM and ELM control parameters randomly, as summarized in Table 3 whereas the number of hidden neurons of the MLP-NN were chosen through a systematic trial and error approach.However, the proposed technique employed 70% of the available data for training and the rest of the data for testing purposes.C R = 10 10 , K p = 1.5 × 10 8 and kernel = Gaussian RBF Figure 4 presents the targeted and the ST based MLT predicted fault locations, for forty randomly selected observations from the test dataset of AG faults, for comparison purpose.It visually appears that the MLP-NN technique adequately predicted the desired outputs in most of the cases.Conversely, the fault location accuracies of the other two techniques (the SVM and ELM) were poor, compared to the MLP-NN approach.Figure 5 presents the scatter plots of the test dataset, which depicts that the data points are closer to the identity line for the MLP-NN approach as expected for an efficient model.On the other hand, the scatter plots of the SVM and ELM models illustrated their inefficiency in locating faults in the four-node test distribution feeder.The dissatisfactory performance of these two models requires further investigation through the optimization of their key parameters, employing the well-known stochastic approaches to achieve better performance.However, like AG faulty cases, the proposed approach collected features for other types of faults including phase-B-to-ground (BG), phase-C-to-ground (CG), phase-A-phase-B-to-ground (ABG), phase-B-to-phase-C-ground (BCG), phase-C-to-phase-A-ground (CAG), and phase-A-to-phase-B-to-phase-C-to-ground (ABCG).Instead of presenting similar figures and discussions, this research summarized the operational times along with the selected statistical performance indices, for the test datasets of the ST based MLT techniques, for all seven types of faults in Table 4.
including phase-B-to-ground (BG), phase-C-to-ground (CG), phase-A-phase-B-to-ground (ABG), phase-B-to-phase-C-ground (BCG), phase-C-to-phase-A-ground (CAG), and phase-A-to-phase-B-tophase-C-to-ground (ABCG).Instead of presenting similar figures and discussions, this research summarized the operational times along with the selected statistical performance indices, for t he test datasets of the ST based MLT techniques, for all seven types of faults in Table 4.It showed the SVM technique required an almost three-fold time to get trained, compared to the MLP-NN technique, whereas the training process of the ELM technique was a hundred times faster than the MLP-NN approach.The test times for all the approaches were almost equal, and are less than few cycles of a 60 Hz power system network.Additionally, the RMSE, MAPE, and RSR values were quite low, whereas the R 2 , WIA, and NSCE were almost unity for the MLP-NN approach; indicating the strength and effectiveness of the MLP-NN approach in locating all types of faults.Furthermore, the PBIAS values were positive for a few cases and negative for the others, which demonstrates underestimations and overestimations of the fault distances, respectively.However, these values were very small and closer to zero; hence, the smaller underestimations and overestimations could be neglected for the ST based MLP-NN approach.On the contrary, the statistical performance indices of the other two approaches (the SVM and ELM), were not satisfactory for all seven types of faults.Consequently, they required further investigation through the optimization of their key parameters, employing the well-known stochastic approaches to achieve better accuracy.

Fault Location with Optimized MLT
The ST based MLP-NN technique, showed satisfactory performance in locating faults in four-node test feeders, but the performance of the other two MLT techniques were dissatisfactory.Consequently, this section optimizes their key parameters employing the constriction factor-based particle swarm optimization.The paper presented the detailed procedures of the CF-PSO, in Section 2.5.The objective function of the CF-PSO was to minimize the MAPE of the actual and the predicted fault locations, for the test datasets.After going through 100 generations with a population size of 20 individuals, the CF-PSO ended up with the optimized SVM control parameters.Similarly, the CF-PSO optimized the control parameters of the ELM in 100 generations, with a population size of 80 individuals.Table 5 summarizes the obtained optimal control parameters of both machine learning approaches.It is worth mentioning, that this paper neither optimized the number of hidden neurons nor the connecting weights and biases of the MLP-NN, due to its better performance with the systematically selected number of neurons.   5.It visually appeared that all the MLT techniques adequately predict the desired outputs in almost every case.Furthermore, Figure 7 presents the scatter plots of the developed MLT for the test dataset, which depicts that the data points are closer to the identity line for the employed techniques.Hence, the presented results confirmed the superiority of the optimized MLT, over the MLT with randomly picked parameters.Table 6 summarizes the operational times, along with the selected statistical performance indices of the test datasets of the ST based optimized MLT approaches, for all seven types of faults.The training and testing times for the selected machine learning tools were like that of Table 4.As can be observed from the table, the RMSE, MAPE, and RSR values were quite low, whereas the R 2 , WIA, and NSCE values were almost unity, indicating the strength and effectiveness of the proposed approach in locating faults.However, the PBIAS values were positive for a few cases and negative for the others, which demonstrated underestimations and overestimations of the fault distances, respectively.However, these values were closer to zero; hence, the smaller underestimations and overestimations could be neglected.

Fault Location with Optimized MLT in the Presence of Measurement Noise
The presented results of the previous section, confirmed the superiority of the optimized MLT approaches over their non-optimized counter parts, where this section provides the results of the test datasets in the presence of measurement noise.Figure 8 shows the targeted and the ST based MLT models predicted fault locations, for forty randomly selected observations from the test dataset of AG faults, in the presences of measurement noise (30 dB SNR).It visually appeared that all ST based MLT approaches, estimated the fault distances with similar accuracy.Moreover, the scatter plots of Figure 9 of the developed MLT models for the test dataset, also depicts that the data points were closer to the identity line as expected for any efficient model.Additionally, Figure 10 presents the comparison of the predicted fault distances with the actual fault distances; and Figure 11 presents the scatter plot of the proposed MLT models, for the test dataset of AG faults, in the presence of 20 dB SNR. Figure 8 to Figure 11 illustrate the efficacy of proposed ST based MLT models in predicting the fault distances efficiently, even with the presence of measurement noise.Furthermore, Tables 7 and 8 present the selected statistical performance indices, along with the required operational time of the proposed ST based MLT approaches, in the presence of 30 dB and 20 dB SNR, respectively.It can be observed from both tables, that the training and testing times, for the selected machine learning tools, are like that of Table 4. models predicted fault locations, for forty randomly selected observations from the test dataset of AG faults, in the presences of measurement noise (30 dB SNR).It visually appeared that all ST based MLT approaches, estimated the fault distances with similar accuracy.Moreover, the scatter plots of Figure 9 of the developed MLT models for the test dataset, also depicts that the data points were closer to the identity line as expected for any efficient model.Additionally, Figure 10 presents the comparison of the predicted fault distances with the actual fault distances; and Figure 11 presents the scatter plot of the proposed MLT models, for the test dataset of AG faults, in the presence of 20 dB SNR. Figure 8 to Figure 11 illustrate the efficacy of proposed ST based MLT models in predicting the fault distances efficiently, even with the presence of measurement noise.Furthermore, Table 7 and Table 8 present the selected statistical performance indices, along with the required operational time of the proposed ST based MLT approaches, in the presence of 30 dB and 20 dB SNR, respectively.It can be observed from both tables, that the training and testing times, for the selected machine learning tools, are like that of Table 4.     Additionally, the RMSE, MAPE, RSR, and PBIAS were quite low, whereas the R 2 , WIA, and NSCE were almost unity for the optimized MLT approaches, indicating the strength and the effectiveness of the approaches, for all types of faults, even in the presence of measurement noise.However, the performance of the developed scheme degrades a bit in the presence of measurement noise, but it can still estimate fault locations with acceptable accuracy.Additionally, the RMSE, MAPE, RSR, and PBIAS were quite low, whereas the R 2 , WIA, and NSCE were almost unity for the optimized MLT approaches, indicating the strength and the effectiveness of the approaches, for all types of faults, even in the presence of measurement noise.However, the performance of the developed scheme degrades a bit in the presence of measurement noise, but it can still estimate fault locations with acceptable accuracy.
Energies 2018, 11 15 of 22    In order to validate the developed ST based fault location scheme for the four-node test distribution feeder, the following faults were applied on the distribution line and predicted the fault location employing the trained and tested machine learning tools.It is worth mentioning, the proposed approach generated the faulty cases by varying the pre-fault loading condition, presence of measurement noise, and fault resistance and inception angle.Then it employed the ST to extract the useful features and fetched them to the trained machine learning tools as inputs, to get a decision on the applied faults.As can be seen from Table 9, the developed signal processing based MLT approach located all applied faults with satisfactory accuracy (<1%), which validated the effectiveness of the employed approach.The proposed ST based machine learning approach was also employed to identify the faulty sections of the IEEE 13-node test distribution feeder.The selected test distribution feeder operates at 4.16 kV and displays most of the characteristic features of the power distribution grids, as shown in Figure 12 [54].Consequently, it is usually used to test common features of the distribution system analysis software.The highly loaded test feeder consists of a single voltage regulator at the substation, an in-line transformer, shunt capacitor banks, overhead distribution lines, and underground cables of various configurations, and several unbalanced spot and distributed loads.Moreover, the test feeder contained single-phase, double-phase, and three-phase laterals.This research modeled the test feeder in an RSCAD environment and implemented the developed model in a Real Time Digital Simulator (RTDS) machine.The proposed ST based faulty section identification approach generated 900 faulty cases by varying the pre-fault loading condition, fault resistance, inception angle, type, and location, through employing the 'batch-mode' operation option of the RSCAD software.After generation of different fault scenarios, the proposed approach recorded three-phase current signals from eight different branches/locations of the test feeder, as indicated in Figure 12, with a sampling frequency of 20 kHz.Then it processed the recorded signals, employing ST in the MATLAB/SIMULINK environment and extracted useful features, as discussed in Section 3.2.It is worth mentioning, that the proposed approach collected 288 features (~8 branches × 36 features per branches), for each fault scenario.Furthermore, this research divided the test distribution feeder into nine sections, as indicated by the rectangular boxes in Figure 12.Consequently, this research prepared the faulty section identification problem as a classification problem.

Faulty Section Identification in Noise-Free Environment
After collection of useful features, the proposed approach trained and tested the MLP-NN with a different number of neurons, where the best neural network consisted of eleven hidden neurons, in terms of minimum mean squared error and overall accuracy.In addition, the proposed approach systematically chose the tan-sigmoid as a squashing function and the resilient backpropagation technique as the training algorithm.Like the fault location technique of the four-node test distribution feeder, the faulty section identification approach also employed 70% of the available data for training purposes and the rest of the data for testing purposes.Table 10 summarizes the technical features related to the MLP neural networks employed for faulty section identification purposes.Table 11 presents a 9 × 9 confusion matrix obtained from the MLP neural networks, as they identified nine faulty sections employing the proposed approach in a noise-free environment.The diagonal and off-diagonal elements of the confusion matrix, represent the successful and unsuccessful identification of faulty sections, respectively.As can be observed from the table, the proposed technique identified the faulty sections with an accuracy of almost 100%, which demonstrated the successful implementation of the adopted ST based machine learning approach.

Faulty Section Identification in the Presence of Measurement Noise
The proposed technique added a different level of additive white gaussian noise to the recorded three-phase branch current signals, with a view to testing the efficacy of the proposed faulty section identification approach, in the presence of measurement noise.Then it extracted the same features from the noisy signals, and employed them as inputs to the MLP-NN.Table 12 summarizes the faulty section identification results of the ST based MLP-NN approach, in the presence of measurement noise.As can be seen, the ST based approach ended up with the overall accuracies of 99.876%, 99.741%, and 99.235% for the 40 dB, 30 dB, and 20 dB SNR, respectively.Consequently, the obtained results validated the efficacy of the proposed ST based faulty section identification technique, for both noise-free and noisy conditions.In addition, the developed approach confirmed its independence on the pre-fault loading condition, fault type, resistance, and inception angle.

Validating the Developed Faulty Section Identification Technique
To validate the developed ST based faulty section identification technique for the thirteen-node test distribution feeder, the following faults were applied on different locations of the feeder and predicted the faulty section employing the trained and tested MLP-NN.It is worth mentioning, the proposed approach generated the faulty cases by varying the pre-fault loading condition, presence of measurement noise, fault resistance, and inception angle.Then it employed the ST to extract the useful features and fetched them to the trained MLP-NN as inputs, to get a decision on the applied faults.As can be seen from Table 13, the developed signal processing based MLP-NN approach identified all applied faults accurately, which validated the effectiveness of the employed approach.

Conclusions and Future Scope
This paper presented advanced signal processing-based machine learning tools to locate faults and identify faulty sections in distribution grids.The proposed approach decomposed three-phase current signals recorded from different branches/locations of the distribution grids, employing the ST to extract the useful characteristic features of faulty cases.Then it employed the extracted features as inputs to different machine learning tools and trained them to locate the faults, and to identify the faulty section in the distribution grids.Additionally, this research employed the CF-PSO technique to optimize the parameters of the support vector and extreme learning machines; whilst it optimized the MLP-NN through a systematic approach to achieve better generalization performance.The presented results validated the efficacy of the developed fault diagnosis approach.Additionally, the optimized machine learning tools outperformed the non-optimized tools in locating distribution grid faults, in terms of the selected statistical performance indices, i.e., the RMSE, MAPE, PBIAS, RSR, R 2 , WIA, and NSEC.Moreover, the ST based ELM approach required lesser training time compared to the SVM and MLP-NN techniques.However, the trained MLT diagnosed the applied faults in a fraction of a second, signaling the real-time implementation of the developed fault diagnosis scheme.Furthermore, the developed faulty section identification approach, identified faulty sections with almost 100% accuracy.Finally, the presented results confirmed the independency of the proposed ST based MLT approach, in the fault inception angle and resistance, pre-fault loading conditions, and presence of measurement noise.As a future extension, the signal processing based MLT approach can be applied to diagnose faults in distribution grids, with the incorporation of renewable energy resources, considering their associated uncertainties.The proposed approach can also be extended to diagnose simultaneous faults at distribution level.

Figure 1 .
Figure 1.The flowchart of the constriction factor particle swarm optimization.

Figure 1 .
Figure 1.The flowchart of the constriction factor particle swarm optimization.

Figure 2 .Figure 2 .
Figure 2. Fault location/faulty section identification flowchart employing the ST based MLT approach.

Figure 3 .
Figure 3. Four-node test distribution feeder used for fault location purpose.

Figure 3 .
Figure 3. Four-node test distribution feeder used for fault location purpose.

Figure 4 .
Figure 4. Comparison of the Stockwell transform (ST) based non-optimized MLT predicted and actual fault distances, for the test dataset of applied phase-A-to-ground (AG) fault.

Figure 4 . 22 Figure 5 .
Figure 4. Comparison of the Stockwell transform (ST) based non-optimized MLT predicted and actual fault distances, for the test dataset of applied phase-A-to-ground (AG) fault.Energies 2018, 11 10 of 22

Figure 5 .
Figure 5. Scatter plot of the targeted and the ST based non-optimized MLT predicted outputs, for the test dataset of AG fault.

Figure 6
Figure6presents the targeted and the ST based MLT predicted fault locations, for forty randomly selected observations from the test dataset of AG faults, where the SVM and ELM techniques used their CF-PSO optimized key parameters, as presented in Table5.It visually appeared that all the MLT techniques adequately predict the desired outputs in almost every case.Furthermore, Figure7presents the scatter plots of the developed MLT for the test dataset, which depicts that the data points are closer to the identity line for the employed techniques.Hence, the presented results confirmed the superiority of the optimized MLT, over the MLT with randomly picked parameters.Table6summarizes the operational times, along with the selected statistical performance indices of the test datasets of the ST based optimized MLT approaches, for all seven types of faults.The training and testing times for the selected machine learning tools were like that of Table4.As can be observed from the table, the RMSE, MAPE, and RSR values were quite low, whereas the R 2 , WIA, and NSCE values were almost unity, indicating the strength and effectiveness of the proposed approach in locating faults.However, the PBIAS values were positive for a few cases and negative for the others, which demonstrated underestimations and overestimations of the fault distances, respectively.However, these values were closer to zero; hence, the smaller underestimations and overestimations could be neglected.

Figure 6 .
Figure 6.Comparison of the ST based constriction factor-based particle swarm optimization (CF-PSO) optimized MLT predicted and actual fault distances for the test dataset of AG fault.

Figure 7 .
Figure 7. Scatter plot of the targeted and the ST based CF-PSO optimized MLT predicted outputs for the test dataset of AG fault.

Figure 7 .
Figure 7. Scatter plot of the targeted and the ST based CF-PSO optimized MLT predicted outputs for the test dataset of AG fault.

Figure 8 .
Figure 8.Comparison of the ST optimized MLT predicted and actual fault distances, for the test dataset, in the presence of 30 dB SNR.

Figure 8 . 22 Figure 9 .
Figure 8.Comparison of the ST based optimized MLT predicted and actual fault distances, for the test dataset, in the presence of 30 dB SNR.Energies 2018, 11 14 of 22

Figure 9 .
Figure 9. Scatter plot of the targeted and the ST based optimized MLT predicted outputs, for the test dataset, in the presence of 30 dB SNR.

Figure 9 .
Figure 9. Scatter plot of the targeted and the ST based optimized MLT predicted outputs, for the test dataset, in the presence of 30 dB SNR.

Figure 10 .
Figure 10.Comparison of the ST based optimized MLT predicted and actual fault distances, for the test dataset, in the presence of 20 dB SNR.

Figure 10 .
Figure 10. of the ST optimized MLT predicted and actual fault distances, for the test dataset, in the presence of 20 dB SNR.

Figure 11 .
Figure 11.Scatter plot of the targeted and the ST based optimized MLT predicted outputs, for the test dataset, in the presence of 20 dB SNR.

Figure 11 .
Figure 11.Scatter plot of the targeted and the ST based optimized predicted outputs, for the test dataset, in the presence of 20 dB SNR.

Figure 12 ,
Figure12, with a sampling frequency of 20 kHz.Then it processed the recorded signals, employing ST in the MATLAB/SIMULINK environment and extracted useful features, as discussed in Section 3.2.It is worth mentioning, that the proposed approach collected 288 features (~8 branches × 36 features per branches), for each fault scenario.Furthermore, this research divided the test distribution feeder into nine sections, as indicated by the rectangular boxes in Figure12.Consequently, this research prepared the faulty section identification problem as a classification problem.

Table 1 .
ST Extracted Feature Selection and Removal Process Summary.F 8 , F 12 & F 16 1.00Remove features F 12 & F 16 F 17 , F 19 & F 20 Most of the cases they are zero Remove all of them F 27 It is constant for all cases Remove F 27 The selected features: F 1 , F 2 , F 3 , F 4 , F 5 , F 10 , F 11 , F 14 , F 18 , F 21 , F 22 , and F 23

Table 2 .
Four-node test distribution feeder specifications.

Table 2 .
Four-node test distribution feeder specifications.

Table 4 .
Operation time and statistical performance indices for test datasets with randomly selected MLT parameters.

Table 4 .
Operation time and statistical performance indices for test datasets with randomly selected MLT parameters.

Table 6 .
Operation time and statistical performance indices for test datasets with CF-PSO optimized MLT parameters.Comparison of the ST based constriction factor-based particle swarm optimization (CF-PSO) optimized MLT predicted and actual fault distances for the test dataset of AG fault.Scatter plot of the targeted and the ST based CF-PSO optimized MLT predicted outputs for the test dataset of AG fault.Comparison the ST based constriction factor-based particle swarm optimization (CF-PSO) optimized MLT predicted and actual fault distances for the test dataset of AG fault.

Table 7 .
Operation time and the statistical performance indices, for the test datasets with CF-PSO optimized MLT, in the presence of 30 dB SNR.

Table 7 .
Operation time and the statistical performance indices, for the test datasets with CF-PSO optimized MLT, in the presence of 30 dB SNR.

Table 8 .
Operation time and the statistical performance indices, for the test datasets with CF-PSO optimized MLT, in the presence of 20 dB SNR.

Table 9 .
Fault location results obtained employing the ST based machine learning tools, for the four-node test distribution feeder.

Table 10 .
Multilayer perceptron neural networks (MLP-NN) parameters for the faulty section identification problem.

Table 11 .
Faulty section identification results for noise-free measurement.

Table 12 .
Faulty section identification results under noisy measurement.

Table 13 .
Faulty section identification results obtained employing the ST based machine learning tools, for the IEEE thirteen-node test distribution feeder.