Microgrid Fault Detection and Classiﬁcation: Machine Learning Based Approach, Comparison, and Reviews

: Accurate fault classiﬁcation and detection for the microgrid (MG) becomes a concern among the researchers from the state-of-art of fault diagnosis as it increases the chance to increase the transient response. The MG frequently experiences a number of shunt faults during the distribution of power from the generation end to user premises, which affects the system reliability, damages the load, and increases the fault line restoration cost. Therefore, a noise-immune and precise fault diagnosis model is required to perform the fast recovery of the unhealthy phases. This paper presents a review on the MG fault diagnosis techniques with their limitations and proposes a novel discrete-wavelet transform (DWT) based probabilistic generative model to explore the precise solution for fault diagnosis of MG. The proposed model is made of multiple layers with a restricted Boltzmann machine (RBM), which enables the model to make the probability reconstruction over its inputs. The individual RBM layer is trained with an unsupervised learning approach where an artiﬁcial neural network (ANN) algorithm tunes the model for minimizing the error between the true and predicted class. The effectiveness of the proposed model is studied by varying the input signal and sampling frequencies. A level of considered noise is added with the sample data to test the robustness of the studied model. Results prove that the proposed fault detection and classiﬁcation model has the ability to perform the precise diagnosis of MG faults. A comparative study among the proposed, kernel extreme learning machine (KELM), multi KELM, and support vector machine (SVM) approaches is studied to conﬁrm the robust superior performance of the proposed model.


Introduction
The microgrid (MG) meets the exponential growth of load demand because of its reliable, secure, sustainable, and green energy supply [1,2]. This small-scale power supply network is constituted by several distributed energy resources (DERs), energy storage devices, communication facilities, and well-regulated loads [3][4][5]. An MG is able to work in both an autonomous/islanded and grid-tied way. In the grid-tied operation, a portion of the load is driven by the primary AC grid, and in the islanded process, the main AC grid is disconnected from the microgrid and runs autonomously. network deeper and enables the model to extract the features adaptively [44,45]. The DBN can handle non-linear data, which makes it to classify the faults more precisely in the microgrid domain.
The proposed network takes the three phase faulty voltage and current waveform data as an input to perform the fault diagnosis of the MG system. A DWT tool is used to extract the features from the raw signal samples. To increase the noise-immune performance of the DBN, an extension of this classifier with the dropout strategy is also proposed in this research. The dropout strategy significantly elevates the effectiveness of the proposed DBN against a level of considered noise.
The main findings of this article are as follows: • We design a novel network for distribution line FDC of MG based on the deep learning network together with the WT that enables the network to take out the relevant short circuit fault attribute from the faulted line signals effectively.

•
We develop a hierarchical generative model with multiple layers of RBM that restrains overfitting of the training dataset with a prominent unsupervised pre-trained process. • Both operating modes namely islanded and grid-connected/tied with two typologies (radial and loop) of MG are studied to measure the effectiveness of the proposed model.

•
The dropout strategy is integrated with the proposed network to establish the robust performance of the developed DBN model over the noisy environment.
The paper is arranged as follows. Section 2 describes the design of the proposed generative model for the classification of MG faults with its required material. Section 3 presents the performance analysis of the proposed DBN model. The paper is concluded in Section 4.

Types of Network Faults
Faults in the MG power line are broadly categorized into shunt faults and series faults. When a power network experiences a plain break in one or two conductors, an imbalance of the series impedance appears on the line and is known as a series fault. This type of fault is not directly related to the distribution of power from one place to another. On the other hand, the three phase power network frequently experiences the shunt fault at the time of power distribution, which is then classified as phase-to-phase (PP), phase-to-ground (PG), two PG (2PG), and three PG (3PG).

Single PG Fault
Single LG fault occurs when any phase line of a three phase power line comes in contact with the neutral line or drops to the ground. The fault is also referred to as a short circuit fault caused by heavy wind or the falling of trees on a line. Three types of single line to ground faults are shown in Figure 1a-c, where a, b, and c are three phases of a distribution line.  Figure 1. Representation of different fault types: (a) a-g, (b) b-g, (c) c-g, (d) ab-g, (e) bc-g, (f) ca-g, (g) a-b, (h) b-c, (i) a-c, and (j) abc-g fault.

Two PG Fault
When two lines of a power line fall to the ground, the two PG fault occurs. This fault gives rise to a significant asymmetry and a higher magnitude of fault current compared to the line-to-line fault. If this fault is not cleared in time, it may turn out to be a three line to the ground fault where the severity is much higher than the other types of faults. In Figure 1d-f, the ab-g, bc-g, and ac-g faults are shown where R f is considered as the fault resistance.

PP Fault
The short circuit between any of the two lines of a three phase system produces this type of fault. One of the important characteristics of this unsymmetrical fault is that the magnitude of the fault impedance varies over a broad range, which makes it difficult to predict its upper and lower limits. Three types of PP faults are presented in Figure 1g-i.

Three PG Fault
3PG fault is a rarely occurring fault, as shown in Figure 1j. This symmetrical fault may be due to the falling of an electric pole or equipment failure. During this fault, the three phase voltage drops to zero, and a large amount of fault current is produced. Though the frequency of occurrence is the lowest, this fault is widely studied in power system protection as the fault produces the maximum amount of short circuit current.

System Modelling
A microgrid system satisfying the International Electrotechnical Commission (IEC) standard [46] was considered in this study to test the classification performance of the proposed network, as illustrated in Figure 2. The studied MG network was modelled in MATLAB/Simulink, which offers extensive facilities to produce the required data. The studied system parameters are reported in Table 1. The system carried four DG units of which Generator-1 (G-1) was a DFIG based wind farm, G-2 was a wind turbine with an asynchronous machine, and G-3 and G-4 were inverter based generators. The system frequency was set to 50 Hz, and the base power was considered to be 48 MVA. By having the circuit breaker (CB) at the grid side, the studied system could operate in both modes. Furthermore, changing CB Loop-1 and CB Loop-2 allowed the system to be operated with the looped or radial topology. The distribution lines of the studied system were divided into five sections with a length of 20 km each. Ten types of shunt faults as mentioned in Table 1 were simulated in the distribution lines, and the three phase voltage and currents were sampled from the sending end side of the lines at a sampling rate of 20 kHz. For different network configurations, operating modes, fault distances, fault resistances, and fault inception angles, the signals were sampled for each fault and the non-fault condition to produce the fault data.

System parameters Types or values
Fault Types of faults a-g, b-g, c-g, ab-g, bc-g, ac-g, a-b, b-c, a-c, abc-g, non-fault

Variation of Signal Energy with Feature Generation
In this paper, a framework of the MG distribution line FDC is proposed based on a probabilistic generative network. For the unhealthy or fault detection plan, the sound or health condition was used to make a type of fault, i.e., including all the short circuit fault conditions and the sound state of phase, presenting a total of 11 fault types. The nature of the classifier was expected to be non-faulty or sound in the healthy condition. An unhealthy or fault event was detected when the classifier output was switched to a distinct fault class. The training of the proposed model demanded digging out the fault features from the raw signals. Every fault signal had a distinct signal energy, which depended on the system parameters, i.e., fault distance and resistance. The variation in raw signals of each phase was separately analysed by applying the DWT. Afterwards, individual signal energy was calculated for preparing the required dataset.

Effect of Fault Distance on Signal Energy
The unhealthy or faulty event may occur at any point of a distribution network. The training approach of the proposed DBN with a variation of the fault distance enabled the network to inspect the signal anywhere in the distribution network. For the sample data generation, the location of the fault event was varied within 1 to 19 km with an increment of 0.5, and the current and voltage waveform were observed. The variation of the raw signal created different lengths of signal energy, which ended up representing various features. For the demonstration, a variation in the signal energy during the a-g fault for different fault locations at Lines 1-3 is presented in Figure 3. From the analysis, the signal energy of the faulted phase current and voltage waveform were observed and varied with the distance. While performing this demonstration, the value of the faulted strength/resistance was fixed to 10 Ω, while other parameters remained persistent, as mentioned before.

Effect of Fault Resistance on Signal Energy
The fault resistance/strength also had a great impact on the raw signals. The short circuit fault event carried with the ground may be responsible for creating inaccurate attribute measurement if the fault strength is not considered. In turn, the proposed FDC system was studied with the difference of the fault strength and training the system for effective FDC. The signal energy from the three phase voltage and current for Lines 1-3 at different fault resistances are shown in Figure 4. From the figure, the energy of the faulty phase was observed, which enabled the system to characterize a fault of different fault resistances. While demonstrating the result of fault resistance/strength on the signal energy, the distance of the fault was set to 10 km from the sampling end, and other parameters remained constant.

Fault Feature Generation with Wavelet Transform
In this study, the DWT was used to explore the features of a distinct portion of a signal when a rapid change in signal occurred. The WT is the decomposition of a signal into a time series of components. The time series faulty components shield a distinct frequency portion, presenting extensive instruction of a time series waveform. In turn, the WT technique is used here to represent the fault attribute of a particular area of the faulty signal during a fault event. In DWT, a signal goes through a series of high-pass filters (HPF) and a low-pass filters (LPF). The LPF analyse the signal at the low-frequency domain, while the HPF analyse the signal high-frequency domain, which as a result disintegrates the signal into detail (Det) and approximation (App) coefficients. The App coefficient depicts the large-and small-scale frequency elements of the fault signal. Accordingly, the Det coefficient represents the small-and large-scale frequency elements of the fault signal. This decomposition process is replicated with the consecutive App so that a fault signal is divided into some smaller resolution sections. This process is referred to as a decomposition tree for DWT, as presented in Figure 5.
In this research, a 1/2 cycle post fault current and voltage information were passed through the multi-resolution wavelet block to extract the App and Det level components correlated with the fault current/voltage data. The Daubechies wavelet (DB) was chosen as a mother wavelet as it has been already proven to work effectively [47]. When a short circuit fault occurs in the distribution line, a variation in the wavelet coefficients of the voltage and current waveforms is noted for various system parameters that carry the valuable fault signature information. For illustration, the Line-a current and voltage waveforms in the presence of the a-g fault are analysed with DWT, and the detailed 4-level wavelet coefficients are presented in Figure 6a and Figure 6b, respectively. At the first level, a cutting spike at the fault inception point is noticed, which shows the highest frequency accessible in the faulty voltage and current waveform. These signal spikes are observed every time when a sudden change in the signals occurs. Thus, the identification of these fault signal spikes is not the proper way to identify the faults of the power transmission line. At the fourth level, the Det level coefficient finds that there exists a side-band that carries several smaller spikes. The changing of the system parameters like the fault resistance, fault distance, and fault type is done to observe the high spikes along the obtained side bands. If this process is accomplished beyond these decomposition levels, it is observed to carry higher side-bands, which makes the complex connection among the fault type, inception angle, and fault resistance and location. Therefore, the meaningful attributes are selected from the Det Level-4 coefficient. Using this coefficient, the energy of a signal [48] can be calculated as follows, where d ik is the detail level coefficient of a signal. The term j = 1, 2, 3, ..., n stands for the number of points used for each wavelet coefficient, and i = 1, 2, 3, ..., I denotes the scale. From [3], it can be concluded that the change in the wavelet coefficient due to the transient events such as faults, in turn, brings a change in the signal energy. The signal energy is calculated for each three phase current and voltage to generate the necessary input features of the proposed FDC model.
where l and m present the integer variable and la m 0 b 0 and a m 0 show the time and scale shift parameter, respectively. The parameters a 0 and b 0 in (1) were chosen with a value of 2 and 1, respectively.

Proposed Hierarchical Generative Fault Classification Model
The framework of the deep belief network (DBN) for MG distribution line FDC is introduced in this section. The proposed network consists of a stack of restricted Boltzmann machine (RBM). The use of a stack of RBM turns the network into a deep architecture where the RBM restrains the network to cope with overfitting of the training data.

Restricted Boltzmann Machine
The RBM is a generative probabilistic model with a two layer neural network. The first layer or the data input layer (visible layer) of RBM consists of visible units u v , while the second layer (hidden layer) consists of the hidden units u h . A connection between each visible and hidden unit appears with a weight matrix W that restricts to dealing with any two units of the visible and hidden layer of RBM as shown in Figure 7. The visible and the hidden units have bias vectors b v and b h , respectively. The energy function for the hidden and visible units is represented as, h are the binary states and biases for the i th visible unit and j th hidden unit. The number of visible and hidden unit is denoted as N v and N h . The joint arrangement of the given units can be determined via the energy function as, Here, Z is the partition function that ensures the normalized distribution. The probability of hidden unit u i h is activated for the visible vector U v , and the probability of visible unit u j v is activated for the hidden vector U h . For the binary state of visible and hidden units, the activation functions are represented as: Here, δ (·) carries the property of the activation function. In (3), the input data from the high-dimensional space are transformed to the low-dimensional space as characteristic vectors (CV). This process is designated as the positive phase learning. The negative phase learning is described in (4) when the input data are reconstructed from the CV. The other parameters W, b v , and b h are trained concurrently to lessen the reconstruction error. The categorical cross-entropy loss (CL), known as the negative log-likelihood, is a loss function that measures the similarity between the true level and the predicted level as below, The proposed DBN contains the five stacked RBM where the first hidden layer constructs the first RBM. The output of the first RBM is fed to the second RBM combined with the second and third hidden layers. In this way, the data from the input layer flow through the RBM stack and finally reach the final layer. We mention that the first invisible layer of the RBM is the visible layer for the second RBM, and so is the higher layer RBM. A DBN structure with four stacked RBM is displayed in Figure 8.

Unsupervised Learning of the Proposed Network
The training phase of the proposed DBN based fault classifier consisted of pre-training of the layer for the single RBM with an unsupervised approach and tuning of the DBN with the ANN. Initially, each RBM layer was trained by applying the Adam optimization algorithm on the negative log-likelihood probability of the training dataset. The gradient of the negative logarithmic probability for the visible layer in terms of the parameter W is specified as, where u i v u j h data = p presents the expectation under the data distribution and u i v u j h model = q presents the expectation under the model distribution. The contrastive divergence approximation after the k th iteration of Gibbs sampling is customarily chosen to train the RBM. If the input x r is given from a dataset x r | M r=1 , the Gibbs sampling for one single step is addressed as, and the updated rule for parameter W is given as,

Supervised Training of the Proposed Network
The fine-tuning of the model parameters of the DBN structure is performed by utilizing the ANN algorithm to minimize the error between the predicted output and the input samples. Analogous to the unsupervised learning stage, the supervised training is performed by the layer-by-layer training process. Taking weights from the RBM, each neuron in the ANN layer performs the following operation, where j implies the number of layers in the ANN architecture and i denotes the number of neurons in a single layer. As similar operations have to be performed for each of the ANN layers, the vectorization of the transposed weights W T is stacked together to form a matrix W. The initialization of these weights is done using "He normal initialization" where the weights are selected randomly, which connect the neurons of the proposed model [49]. After the initialization is done, the model itself updates the weights based on the size of the former layer of the neurons to meet the convergence criteria of the objective function. Similarly, the bias b of each neuron in the ANN layer creates the vertical vector b.
The ReLU function is used to activate all of the process, and the output from (13) is transferred through a non-linear activation function g n that forms an updated output matrix as, The vectors a and z of each ANN layer create the A, and Z matrices, respectively. From (12), we can write this as: To demonstrate the progress of learning, the model applies the categorical cross-entropy as a loss function, which can be represented in (16). As an example, the loss curve for grid-tied radial mode operation is shown in Figure 9, where the value of the learning coefficient is selected using a trial and error method. The aim of the method is to achieve the maximum training speed by minimizing the loss curve. The losses for all four modes of operation are calculated in a similar manner.
Here, t i,C = 1 if and only if i belongs to class C, and p i,c is the output probability of i belonging to class c. To go from correct class y where the arbitrary values y ∈ R C to normalized probability estimates p ∈ R C for a single instance, we can write, where i and C ∈ {1, ..., C} show the class range and p i , y 1 and y C refer to the class probabilities and values for a single instance. A program flowchart of the DBN based fault detection and classification scheme showing the entire process is given in Figure 10.

Results and Discussion
This section illustrates the performance of the proposed fault detection and classification technique with different parameter variations. The fault current and voltage signals exposed dissimilar magnitudes in islanded mode and grid-connected mode. Thus, it was difficult to design a unified fault classification scheme. Therefore, the performance of the proposed model was individually analysed for different system topologies (radial or loop) and operating modes (grid-connected or islanded). The accuracy was evaluated by three aspects: (i) type of input signal, i.e., how the system performed with only the current or voltage waveform and with the voltage and current waveforms combined; (ii) sampling resolution, i.e., system accuracy evaluation with a variety of data acquirement rates; (iii) fault signal with noise present in it, i.e., the system performance with the noise present in the sampled signal. Additionally, a comparative analysis in terms of the accuracy of the existing and proposed FDC techniques was also carried out to show the superior short circuit fault classification performance of the proposed classifier.

Performance Assessment of the DBN Based FDC Scheme
In machine learning, to measure the validity of a learning model, a list of the data sample to test the model performance is used, which should be different from the training data [50]. As a total of 1716 samples was made from the current and voltage waveforms for the individual datasets, it was first mixed and shuffled, and then, 30% of the data was randomly selected to test the effectiveness of the proposed model. The performance of the proposed DBN for Lines 1-3 was simulated with different system configurations and operating modes of MG and illustrated in Figure 11  Primarily, from the CM, it was seen that most of the fault classes for all system configuration were classified correctly. The first accuracy measurement criterion from the confusion matrix was the average classification accuracy (AA) [51] as stated in (18).
Here, N TD presents the total number of input data for the developed model, and N CC implies the number of correctly classified data. The proposed network could also show a similar performance for the rest of the distribution line. The average accuracy calculated for all of the distribution line is depicted in Table 2. From the result, the highest accuracy of the proposed classifier was recorded as 99.70% for the grid-connected radial mode operation. For the other system configurations, the classifier performed better than 99.5%, which was in line with the expectation.  However, the average accuracy could not present the detailed result about the model performance. Thus, to investigate how the classifier behaved for individual fault classes, the classification performance was further assessed with the F1-score. The F1-score is a function of precision and recall/sensitivity, which is considered as perfect when its value is one and the worst if it is zero. The precision, which is known as the positive predictive value, can be defined as, For a good classifier, the precision value should be one. From (19), if the FP increases, the precision value decrease,s which is not expected for a good classifier. Another metric, recall, which is known as the true positive rate or the sensitivity of the classifier, can be defined as, Like the precision, the recall value should be one for a good classifier. For this metric, if the FN increased, the recall value decreased, which was also not in line with the expectation. Therefore, another performance evaluation metric known as the F1-score was adopted, which takes both precision and recall into account. The higher F1 score of the proposed classifier for both voltage and current signals depicted in Tables 3 and 4 showed that the classifier had less problems with the false positives and false negatives. Furthermore, from the classification accuracy (user accuracy) for each fault class, it could be concluded that the classifier had the ability to classify the faults with high accuracy. Table 3. Precision, recall, F1-score, and individual class accuracy of the proposed classifier for grid-connected radial mode and grid-connected loop mode operation.

Effect of Sampling Resolution and Signal Type
In the proposed FDC method, the three phase current and voltage waveforms were collected with a 20 kHz sampling resolution. In reality, the sampling frequency (SF) can be much less than 20 kHz because of the restrictions of the data collection apparatus. In some practical field scenarios, the FDC system needs to utilize current or voltage waveforms to perform the classification tasks due to the unavailability of both signals at the same time instance. Thus, the fault classification performance of the proposed classifier was examined with the variation of input signal type, as well as sampling rate. The SF utilized in this research were 2, 5, 10, 15, and 20 kHz, and the input signal types were the voltage waveform (Scheme-1) or current waveform (Scheme-2) and combined current and voltage waveform (Scheme-3). The classification results for an SF and a particular type of signal was done by performing the classification process five times. Thereafter, the mean value of the accuracies was determined to achieve the final results as shown in Figure 12. The increase in classification accuracy was expected as a higher SF carried more detailed fault information for a distinct short circuit fault class. Moreover, Scheme-3 offered the highest classification performance for all considered SF. At a smaller sampling rate, better classification performance was observed with the three phase current waveform than with the three phase voltage waveform. Furthermore, the FDC system performed better with the voltage signal information at a higher SF. At an SF range between 5 kHz and 10 kHz, Scheme-2 and Scheme-3 showed almost the same classification accuracy. This scenario was also expected, as the voltage waveform carried less low-frequency fault information than the current waveform for a distinct fault class. On the other side, the voltage waveform contained spare faulted transients, which were suitable to investigate the type of short circuit fault at the higher sampling rate. The above analysis explicated that the expected accuracy could not be accomplished with only the current or voltage waveform. If both waveforms were considered at a time, a higher fault classification performance could be accomplished within the considered frequency level as particular short circuit fault information for both the three phase current and voltage intents was used. From the aforementioned study, it was observed that the classification accuracy using only the current or voltage waveform was not satisfying; rather, their fusion offered more than a 99% classification accuracy at the large level of the considered frequency range, which validated the effectiveness of the proposed FDC model. The similar classification results could also be observed for the rest of the distribution line of the studied MG system.

Effect of Noise Present in the Measured Signal Data on the Classification Accuracy
In practice, the current or voltage waveforms are continuously subjected to statistical noises or uncertainties, which play an important role in degrading the overall performance of the MG fault diagnosis. Thus, the dropout strategy was added to the hidden layer of the RBM to confirm the performance [52]. The fundamental idea of dropout is that it randomly sets the hidden nodes to zero at a certain probability to prevent overfitting of the model. That means some nodes present in the hidden layer do not engage in the training phase, and the weights will be reserved. The ignored nodes will be involved again in the next iteration. Thus, for each iteration process, the dropout strategy removes some random nodes of the hidden layer from the network. This process can effectively restrain the interdependence among the features and enhances the noise-immune classification performance. A comparison of the network structure with and without the dropout strategy is shown in Figure 13.
To examine how the proposed model could show the result with noisy data, the system was run with a new sample dataset, which contained both signals (current and voltage). To validate the model performance with the noisy data, the white Gaussian noise of different signal-to-noise ratios (SNR) as per [53] was added with 30% of the test data from the main dataset. Now, the proposed classifier was trained with the original data and tested with the contaminated data. The performance of the proposed classifier with the contaminated data is shown in Figure 14. From the result, it was observed that the fault classification performance of the proposed FDC model with the dropout strategy was higher than the model without dropout. The classification performance without the dropout strategy was observed to decrease faster with the decrement of the SNR value for each mode of MG operation, as shown in Figure 14. Finally, it was concluded that the classification accuracy against the noise guaranteed the robust performance of the proposed classifier.

Input
Hidden Output

Comparative Study
A comparison among the proposed model and several existing alternative fault diagnosis models is discussed in this section. In [30], a discrete orthonormal S-transform based multi-kernel extreme learning machine (MKELM) was proposed and compared with KELM and SVM. The mentioned approaches use only the one cycle post fault current signal to declare a fault type that cannot confirm the accurate results at a lower SF, as discussed in Section 3.2. However, comparing the proposed approach with the existing approaches is not fully consistent because of the following aspects. This research analysed the faults occurring in the considered MG for different topologies, i.e., looped or radial, operating modes, i.e., islanded or grid-tied, and for different distribution lines, separately. Therefore, the result analysis carried out in this study was much more challenging due to the diversity of the system parameters. Even though the results were not directly comparable, a comparison of the classification performance with the methods mentioned in [30] is depicted in Table 5. It was observed that the accuracy of the proposed approach was better than the other approaches. Again, the classification accuracy discussed in Section 3.3 for the lower value of SNR proved the noise-immune performance of the proposed method. Additionally, the DBN based FDC scheme had a superior feature over the conventional and modern FDC techniques, as illustrated in Table 6.  within the studied frequency band. The classification performance of the proposed model was further examined to ensure the robustness against the unwanted noise, and it was shown that the proposed model continued detecting faults against such incorporation of the noise. A comparative analysis between some of the existing models available in the literature and the proposed model was conducted considering a number of instances such as the types of input signals utilized and the level of the signal-to-noise ratio. The comparative analysis showed that the proposed model provided a more stable and trustworthy classification performance as compared to the state-of-the-art. The future implementation of this research may reflect the real-time system data collected by the measuring apparatus and deployed in a real-world power grid.