Intelligent Fault Detection System for Microgrids

The dynamic features of microgrid operation, such as on-grid/off-grid operation mode, the intermittency of distributed generators, and its dynamic topology due to its ability to reconfigure itself, cause misfiring of conventional protection schemes. To solve this issue, adaptive protection schemes that use robust communication systems have been proposed for the protection of microgrids. However, the cost of this solution is significantly high. This paper presented an intelligent fault detection (FD) system for microgrids on the basis of local measurements and machine learning (ML) techniques. This proposed FD system provided a smart level to intelligent electronic devices (IED) installed on the microgrid through the integration of ML models. This allowed each IED to autonomously determine if a fault occurred on the microgrid, eliminating the requirement of robust communication infrastructure between IEDs for microgrid protection. Additionally, the proposed system presented a methodology composed of four stages, which allowed its implementation in any microgrid. In addition, each stage provided important recommendations for the proper use of ML techniques on the protection problem. The proposed FD system was validated on the modified IEEE 13-nodes test feeder. This took into consideration typical features of microgrids such as the load imbalance, reconfiguration, and off-grid/on-grid operation modes. The results demonstrated the flexibility and simplicity of the FD system in determining the best accuracy performance among several ML models. The ease of design’s implementation, formulation of parameters, and promising test results indicated the potential for real-life applications.


Introduction
Distribution systems have presented several changes in the last years. Among the most significant ones, there is the integration of distributed energy resources (DER), which has been motivated by advances in power electronics and increased environmental awareness [1]. High integration of DER makes distribution network operation more complex, requiring the introduction of advanced control functionalities [2,3]. The presence of high-level penetration DER and control resources on distribution networks has given rise to a new concept known as microgrid [4,5]. A microgrid is defined as a group of interconnected loads and DER, with clearly defined electrical boundaries, acting as a single controllable entity from the grid's point of view, and operating in both grid-connected or islanded mode [6]. The use of these systems brings operational, environmental, and economic benefits, such The following considerations were made for the development of this research: • Only low impedance faults were detected. High impedance faults were beyond the scope of this work.

•
The device coordination process was not addressed.

•
It was assumed that the microgrids had robust control functionalities to guarantee their stability after to clean the fault.
The remainder of the paper is structured as follows: Section 2 presents the proposed fault detection system. Section 3 describes the case study. Section 4 contains the validation results and discussion. Finally, the conclusions of this work are presented in Section 5.

The Proposed Fault Detection System
The proposed fault detection system was based on ML techniques. This method considers intelligent electronic devices (IED) installed at each end point of the line section without a communication link between them, as illustrated in Figure 1.  The following considerations were made for the development of this research: • Only low impedance faults were detected. High impedance faults were beyond the scope of this work.

•
The device coordination process was not addressed.

•
It was assumed that the microgrids had robust control functionalities to guarantee their stability after to clean the fault. Each IED records the current and voltage signals of the three-phases at the node where it is located. Using these signals as inputs to ML models at each IED, they must discriminate between normal (no-faulted) and abnormal (fault) operating conditions. Thus, the FD can be formulated as a binary classification task, using the measurement signals as input features (attributes). The proposed FD system is divided into four stages, as illustrated in Figure 2. Each stage is explained in the following subsections.

Stage I: Database from Simulations
The performance of ML algorithms depends strongly on the quality of the training database. Because microgrids are designed to minimize the occurrence of faults, the number of faults recorded during actual operation is very low. Therefore, creating a training database using measures exclusively from an actual microgrid is not practical. Instead, actual measures should be complemented with simulations representing all normal and faulted conditions of the microgrid. It is then critical to determine the main factors involved in the operation of microgrids for both normal and fault operating conditions. The database building is then divided into two steps, which are described in Sections 2.1.1 and 2.1.2.

Step 1: Determining Factors and Levels for Normal and Faulted Microgrid Operating Conditions
In order to properly set up the simulations for data gathering, it is necessary to determine what are the main factors that affect microgrid operation and the intervals in which they perform. Table 2 lists operating factors, with their respective levels, as proposed in the literature. This list does not exclude the existence of other factors [13,22,23,33]. The non-faulted operation factors were chosen to cover as much as possible of the range of normal operating scenarios of the microgrid. On the other hand, for fault operation scenarios, these factors affect the fault magnitude (for low impedance faults) in a directly proportional manner [9].

Stage I: Database from Simulations
The performance of ML algorithms depends strongly on the quality of the training database. Because microgrids are designed to minimize the occurrence of faults, the number of faults recorded during actual operation is very low. Therefore, creating a training database using measures exclusively from an actual microgrid is not practical. Instead, actual measures should be complemented with simulations representing all normal and faulted conditions of the microgrid. It is then critical to determine the main factors involved in the operation of microgrids for both normal and fault operating conditions. The database building is then divided into two steps, which are described in In order to properly set up the simulations for data gathering, it is necessary to determine what are the main factors that affect microgrid operation and the intervals in which they perform. Table 2 lists operating factors, with their respective levels, as proposed in the literature. This list does not exclude the existence of other factors [13,22,23,33]. Table 2. Factors and levels commonly used.

Group
Factor Levels Reference

No-fault operation
Load change 30-150% [21,22,25,[34][35][36] Generation change 50-150% [28] Topology change Reconfiguration-section cut off-off grid [29] Cut off generation At least one DG to time [30] Capacitor switching At least one to time [37] Operation mode microgrid On-grid/off-grid [21,22,25,[34][35][36] Fault The non-faulted operation factors were chosen to cover as much as possible of the range of normal operating scenarios of the microgrid. On the other hand, for fault operation scenarios, these factors affect the fault magnitude (for low impedance faults) in a directly proportional manner [9]. Therefore, the normal operation scenarios and fault conditions for data gathering through simulation were defined by the factors and levels listed in Table 2. These scenarios are automatically simulated by using electromagnetic transient simulation software, as described in step 2.

Step 2: Database Generation from Simulations
When only synthetic databases are used, aspects such as accurately studied system modeled, the effect of the instrumentation on quality of electrical signals, and balance and randomization of the database should be considered in order to guarantee a performance satisfactory of the ML models in their implementation on the real network. The simulation data generation was carried out on three stages: baseline operation condition, generation of non-faulted and faulted events, and database labeling, as shown in Figure 3. For this step, the factors and levels of variation for microgrid operation defined in step 1 as well as a model of the microgrid, built-in software for EMT simulation, were used to set the simulation parameters.
Energies 2020, 13, x FOR PEER REVIEW 5 of 21 Therefore, the normal operation scenarios and fault conditions for data gathering through simulation were defined by the factors and levels listed in Table 2. These scenarios are automatically simulated by using electromagnetic transient simulation software, as described in step 2.

Step 2: Database Generation from Simulations
When only synthetic databases are used, aspects such as accurately studied system modeled, the effect of the instrumentation on quality of electrical signals, and balance and randomization of the database should be considered in order to guarantee a performance satisfactory of the ML models in their implementation on the real network. The simulation data generation was carried out on three stages: baseline operation condition, generation of non-faulted and faulted events, and database labeling, as shown in Figure 3. For this step, the factors and levels of variation for microgrid operation defined in step 1 as well as a model of the microgrid, built-in software for EMT simulation, were used to set the simulation parameters.   In this stage, the initial operating condition of the microgrid is set. Different baseline conditions were used to add variety to the training data. Modifications are carried out by load variations: low-medium load condition (30-70%), medium-nominal load condition: (70-100%), and nominal-high load condition (100-150%). Load variation in microgrid operation requires the estimation of injected powers by DER. For this purpose, optimal power flow is applied, minimizing losses and determining the active and reactive power contributions of each DER. The location of each IED is also defined in this stage. The voltage and current measurements are stored for all three-phases at each node of the microgrid. Additionally, the simulation time and its sampling rate are defined.

•
Stage 2: Generation of non-fault and faulted events The EMT simulation is divided into three intervals: the first interval is given by the baseline operation condition, which is the initial condition of the microgrid and is defined in stage 1. The second interval is generated by a change in the normal operating condition of the microgrid, corresponding to an increase/reduction on the demand.
This change is carried out by creating a random load variation event in the EMT simulation of no more than 50% of the baseline operation condition. The third interval corresponds to the fault condition. The fault occurs at 0% and 50% of the line sections of the microgrid. The levels in Table 2 establish the fault resistance values. For this research, a simulation time of 150 ms was considered, where the random load variation event and fault event occurred at 50 ms and 100 ms, respectively. These intervals were selected as being long enough to avoid any transitory effects but short enough to allow for fast response of the protection devices. Figure 4 illustrates the current signal recorded by an IED under the above conditions, and the sampling frequency was set to 10 kHz.
Energies 2020, 13, x FOR PEER REVIEW 6 of 21 In this stage, the initial operating condition of the microgrid is set. Different baseline conditions were used to add variety to the training data. Modifications are carried out by load variations: lowmedium load condition (30-70%), medium-nominal load condition: (70-100%), and nominal-high load condition (100-150%). Load variation in microgrid operation requires the estimation of injected powers by DER. For this purpose, optimal power flow is applied, minimizing losses and determining the active and reactive power contributions of each DER. The location of each IED is also defined in this stage. The voltage and current measurements are stored for all three-phases at each node of the microgrid. Additionally, the simulation time and its sampling rate are defined.
• Stage 2: Generation of non-fault and faulted events The EMT simulation is divided into three intervals: the first interval is given by the baseline operation condition, which is the initial condition of the microgrid and is defined in stage 1. The second interval is generated by a change in the normal operating condition of the microgrid, corresponding to an increase/reduction on the demand.
This change is carried out by creating a random load variation event in the EMT simulation of no more than 50% of the baseline operation condition. The third interval corresponds to the fault condition. The fault occurs at 0% and 50% of the line sections of the microgrid. The levels in Table 2 establish the fault resistance values. For this research, a simulation time of 150 ms was considered, where the random load variation event and fault event occurred at 50 ms and 100 ms, respectively. These intervals were selected as being long enough to avoid any transitory effects but short enough to allow for fast response of the protection devices. Figure 4 illustrates the current signal recorded by an IED under the above conditions, and the sampling frequency was set to 10 kHz.

• Stage 3: Database labeling
During the previous stage, for each simulated operation scenario, the V and I signals are obtained at the installation points of each IED. In this new stage, the resulting V and I signals are labeled as follows:  Non-faulted: this class includes all normal operation condition scenarios of the microgrid. This scenario is labeled as class 1 in this work.  Fault condition without relay activation: this class includes all fault events that can be detected by the IED but did not occur in its protection zone. This scenario is labeled as class 2 in this work.  Fault condition with relay activation: this class includes all fault events that occur in the protection zone of each IED.

Stage II: Input Data Adjustment
The definition and selection of attributes is a critical process in the application of ML techniques. The features should be selected, seeking to maximize the amount of information that they capture from the database [13]. Additionally, if a lower number of attributes is employed, the dimensionality of the problem space is also reduced, which can improve the performance of ML techniques on a given dataset [39]. Several attributes for FD approaches have been proposed in [13,[21][22][23]26,40]. In •

Stage 3: Database labeling
During the previous stage, for each simulated operation scenario, the V and I signals are obtained at the installation points of each IED. In this new stage, the resulting V and I signals are labeled as follows: Non-faulted: this class includes all normal operation condition scenarios of the microgrid. This scenario is labeled as class 1 in this work. Fault condition without relay activation: this class includes all fault events that can be detected by the IED but did not occur in its protection zone. This scenario is labeled as class 2 in this work.
Fault condition with relay activation: this class includes all fault events that occur in the protection zone of each IED.

Stage II: Input Data Adjustment
The definition and selection of attributes is a critical process in the application of ML techniques. The features should be selected, seeking to maximize the amount of information that they capture from the database [13]. Additionally, if a lower number of attributes is employed, the dimensionality of the problem space is also reduced, which can improve the performance of ML techniques on a given dataset [39]. Several attributes for FD approaches have been proposed in [13,[21][22][23]26,40]. In this work, the 49 attributes listed in Table 3 were used. These attributes were computed for each signal cycle, taking approximately 160 samples to describe a 60 Hz cycle. Table 3. Definition and estimation of attributes.

Attribute Estimation Number of Features References
Root mean square (RMS) voltage and current signal Mean of the fundamental frequency contours [25,30,35,36] Standard deviation of the fundamental frequency contours Entropy of the fundamental frequency contours Obliquity of the fundamental frequency contours It is possible that two or more of these attributes are highly correlated. Therefore, a selection method must be used to determine the most representative attributes.

Stage III: Parametrization and Training of ML Techniques
This stage is composed of three steps: the selection of the ML technique, selection of representative attributes, and parametrization and training. The processing steps are explained in Sections 2.3.1-2.3.3.

Step 4: Selection of the ML Technique
Three ML algorithms are considered, choosing among them by finding the best overall performance when each algorithm is tuned using a heuristic adjustment of its hyper-parameters. For this approach, the value of each hyper-parameter is varied between a range until the performance of the corresponding technique reaches its peak. This process was carried out considering the 49 attributes from stage 2. The ML techniques considered for this work were random forest (RF), support vector machine based on radial basis function kernel (SVM), and K-nearest neighbors (K-NN), which are commonly used in the literature to solve fault detection problems. Table 4 shows the hyper-parameters for each ML technique and the intervals considered for tuning [42]. Table 4. Machine learning (ML) techniques and hyper-parameters.

ML Technique Hyper-Parameter Interval
Random forest classifier (RF) Threshold selection Gini, entropy. In this step, the number of features that represent the largest amount of database information are selected, reducing problem space dimensionality and computational requirements. For this research, two feature selection techniques were used [43]: PCA is a complexity reduction technique used for the analysis of intrinsic database variability [44]. This technique uses a transformation to gain new attributes, called components, on the basis of the original features, maximizing variance and minimizing the lineal correlation between features. In order to diminish the loss of information in the transformation process, PCA employs the covariance matrix for transforming a database Xn x m in a database Yn x l such that l m. This method is given by Equation (1).
where X is the set of data to be analyzed by principal components, l is the number of components, p a is an orthonormal vector that contains the relationship between the features, and t a is the projections of X over p a . Finally, E is the error of the model. Therefore, given Equation (1), the PCA is placed on the decomposition in eigenvalues of the covariance matrix cov(X) = X T X/(n − 1).
The new features are selected on the consideration that their contribution is more than 1% of the database variation and that their combination with other features represents at least 98% of the total data variation. •

Singular value decomposition (SVD)
Algebraically, any matrix X can be divided into a linear combination of matrixes with rank 1, which is described by Equation (2) [45].
where u and v are orthonormal matrices, and σ is a diagonal matrix with non-negative values, which are called singular values. A way to obtain these from a matrix is through the eigenvalues λ i of the square matrix A = XX T , such that the singular value σ i meets the condition σ i = √ λ i . As is the case with the PCA method, the SVD saves the relevant information for each dimension of a database. With this information, it is possible to determine the number of attributes that can describe the original database efficiently. As a disadvantage, both Principal component analysis (PCA) and SVD result in new abstract feature spaces generated by combining features in the original space.

Step 6: Parametrization and Training of ML Techniques
The parametrization and training processes are performed simultaneously in this step. As in step 5, the ML technique is selected and its hyper-parameters are estimated, and the parametrization process is addressed in order to determine the combination of the features that maximize the performance of the ML technique computed using cross-validation. Cross-validation is a procedure for determining the ML performance through the training database being divided into N subsets (folds), and N models are trained and tested using a leave-one-out scheme [46]. The overall performance is computed as the average of the accuracy of the N resulting models. For the case presented here, the number of feature combinations was 249. Therefore, to determine the feature combination that maximizes the performance of the ML technique is not a trivial task. That is why a Chu-Beasley genetic algorithm (CBGA) was implemented to achieve this goal. Figure 5 presents each stage of the CBGA, and the next sections explain in detail each stage [47].
Energies 2020, 13, 1223 9 of 21 the value of zero (0) or one (1): zero indicates that the electrical feature obtained in step 3 is not considered, and 1 that this is considered in the combination of attributes defined by the individual. Additionally, an infeasibility function is implemented to limit the number of features for each chromosome. The maximum number of features is determined by the techniques presented in step 5. Additionally, each chromosome is qualified through a fitness function, which is computed as the performance of the ML model when trained on the features defined in the chromosome. • Stage 2: Making of next generation In this process, new generations of individuals are obtained from the crossing of the individuals that compose the current population. This is achieved through three evolution mechanisms as shown in Figure 5. The selection of parents is carried out by tournament, where two groups composed of five individuals of the current population are randomly selected, such as the individual with the highest performance being selected from each group. With the definition of the two parents, the crossing between them is generated using a selected crossover. For this, two random numbers between 0 and the length of the individual are obtained, exchanging the part of the chromosome that is between these two values. From this process, two new individuals called children are obtained, but only one is chosen randomly for the next generation [49]. The third evolution mechanism is the mutation. For this process, a random number from 0 to 1 is found. If the number is greater than 0.85, the genome of a random chromosome is changed so that if the genome is 1, this is changed to 0.

• Stage 3: Decision criterion
Considering the coding strategy used in the approach, the new descendent could substitute the individual who has the worst objective function if, and only if, the descendent has a better objective function and meets the diversity criterion. This is achieved through three decision criteria. The first corresponds to the feasibility criterion, which determines if the number of features defined by the current chromosome is less than the number of representative features selected in step 5. The second criterion corresponds to the aspiration criterion, where if the performance of the current descendent is lower than the fitness of any chromosome of the current population, and then a number of mutations are applied in order to improve their fitness. Finally, the third criterion corresponds to the selection criteria. This determines whether the new descendent replaces an individual in the current population. The replacement is carried out if, and only if, there is an unfeasible chromosome in the current population, or if the child chromosome has better performance than the worst individual in the population and is less infeasible than the individual being replaced. In cases of gaining better performance and major infeasibility, it replaces the most infeasible. The CBGA is a computer simulation in which a population of abstract representations (genotype of the genome) of solution candidates (individuals or chromosomes) to an optimization problem is modified through an evolutionary mechanism that results in a trend towards better solutions [48]. For the fault detection problem, an initial population of 20 individuals is used, where each individual is composed by 49 chromosomes that represent each electrical attribute. Each chromosome can take the value of zero (0) or one (1): zero indicates that the electrical feature obtained in step 3 is not considered, and 1 that this is considered in the combination of attributes defined by the individual. Additionally, an infeasibility function is implemented to limit the number of features for each chromosome. The maximum number of features is determined by the techniques presented in step 5. Additionally, each chromosome is qualified through a fitness function, which is computed as the performance of the ML model when trained on the features defined in the chromosome.

•
Stage 2: Making of next generation In this process, new generations of individuals are obtained from the crossing of the individuals that compose the current population. This is achieved through three evolution mechanisms as shown in Figure 5. The selection of parents is carried out by tournament, where two groups composed of five individuals of the current population are randomly selected, such as the individual with the highest performance being selected from each group. With the definition of the two parents, the crossing between them is generated using a selected crossover. For this, two random numbers between 0 and the length of the individual are obtained, exchanging the part of the chromosome that is between these two values. From this process, two new individuals called children are obtained, but only one is chosen randomly for the next generation [49]. The third evolution mechanism is the mutation. For this process, a random number from 0 to 1 is found. If the number is greater than 0.85, the genome of a random chromosome is changed so that if the genome is 1, this is changed to 0.

Stage 3: Decision criterion
Considering the coding strategy used in the approach, the new descendent could substitute the individual who has the worst objective function if, and only if, the descendent has a better objective function and meets the diversity criterion. This is achieved through three decision criteria. The first corresponds to the feasibility criterion, which determines if the number of features defined by the current chromosome is less than the number of representative features selected in step 5. The second criterion corresponds to the aspiration criterion, where if the performance of the current descendent is lower than the fitness of any chromosome of the current population, and then a number of mutations are applied in order to improve their fitness. Finally, the third criterion corresponds to the selection criteria. This determines whether the new descendent replaces an individual in the current population. The replacement is carried out if, and only if, there is an unfeasible chromosome in the current population, or if the child chromosome has better performance than the worst individual in the population and is less infeasible than the individual being replaced. In cases of gaining better performance and major infeasibility, it replaces the most infeasible.
Stages 2 and 3 are performed until a defined number of iterations is reached, or the accuracy change rate between the parents is less than 1%. As a result of step 7, the ML model for each IED with the best performance reported in the training process is obtained, obtaining a combination of 16 features in the worst-case scenario, thus reducing the ML model complexity.

Stage IV: Validation of ML Techniques
Step 7: Performance of the ML Techniques In order to validate the ML models determined in step 2.3, their performance was studied using a test database, which corresponded to 15% of the database that was not considered in the training process.

Case Study
The proposed method was validated on the modified IEEE 13-node test feeder [50]. This feeder operates at a voltage of 24.9 kV and is characterized by being short and relatively highly loaded, and having overhead and underground lines, shunt capacitors, an in-line transformer, and unbalanced loading. This system was modeled in the DIgSilent Power Factory simulation software and modified by inserting one photovoltaics source (PV) system of 1.5 MW, a conventional synchronous generator of 2 MW, and one wind generator of 1MW. Figure 6 presents the IEEE 13-node test feeder.
Energies 2020, 13, x FOR PEER REVIEW 10 of 21 Stages 2 and 3 are performed until a defined number of iterations is reached, or the accuracy change rate between the parents is less than 1%. As a result of step 7, the ML model for each IED with the best performance reported in the training process is obtained, obtaining a combination of 16 features in the worst-case scenario, thus reducing the ML model complexity.

Stage IV: Validation of ML Techniques
Step 7: Performance of the ML Techniques In order to validate the ML models determined in step 2.3, their performance was studied using a test database, which corresponded to 15% of the database that was not considered in the training process.

Case Study
The proposed method was validated on the modified IEEE 13-node test feeder [50]. This feeder operates at a voltage of 24.9 kV and is characterized by being short and relatively highly loaded, and having overhead and underground lines, shunt capacitors, an in-line transformer, and unbalanced loading. This system was modeled in the DIgSilent Power Factory simulation software and modified by inserting one photovoltaics source (PV) system of 1.5 MW, a conventional synchronous generator of 2 MW, and one wind generator of 1MW. Figure 6 presents the IEEE 13-node test feeder.

Validation and Discussion
The validation of the proposed FD system was carried out by its implementation on the case study and the sensitivity analysis. Sections 4.1. and 4.2. present the obtained results and discussion for the application of each stage of FD system and sensitivity analysis. The database from simulations was obtained by simulating the possible operating scenarios of the microgrid. These scenarios were obtained from its operation modes (on-grid/off-grid) and three reference load conditions defined as low-load condition, half-load condition, and nominal-load condition, as presented in Table 5. Tables A1-A3 of the Appendix A show the load values used in the

Validation and Discussion
The validation of the proposed FD system was carried out by its implementation on the case study and the sensitivity analysis. Sections 4.1 and 4.2 present the obtained results and discussion for the application of each stage of FD system and sensitivity analysis.  The database from simulations was obtained by simulating the possible operating scenarios of the microgrid. These scenarios were obtained from its operation modes (on-grid/off-grid) and three reference load conditions defined as low-load condition, half-load condition, and nominal-load condition, as presented in Table 5. Tables A1-A3 of the Appendix A show the load values used in the generation of the database. Several factors for generation of the normal operation conditions and faulted conditions were used. Table 6 presents the factors used in this stage. For each group (normal condition and faulted-condition), 23,606 scenarios were obtained, because the theory of the ML recommends that each class has the same number of scenarios so as not to bias the technique performance [51]. For each scenario, voltage and current signals at the installation points of the IEDs were obtained. The location of the IED is illustrated in Figure 6. Additionally, the labeling process mentioned in Section 2.1.2 was also carried out. As a result of this stage, a database from simulations composed of the voltage and current signals at the installation points of the IED were obtained. In this stage, the 49 attributes defined in Section 2.2 were estimated for each scenario obtained in stage 1. In addition, the randomization of the database was generated by using a random Python function that returns uniformly distributed pseudorandom numbers [52]. The number of cases selected for the validation process was 7080.

Stage III: Parametrization and Training of ML Techniques
The parameterization process was carried out in the three steps mentioned in Section 2.3. The following sections present their application to the case study. •

Selection of the ML technique
Energies 2020, 13, 1223

of 21
Three classic ML techniques were used in the proposed methodology for the case study: random forest (RF), support vector machine (SVM), and K-nearest neighbors (K-NN). The selection of the ML technique was carried out by heuristic adjustment of its hyper-parameters to gain an improvement in its performance, as presented in Section 2.3.1. The performance of the techniques was given by accuracy, as defined by Equation (3).
where TF is the number of operation conditions that are true under fault; TFW is the number of operation conditions that are true in fault with activation; TNF is the number of operation conditions that are true under no-fault; FF is the number of operation conditions that wrongly predicted a fault; FFW is the number of operation conditions that wrongly predicted a fault with activation, and FNF is the number of operation conditions that wrongly predicted a no-fault. For this process, the 49 attributes for each database scenario were considered. In the next section, the performances for each ML technique in function of adjustment of their hyper-parameters are presented. These processes are executed for each IED, taking into account the validation dataset.

Support vector machine (SVM)
This SVM used a radial basis function kernel, which had two hyper-parameters, C and γ. To assess the effect of each hyper-parameter individually, one of them was set to 1 and the other hyper-parameter was varied according to the interval of Table 4. Figure 7 shows the behavior of the accuracy of the SVM technique when the hyper-parameters were varied independently.
Energies 2020, 13, x FOR PEER REVIEW 12 of 21 in its performance, as presented in Section 2.3.1. The performance of the techniques was given by accuracy, as defined by Equation (3).
where TF is the number of operation conditions that are true under fault; TFW is the number of operation conditions that are true in fault with activation; TNF is the number of operation conditions that are true under no-fault; FF is the number of operation conditions that wrongly predicted a fault; FFW is the number of operation conditions that wrongly predicted a fault with activation, and FNF is the number of operation conditions that wrongly predicted a no-fault. For this process, the 49 attributes for each database scenario were considered. In the next section, the performances for each ML technique in function of adjustment of their hyper-parameters are presented. These processes are executed for each IED, taking into account the validation dataset.
o Support vector machine (SVM) This SVM used a radial basis function kernel, which had two hyper-parameters, and . To assess the effect of each hyper-parameter individually, one of them was set to 1 and the other hyperparameter was varied according to the interval of Table 4. Figure 7 shows the behavior of the accuracy of the SVM technique when the hyper-parameters were varied independently. From Figure 7a,b, it can be observed that ML technique performance improved for all IED as the hyper-parameters increased. However, there was a zone where the increase of the hyper-parameter did not produce an improvement in performance. Hyper-parameters must be adjusted near this zone to avoid overfitting.
o Random forest (RF) For random forest, the Gini criterion was selected to minimize the probability of misclassification. Therefore, the hyper-parameter to be adjusted was the number of trees. Figure 8 shows the behavior of the accuracy of the RF technique when the number of trees was modified. From Figure 7a,b, it can be observed that ML technique performance improved for all IED as the hyper-parameters increased. However, there was a zone where the increase of the hyper-parameter did not produce an improvement in performance. Hyper-parameters must be adjusted near this zone to avoid overfitting.

Random forest (RF)
For random forest, the Gini criterion was selected to minimize the probability of misclassification. Therefore, the hyper-parameter to be adjusted was the number of trees. Figure 8 shows the behavior of the accuracy of the RF technique when the number of trees was modified. Similar to the previous case, Figure 8 shows that if the number of trees was increased, a slight increase in accuracy was obtained. However, as the number of trees increased, this improvement tended to be negligible. The above occurred for a number of trees greater than eight, where the accuracy for relays was greater than 90% for RF technique evaluated. The best performance was achieved for relays 1 and 10. The good performance of relay 1 was probably related to the fact that it acted for all faults in on-grid connected mode, whereas in the off-grid mode, it should not detect faults. On the other hand, relay 10 only discriminated the faults that occurred in its line segment.

o K-nearest neighbors (K-NN)
The performance of the K-NN technique in function of adjustment of its hyper-parameters was obtained. In this case, the hyper-parameter was the K neighbors. Figure 9 presents the behavior of the accuracy of this technique when the number of K neighbors was modified. For the K-NN technique, the performance decreased for all IED as the hyper-parameters increased. To avoid overfitting problems, the hyper-parameter should be adjusted to a small number of neighbors. For this technique, the number of neighbors' K was set to 3. Note that it is possible to achieve accuracies greater than 87% for all relays regardless of K value. The accuracy for relays 1 and 10 was similar to that presented in Figure 8. This supports the assumption that the allocation of these relays influences their performance.
o Hyper-parameter setting for each ML technique From the results obtained in Figures 7-9, the adjustment values of the hyper-parameters were selected by applying an exponential smoothing technique to each curve and taking their inflection point. These values were approximated to the nearest integer value, and the value that was repeated Similar to the previous case, Figure 8 shows that if the number of trees was increased, a slight increase in accuracy was obtained. However, as the number of trees increased, this improvement tended to be negligible. The above occurred for a number of trees greater than eight, where the accuracy for relays was greater than 90% for RF technique evaluated. The best performance was achieved for relays 1 and 10. The good performance of relay 1 was probably related to the fact that it acted for all faults in on-grid connected mode, whereas in the off-grid mode, it should not detect faults. On the other hand, relay 10 only discriminated the faults that occurred in its line segment.

K-nearest neighbors (K-NN)
The performance of the K-NN technique in function of adjustment of its hyper-parameters was obtained. In this case, the hyper-parameter was the K neighbors. Figure 9 presents the behavior of the accuracy of this technique when the number of K neighbors was modified. Similar to the previous case, Figure 8 shows that if the number of trees was increased, a slight increase in accuracy was obtained. However, as the number of trees increased, this improvement tended to be negligible. The above occurred for a number of trees greater than eight, where the accuracy for relays was greater than 90% for RF technique evaluated. The best performance was achieved for relays 1 and 10. The good performance of relay 1 was probably related to the fact that it acted for all faults in on-grid connected mode, whereas in the off-grid mode, it should not detect faults. On the other hand, relay 10 only discriminated the faults that occurred in its line segment.
o K-nearest neighbors (K-NN) The performance of the K-NN technique in function of adjustment of its hyper-parameters was obtained. In this case, the hyper-parameter was the K neighbors. Figure 9 presents the behavior of the accuracy of this technique when the number of K neighbors was modified. For the K-NN technique, the performance decreased for all IED as the hyper-parameters increased. To avoid overfitting problems, the hyper-parameter should be adjusted to a small number of neighbors. For this technique, the number of neighbors' K was set to 3. Note that it is possible to achieve accuracies greater than 87% for all relays regardless of K value. The accuracy for relays 1 and 10 was similar to that presented in Figure 8. This supports the assumption that the allocation of these relays influences their performance.
o Hyper-parameter setting for each ML technique From the results obtained in Figures 7-9, the adjustment values of the hyper-parameters were selected by applying an exponential smoothing technique to each curve and taking their inflection point. These values were approximated to the nearest integer value, and the value that was repeated For the K-NN technique, the performance decreased for all IED as the hyper-parameters increased.
To avoid overfitting problems, the hyper-parameter should be adjusted to a small number of neighbors. For this technique, the number of neighbors' K was set to 3. Note that it is possible to achieve accuracies greater than 87% for all relays regardless of K value. The accuracy for relays 1 and 10 was similar to that presented in Figure 8. This supports the assumption that the allocation of these relays influences their performance.

Hyper-parameter setting for each ML technique
From the results obtained in Figures 7-9, the adjustment values of the hyper-parameters were selected by applying an exponential smoothing technique to each curve and taking their inflection point. These values were approximated to the nearest integer value, and the value that was repeated more often was taken as the hyper-parameter setting. The hyper-parameter setting for each ML technique is presented in Table 7. Comparison and selection of the ML technique Table 7 shows that the best performing technique for the cases evaluated was RF. Additionally, the ML models obtained with this technique are easy to implement. For these reasons, RF was selected in this work. •

Selection of representative attributes
The selection of representative attributes was carried out by means of PCA clustering and SVD clustering techniques, as presented in Section 2.3.2. Table 8 presents the number of representative attributes determined by each technique for each system relay. Additionally, it shows the percentage of information that represents the number of attributes.  Table 8 shows that the combination of 16 features can represent more than 98% of the information of the database. Therefore, 16 was selected as the maximum number of representative features. The above represents a significant reduction of attributes (from 49 to 16), which reduces the computational effort and the presenting of data scarcity [53]. •

Parametrization and training of ML techniques
Once the maximum number of representative attributes was determined, a Chu-Beasly genetic algorithm was used in order to determine the combination of attributes that maximize the performance of the ML technique. Section 2.3.3 presents the formulation of the algorithm used. The results obtained for each relay with its combination features and accuracy are shown in Table 9.
These results showed high accuracy of the model obtained in the training process. However, it was necessary to determine the accuracy for events that were not used in the parameterization and training process, which is presented in Section 4.1.4.  To validate the performance of the training models obtained in stage III, the 15% of the database generated in stage I, and that which was not used in the training process was considered. This validation considered all factors presented in Table 6. Additionally, in order to guarantee the statistical validity of the experiment, the proposed FD system was executed 30 times for the tests evaluated. Table 9 shows the accuracy of the ML models for each relay validated.
The results obtained showed satisfactory performance of the proposed FD system, presenting an accuracy greater than 95% for all cases evaluated. Although similar performances were reported in [22,23,[25][26][27][28][29][30], in this work, a strategy to select the ML technique, the representative features, and their combination in order to optimize the performance of proposed FD technique were formulated. Additionally, all the stages were presented with enough detail for their understanding and replication, which is not usually observed in the FD state-of-the-art techniques.
However, these tests do not allow for the determination of the factors that affect most the performance of the FD system. In consequence, a sensitivity analysis was performed, as presented in the following section.

Sensitivity Analysis
In order to know the factors that directly affect the performance of the proposed FD system, a sensitivity analysis by an experimental design was executed. This was composed of a set of five factors, which are presented in Table 10. For each level, 600 repetitions were executed in order to guarantee the statistical validity of the experiment. According to the factors, levels, and the number of repetitions, the total number of experiments was 28,800, obtained by TNE = r n i=1 n i where r is the number of repetitions and n i is the number of levels of the factor i. Each experiment was represented with the accuracy obtained after each trained model was tested with the validating signals that described the experiment. The homogeneity between all the populations that were described by the level combinations was validated by the sensitivity analysis. The above was achieved through the following hypothesis test: where i represents the level combination that has a different mean in case the null hypothesis was rejected [54]. An analysis of variance ANOVA was selected as a way to refuse the null hypothesis. ANOVA residues were employed to verify accomplishment with the normality, independence, and homoscedasticity criteria. The above is shown in Figure 10. In addition, statistical testds such as Jarque-Bera, Durbin-Watson, and Levene were executed as another way to confirm ANOVA assumptions.
Energies 2020, 13, x FOR PEER REVIEW 16 of 21 of levels of the factor . Each experiment was represented with the accuracy obtained after each trained model was tested with the validating signals that described the experiment. The homogeneity between all the populations that were described by the level combinations was validated by the sensitivity analysis. The above was achieved through the following hypothesis test: where i represents the level combination that has a different mean in case the null hypothesis was rejected [54]. An analysis of variance ANOVA was selected as a way to refuse the null hypothesis. ANOVA residues were employed to verify accomplishment with the normality, independence, and homoscedasticity criteria. The above is shown in Figure 10. In addition, statistical testds such as Jarque-Bera, Durbin-Watson, and Levene were executed as another way to confirm ANOVA assumptions.
The p-values of the statistical tests Jarque-Bera, Durbin-Watson, and Levene were 0.0945, 0.3395, and 0.2642, respectively. For all the tests, the p-values were higher than 0.05. Therefore, the assumptions of normality, homoscedasticity, and independence were validated, and the ANOVA results were truthful.
The results of the ANOVA are presented in Table 11. Factors with p-values greater than 0.05 were considered not statistically significant for the model studied. The above occurred for factors A and C: fault type and load behavior, respectively. Therefore, it is possible to reject that these factors had an influential factor in the sensitivity of the proposed FD system. The above follows the shown behaviors, where each IED was composed of different feature combinations and presented different behavior with respect to the hyper-parameters of the techniques. On the other hand, it was expected that a grid connection such as the fault position had a statistical dependence because this incident affected the protection configuration directly.  The p-values of the statistical tests Jarque-Bera, Durbin-Watson, and Levene were 0.0945, 0.3395, and 0.2642, respectively. For all the tests, the p-values were higher than 0.05. Therefore, the assumptions of normality, homoscedasticity, and independence were validated, and the ANOVA results were truthful.
The results of the ANOVA are presented in Table 11. Factors with p-values greater than 0.05 were considered not statistically significant for the model studied. The above occurred for factors A and C: fault type and load behavior, respectively. Therefore, it is possible to reject that these factors had an influential factor in the sensitivity of the proposed FD system. The above follows the shown behaviors, where each IED was composed of different feature combinations and presented different behavior with respect to the hyper-parameters of the techniques. On the other hand, it was expected that a grid connection such as the fault position had a statistical dependence because this incident affected the protection configuration directly.

Conclusions
This paper presented an intelligent fault detection system for microgrids. The obtained results showed a satisfactory performance, with an accuracy greater than 95.7% for the cases evaluated, although only voltage and current measurements registered locally by IED were used and the need for communication systems for the protection process was eliminated.
Additionally, the intelligent FD system presented a methodology composed of four steps that allowed for its implementation on any microgrid. From these steps, the database from simulations generating the process of micro-network operation and parameterization was highlighted.
The database-generating process presented recommendations for the generation of a high-quality database, which would guarantee the success of the use of ML models. The parameterization process showed how to determine the number of representative attributes that represent the largest amount of information in the database, which is valuable in order to reduce computational effort and avoid the presence of data scarcity. In addition, in the same parameterization process, a Chu-Beasley genetic algorithm was used to determine the best combination of attributes that would maximize the performance of ML techniques. Finally, the technique presented performances greater than 95% in both the training and validation process, and the sensitivity analysis showed that factors such as the fault type and load condition did not affect the performance of the proposed fault detection system, whereas other factors such as IED location were significant for the model. This implies that the training process must be executed on all available devices because, depending on this, the performance of the methodology might change.
Finally, we can summarize the main practical and economic benefits of employing the proposed FD system as being Funding: This work was partially supported by the project "Desarrollo de una plataforma hardware/software para el modelado de la integración de pequeñas plantas de generación distribuida renovable en redes eléctricas de baja tensión de la Región Caribe para evaluación de su impacto sobre la estabilidad del sistema y el flujo energético de la red" of the Administrative Department of Science, Technology and Innovation of Colombia (COLCIENCIAS), code 57598, and contract number 037-2018.

Conflicts of Interest:
The authors declare no conflicts of interest.    Table A3. Random variation load.