Optimal Coordination of Over-Current Relays in Microgrids Using Principal Component Analysis and K-Means

: Microgrids (MGs) are decentralized systems that integrate distributed energy resources and may operate in grid-connected or islanded modes. Furthermore, MGs may feature several topologies or operative scenarios. These characteristics bring about major challenges in determining a proper protection coordination scheme. A new optimal coordination approach for directional over-current relays (OCRs) in MGs is proposed. In this case, a clustering of operational models is carried out by means of a K-means algorithm hybridized with the principal component analysis (PCA) technique. The number of clusters is limited by the number of setting groups of commercially available relays. The results carried out on a benchmark IEC microgrid evidence the applicability and effectiveness of the proposed approach. combination of both clustering techniques resulted in a successful coordination for all clusters of operative scenarios. These results evidenced the effectiveness and applicability of hybridizing the proposed techniques and open the possibility of exploring new approaches to solve the optimal protection coordination in modern MGs.


Introduction
A microgrid can be defined as a local power network that integrates distributed energy resources (DERs) to provide grid backup or off-grid power to meet local electricity needs [1,2]. Recently, MGs have begun to be used as a novel alternative to mitigate some of the difficulties that arise in traditional electrical grids [3]. MGs can be considered as small-scale and controllable electrical distribution networks that are able to operate either connected to or disconnected from the main electrical grid [4]. MGs allow the incorporation of distributed generation (DG), presenting itself as a new and promising solution to meet the needs of the expanding energy consumption.
The coordination of electrical protections in MGs with the presence of DG has become a major challenge in modern electrical systems. Distribution networks are changing their traditional behavior due to the presence of DG units that give rise to bidirectional power flows and variable short-circuit levels. In addition, according to [5,6], microgrid protection systems must face two important aspects that make their operation difficult. The first one is the fact that they may operate either connected to or disconnected from the main AC power supply, and the second one is the intermittency of loads and generators that generates a dynamic behavior.
Over-current relays (OCR) have proven to be an effective alternative for the protection of MGs [7,8]. However, their proper incorporation and coordination still remains a research topic [7][8][9]. MGs have changed radial distribution networks into flexible, non-radial systems. Nonetheless, a suitable protection scheme that simultaneously ensures speed, selectivity, and reliability is still one of the most difficult challenges in MG design [10]. Recently, many research efforts have focused on the application of optimization techniques to find a suitable protection coordination scheme that reduces operating times while ensuring reliability and safety [11,12].
Several authors agree that directional OCRs are a proper alternative for protecting MGs [7]. The authors in [13] presented a methodology for determining the effect on the coordination of directional OCRs when DG units are included. In addition, the problem of coordination of directional OCRs is considered in [14] for distribution networks with high DG participation. In [15], the authors presented a procedure for the coordination of OCRs in which N-1 security criteria were considered. A methodology based on a multi-objective swarm optimization technique for OCR coordination is developed in [16]. The authors in [17] developed a methodology for OCR coordination that has the ability to adapt online according to system operating conditions. An adaptive protection system for OCR coordination is also presented in [18], being mixed a neural network technique with a support vector machine algorithm, which enhances the state recognition in MGs; basically, all network protection settings were dynamically modified to improve network reliability. A novel alternative to the traditional OCR protection coordination is presented in [11]. In this case, the authors propose a new constraint regarding the plug setting multiplier (PSM). In the approach proposed in [11], the optimization model considers real aspects of commercially available relays.
The authors in [19] proposed an adaptive protection scheme for OCRs based on the sets of settings available in the relays. They proposed the use of different setting groups, though limited by the capabilities of commercially available relays. In this case, for specifying a setting group, a technique was developed using linear integer programming and particle swarm optimization (PSO). A similar approach to the one proposed in [19] was implemented in [20][21][22]. In this case, the authors used unsupervised machine learning techniques to intelligently group operative scenarios to conduct the OCR coordination. In [20], a K-means OCR coordination scheme was proposed to group different operative scenarios taking into account the available setting groups of relays. In [21], the authors developed a similar work; nonetheless, instead of K-means, they implemented a self-organizing map (SOM) clustering technique. In [22], the authors discuss the implementation of the K-Medoids technique to cluster the operative scenarios of an MG. In [23], the authors propose an approach that uses K-means, SOM, and hierarchical techniques to improve the operating times of the optimal coordination of OCRs in microgrids; nonetheless, the coordination process must undergo an extra procedure (heuristic adjustment) to achieve feasible solutions, which renders the process more time consuming.
Our proposal consists of a novel protection coordination model for directional OCRs in MGs that not only considers DG but also handles numerous operational modes. The proposal can be applicable to adaptive OCRs, with the capacity of configuring multiple settings groups, taking into account OCR operating currents and also their corresponding limitations. The proposed approach incorporates the constraint introduced in [11] which limits the PSM; in addition, as in [19][20][21][22], the setting groups are limited, taking into account the real capabilities of commercially available OCRs. To complement the methodologies proposed in [20,23] that implement the K-means technique to intelligently group operative scenarios, in this paper, the principal component analysis (PCA) technique is used to reduce the dimensionality of the dataset and improve the performance of the K-means clustering technique. In this sense, our paper complements and improves the methodology proposed in [20] as it jointly uses PCA and K-means to enhance the protection coordination process. Additionally, as opposed to [23], it does not require an additional heuristic adjustment to reach feasible solutions. A comparison with the methodologies proposed in [20,23] is presented applied to a benchmark MG. Results show that the combination of K-means and PCA results in a better coordination scheme as compared with the use of K-means alone; furthermore, despite the potential applications of the PCA technique, to the best of the authors' knowledge, this one has not been implemented within the optimal protection coordination problem. It is also important to highlight that the simultaneous use of K-means and PCA has also not been reported for solving the optimal coordination of directional OCRs in MGs.

Mathematical Formulation
The mathematical formulation for the coordination of directional OCRs in MGs is presented in Equations (1)- (8). The proposed model minimizes the total operating time, guaranteeing the OCR coordination between the main and backup relays: subject to: Equation (1) is the objective function, where t i f is the operating time for relay i under a fault f , m is the total number of relays, and n is the number of faults. Depending on the fault location, each relay within the MG experiences a different fault current. Ideally, the relay closest to the fault (main relay) must operate first, so that the fault can be isolated. In the event that this relay does not operate, the backup relay should operate within a coordination time margin. Equation (2) represents the coordination margin between the main and backup relay for each fault, where CTI is defined as the coordination time interval, while t i f and t j f are the operation time of the main and backup relays, respectively, when fault f takes place. Minimum and maximum limits of t i f are given by Equation (3).
Equation (4) represents the minimum (TMS imin ) and maximum (TMS imax ) limits for TMS i . In this case, TMS i is the dial of the relay operation curve. Equation (5) defines the plug setting multiplier PSM i f that is seen by relay i under a fault f . PSM i f is the ratio that relates the fault current (I f i ) with the pickup current (ipickup i ) of relay i. Equation (6) indicates the minimum (ipickup imin ) and maximum (ipickup imax ) limits of ipickup i . Equation (7) indicates the minimum (PSM imin ) and the maximum (PSM imax ) limits of PSM i f . Finally, Equation (8) represents the operating curve of the main relay which allows calculating its operating time. In this case, A and B are the constants that define the characteristics of the relay operation curve. Finally, Equation (9) prevents the denominator of Equation (8) from being zero.

K-Means Clustering via PCA
PCA is a statistical technique used for unsupervised dimension reduction, while K-means is an unsupervised learning clustering technique. PCA and K-means have been used in the solutions of different problems in the field of engineering as reported in [24,25]. In [26], the authors used PCA in conjunction with K-means, demonstrating an improvement in the clustering results when combining both techniques. Results of [26] indicate that unsupervised dimension reduction is closely related to unsupervised learning.

Principal Component Analysis
PCA is a multivariate technique that analyzes data where observations are described by several interrelated quantitative dependent variables. PCA aims at extracting the important information from the data and representing such data as a set of new orthogonal variables that are called principal components [27]. According to [27], the objectives of the PCA technique are: (1) to extract the most important information from the data, (2) to compress the size of the dataset keeping only the important information, (3) to simplify the description of the dataset, and (4) to analyze the structure of the observations and variables.
In PCA, the new variables are obtained as linear combinations of the original ones. Basically, PCA features the following components: (1) the first and principal component is required to have the largest possible variance, (2) the second component is calculated under the constraint of being orthogonal to the first component and having the second highest possible variance, and (3) the third component is calculated under the constraint of being orthogonal to the first and second components and having the third highest possible variance, generally using only the first few principal components and ignoring the rest. The values of these new variables for the observations are called factor scores. These factor scores can be geometrically interpreted as the projections of the observations of principal components [27].
The principal components are obtained from the singular value decomposition of the data matrix X, as presented in (10). In this case, X is an mxn matrix where m relates to the observations and n to the variables. Therefore, m represents the number of operative scenarios, and n is the number of faults analyzed multiplied by the number of relays in the network. Matrix L is the matrix of left singular vectors of dimension mxk, R is a kxn matrix of right singular vectors, and D is the diagonal matrix of singular values of dimension kxk, where k is the rank of matrix X.
The matrix of principal components P of dimension mxk is obtained as the multiplication of the left singular vector matrix L with the diagonal matrix of singular values D.
The matrix R provides the coefficients of the linear combinations used to obtain the principal components.

K-Means Clustering Algorithm
K-means is an algorithm that allows creating groups of observations from a a set of data [28]. The K-means' main goal is to divide n observations into k groups or clusters in which each observation belongs to the cluster with the nearest centroid. Therefore, the distance d ij from each observation to its closest centroid is minimized [20]. The method follows the following steps: (1) select as many k points as the desired numbers of groups or clusters to establish the initial centroids, (2) each observation is linked to the nearest centroid to generate groups based on distance d ij , (3) new centroids are calculated for each group, (4) the observations are re-assigned to the group with the nearest center based on distance d ij , and (5) the process is continued until convergence is obtained. Figure 1 depicts the flowchart of the K-means clustering algorithm. In this case, Equation (13) corresponds to the Euclidean distance for calculating d ij , where x i is the observation, c j is the centroid of the group, and m is the number of objects in the group g j .

Methodology
The optimal coordination of OCRs is carried out following the flowchart depicted in Figure 2. Note that the first three steps involve the Digsilent software, while the last step consists of executing a genetic algorithm (GA). The process consists of six steps which are described below.
Step 1: Digsilent Power Factory software is used for modeling the test network, using input data.
Step 2: Operative scenarios (OS) for the test network are defined and configured.
Step 3: Short-circuit currents are obtained using Digsilent Power Factory software for different fault locations and for every OS.
Step 4: The simulation results obtained in Step 3 are used as input data for the PCA technique. Using the procedure described in Section 3.1, the main components P of the simulation data are obtained. The PCA technique modifies the representation of data variables to improve the performance of the K-means clustering algorithm.
Step 5: The main components P correspond to the input parameters of the K-means algorithm. The K-means algorithm performs the clustering of the operative scenarios to facilitate the protection coordination of the microgrid. Step 6: The simulation results obtained in Step 3 and the clusters acquired in Step 5 are the input parameters of the GA to find the setting of the protection coordination. In this case, the protection coordination model (given by Equations (1) to (9)) is solved for each cluster with the proposed GA; then, a set of coordinating parameters is generated for each OS.
Several methodologies can be used to solve the set of equations given by (1)- (9). In this case, a GA was selected since this type of technique has proven to be effective in obtaining the OCR coordination scheme as indicated in [29][30][31]. The GA starts with a random generation of candidate solutions that must go through the stages of selection, crossover, and mutation until a given number of generations has elapsed. A detailed description of the implemented GA can be consulted in [32]. Nonetheless, other techniques could be used for obtaining the OCR coordination scheme as reported in [33][34][35].

Results
The proposed approach was implemented in an IEC benchmark MG that integrates different DG technologies and presents several operational modes. This microgrid is depicted in Figure 3 and its parameters can be consulted in [11]. The parameters of the relays are presented in Table 1. Different generation and topological conditions were considered through the 16 operative scenarios (OS) described in Table 2. As for the number of scenarios, the network operator must determine how many will be sufficient to adequately model the behavior of the network. In this case, the microgrid topology was modified considering the different operating states (on-off) of CB-1 and CB-2. It was also considered that the MG may operate connected from the AC power supply and in islanded mode. Complementary information concerning the simulations and data are available via the authors upon request.

Parameter
Value The MG relays were numbered from 1 to 15 with the letter R preceding them. Figure 3 shows the position of relays. The IEEE 242 standard [36] suggests that the coordination time between the main and backup relays CTI must be equal at least 0.2 s. In this paper, a CTI of 0.3 s is considered to have a suitable coordination margin. Additionally, normally inverse IEC characteristic curves are used for the operating characteristics of OCRs. It was also considered that the OCRs allow up to four groups of settings to be configured.
Three variations were considered when solving the optimal coordination problem: with PCA and K-means independently and with both techniques combined together. In all cases, the problem was solved considering relays that allow configuring four groups of settings. In addition, a single group of adjustments was used for solving the OCR coordination.

Results Considering a Single Set of Parameters for All Scenarios
Initially, the coordination problem of directional OCR was carried out by using only one group of parameters intended to be suitable for all OSes (no clustering). Results are summarized in Table 3. Note that the number of cases in which Equation (2) is not fulfilled is 13, demonstrating that this solution is clearly not adequate. In general terms, this allows us to conclude that it is not possible to guarantee proper coordination when a single set of parameters is considered for all operative scenarios of the MG.

Method Operation Time (s) Violations
Without clustering 724.8 13

Results Using the K-Means Clustering Algorithm
In this section, we implement the methodologies proposed in [20,23] which consist of using clustering analysis for the optimal coordination of directional OCRs through the K-means algorithm. As regards [23], the authors also implement an additional step consisting of a heuristic adjustment which is not considered in this paper for comparative purposes. Furthermore, hierarchical clustering and SOM were implemented in [23] with similar performance. Table 4 presents the distribution of the clusters, the value of the objective function, and the number of cases in which coordination is not achieved. In this case, OT stands for operation time, and Viol indicates the number of coordination violations. According to these results, there are two cases in which it is not possible to enforce the constraint given by Equation (2) in C3. Table 5 presents the coordination adjustments using K-means clustering. Although coordination is not achieved, the results are better than those presented in Table 3 in terms of the number of violations and operational time of the coordination.

Results Using Principal Component Analysis
This section uses the proposed methodology of Section 4 for solving the OCR coordination problem by only using the graphic results of the PCA technique. The main objective of PCA is to extract the most important information from the data and then represent it as a set of new orthogonal variables called principal components. The original number of variables is generally reduced when using the principal components; hence, the dataset can be analyzed using fewer variables. After carrying out the simulations in the microgrid, a total of 75 variables were identified. After the applications of the PCA technique, 15 principal components were obtained. The 15 new variables or principal components extract the most important information from the data.
An analysis of total variance was used to determine the most significant principal components. The analysis of variance allows knowing the principal components found which gather the most information from the original dataset. In Figure 4, the total variance in percentage of the first seven principal components is presented since the rest are not significant enough to express the behavior of the data. It can be seen that principal component number one describes 76.14% of the total information in the original data, while the principal components two, three, and four describe 12.4%, 4.1%, and 3.5%, respectively. As a conclusion, there is only a need to consider the first four principal components for a detailed analysis of the data, since they describe 96.12% of the information within the original dataset. Principal components 1 and 2 are studied in order to perform a graphic analysis, since they describe 88.54% of the information within the original dataset. Figure   The biplot diagram allows a clustering of the data observations which represent the different operative scenarios of the microgrid. The operating scenarios presented in Table 2 are plotted in the biplot diagram as presented in Figure 6, from which a clustering can be performed. According to the location of the operative scenarios and considering the four groups of adjustments of OCRs, four clusters were defined: cluster 1 (C1), cluster 2 (C2), cluster 3 (C3), and cluster 4 (C4). Table 6 presents the distribution of each group in detail.  Table 6 presents the results of the clustering, the operation time, and the number of violations obtained with the PCA technique. According to these results, in only two cases, the coordination was not achieved, which represents a significant improvement compared with the results of Table 3 in which no clustering is considered. Comparing these results with those of the K-means presented in Table 4, it can be concluded that PCA alone is at least as effective as the K-means technique proposed in [20,23]; nonetheless, the later results in lower operational time which is a clear advantage. It is worth mentioning that the PCA technique is not a clustering algorithm as such; however, by using the graphical results that consider two components, the OS can be grouped according to their closeness in the biplot graph as shown in Figure 6. Table 7 presents the coordination adjustments for each relay obtained with the PCA technique.  Table 8 presents the results obtained with the proposed approach compared with those using the methodology presented in [20,23]. Note that when K-means is used in conjunction with PCA, no violations are obtained. This represents an important improvement compared to the previous results where each of these techniques is implemented separately. Despite the fact that that using K-means alone results in lower operational time, it results in an unfeasible coordination. Ensuring no violations on the coordination scheme is a top priority for engineers; therefore, it can be concluded that the proposed approach presents better performance. Table 9 presents the coordination adjustments of all relays with the proposed approach.

Conclusions
Modern distribution networks include MGs that exhibit several operative scenarios based on topology changes, making their protection coordination a difficult task. This paper presented a novel approach for optimal coordination of directional OCRs in MGs that integrate DG and feature several operative modes. The coordination is carried out by performing a clustering of the operational modes by means of K-means and PCA techniques and then solving the coordination constraints through a GA. Several tests were carried out with an IEC benchmark microgrid. Initially, both K-means and PCA methodologies were tested separately, but coordination was not satisfactorily achieved; nonetheless, the combination of both clustering techniques resulted in a successful coordination for all clusters of operative scenarios. These results evidenced the effectiveness and applicability of hybridizing the proposed techniques and open the possibility of exploring new approaches to solve the optimal protection coordination in modern MGs.