1. Introduction
Maintaining a transient stability of the electric power system remains one of the fundamental, and increasingly more important, preconditions for the reliable operation of high-voltage electric grids [
1]. This crucial ability still heavily rests on a timely and correct operation of the relay protection of individual power generators [
2]. The emphasis is on the correct operation, which presumes proper setting and coordination of different generator relay protection functions and absence of false activations. The foremost significant here are the underimpedance (21G) and the out-of-step (78) relay protections, where correct operation above all precludes spurious generator trips during (stable) power swings [
3]. Accurate detection and correct distinction between stable and unstable power swings, by these relay protection functions, is seen as a critical precondition for valid generator tripping. With the reduced system inertia, this task becomes even more influential [
4]. Consequently, the intention of this paper is to enhance these generator relay protections by supporting their traditional decision-making logic (where it exists) with a machine learning based classifier, in order to strengthen them against erroneous trips due to (stable) generator swings. If the power swing detection logic is absent from the relay protection function, then the classifier can supply it. The classifier will specifically aid and support relay decisions for blocking the underimpedance protection during stable swings. It will also expedite trip decisions of the out-of-step protection during unstable swings. It needs to be stated here that the implementation and configuration of these generator protections will be in accordance with the IEC standards and will be primarily seen from the European practice, which differs considerably from that found in the United States (US); see, e.g., Siemens 7UM62 [
5] and ABB REG 670 [
6] manuals for more information. This also means that both of these protections will feature polygonal trip characteristics (and not MHO-based circles used extensively in the US). Implementation will consider numerical protection relays from Siemens AG.
Generator underimpedance relay protection (also known as the impedance or distance protection, ANSI 21G) is intended for the protection of generator and its step-up transformer, along with everything in-between them (i.e., high-current bus duct connections and equipment), from damage due to short-circuit events. In its standard two-zone configuration, it protects (part of) the generator’s stator winding and both windings of the step-up transformer from phase-to-phase short-circuit faults [
5]. It supplements a differential protection of the generator and its step-up transformer (ANSI 87T) and serves as its main backup protection. And, unlike differential protection, it can further serve as a backup protection for the incident transmission lines. However, since it has open-ended protection zones, it needs to be coordinated with the relay protection of transmission lines. The extent of the protection coverage (i.e., reach) for different zones (and different fault types), along with their backup protection scope, can vary considerably between different implementations. Furthermore, this protection is considered superior to the various generator overcurrent protection schemes (e.g., overcurrent with undervoltage seal-in) and preferred for large machines. However, it needs to be carefully guarded against spurious generator trips during stable swings, initiated by the faults within the power system [
7]. This is where the proposed classifier comes into play, by strengthening reliability (and credibility) of the decisions made by the traditional underimpedance protection swing detection logic.
Generator out-of-step protection (also known as the pole-slip protection, ANSI 78) is intended not only for the protection of the generator from damage emanating from the pole slipping events, where it needs to timely disconnect the generator from the rest of the power system, but also for preventing instability from spreading to other portions of the system. Namely, the sudden and unexpected loss of generating capacity during a disturbance can precipitate major power system outage. In its standard two-zone configuration, out-of-step protection fully covers the generator, its step-up transformer and extends into the power system [
5,
6]. The extent of the out-of-step protection reach into the power system, and subsequent coverage of the incident transmission lines, depends primarily on the zone settings. This protection, first and foremost, needs to make consistently reliable and trustworthy decisions regarding unrecoverable (i.e., unstable) generator swings [
5]. As such, this function can benefit from the support of the proposed classifier as well.
Machine learning (ML) has been applied for supporting and extending (and, in some cases, even completely replacing) various traditional protection functions [
8,
9]. A general review of the power system protection with the aid of ML techniques has been presented in [
10]. More specifically, multidimensional relay protection, based on support vector machine, was proposed in [
11]. Overcurrent relays were replaced by XGBoost classifiers in [
12]. Li et al. in [
13] recommended an ML based identification of the impedance trajectory for the generator out-of-step protection. Another approach to the adaptive generator out-of-step protection, this incorporating the phasor measurement units (PMU), was suggested in [
14]. Furthermore, detecting the loss of excitation condition of synchronous generators has been tackled by applying ML methods as well, e.g., [
15,
16,
17]. Detection of islanding by means of the ML was proposed by Meera et al. in [
18]. Also, ML has been applied in connection with the distance protection of high-voltage transmission lines (TL), e.g., [
19,
20], including the use of artificial neural networks [
21]. However, very few papers deal with the ML support of the generator underimpedance relay protection.
This paper will introduce a support vector machine (SVM) classifier for detecting generator swings (from the generator transient stability assessment). It will be trained and tested on the IEEE New England 10-generator benchmark power system. It will use PMU-type signals taken from the generator terminals and will be interfaced with the generator relay protection logic. It will reinforce relay decisions regarding blocking of the underimpedance protection during stable generator swings. Likewise, it will also enhance the out-of-step relay protection, by allowing faster generator tripping for unrecoverable swings. Numerical relays that do not possess power swing detection logic (such as the REG 670 from ABB) can be retrofitted with this classifier. We believe that the proposed approach presents a novel contribution to the state-of-art of generator relay protection.
The paper is organized in the following manner.
Section 2, first briefly introduces these two impedance-based generator protection functions, then describes the classifier building process and its interfacing with the generator protection. Application of the classifier in the IEEE New England 10-generator power system is provided in
Section 3, which is followed by conclusion in
Section 4.
3. IEEE New England 10-Generator Power System Example
A well-known IEEE New England 10-generator power system is taken as a basis for classifier training and testing. This power system features ten synchronous machines, in addition to transmission lines, three-phase transformers and loads [
26]. One of the generators serves as a surrogate of the external power system. Each machine includes an excitation system control, automatic voltage regulator, power system stabilizer and turbine governor control. Loads are represented as simple R-L-C branches. Transmission lines are modeled as three-phase
–section blocks. A complete electro–mechanical transient simulation of the power system is carried out for different load levels and three different short-circuit types scattered throughout the network; see [
27] for additional information. A total of 9360 time-domain simulations were performed, which created the dataset.
As an example of the simulation output, two time-domain signals of the generator voltage and rotor angle deviations are presented in
Figure 6, which demonstrate strong influence of the incident TL distance relay trip time on the transient stability of the generator. Distinction between stable and unstable generator swing is clearly visible from the rotor angle deviation. In addition,
Figure 7 graphically presents a trajectory of the unstable generator swing, following a single-phase short circuit on the incident TL. A trajectory in the P–Q plane can be readily transformed into the R–X plane while retaining its circular shape.
Features, extracted from the dataset of time-domain systematic simulations (see
Section 2.3), can be visually depicted using the process of low-dimensional embedding. We show the results of this process here by using the kernel principal component analysis (kPCA) while projecting the original features space into the three-dimensional embedding [
25]. Consequently,
Figure 8 graphically depicts different short-circuit events (from all parts of the network) in 3D (left side) and 2D (right side) coordinate systems of principal components. Furthermore, SVM classifier predictions (i.e., stable and unstable generator swings), arising from these events, are also graphically depicted using the same kPCA embedding and displayed in
Figure 9. By comparing
Figure 8 and
Figure 9, it can be clearly seen that the main source of unstable generator swings are the three-phase (SC 3) short circuit events. At the same time, single-phase (SC 1) short circuit events are far less prone to cause generator’s loss of stability. This is completely expected.
The classifier’s performance is gauged by means of the scores obtained from the test set. For example, the Matthews correlation coefficient of the classifier, using the 3–fold cross validation on the test set, yields:
. In addition, classifier’s performance can be further examined by contrasting its precision and recall metrics [
25]. Precision is defined as a ratio between a number of true positives and a number of predicted positive results. Recall is defined as a ratio between a number of true positives and an actual number of positive cases. Classifier will balance these two opposing metrics.
Table 1 presents classifier’s individual precision and recall measures obtained from the test set. Although these values can vary between different runs, due to randomness involved in data shuffling and model training (i.e., random search for optimal hyperparameters), it can be seen from the presented results that the proposed SVM classifier obtained high scores across several important metrics. We were also able to consistently reproduce this level of performance between runs.
Additionally, in order to examine the influence of the dataset size on the classifier’s performance, we used only 1000 (stratified) random samples from the original dataset (again 80% for training and 20% for testing), which constitutes only cca. 10% of the original data. Training the classifier with only 800 samples yields a Matthews correlation coefficient of
(from the 3–fold cross validation on the test set of 200 samples). It also yields an area under the receiver operating characteristic (ROC) curve of
.
Table 2 presents more complete results obtained by using this small dataset. It can be seen that, even with only cca. 10% of the original dataset, the classifier was still able to achieve very good performance with relatively high scores. This is a reassuring finding, which means that a dataset can be purposefully built each time, as part of the relay settings calculations, using simulation results from the (extended) machine stability studies.
Fine-tuning of the classifier’s decision probability threshold, as a final step in the training process, can be carried-out using the precision–recall curves [
25]. For that purpose,
Figure 10 presents precision and recall curves, obtained from predictions on the test set of 200 samples, as a function of the decision probability thresholds. It can be seen that as the precision is increased, recall will inevitably decrease, and vice versa. It is desirable for the SVM threshold supporting underimpedance protection to have higher recall (lower type II errors), while that of the out-of-step protection to have higher precision (lower type I errors). These different thresholds—for the same SVM classifier—can be independently determined from the precision–recall curves (where threshold is related to the selectable parameters “SVM Thr. Z<” and “SVM Thr. O/S” from
Figure 4 and
Figure 5, respectively). Selected probability threshold level, at the same time, defines the classifier’s confidence score regarding the class predictions. For example, setting the probability threshold level at
, in relation to the out-of-step protection, means that the classifier will be reporting unstable power swing cases with an 80% confidence. Unstable power swings that turn out to be associated with a confidence that is lower than the 80% would not be influencing out-of-step relay decisions. This ensures that only very confident predictions can interact with the traditional relay protection logic.
Influence of the decision threshold value on the type I and II errors is further presented graphically, by means of the confusion matrices in
Figure 11, again for this test set of 200 samples. If the threshold is selected at the intersection of precision and recall curves, the number of type I and type II errors would be exactly the same. Confusion matrix, at the same time, enables deriving several other useful metrics, such as: Youden’s J–statistic, Jaccard’s index, F–measures (
,
), Fowlkes–Mallows index, and others. These can be employed as a means of further examining the classifier’s performance. For example, the Jaccard’s score on the test set of 200 samples (using 3–fold cross validation) yields:
.
In addition, a confusion matrix forms a basis for the so-called “net benefit” analysis, which weighs the relative harms (i.e., costs) of false-positive and false-negative predictions across different threshold probabilities; see [
28] for more information. The net benefit of the presented SVM model, for any probability threshold level
, can be calculated from the following relation:
with
N being the total number of samples. A net benefit can acquire values from the minus infinity up to the value that is equal to the incidence of the positive class. A model provides utility only when its net benefit value is greater than zero. If the net benefit is calculated for different (ascending) threshold values, the resulting curve (that relates the net benefit values with associated thresholds) is known as a decision curve. It can be used to assess model’s usefulness, to select the appropriate probability threshold level and to compare different models [
28].
Figure 12 presents a decision curve for the SVM classifier at hand, obtained from the small dataset of 1000 samples. It can be seen that the net benefit of the model is positive.
Finally, it should be stated that the SVM classifier was trained on a synthetic data generated by numerical simulations. Considering the importance of the power system, further testing of the classifier is recommended, preferably using the actual (measured) generator data. This, however, could be difficult to achieve, since the participation (and permission) of the machine owner would be required.
4. Conclusions
The massive integration of power electronic converters, along with the continuous displacement of synchronous generators, is fundamentally changing the dynamic characteristics of power systems, which imposes new challenges on the stable and resilient operation of power grids. Since the generator relay protections are directly facing these challenges, strengthening their decision-making logic may be seen as a prudent step in the direction of securing the system stability. Hence, this paper introduced a support vector machine based (binary) classifier for supporting the synchronous generator underimpedance (21G) and out-of-step (78) relay protection functions, which are based on the IEC standards and European practice. Both protection functions are impedance-based and feature polygonal characteristics (i.e., they do not employ MHO-type circles, which is a standard practice in the US). As a side note, it could be mentioned that Siemens’ implementation of the underexcitation protection (i.e., loss-of-field protection, ANSI 40) does not feature MHO-type offset circle either (which is again very different from the US practice); instead, it is based on a special three-lines characteristic, presented in the admittance (G, B) plane and applied directly on top of the generator’s capability curve.
The proposed classifier was trained on the dataset of PMU-type signals obtained from time-domain numerical simulations of the IEEE New England 10-generator test power system, following a standard practice used in many research papers. It is then proposed as a support for the internal relay logic, for blocking of the underimpedance protection during stable power swings. It is also intended as a support for faster generator tripping, by the out-of-step protection, during unstable generator swings. In both cases, it is meant to reinforce second zone (i.e., overreaching) of both underimpedance and out-of-step protection functions, since these are “looking” into the network. In case that the underimpedance protection features a third zone, it could be used with it as well. Furthermore, in case that the underimpedance relay protection function does not provide internal power swing detection logic (as is the case with, e.g., REG 670 from ABB), the classifier could be used as an independent source of external swing detection (binary) signal for blocking the protection during stable swings.
It ought to be mentioned that it is important to train the classifier using transient simulation data obtained directly from the machine stability studies, which are often performed as part of the protection relay settings calculation. This data set does not have to be very large, as shown previously, since the SVM classifier is easy to train (i.e., it has only two hyperparameters). This will also ensure that the classifier is familiar with particular swing trajectories of the machine at hand. As part of this training process, the classifier can also be fine-tuned by selecting decision probability threshold levels that are appropriate for minimizing prediction errors.
Finally, considering the importance of generator protections for the power system stability—particularly in these evolving circumstances emanating from the large-scale integration of renewable energy sources—further testing of the classifier is recommended. Hence, future work envisions, among others, increasing the class imbalance with new network contingencies and introducing different types of (artificial) noise and measurement errors into the dataset for testing the classifier robustness and performance under these adverse conditions. Also, testing the classifier with changing generation mix and reduced system inertia is seen as another important future research direction.