An Improved Power System Transient Stability Prediction Model Based on mRMR Feature Selection and WTA Ensemble Learning

: Fast online transient stability assessment (TSA) is very important to maintain the stable operation of power systems. However, the existing transient stability assessment methods su ﬀ er the drawbacks of unsatisfactory prediction accuracy, di ﬃ cult applicability, or a heavy computational burden. In light of this, an improved high accuracy power system transient stability prediction model is proposed, based on min-redundancy and max-relevance (mRMR) feature selection and winner take all (WTA) ensemble learning. Firstly, the contributions of four di ﬀ erent series of raw sampled data from all of the three-time stages, namely the pre-fault, during-fault and post-fault, to transient stability are compared. The new feature of generator electromagnetic power is introduced and compared with three conventional types of input features, through a support vector machine (SVM) classiﬁer. Furthermore, the two types of most contributive input features are obtained by the mRMR feature selection method. Finally, the prediction results of the electromagnetic power of generators and the voltage amplitude of buses are combined using the WTA ensemble learning method, and an improved transient stability prediction model with higher accuracy for unstable samples is obtained, whose overall prediction accuracy would not decrease either. The real-time data collected by wide area monitoring systems (WAMS) can be fed into this model for fast online transient stability prediction; the results can also provide a basis for the future emergency control decision-making of power systems.


Introduction
Social developments and economic growths have been calling for higher requirements for secure and reliable power supplies. With the continuous and rapid growth of electricity demand, the power systems are developing towards large-scale [1,2], high voltage, regional grid interconnection, hybrid AC/DC, long-distance large-capacity transmission and high renewable energy penetration [3][4][5], etc., The power system topologies and operation characteristics are becoming increasingly complex and changeable, which has brought severe challenges to the secure and stable operation of power systems. The challenges of non-linearity and the rapid development of the electromechanical transient process make it difficult to predict the system transient stability quickly and accurately after the fault has been cleared [6]. In recent years, many large-scale power system blackouts have occurred worldwide, which makes fast-online transient stability assessments and emergency control more urgent [7][8][9][10].

Materials and Methods
Power system stability can be classified into rotor angle stability, voltage stability, and frequency stability. Rotor angle stability can be further divided into small-disturbance stability and transient stability. The definitions and classifications can also be found in Reference [30]. Since transient stability assessment is one of the most important issues to guarantee the secure and stable operation of power systems, it belongs to short-term rotor angle stability ramification [34]. Therefore, only the rotor angle stability in the power system electromechanical transient process is studied, and the voltage stability, frequency stability and medium-and-long term stability are not considered. The improved power system transient stability prediction model can be realized through the following four steps: data preparation, multi-input feature analysis, mRMR feature selection and WTA ensemble learning modeling.

Three-time Stages Related to Transient Stability
The electromechanical transient stability of the power system is related to three-time stages, pre-fault (steady state), during-fault and post-fault, as shown in table Figure 1. The pre-fault stage reflects the initial operation state of the system, the during-fault stage reflects the severity of the fault disturbance, and the post-fault stage reflects the dynamic performance of the system after the fault is cleared.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 17 classifiers with different feature sets are combined by the winner take all (WTA) method, to establish a high-precision and conservative transient stability prediction model.

Materials and Methods
Power system stability can be classified into rotor angle stability, voltage stability, and frequency stability. Rotor angle stability can be further divided into small-disturbance stability and transient stability. The definitions and classifications can also be found in Reference [30]. Since transient stability assessment is one of the most important issues to guarantee the secure and stable operation of power systems, it belongs to short-term rotor angle stability ramification [34]. Therefore, only the rotor angle stability in the power system electromechanical transient process is studied, and the voltage stability, frequency stability and medium-and-long term stability are not considered. The improved power system transient stability prediction model can be realized through the following four steps: data preparation, multi-input feature analysis, mRMR feature selection and WTA ensemble learning modeling.

Three-time Stages Related to Transient Stability
The electromechanical transient stability of the power system is related to three-time stages, prefault (steady state), during-fault and post-fault, as shown in table Figure 1. The pre-fault stage reflects the initial operation state of the system, the during-fault stage reflects the severity of the fault disturbance, and the post-fault stage reflects the dynamic performance of the system after the fault is cleared. Since the power system is a complex non-linear system, its transient stability is not only related to the post-fault information, but also influenced by the steady-state operation point of the power system and the severity of disturbance during the fault. In order to analyze the different contributions of the variables during these three time stages to the system transient stability, the three time stages are grouped into 4 series of sampling data, as shown in Figure 2. Comparisons on these 4 series of sampled data are tested on the IEEE 39-bus system, which is shown in Figure 3, using the dual-axis generator model, IEEE DC Exciter Type 1 exciter model, and constant-impedance load model. The simulation software is MATLAB toolbox PST3.0 [35]. Since the power system is a complex non-linear system, its transient stability is not only related to the post-fault information, but also influenced by the steady-state operation point of the power system and the severity of disturbance during the fault. In order to analyze the different contributions of the variables during these three time stages to the system transient stability, the three time stages are grouped into 4 series of sampling data, as shown in Figure 2.
Appl. Sci. 2020, 10, x FOR PEER REVIEW  3 of 17 classifiers with different feature sets are combined by the winner take all (WTA) method, to establish a high-precision and conservative transient stability prediction model.

Materials and Methods
Power system stability can be classified into rotor angle stability, voltage stability, and frequency stability. Rotor angle stability can be further divided into small-disturbance stability and transient stability. The definitions and classifications can also be found in Reference [30]. Since transient stability assessment is one of the most important issues to guarantee the secure and stable operation of power systems, it belongs to short-term rotor angle stability ramification [34]. Therefore, only the rotor angle stability in the power system electromechanical transient process is studied, and the voltage stability, frequency stability and medium-and-long term stability are not considered. The improved power system transient stability prediction model can be realized through the following four steps: data preparation, multi-input feature analysis, mRMR feature selection and WTA ensemble learning modeling.

Three-time Stages Related to Transient Stability
The electromechanical transient stability of the power system is related to three-time stages, prefault (steady state), during-fault and post-fault, as shown in table Figure 1. The pre-fault stage reflects the initial operation state of the system, the during-fault stage reflects the severity of the fault disturbance, and the post-fault stage reflects the dynamic performance of the system after the fault is cleared. Since the power system is a complex non-linear system, its transient stability is not only related to the post-fault information, but also influenced by the steady-state operation point of the power system and the severity of disturbance during the fault. In order to analyze the different contributions of the variables during these three time stages to the system transient stability, the three time stages are grouped into 4 series of sampling data, as shown in Figure 2. Comparisons on these 4 series of sampled data are tested on the IEEE 39-bus system, which is shown in Figure 3, using the dual-axis generator model, IEEE DC Exciter Type 1 exciter model, and constant-impedance load model. The simulation software is MATLAB toolbox PST3.0 [35]. Comparisons on these 4 series of sampled data are tested on the IEEE 39-bus system, which is shown in Figure 3, using the dual-axis generator model, IEEE DC Exciter Type 1 exciter model, and constant-impedance load model. The simulation software is MATLAB toolbox PST3.0 [35]. Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 17 The transient instability criterion is set as, if the maximum rotor angle difference between any pair of generators exceeds 180º at the end of the transient simulation (such as 5s), the case will be recognized as unstable [36]. Figure 4 shows the bus voltage amplitude curves of a typical transiently stable sample (in blue) and an unstable sample (in red), with the four series of sampled data. Here, the fault occurred at 0.10 s and was cleared at 0.20 s in all cases. For the stable sample, the load is 0.9 times the base load level, the faulted line is 1-39, and the fault location is 10% from bus-39 side. For the unstable sample, the load is 1.1 times the base load level, the faulted line is 21-22, and fault location is 40% from the bus-21 side. It can be seen from Figure 4 that the first three series cannot fully reflect the exact dynamic behavior of the power system, due to the loss of information. In order to better characterize power system transient performance, it is necessary to utilize the sampling data covering all three time stages, that is, the stages of pre-fault, during-fault and post-fault, as shown in Series 4 of Figure 2. It should also be noted that in actual power systems, the typical sampling frequency for the fundamental phasor of the trajectory variables is 10 ms for the PMUs in the wide area measurement system (WAMS). The transient instability criterion is set as, if the maximum rotor angle difference between any pair of generators exceeds 180º at the end of the transient simulation (such as 5s), the case will be recognized as unstable [36]. Figure 4 shows the bus voltage amplitude curves of a typical transiently stable sample (in blue) and an unstable sample (in red), with the four series of sampled data. Here, the fault occurred at 0.10 s and was cleared at 0.20 s in all cases. For the stable sample, the load is 0.9 times the base load level, the faulted line is 1-39, and the fault location is 10% from bus-39 side. For the unstable sample, the load is 1.1 times the base load level, the faulted line is 21-22, and fault location is 40% from the bus-21 side.  The transient instability criterion is set as, if the maximum rotor angle difference between any pair of generators exceeds 180º at the end of the transient simulation (such as 5s), the case will be recognized as unstable [36]. Figure 4 shows the bus voltage amplitude curves of a typical transiently stable sample (in blue) and an unstable sample (in red), with the four series of sampled data. Here, the fault occurred at 0.10 s and was cleared at 0.20 s in all cases. For the stable sample, the load is 0.9 times the base load level, the faulted line is 1-39, and the fault location is 10% from bus-39 side. For the unstable sample, the load is 1.1 times the base load level, the faulted line is 21-22, and fault location is 40% from the bus-21 side. It can be seen from Figure 4 that the first three series cannot fully reflect the exact dynamic behavior of the power system, due to the loss of information. In order to better characterize power system transient performance, it is necessary to utilize the sampling data covering all three time stages, that is, the stages of pre-fault, during-fault and post-fault, as shown in Series 4 of Figure 2. It should also be noted that in actual power systems, the typical sampling frequency for the fundamental phasor of the trajectory variables is 10 ms for the PMUs in the wide area measurement system (WAMS). It can be seen from Figure 4 that the first three series cannot fully reflect the exact dynamic behavior of the power system, due to the loss of information. In order to better characterize power system transient performance, it is necessary to utilize the sampling data covering all three time stages, that is, the stages of pre-fault, during-fault and post-fault, as shown in Series 4 of Figure 2. It should also be noted that in actual power systems, the typical sampling frequency for the fundamental phasor of the trajectory variables is 10 ms for the PMUs in the wide area measurement system (WAMS).

SVM Prediction Model
It can be seen that the input features of Figure 4 are not linearly separable, because some data points of the stable and unstable samples of the voltage curves intersect with each other. SVM is able to map a linearly inseparable data in low-dimensional space to a linearly-separable high-dimensional space through kernel functions. Figure 5 shows the mapping process visually.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 17 It can be seen that the input features of Figure 4 are not linearly separable, because some data points of the stable and unstable samples of the voltage curves intersect with each other. SVM is able to map a linearly inseparable data in low-dimensional space to a linearly-separable high-dimensional space through kernel functions. Figure 5 shows the mapping process visually.
where ω is the weight vector of the hyperplane; b is the threshold value; ζ is the relaxation variable; C is the penalty factor for the relaxation variable; N is the number of training samples; (.) ϕ is the mapping function from low-dimensional space to high-dimensional space, the kernel function is chosen as radial basis function (RBF);

Analysis of Multiple Input Features
In the electromechanical transient stability analysis, the transient stability criterion is determined by the maximum rotor angle difference between each pair of generators. The rotor angle dynamic behavior can be influenced by many factors, and the detailed feature extraction for transient stability assessment is analyzed below.  The basic principle of SVM is shown in Equation (1).
where ω is the weight vector of the hyperplane; b is the threshold value; ζ is the relaxation variable; C is the penalty factor for the relaxation variable; N is the number of training samples; ϕ(.) is the mapping function from low-dimensional space to high-dimensional space, the kernel function is chosen as radial basis function (RBF); X i (i = 1, . . . , N) are the support vectors; y i is the output of i th sample; f (.) is the fitted model of SVM, where α i is the coefficient of i th sample.

Analysis of Multiple Input Features
In the electromechanical transient stability analysis, the transient stability criterion is determined by the maximum rotor angle difference between each pair of generators. The rotor angle dynamic behavior can be influenced by many factors, and the detailed feature extraction for transient stability assessment is analyzed below.

Rotor Motion Equation
According to the rotor motion Equation (2), the rotor angle of generators is directly affected by the rotor speed, electromagnetic power and mechanical power of generators.
where δ is the rotor angle; ω is the rotor speed; P m is the mechanical power; P e is the electromagnetic power; T j is the inertia time constant; and D is the damping coefficient of the generator. The electromagnetic power can be calculated by Equation (3), during the time domain simulation.
Appl. Sci. 2020, 10, 2255 6 of 17 where V d and V q are the d-axis voltage and q-axis voltage, respectively; I d and I q are the d-axis current and q-axis current, respectively. Due to the rapid development of the transient process, it is assumed that the mechanical power P m does not change during the short process. The damping coefficient D is also neglected in this study. When a certain fault occurs, the electromagnetic power P e of generators will change rapidly with the changes of voltage and current variables; the imbalance power between P m and P e will then cause the rotor speed ω to change. Rotor speed ω changes will further affect the rotor angle δ, that is, the transient stability of power system can be partially traced back to P e . Therefore, the electromagnetic power can also be considered as an important factor affecting the transient stability besides the existing input features [27], namely, the rotor angle δ of generators, rotor speed ω of generators and voltage amplitude V of buses.

Separability of Electromagnetic Power and 3 Traditional Features
Based on the analysis above, four types of input features [δ,ω,V,P e ] can be used to train the SVM transient stability prediction model. Figure 6 shows the simulation curves of the 4 types of features, extracted from typical stable and unstable samples of the IEEE 39-bus system.
where d V and q V are the d-axis voltage and q-axis voltage, respectively; d I and q I are the daxis current and q -axis current, respectively.
Due to the rapid development of the transient process, it is assumed that the mechanical power m P does not change during the short process. The damping coefficient D is also neglected in this study. When a certain fault occurs, the electromagnetic power e P of generators will change rapidly with the changes of voltage and current variables; the imbalance power between m P and e P will then cause the rotor speed ω to change. Rotor speed ω changes will further affect the rotor angle δ , that is, the transient stability of power system can be partially traced back to e P . Therefore, the electromagnetic power can also be considered as an important factor affecting the transient stability besides the existing input features [27], namely, the rotor angle δ of generators, rotor speed ω of generators and voltage amplitude V of buses.

Separability of Electromagnetic Power and 3 Traditional Features
Based on the analysis above, four types of input features [ δ , ω ,V , e P ] can be used to train the SVM transient stability prediction model. Figure 6 shows the simulation curves of the 4 types of features, extracted from typical stable and unstable samples of the IEEE 39-bus system.
It can be seen from Figure 6 (d) that the transiently stable and unstable samples of the e P curves are significantly different, similar to the δ , ω and V features in Figure 6 (a) to (c). Therefore, it is feasible to use e P as a novel input feature.   It can be seen from Figure 6d that the transiently stable and unstable samples of the P e curves are significantly different, similar to the δ, ω and V features in Figure 6a-c. Therefore, it is feasible to use P e as a novel input feature.

Feature Selection Based on mRMR Technique
In large-scale power systems, too many input features of the SVM model will cause a heavy computational burden, which limits the online applicability of the machine learning model. After briefly comparing the 4 types of input features, further feature selection based on mRMR will be conducted in this section.
It is obvious that the transient stability prediction time increases with the increase of the feature size, as shown in Figure 7 (using the IEEE 39-bus system). Therefore, when combining multiple features into the SVM classifier, it is necessary to reduce the number of input features.

Feature Selection Based on mRMR Technique
In large-scale power systems, too many input features of the SVM model will cause a heavy computational burden, which limits the online applicability of the machine learning model. After briefly comparing the 4 types of input features, further feature selection based on mRMR will be conducted in this section.
It is obvious that the transient stability prediction time increases with the increase of the feature size, as shown in Figure 7 (using the IEEE 39-bus system). Therefore, when combining multiple features into the SVM classifier, it is necessary to reduce the number of input features. In the field of information theory, mutual information is always used to measure the degree of correlation between discrete random variables, as shown in Equation (4).
where ( ) , p x y is the joint probability density function of X and Y ; 1 x , 2 x , ( ) p x and ( ) p y are the value spaces and marginal probability density functions of X and Y , respectively.
Unless otherwise specified, the subscript of the logarithm is 2.
Mutual information calculation requires the variables are discrete, so it is necessary to discretize the trajectory variables with continuous values first. In order to make the SVM learning model have better fitting performance, the data are normalized to the range of [0,1] in the meantime.
As mentioned in Section 2.1.1, trajectory variables are used as input features; each group of input features contains 10-dimensional discrete data. Although each dimension of the data contains certain information, if the mutual information analysis is performed directly on each dimension of the data using Equation (4), the original physical meaning of trajectory variables will be destroyed. Therefore, each series of trajectory variables, namely, the 10-dimensional trajectory data, are regarded as a single group of input features for mutual information calculation, due to the temporal correlation among them. Taking the voltage amplitude of buses as an example, for the bus i and bus j , the sampled 10-dimensional voltage amplitude data are regarded as two vectors, as shown in Equation (5).
Thus, a joint probability distribution of the two groups of input features is proposed for mutual information calculation of the time-correlated trajectory data, as shown in Equation (6).  In the field of information theory, mutual information is always used to measure the degree of correlation between discrete random variables, as shown in Equation (4).
where p(x, y) is the joint probability density function of X and Y; x 1 , x 2 , p(x) and p(y) are the value spaces and marginal probability density functions of X and Y, respectively. Unless otherwise specified, the subscript of the logarithm is 2.
Mutual information calculation requires the variables are discrete, so it is necessary to discretize the trajectory variables with continuous values first. In order to make the SVM learning model have better fitting performance, the data are normalized to the range of [0,1] in the meantime.
As mentioned in Section 2.1.1, trajectory variables are used as input features; each group of input features contains 10-dimensional discrete data. Although each dimension of the data contains certain information, if the mutual information analysis is performed directly on each dimension of the data using Equation (4), the original physical meaning of trajectory variables will be destroyed. Therefore, each series of trajectory variables, namely, the 10-dimensional trajectory data, are regarded as a single group of input features for mutual information calculation, due to the temporal correlation among them. Taking the voltage amplitude of buses as an example, for the bus i and bus j, the sampled 10-dimensional voltage amplitude data are regarded as two vectors, as shown in Equation (5).
Thus, a joint probability distribution of the two groups of input features is proposed for mutual information calculation of the time-correlated trajectory data, as shown in Equation (6).
Based on the definition of mutual information, the feature selection method mRMR [37] is utilized to obtain an optimal feature subset, which has minimum redundancy among the interior features and maximum relevance with the stability result. For a dataset D = x 1 , . . . , x N y with N groups of input features and the stability label vector y. Assuming that S is a subset of D, the redundancy of the subset can be calculated as Equation (7).
where | S| is the number of feature groups contained in the subset S. The correlation between subset S and target vector y is calculated as Equation (8).
Then, taking Equation (9) as the optimization objective to find the optimal subset of input features, with less redundancy V S and stronger relevance W S .
The computational burden of obtaining the optimal feature subset is huge. Thus, mRMR technology uses the incremental search algorithm to sort all feature groups and then select the optimal feature subset. The detailed process is as follows.

1.
Define the set of selected feature groups as S.

2.
Calculate the correlation between each group of input features x 1 and the target y, and then select the group of input features x (1) that is most relevant to the target according to Equation (10). The selected group of input features x (1) is added to the set S as the first input feature group. 3.
Select the next group of input features x ( j) according to Equation (11), using the previously recorded features 4. Add the selected feature group x ( j) in Step 3 to the set S, and then repeat Step 3 until all input features are sorted.
The final sorting result indicates that if a subset of N 1 (N 1 ≤ N) feature groups are selected as the input of the learning machine, the first N 1 feature groups in the set S will be the optimal subset, which shows a stronger correlation with the target and less interior redundancy.
Since the bus V has a total of 39 groups of features, and the generator P e had only 10 groups of features, the bus V seems to make a greater contribution to the transient stability prediction in mRMR feature selection. On the other hand, the transient stability prediction results in Section 3.2 show that the classification accuracy of the generator P e is higher than that of the bus V. However, both types of features have better prediction results than the electromechanical variables of δ and ω. Section 2.4 will aim to combine these two types of superior features V and P e , and jointly form an improved transient stability prediction model.

High Accuracy Prediction Model Based on WTA Ensemble Learning
As shown in Section 3.2, although the prediction model of the new feature of generator P e reaches an accuracy of about 98.77%, it is necessary to establish a more accurate transient stability prediction model, especially to accurately identify the unstable situations, in order to avoid losing synchronization, cascading failures, or even large-scale blackout.

Combined Features of Voltage Amplitude and Electromagnetic Power
In order to meet the fast and accurate prediction requirement online, the two types of superior features with higher accuracy are selected according to the mRMR ranking result in Section 2.3, namely the generator electromagnetic power P e and bus voltage magnitude V.

WTA Ensemble Learning Model
As analyzed above, the overall prediction accuracies of P e and V are relatively high, but the prediction accuracies for unstable samples are still very low. In order to improve the conservativeness of the prediction model, the SVM learning machines based on P e and V are taken as two sub-classifiers. Then the outputs of the two classifiers are combined by the winner take all (WTA) ensemble learning method. When the prediction result of any sub-learning machine is unstable, the WTA module will determine that the transient process is unstable; otherwise it will be accepted as stable. The input features of sub-learning machine 1 (M1) is V, and the input features of sub-learning machine 2 (M2) is P e . Stable samples have a label of 1 and unstable samples have a label of 0. The principle of WTA ensemble learning model is expressed as Equation (12).
The WTA ensemble learning process can also be shown in Figure 8.

High Accuracy Prediction Model Based on WTA Ensemble Learning
As shown in Section 3.2, although the prediction model of the new feature of generator e P reaches an accuracy of about 98.77%, it is necessary to establish a more accurate transient stability prediction model, especially to accurately identify the unstable situations, in order to avoid losing synchronization, cascading failures, or even large-scale blackout.

Combined features of voltage amplitude and electromagnetic power
In order to meet the fast and accurate prediction requirement online, the two types of superior features with higher accuracy are selected according to the mRMR ranking result in Section 2.3, namely the generator electromagnetic power e P and bus voltage magnitude V .

WTA Ensemble Learning Model
As analyzed above, the overall prediction accuracies of e P and V are relatively high, but the prediction accuracies for unstable samples are still very low. In order to improve the conservativeness of the prediction model, the SVM learning machines based on e P and V are taken as two subclassifiers. Then the outputs of the two classifiers are combined by the winner take all (WTA) ensemble learning method. When the prediction result of any sub-learning machine is unstable, the WTA module will determine that the transient process is unstable; otherwise it will be accepted as The WTA ensemble learning process can also be shown in Figure 8. As previously mentioned, the V and P e curves reflect the different dynamic characteristics of power systems. In other words, the transient stability prediction results of M1 and M2 can be relatively independent. Assuming that the error rates of M1 and M2 are ε 1 and ε 2 respectively, then the error rate of the WTA model for unstable samples is shown in Equation (13).
For stable samples, the prediction accuracy P st of the WTA model is the product of the accuracies P st M1 and P st M2 . For unstable samples, the prediction accuracy P um of the WTA model is 1 minus the error rate product of the unstable predictions 1 − P um M1 and 1 − P um M2 . Therefore, the overall prediction accuracy P total of the WTA model can be calculated by 1, minus the proportion of the number of samples with incorrect prediction results to the total number of samples (N st + N um ), as shown in Equation (14).

Results
In order to verify the methods in Section 2, the sample series analysis, input feature extraction, mRMR feature selection, and transient stability prediction results based on WTA ensemble learning are described respectively as follows.

Sample Generation and Data Series Analysis with Traditional Three Features
The process of generating the simulated data samples is as follows. The load level is randomly set to 0.9, 1.0 or 1.1 times of the base load (the generator outputs are adjusted proportionally). Three-phase short-circuit faults are applied on the selected 33 transmission lines (excluding transformers and islands), at 10% to 80% positions, with an interval of 10%. The circuit breakers might trip the faulted line at 0.05 s, 0.1 s, 0.15 s or 0.20 s after the fault occurrence. A total of 3168 samples are obtained, of which two-thirds are randomly selected as training samples, and the rest are test samples.
The sampling time step is 0.01 s. The sampled data of all three-time stages include two consecutive points before the fault, two consecutive points immediately after the fault occurrence, and 6 consecutive points after the fault is cleared. The system variables original selected as input features are rotor angles of 10 generators, rotor speeds of 10 generators and voltage amplitudes of 39 buses, all in units of per unit (p.u.). To verify the effectiveness of the corresponding four sampling series in Section 2.1.1, the basic prediction results of these three types of input features are compared in Figure 2.
In this paper, the SVM model used is libsvm2.0 [38], and the optimal penalty coefficients C and RBF kernel parameters of the SVM classifiers are obtained by grid search [36] and 5-fold cross-validation using the training samples. Then, the entire training samples and the optimal SVM parameters are used to retrain the SVM classifiers. Finally, the prediction accuracy of SVM classifiers is obtained by testing the test samples. The results for these three types of input features from all four sampling series are shown in Figure 9. It can be seen from Figure 9 that Series 4 has the highest prediction accuracy among all 4 series, which is greater than 98.00% for all three types of input features; while the accuracies are lower than 97.35%, 97.35% and 97.60% for Series 1, 2 and 3, respectively. This means that the feature set that contains all of the pre-fault, during-fault and post-fault stages can better characterize the transient behavior of the system. Therefore, in the following subsections, the input features will include the 10 data points (sampled like Series 4) of each trajectory variable from all three-time stages.

Prediction Results of Four Input Features, Including the Proposed Electromagnetic Power Feature
As described in Section 3.1, the optimal penalty coefficients and RBF kernel parameters of the SVM classifiers can be acquired through the training samples with features δ ,ω ,V and e P , respectively. Theses 4 independent SVM classifiers can then be established to estimate the prediction accuracy of different types of input features on the test samples. The results are shown in Table 1.
It can be seen in Table 1 that, among the 4 types of input features, the transient stability prediction accuracy of the feature e P is the highest, reaching 98.77%, followed by the feature V , etc., The features of δ and ω have relatively lower accuracies because they may not change much within the 0.06 s-period after the fault clears, namely only 6 post-fault sampling points.

Rotor angle
Rotor Speed Bus voltage amplitude It can be seen from Figure 9 that Series 4 has the highest prediction accuracy among all 4 series, which is greater than 98.00% for all three types of input features; while the accuracies are lower than 97.35%, 97.35% and 97.60% for Series 1, 2 and 3, respectively. This means that the feature set that contains all of the pre-fault, during-fault and post-fault stages can better characterize the transient behavior of the system. Therefore, in the following subsections, the input features will include the 10 data points (sampled like Series 4) of each trajectory variable from all three-time stages.

Prediction Results of Four Input Features, Including the Proposed Electromagnetic Power Feature
As described in Section 3.1, the optimal penalty coefficients and RBF kernel parameters of the SVM classifiers can be acquired through the training samples with features δ, ω, V and P e , respectively. Theses 4 independent SVM classifiers can then be established to estimate the prediction accuracy of different types of input features on the test samples. The results are shown in Table 1. It can be seen in Table 1 that, among the 4 types of input features, the transient stability prediction accuracy of the feature P e is the highest, reaching 98.77%, followed by the feature V, etc., The features of δ and ω have relatively lower accuracies because they may not change much within the 0.06 s-period after the fault clears, namely only 6 post-fault sampling points.
The prediction results of SVM are further compared with Back Propagation Neural Networks (BP-NN) and Random Forest (RF) in Table 2; the BP-NN algorithm is from [39], and RF uses the built-in function of MATLAB. It can be seen in Table 2 that the accuracy of SVM is higher than that of BP-NN and RF. Therefore, the SVM sub-classifiers are mainly used in this study.  Tables 3 and 4 further give the confusion matrix of the bus voltage amplitude and the electromagnetic power of the generators, which shows the prediction results of the SVM classifier in more detail. In the tables, the recall rates denote the proportions of transiently stable samples and unstable samples that were accurately predicted, separately. It can be seen in Tables 3 and 4 that the recall rates of stable samples are relatively high, always above 99%, but the recall rates of the unstable samples are still not high enough. Therefore, the rest of this paper will focus on improving the prediction performance of unstable samples, that is, to improve the conservativeness of the model.

Optimal Input Features Selection by mRMR
The variables δ, ω and P e of 10 generators and the V of 39 buses constitute of a 690-dimensional feature set. The 69 groups of trajectory variables are arranged in the following order: δ numbered 1-10, ω numbered 11-20, V numbered 21-59, and P e numbered 60-69. The basic result sorted by mRMR is shown in Figure 10. However, bus V has 39 groups of input features, which is far more than the number of generator variables δ , ω and e P . Therefore, in order to analyze the priorities of the other three types of generator-related features to transient stability, it is necessary to temporarily remove bus V data from the ranking results. Then the numerical order of the generator-related features δ , ω and e P are respectively added, and the sums are shown in Figure 11.
As can be seen from Figure 11, the order sum of the e P (in yellow) group is far less than those of the variables δ and ω . In other words, the new feature of e P can be acknowledged as more relevant to the system transient stability than δ and ω , which is also consistent with the accuracy comparisons of transient stability prediction in the previous Section 3.2. Figure 11. Sum of the order in three types of features.

Simple Combined Features of Voltage Amplitude and Electromagnetic Power
The changes in electromagnetic power e P will cause changes in the mechanical performance of generators, while the trajectories of bus voltage V can reflect the dynamic voltage recovery of the power system. An intuitive combination method is to integrate these two types of features into a single SVM learning machine, using a total of 10 + 39 = 49 trajectory variables, each variable containing It can be seen from Figure 10 that 7 out of the top 10 groups of features most relevant to transient stability are bus voltage variables V (in blue), which can be seen as a good feature type to assess the transient stability of power systems.
However, bus V has 39 groups of input features, which is far more than the number of generator variables δ, ω and P e . Therefore, in order to analyze the priorities of the other three types of generator-related features to transient stability, it is necessary to temporarily remove bus V data from the ranking results. Then the numerical order of the generator-related features δ, ω and P e are respectively added, and the sums are shown in Figure 11. However, bus V has 39 groups of input features, which is far more than the number of generator variables δ , ω and e P . Therefore, in order to analyze the priorities of the other three types of generator-related features to transient stability, it is necessary to temporarily remove bus V data from the ranking results. Then the numerical order of the generator-related features δ , ω and e P are respectively added, and the sums are shown in Figure 11.
As can be seen from Figure 11, the order sum of the e P (in yellow) group is far less than those of the variables δ and ω . In other words, the new feature of e P can be acknowledged as more relevant to the system transient stability than δ and ω , which is also consistent with the accuracy comparisons of transient stability prediction in the previous Section 3.2. Figure 11. Sum of the order in three types of features.

Simple Combined Features of Voltage Amplitude and Electromagnetic Power
The changes in electromagnetic power e P will cause changes in the mechanical performance of generators, while the trajectories of bus voltage V can reflect the dynamic voltage recovery of the power system. An intuitive combination method is to integrate these two types of features into a single SVM learning machine, using a total of 10 + 39 = 49 trajectory variables, each variable containing 10-dimensional data from the pre-fault to post-fault stages. The prediction results are shown in Table  5 and Figure 12. From Table 5, the prediction accuracy is only improved slightly from 98.77% to 98.98% by As can be seen from Figure 11, the order sum of the P e (in yellow) group is far less than those of the variables δ and ω. In other words, the new feature of P e can be acknowledged as more relevant to the system transient stability than δ and ω, which is also consistent with the accuracy comparisons of transient stability prediction in the previous Section 3.2.

Simple Combined Features of Voltage Amplitude and Electromagnetic Power
The changes in electromagnetic power P e will cause changes in the mechanical performance of generators, while the trajectories of bus voltage V can reflect the dynamic voltage recovery of the power system. An intuitive combination method is to integrate these two types of features into a single SVM learning machine, using a total of 10 + 39 = 49 trajectory variables, each variable containing 10-dimensional data from the pre-fault to post-fault stages. The prediction results are shown in Table 5 and Figure 12. better solutions to reduce the error rate of unstable samples, which are more harmful to the operation of power systems.

Improved WTA Ensemble Learning Results for Conservative Prediction
For the two selected input features of V and e P , the prediction accuracies of the two sublearning machines and WTA ensemble learning model are shown in Table 6 and Figure 13. It can be seen from Table 6 and Figure 13 that the WTA model is able to improve the prediction accuracy of unstable samples (in blue) greatly, from 90.38%, 92.31% to 99.26%, while the overall prediction accuracy also increases slightly. Due to the special treatment to improve the model conservativeness for unstable situations, the accuracy of the proposed WTA ensemble model is higher than the recent work, such as the DSEC ensemble model of less than 97.03% in Reference [6]; the datasets were generated from the same IEEE 39-bus system. Therefore, the proposed WTA ensemble learning model can provide a strong basis for the online applications of TSA, based on machine learning technology.

Comparison of prediction results
Overall Accuracy Unstable Samples Accuracy Stable Samples Accuracy From Table 5, the prediction accuracy is only improved slightly from 98.77% to 98.98% by integrating both features P e and V into one SVM prediction model. A more detailed comparison in Figure 12 shows that the third set of input features [V, P e ] has the highest accuracy. However, the prediction accuracy for unstable samples (in blue) is still not satisfying; that is, the conservativeness of the prediction model is still not good enough. Therefore, it is necessary to find better solutions to reduce the error rate of unstable samples, which are more harmful to the operation of power systems.

Improved WTA Ensemble Learning Results for Conservative Prediction
For the two selected input features of V and P e , the prediction accuracies of the two sub-learning machines and WTA ensemble learning model are shown in Table 6 and Figure 13.

Discussion
The WTA model is able to improve the prediction accuracy of unstable samples to 99.26%. In order to implement the proposed machine learning model into the online TSA, the higher the prediction accuracy for the unstable situations, the better. Any missed instability situation may lead to loss of synchronization, cascading failures, or even large-scale power outages. Making fast and accurate online transient stability predictions is not enough in transient operation of a power system appropriate online emergency control measures will be of interest in future research.
Another concern in practically implementing the proposed machine learning model into largescale power systems is that the feature reduction should be further studied, because too many input features mean a large amount of measurement investment and a huge computational burden in realtime.
In addition, the main focus of this study is to perform machine learning-based transient stability predictions that only deal with the rotor angle stability of power systems. If appropriate data are available, the transient stability prediction modeling method based on machine learning can also be extended to small-disturbance stability, voltage stability, and frequency stability issues.

Conclusion
Recent artificial intelligence and machine learning technologies enable the use of online information of electrical and electromechanical conditions to be used to diagnose and predict the operating status of power systems. A high-accuracy conservative transient stability prediction model is proposed in this paper. Compared with the existing models, our model contains four improvements. 1) The sampled data containing multiple time stages are used as input features for the SVM classifier. It is found that the sampled data containing all three-time stages, namely the prefault, during-fault and post-fault, can better characterize the transient stability of the power systems; 2) the new feature of generators' electromagnetic power is found to be highly correlated to system stability. The SVM classification results show that the prediction accuracy of electromagnetic power feature is higher than the conventional generator rotor angle, generator rotor speed and bus voltage amplitude features; 3) electromagnetic power and voltage amplitude are determined as two superior features so as to reduce computational burden, based on mRMR feature selection; 4) a high-precision WTA ensemble learning model based on the two selected features is established for power system transient stability prediction, which improves the accuracy for unstable situations from 90.38%, 92.31% to 99.26%. The WTA ensemble learning can significantly improve the conservativeness of the prediction model, and the overall prediction accuracy will also be increased slightly. All the research results are verified by the simulated samples on the IEEE 39-Bus system.

Comparison of prediction results
Overall Accuracy Unstable Samples Accuracy Stable Samples Accuracy Figure 13. Comparison of accuracy in different models.
It can be seen from Table 6 and Figure 13 that the WTA model is able to improve the prediction accuracy of unstable samples (in blue) greatly, from 90.38%, 92.31% to 99.26%, while the overall prediction accuracy also increases slightly. Due to the special treatment to improve the model conservativeness for unstable situations, the accuracy of the proposed WTA ensemble model is higher than the recent work, such as the DSEC ensemble model of less than 97.03% in Reference [6]; the datasets were generated from the same IEEE 39-bus system. Therefore, the proposed WTA ensemble learning model can provide a strong basis for the online applications of TSA, based on machine learning technology.

Discussion
The WTA model is able to improve the prediction accuracy of unstable samples to 99.26%. In order to implement the proposed machine learning model into the online TSA, the higher the prediction accuracy for the unstable situations, the better. Any missed instability situation may lead to loss of synchronization, cascading failures, or even large-scale power outages. Making fast and accurate online transient stability predictions is not enough in transient operation of a power system appropriate online emergency control measures will be of interest in future research.
Another concern in practically implementing the proposed machine learning model into large-scale power systems is that the feature reduction should be further studied, because too many input features mean a large amount of measurement investment and a huge computational burden in real-time.
In addition, the main focus of this study is to perform machine learning-based transient stability predictions that only deal with the rotor angle stability of power systems. If appropriate data are available, the transient stability prediction modeling method based on machine learning can also be extended to small-disturbance stability, voltage stability, and frequency stability issues.

Conclusions
Recent artificial intelligence and machine learning technologies enable the use of online information of electrical and electromechanical conditions to be used to diagnose and predict the operating status of power systems. A high-accuracy conservative transient stability prediction model is proposed in this paper. Compared with the existing models, our model contains four improvements. (1) The sampled data containing multiple time stages are used as input features for the SVM classifier. It is found that the sampled data containing all three-time stages, namely the pre-fault, during-fault and post-fault, can better characterize the transient stability of the power systems; (2) the new feature of generators' electromagnetic power is found to be highly correlated to system stability. The SVM classification results show that the prediction accuracy of electromagnetic power feature is higher than the conventional generator rotor angle, generator rotor speed and bus voltage amplitude features; (3) electromagnetic power and voltage amplitude are determined as two superior features so as to reduce computational burden, based on mRMR feature selection; (4) a high-precision WTA ensemble learning model based on the two selected features is established for power system transient stability prediction, which improves the accuracy for unstable situations from 90.38%, 92.31% to 99.26%. The WTA ensemble learning can significantly improve the conservativeness of the prediction model, and the overall prediction accuracy will also be increased slightly. All the research results are verified by the simulated samples on the IEEE 39-Bus system.

Conflicts of Interest:
The authors declare no conflict of interest.