Decision Tree for Online Voltage Stability Margin Assessment Using C4.5 and Relief-F Algorithms

: In practical power system operation, knowing the voltage stability limits of the system is important. This paper proposes using a decision tree (DT) to extract guidelines through o ﬄ ine study results for assessing system voltage stability status online. Firstly, a sample set of DTs is determined o ﬄ ine by active power injection and bus voltage magnitude (P-V) curve analysis. Secondly, participation factor (PF) analysis and the Relief-F algorithm are used successively for attribute selection, which takes both the physical signiﬁcance and the classiﬁcation capabilities into consideration. Finally, the C4.5 algorithm is used to build the DT because it is more suitable for handling continuous variables. A practical power system is implemented to verify the feasibility of the proposed online voltage stability margin (VSM) assessment framework. Study results indicate that the operating guidelines extracted from the DT can help power system operators assess real time VSM e ﬀ ectively.


Introduction
Voltage stability refers to the capability of maintaining voltage within safe or acceptable limits, even in the case of credible contingencies [1][2][3][4]. Current industry practice is to perform offline voltage stability studies and extract operating guidelines to monitor and control the system state. Due to the interconnections of power systems, the scale of the power grid expands rapidly and operating conditions become more complicated. Furthermore, due to the rapid development of wind power and photovoltaic power generation in the power grid, greater uncertainty has been introduced. To investigate system stability limits, operational planners have to perform more offline studies of various system topologies and conditions. Data collected from these offline studies has reached an unprecedented level. Currently, individual operational planners analyze offline data manually or semi-automatically. Each individual is only capable of exploring valuable information from small amounts of data and is unable to manually process large amounts of data. Determination of stability limits is dependent on an individual operational planner's experience and knowledge, which is inevitably limited and fragmentary. Although online voltage stability analysis has been implemented by some utilities, it remains a challenge to extract useful information from the mass of data obtained from voltage stability analysis results within an operating time frame.
Data mining techniques provide opportunities to solve this problem, as they can process large amounts of data automatically and efficiently. Researchers have applied data mining technologies, such as artificial neural networks (ANNs), support vector machine (SVMs), and naive Bayes (NB) algorithms, and various improved versions, to online voltage stability assessment . However, these attempts use black box models, which hide the inner data mining process and cannot extract underlying valuable information [30].

Theoretical Framework
This section is composed of three parts. First, the sample set is established for the DT. According to the P-V curve analysis, the VSM is determined and the DT's sample set then can be generated. Second, participation factor (PF) analysis and the Relief-F algorithm are sequentially applied in attribute selection. Finally, the DT is built using the C4.5 algorithm for online VSM assessment.

Establish Sample Set for DT
For each network topology, the maximum power transfer point is obtained by gradually increasing system loads with the original operating point. Once it is reached, VSM can be determined and corresponding training samples then can be generated.
The variation of voltage amplitude V with active power P is plotted as the P-V curve. The system VSM is given as follows: where P i is active power of the ith operating point. P max represents the active power of the maximum power transfer point. In practice, system operators usually avoid operating the power system in the region below the maximum power transfer point. According to monitoring needs, operators usually divide the system into three states: normal state, alert state, and emergency state.
When M i is within the interval [5%, 8%), the power system is considered to be at a normal state. The operating points within this interval are treated as normal samples and assigned with class label N. In a similar vein, the alert samples and emergency samples are selected and assigned with class label A and class label E, respectively. If the number of a certain class of samples is much smaller than that of others, this class of samples may be mistaken for noise. In the classifying process, the DT may neglect learning these samples, which means the DT cannot learn the true structure of data. This is called "underfitting" [37]. In system operation, operators tend to focus more on the emergency state. However, the number of emergency samples is obviously less than that of normal or alert samples. Therefore, the same number of samples should be selected from each class for constructing the sample set to avoid underfitting.

Determine Attributes for the DT
First, modal analysis is implemented for distinguishing pivotal voltage stability modes. For each pivotal mode, PF analysis is applied for selecting attributes preliminarily. Second, the relief-F algorithm is used for further selecting attributes and the final attribute set then can be determined for DT.
The relationship of reactive power increment ∆Q to voltage amplitude increment ∆V is expressed using the following equation [1,2]: where J R represents the reduced Jacobian matrix. ∆V and ∆Q are incremental changes in voltage amplitude and reactive power, respectively. The ith modal voltage change is calculated as follows: where λ i is the ith eigenvalue of J R . ∆Q mi is the modal reactive power change. It is clear that if λ i = 0, a slight change in Q m will cause significant changes in V m . The system is considered to be on the brink of voltage instability if one of the eigenvalues equals 0. The eigenvalues are obtained using eigenvalue analysis based on P-V curves. The smallest eigenvalue, which can be used for measuring the distance to voltage instability, is chosen to implement PF analysis.
The PF at bus k of mode i is shown as: where ξ ki and η ki are kth elements of the ith column right eigenvector and ith row left eigenvector of J R , respectively. According to (4), the PFs of each bus are calculated and the buses are listed in descending order based on the bus PF analysis results. The differences between adjacent PFs then can be calculated and the largest is regarded as the threshold. The bus whose PF is above the threshold is selected as the key bus. The branch PF of branch j in mode i is given as: where ∆Q loss.j and ∆Q loss.max are reactive power loss of branch j and the maximum loss in all branches, respectively. The generation PF is given as: where ∆Q gk and ∆Q max are reactive power loss of generator k and the maximum loss of all generators, respectively. The corresponding key branches and generators can be determined in a similar way to key buses. The voltage amplitude and voltage phase angle of buses, reactive power of branches, and reactive power of generators comprise the DT's initial attributes.
PF analysis can effectively identify which buses, branches, or generators are important to voltage stability. However, these initial attributes may not have good classification capability. If an attribute is closely correlated with another attribute, these two attributes have the same classification capability. One of these two attributes can be regarded as redundant. In addition, some attributes are irrelevant attributes that may not be useful for classification. If many redundant attributes and/or irrelevant attributes are chosen, the DT is larger than necessary [37,44]. Initial attributes determined by PF analysis should be further selected from a classification capability perspective.
Relief algorithms, one of the most successful preprocessing algorithms, can select attributes effectively without any assumption of attribute independence. However, they are restricted to two-class problems [45][46][47][48][49]. The Relief-F algorithm, which is extended from the relief algorithm and suitable for handling multi-class problems, is applied in this paper to further select attributes. The core of the algorithm is to compute weight values iteratively for attributes based on the distance of different samples. At the beginning, sample R i is selected randomly. From the same class with sample R i , m nearest neighbors, which are represented by H j , are selected. In addition, the m nearest neighbors, which are represented by M j , from each of the different classes is also selected. The weight of attribute A is given as: where ω A is the weight value of attribute A. n represents the number of iterations. P(C) represents the probability of samples belonging to class C. diff represents the difference between R i and H j or M j under attribute A: where v(A, I 1 ) and v(A, I 2 ) are the values of sample I 1 and I 2 of attribute A, respectively. v(A) max and v(A) min are the maximum and minimum values, respectively, of attribute A for all of the samples.
, then the attribute is suitable for dividing the sample set and the weight value should be raised.
According to (7), the weight value of each attribute can be obtained and then listed in descending order. The threshold value is determined based on the most obvious change between adjacent weight values. Given that the attributes with larger weight values have a more positive effect on classification, the attributes whose weight values are greater than the threshold value are selected.

Apply C4.5 Algorithm to Build DT
The tree-like structure of the DT reveals the classification rules visually. Once the DT is constructed, a series of operating guidelines are obtained for operators to monitor system voltage stability status.
CART, Iterative Dichotomiser 3 (ID3), and C4.5 are commonly used algorithms to construct DTs. The C4.5 algorithm is better suited to handling continuous attributes than the CART algorithm and can avoid the overfitting problem of the ID3 algorithm [50]. The C4.5 algorithm comprises four main steps. First, the initial information entropy of the sample set is calculated. Second, the split entropy of the sample set under a selected attribute is calculated. Then the information gain can be easily obtained by subtracting the split entropy from the initial information entropy. Finally, the information gain ratio corresponding to the selected attribute is calculated based on the information gain.

Calculate the Initial Information Entropy
where S is the sample set. p i represents the probability of samples belong to class i. m is the number of classes, which matches the system states.

Calculate the Split Entropy
Assume that S is divided into two subsets by a randomly selected attribute A. The split entropy is shown as follows: where S L and S R are subsets of S. |S|, |S L | and |S R | represent the number of samples in each set.

Obtain the Information Gain
The information gain is calculated to quantify the ability of attributes to reduce the overall entropy: However, if the information gain is regarded as the attribute selection criterion of the DT, it may result in overfitting.

Calculate the Information Gain Ratio
Split information is introduced to solve the overfitting problem, which is calculated using the following equation: A greater information gain ratio means a better attribute. The most suitable attribute of each node can be identified by calculating the information gain ratio for each attribute recursively. The branches of the DT are the system voltage stability guidelines, which can be extracted for operators to identify the system state.
The C4.5 algorithm can only access the generated sample set. The error computed in this process is the training error. However, an excellent DT must not only fit the generated sample set well, but also classify the unknown samples accurately. The expected error of previously unknown samples, known as the generalization error, needs to be calculated. K-fold cross-validation is an effective method to Energies 2020, 13, 3824 6 of 13 compute the generalization error by dividing the sample set into K equal-sized subsets. During each training, K−1 subsets are selected to train the model and the other is used to test the model. This process is run repeatedly K times until each subset is selected to test the performance of the DT [31,37]. The generalization error is defined as follows: where E g represents generalization error. K is the number of trainings, and E g is the error of the ith training. According to numerous tests with various data sets by different data mining techniques, 10-fold cross-validation has been certified as the optimal approach to estimate generalization error [44]. In this study, 10-fold cross validation was performed for evaluating performance. Figure 1 displays the overall framework.
Energies 2020, 13, x FOR PEER REVIEW 6 of 14 branches of the DT are the system voltage stability guidelines, which can be extracted for operators to identify the system state. The C4.5 algorithm can only access the generated sample set. The error computed in this process is the training error. However, an excellent DT must not only fit the generated sample set well, but also classify the unknown samples accurately. The expected error of previously unknown samples, known as the generalization error, needs to be calculated. K-fold cross-validation is an effective method to compute the generalization error by dividing the sample set into K equal-sized subsets. During each training, K−1 subsets are selected to train the model and the other is used to test the model. This process is run repeatedly K times until each subset is selected to test the performance of the DT [31,37]. The generalization error is defined as follows: where g E represents generalization error. K is the number of trainings, and g E is the error of the ith training. According to numerous tests with various data sets by different data mining techniques, 10-fold cross-validation has been certified as the optimal approach to estimate generalization error [44]. In this study, 10-fold cross validation was performed for evaluating performance. Figure 1 displays the overall framework.

Case Study
The proposed online VSM assessment method was verified using actual data obtained from a provincial power grid in China. Figure 2

Case Study
The proposed online VSM assessment method was verified using actual data obtained from a provincial power grid in China. Figure 2 displays the single-line diagram.

Establish the Sample Set for DT
In order to obtain different operating conditions, contingency analysis was performed on each generator, part of transmission lines, and transformer. As described in Section 2.1, P-V curve analysis was conducted for each operating condition. The maximum power transfer points can be generated by increasing system loads in 10 MW steps from the initial operating points. According to (1), the operating points in the different three system states are obtained. The points in the emergency state are regarded as emergency samples because in practice operators need to prevent the system from operating in this region. The operating points in the normal and alert states are used as normal and alert samples, respectively. To emphasize the emergency state and acquire sufficient samples for the DT, 15 samples were selected from each VSM interval in this study. The sample set included 21,960 samples in total.

Establish the Sample Set for DT
In order to obtain different operating conditions, contingency analysis was performed on each generator, part of transmission lines, and transformer. As described in Section 2.1, P-V curve analysis was conducted for each operating condition. The maximum power transfer points can be generated by increasing system loads in 10 MW steps from the initial operating points. According to (1), the operating points in the different three system states are obtained. The points in the emergency state are regarded as emergency samples because in practice operators need to prevent the system from operating in this region. The operating points in the normal and alert states are used as normal and alert samples, respectively. To emphasize the emergency state and acquire sufficient samples for the DT, 15 samples were selected from each VSM interval in this study. The sample set included 21,960 samples in total.

Initial Selection of Attributes
As described in Section 2.2, the critical voltage stability modes are determined by modal analysis. The corresponding smallest eigenvalue of the practical system was 0.053111. In order to determine the main factors for the critical mode, PF analysis was performed. Figure 3 shows the bus PF analysis results, which are listed in descending order. The 64th factor is chosen as the threshold value based on the order of magnitudes of differences between adjacent buses. Hence, the buses whose PFs rank in the top 64 are regarded as key buses. The bus PF analysis results are shown in Table 2.

Initial Selection of Attributes
As described in Section 2.2, the critical voltage stability modes are determined by modal analysis. The corresponding smallest eigenvalue of the practical system was 0.053111. In order to determine the main factors for the critical mode, PF analysis was performed. Figure 3 shows the bus PF analysis results, which are listed in descending order. The 64th factor is chosen as the threshold value based on the order of magnitudes of differences between adjacent buses. Hence, the buses whose PFs rank in the top 64 are regarded as key buses. The bus PF analysis results are shown in Table 2.      A total of 21 Key branches and 12 key generators were determined using similar methods. The branch and generator PF analysis results are given in Tables 3 and 4, respectively. There are 161 initial attributes in total.

Determination of Attributes Set
As described in Section 2.2, 161 initial attributes were further filtered using the Relief-F algorithm. According to (7), the weight values of these initial attributes were computed. Figure 4 shows the results in descending order.
As shown in Figure 4, the first 94 attributes were selected to construct the attribute set of the DT.

Apply C4.5 Algorithm to Build DT
The DT can be constructed based on the generated sample set and the pre-selected 94 attributes using the C4.5 algorithm. The partial DT is shown in Figure 5.

Determination of Attributes Set
As described in Section 2.2, 161 initial attributes were further filtered using the Relief-F algorithm. According to (7), the weight values of these initial attributes were computed. Figure 4 shows the results in descending order. As shown in Figure 4, the first 94 attributes were selected to construct the attribute set of the DT.

Apply C4.5 Algorithm to Build DT
The DT can be constructed based on the generated sample set and the pre-selected 94 attributes using the C4.5 algorithm. The partial DT is shown in Figure 5. In Figure 5, the first and second number in the leaf nodes indicate the total number of samples and the number of mis-classified samples, respectively. Based on all paths of the constructed DT, system operators can extract a series of VSM assessment rules. For instance, if the reactive power of generator 881 is less than or equal to 53.4 Mvar and the voltage amplitude of Bus 613 is less than or equal to 0.941 pu, the system enters the alert state. According to this rule, the system states corresponding to these 71 samples are assessed as the alert state and all are classified correctly. System operators can use these guidelines based on phasor measurement unit (PMU) data to assess the system voltage stability status.   As shown in Figure 4, the first 94 attributes were selected to construct the attribute set of the DT.

Apply C4.5 Algorithm to Build DT
The DT can be constructed based on the generated sample set and the pre-selected 94 attributes using the C4.5 algorithm. The partial DT is shown in Figure 5. In Figure 5, the first and second number in the leaf nodes indicate the total number of samples and the number of mis-classified samples, respectively. Based on all paths of the constructed DT, system operators can extract a series of VSM assessment rules. For instance, if the reactive power of generator 881 is less than or equal to 53.4 Mvar and the voltage amplitude of Bus 613 is less than or equal to 0.941 pu, the system enters the alert state. According to this rule, the system states corresponding to these 71 samples are assessed as the alert state and all are classified correctly. System operators can use these guidelines based on phasor measurement unit (PMU) data to assess the system voltage stability status.  In Figure 5, the first and second number in the leaf nodes indicate the total number of samples and the number of mis-classified samples, respectively. Based on all paths of the constructed DT, system operators can extract a series of VSM assessment rules. For instance, if the reactive power of generator 881 is less than or equal to 53.4 Mvar and the voltage amplitude of Bus 613 is less than or equal to 0.941 pu, the system enters the alert state. According to this rule, the system states corresponding to these 71 samples are assessed as the alert state and all are classified correctly. System operators can use these guidelines based on phasor measurement unit (PMU) data to assess the system voltage stability status.
All of the 21,960 samples were classified to evaluate the DT's performance, which was executed on an Intel Core i7 2.5-GHz CPU with 4 GB of RAM. The generalization error was 97.3953%. The performance of the DT can be visualized in Table 5.
f ij represents the number of samples that were classified from class i to class j. For example, f AN , which equals to 138, is the number of samples that were mis-classified from class A to class N. The total number of correct classifications made by the DT was f NN + f AA + f EE = 21,388 and the total number of incorrect classifications was f AN + f EN + f NA + f EA + f NE + f AE = 572. In practical system operation, system operators pay more attention to the classification results of samples from class E (emergency state). In this confusion matrix, among the 7320 actual samples of class E, the DT misclassified 1 for class N and 130 for class A. Table 6 shows case study results of the DT constructed by the C4.5 algorithm with different attribute sets. The modeling time of the DT constructed with 161 attributes was 13.56 s, while the modeling time of the DT constructed with 94 selected attributes was 8.45 s. This implies that the application of the Relief-F algorithm can significantly shorten the modeling time, and therefore improve the classification efficiency of the DT. The classification accuracy of the DT constructed with 161 attributes was 96.28%, while the classification accuracy of the DT constructed with 94 selected attributes was 97.40%. This implies that the application of all attributes does not guarantee a better classification accuracy. The Relief-F algorithm is an effective method to improve the performance of the DT. Table 7 compares the performance with different DT algorithms. The modeling time of the C4.5 algorithm was 8.45 s, while the modeling time of the CART algorithm was 30.34 s. Compared with the CART algorithm, the efficiency of the C4.5 algorithm was increased by about three times. The classification accuracy of the C4.5 algorithm was 4 percentage points higher than that of the CART algorithm. Hence, the C4.5 algorithm is more suitable for handling continuous attributes in VSM assessment. Table 8 shows the classification accuracy and modeling time of the proposed method (NEW), and the ANN, SVM, and NB methods. All methods used the same sample set and the same attribute set. It is clear that the proposed method performs better than the other methods. System operators can use the operating guidelines extracted from the proposed method to monitor and assess the system state based on PMU data. Once the power system falls into the alert or emergency states, operators have enough time to enact corresponding control strategies and actions to ensure the stability of the power system.

Conclusions
This paper proposes constructing a DT using the C4.5 algorithm to extract operating guidelines for power system operators, and has two innovations. First, PF analysis was performed to initially select attributes based on the attributes' physical meaning. The Relief-F algorithm was then used to further select key attributes based on the attributes' classification ability. Second, the C4.5 algorithm was applied for constructing the DT because most of the attributes in VSM assessment are continuous variables.
Operational planners can use the DT to extract operating guidelines from a large amount of offline study data. The DT can be used to help find new knowledge and insights from offline voltage stability analysis. System operators can use the DT to quickly assess the system voltage margin based on measurement data. Operational planners can use the DT to develop effective control strategies and actions. A case study using provincial power system data indicates that the proposed VSM assessment method not only guarantees classification accuracy to meet practical system operation needs, but also handles a large amount of data within online operation requirements.
Author Contributions: X.M. and P.Z. conceptualized the paper and designed the methods; X.M. worked on the simulation results and was involved in the paper writing. D.Z. was involved in the paper modification. All authors have read and agreed to the published version of the manuscript.