1. Introduction
With the development of ultra-high voltage (UHV) transmission technology and flexible alternating-current (AC) transmission technology, modern power systems have entered the era of large units, UHV, super large scale, long distance, alternating-current and direct-current (AC-DC) hybrid transmission. The interconnection of regional power grids is becoming more and more compact, and the scale of the system is increasingly complex. As the grid operates in a variety of ways and the dynamic characteristics are more complex, the occurrence of low-frequency oscillations will have a serious impact on the grid. It is important to evaluate the stability of low-frequency oscillations online.
Low frequency oscillation which is closely related to small signal stability is usually attributed to small signal stability analysis. The small signal stability analysis of the power system includes frequency domain analysis, eigenvalue analysis and time domain analysis [
1,
2,
3,
4], but these methods don’t consider the actual uncertainties, and it is difficult to fully reflect the stability level of low-frequency oscillations in actual systems. Therefore, the probabilistic analysis method is introduced. And the statistical probability index of the small signal stability is established by considering the random variables such as state and force variation, load fluctuation, and line parameter variation under various working conditions [
5,
6]. In literature [
7], a small signal stability frequency estimation method is proposed by introducing the Monte Carlo method to the random variables such as load level and form, generator state, and network topology parameters. However, the probability model of random variables is relatively simple, so that the evaluation results cannot accurately reflect the actual situation of the grid. In literature [
8,
9], the problem is solved. Complex systems require a large amount of computation and long simulation time, so it is necessary to further study more effective methods for evaluating the low-frequency oscillation stability. Based on the eigenvalue analysis method and risk assessment method, considering the probability safety and instability of the system, the literature [
10] proposed a method to quickly evaluate the real-time risk of small-scale power grid, but did not consider the uncertainty of the grid. Literature [
11] studies the probability distribution of system vibration modal damping based on the deterministic small-signal safety analysis when considering uncertainties. However, how to evaluate low frequency oscillations stability had not been studied. Considering the seriousness of system instability, the literature [
12] proposed a risk-based probabilistic small-signal safety analysis method, and it quantified risk through matrix and continuous function. This method takes into account uncertainties of the power system. A nomogram method based on the analysis of oscillation damping factors is used for small-signal security assessment of power systems to increase accuracy [
13], but these methods still belong to offline evaluation and the results of estimates are inaccurate. This literature [
14] describes that the small signal stability assessment with phasor measurement can be applied online, but the timeliness of judgment is poor. Then a new data-driven methodology to detection of low-frequency oscillations is proposed in [
15] and literature [
16] presents a risk-based probabilistic small-disturbance security analysis (PSSA) methodology for use with power systems with uncertainties, these methods are rapidity but the accuracy needs to be improved.
During the daily operation of the power system, there are persistent small signals of random nature such as load variation, tap changer of transformer, and so on, which bring some random disturbance to the system. The random response data obtained by measurement is externally characterized as a random, noise-like random response data [
17]. This kind of data is not only rich and easy to obtain, but also it contains a large number of electromechanical oscillation characteristics related to actual working conditions, which implies the uncertainties of the actual grid during operation. The low frequency oscillation stability evaluation method based on random response data has received extensive attention. Literature [
18,
19] use frequency domain decomposition and the total least squares-rotation invariant technique to extract vibration information from random response data. The stochastic subspace identification (SSI) method has become a common method for low-frequency oscillation identification [
20]. Because its model order is simple, it has high adaptability to systems with large data volume and complex dynamic processes. Forgetting factor is introduced into the original recursive stochastic subspace identification (RSSI) algorithm in [
21], which improves the calculation speed of system model parameters. However, the selection of genetic factors is very importance and it is difficult to find suitable genetic factors in practice. In literature [
22], a new Bayesian method for the measurement based analysis of electromechanical modes is proposed, which can accurately identity. However, the power system in actual operation is often affected by various small disturbances, and the above methods have a low recognition speed and cannot meet the requirements of real-time evaluation.
With the rapid development of artificial intelligence (AI), the use of data-driven methods to study grid security issues have become a new approach. The AI technology is applied to the analysis of low-frequency oscillation stability for the first time. A neural network-based eigenvalue prediction method for power system critical stability model is proposed in [
23]. Although it has high accuracy, it is offline evaluation. Therefore, the use of artificial intelligence technology to solve the problem of low frequency oscillation stability is a new direction. XGboost (Extreme gradient boosting) is a large-scale parallel learning algorithm which uses different processing methods to learn how to handle missing values when different nodes encounter missing values. Moreover, it has the advantages of low input data requirement, automatic variable selection, and low computational complexity [
24]. And it has been applied in the field of wind turbine fault detection [
25]. Literature [
25,
26,
27] show that the XGboost classifier not only has faster prediction speed than the other classifiers such as support vector machine (SVM) and deep belief network (DBN), but it also has higher prediction accuracy.
The main contributions of this paper is to propose a machine learning method to evaluate the low-frequency oscillation stability of the power system timely and accurately considering the random response data containing the uncertainties of the power grid. Firstly, the original input feature set of the evaluation system is established to ensure the efficiency of the evaluation by analyzing the effects of generator electromechanical model, excitation system, and PSS on low frequency oscillation. Secondly, the data mining method and the improved XGboost machine learning method are applied to analyze the random response data, and then the supervised training is conducted to obtain the training model that describes the relationship between feature set and low-frequency oscillation stability. Finally, the model is applied to online evaluation of low frequency oscillation stability.
The rest of this paper is organized as follows: analyze the essence of low frequency oscillations and establish the original input characteristics of low frequency oscillations in
Section 2.
Section 3 introduces the principle of XGboost and improves the XGboost algorithm. XGboost classifies the random response data after wavelet threshold de-noising and z-score normalization. An online evaluation model for low frequency oscillations is proposed and a model performance evaluation index is established in
Section 4. Simulations and analysis are shown in
Section 5. Finally, conclusions are drawn in
Section 6.
2. The Construction of the Original Input Feature
The main factors affecting the stability of low-frequency oscillation in power system include the initial operation state, the tightness of the components in the transmission system and the features of various control devices. And the specific disturbance values and forms are independent of the low frequency oscillation stability. Therefore, the low frequency oscillation stability can be judged by calculating the damping ratio of the system oscillation mode. In this paper, different damping ratios are chosen as the threshold of low frequency oscillation stability damping ratio. According to the threshold of damping ratio, the low frequency oscillation stability is divided into three categories: (1) Negative damping; (2) Weak damping; and (3) Strong damping.
By analyzing the essence of low frequency oscillation stability, a set of original input features for online evaluation of low frequency oscillation stability is constructed.
The third order generator model is adopted and the differential equation of generator is incremented as (
):
In Formula (1), is the change of output excitation voltage of the excitation system; is the change of transient potential of the q axis; and are electromagnetic power, rotor angular velocity and rotor angle respectively; and are inertia time constant and self-damping coefficient.
Therefore, analysis Formula (2) shows that , , and describe the change features of the generator when the power system is subjected to small disturbance. The following feature sets can be selected: the change values of the electromagnetic power per unit time, that is the electromagnetic accelerate power (maximum, minimum, and average); the change in angle per unit time, that is the angular velocity; the velocity (the difference between the maximum and the minimum of the angular velocity); the change in angular velocity per unit time, that is the angular acceleration (the difference between the maximum and the minimum of the angular acceleration).
Transfer function of excitation system, set (
):
Increments the excitation system to:
In Formula (2), is the change value of generator terminal voltage.
Therefore, analysis Formula (3) shows that describes the change features of the excitation system when the power system is subjected to small disturbance, and the following feature sets can be selected—the change value of the excitation voltage per unit time (the maximum value, the minimum value, and the mean value of the change value).
Due to the large electromagnetic inertia of the excitation system, the negative damping caused by the regulator under certain conditions (high load level and weak connection) will have a negative impact on the dynamic stability of the power system and it causes low frequency oscillation. The principle of PSS is as follows: when the system is subjected to low frequency oscillation after small disturbance, PSS can compensate the inertia time delay of the excitation control system by extracting the speed deviation signal of the generator and compensating the inertia time delay of the excitation control system, so that the stabilizer can get the appropriate phase compensation and the speed deviation of generator is eliminated by integral loop.
Generator rotor kinetic energy:
Generator rotor acceleration:
Therefore, and describe the rotational speed deviation of the generator when the power system is disturbed. The following feature sets can be selected: the Formula (4) is shown as the change value of the rotor speed per unit time, that is the rotor acceleration (the maximum, minimum, and average value of the rotor acceleration), and Formula (5) is shown as the rotor motion of the generator. It can take the difference between the maximum and minimum values of the kinetic energy of the generator rotor and the average kinetic energy of the generator rotor.
The construction of raw input features is a critical task for on-line evaluation of low frequency oscillation stability. Therefore, the construction of the original input features fundamentally determines the accuracy of online evaluation of low frequency oscillation stability. Through the analysis of the stability features of low frequency oscillation, the original input features which can fully reflect the change of the stability and dynamic features of low frequency oscillation at a certain time are complete. At the same time, in order to reflect the dynamic process of low frequency oscillation stability at different time, the original feature sets of disturbance occurrence time, disturbance end time, and different time of dynamic process are selected and the original input features of 4 and 5 typical moments are constructed. The 15-dimensional original input feature description at each moment is shown in
Table 1.
3. The Principle of the Improved XGboost Algorithm
3.1. The Principle of Wavelet Threshold De-Noising for Random Response Data
The random response data is the long-term dynamic response data in the daily operation of the power system, and the disturbance form and the specific occurrence position of the disturbance source can be ignored. The use of random response data for low frequency oscillation stability determination has the following two advantages:
It can determine the low-frequency oscillation stability of the system through the machine learning method only by relying on the daily operation measurement data, avoiding the complicated construction process of the high-dimensional model and the error of the identification result caused by the difference between the model and the actual system.
The electromechanical oscillation characteristic parameter identification process based on random response data does not need to prepare the disturbance experimental scheme in advance, and the system can be carried out under normal operating conditions, thereby overcoming the timeliness and credibility of evaluation method. The random response data provides real-time dynamic stability change information of the power system, which is suitable for online applications.
The random response data of power system collected by WAMS (wide area measurement system) can be expressed as:
In the Formula (6), is a signal containing noise; is observed signal; is Gauss white noise.
The key problem of wavelet threshold de-noising algorithm [
28] is the selection of threshold and threshold function. The threshold method is as follows:
In the Formula (7), is the noise intensity, it is also the standard deviation of the noise signal; is the length of the signal. In the Formula (8), is the median of wavelet coefficients on scale j.
The wavelet threshold method is applied to de-noise the random response data collected by WAMS.
3.2. The Principle of XGboost
XGboost is the abbreviation of extreme gradient rise, and it is a large-scale parallel algorithm.
The XGboost model can be expressed as:
In the Formula (9),
,
is the number of samples;
is a set that corresponds to all the regression trees, and
is a function in
. When establishing a model, the best parameters should be selected [
24] to make the target function minimum. The general objective function contains two items: the error term
(Error function) and the regularization term
(Measuring the complexity of the model). The target function
is expressed as:
In Formula (10), .
From Formula (10), objective function depends only on the first-order derivative and the second derivative of each data point on the error function.
Then the model complexity in the target function is defined. To refine
, the regression tree can be divided into the structural part of the tree
and the weight part of the leaf
. That is:
The number of leaf nodes is L1 regular, with a coefficient of
, and the weight of leaves is L2 regular, with a coefficient of
. The above two items are used to control tree growth to avoid overfitting to a certain extent. That is:
Through Formula (12), the objective function seeks the maximum
and the maximum gain of the corresponding function, and it transforms the problem into the minimum value problem of solving the quadratic functions. Solved:
In Formula (13),
is the scoring function of the evaluation model. If the value of
is smaller, the model is better. XGboost uses the “greedy method” to make
find the best tree structure, which is to add a new partition to the existing leaves each time and calculate the maximum gain that is obtained. Gain calculation formula is as follows:
In Formula (14), the first item represents the gain generated by the left subtree after the segmentation. The second item represents the gain generated by the right subtree after the segmentation. The third item represents the gain that does not carry out the segmentation. represents the complexity cost of the new leaves due to the segmentation.
In this paper, the essence of the XGboost method is to parallel the Boosted Tree on a single CPU computer to improve the prediction accuracy of Boosted Tree.
3.3. Normalization Based on XGboost Features
If the data is not normalized, the loss function in XGboost can only choose linearity, which leads to the poor effect of the model. Therefore, Z-score normalization method [
29] is adopted to normalize the original features.
The original feature set is
, where
n is the sample number and
m is the number of observed variables. The standardization of the original feature set
Y by z-score method is as follows:
In Formula (15), is the first sample, is the mean vectors of all values of the original feature set Y, is the standard deviation vectors of all values of the original feature set Y, and is the sample data normalized by the sample Y.
Since z-score standardization uses the mean and variance of the entire data, the mean and variance of the data with different operation modes and different small interferences vary greatly. In order to adapt to the low frequency oscillation, the local data mean and variance are standardized.
The main idea of the local nearest neighbor standardization method is to standardize the mean and variance of the local neighbor samples consisting of
k nearest neighbors of sample
. The formula is as follows:
In Formula (16),
k is the selected number of nearest neighbors, and
k must satisfy
,
is the data set of
k nearest neighbors determined by the Euclidean distance of sample
in the original feature set
Y. And
is the
k nearest neighbor sample
,
is the Euclidean distance between two samples
and
, then the relationship between
k nearest neighbor samples in
and the common of data set
. The formulas are as follows:
5. Simulation Analysis
5.1. Example System
The eight-machine 36-node system shown in
Figure 3 are selected as the test grid. The data set is simulated by MATLAB and its power system analysis toolbox (PSAT) for transient stability calculation and small signal stability calculation. To obtain the random response data in the power system, the operation state of the power system includes
, and so on, a total of 15 kinds of load levels (of which the generator changes according to the load level). In the 15 operation modes, the power flow calculation is carried out, the PSS parameters are changed, and the low frequency oscillation instability caused by the small disturbance is simulated. (The data is attached below the article.) Small disturbances are set as follows:
The load fluctuation simulation small disturbance occurs on nine loads. The simulation setting is as follows: setting load fluctuation, the occurrence time is 0.9 s, and the end time is 1.1 s.
Set part of the machine on eight generators to simulate small disturbance. The simulation setting is as follows: setting the cutting machine unit and its proportion, the time is 1 s, the time of excision is 1.1 s.
Change the tap of transformer separately to simulate the occurrence of small disturbance. The occurrence time is 0.9 s.
At the end of the simulation, according to the eigenvalue analysis, if the damping ratio of all electromechanical oscillation mode is more than the threshold value, it is judged as stable (judged 1). if the damping ratio of any electromechanical oscillation mode is less than the threshold (0.03, 0.04, or 0.05) and more than 0, it is judged to be harmful to the system because of the long-time oscillation (judged 0). The damping ratio is less than 0, then it is judged to be low frequency oscillation and the system is unstable (judged −1). In this paper, 36,000 samples (13,500 load fluctuation samples, 12,000 cutting machine samples, 10,500 changing transformer taps samples) are obtained, of which 25,200 samples are used as training sets, and 10,800 samples are used as test sets. The training set is input into the model for training, and the test set is used to verify the validity of the model. (Negative sample ratio is 43%)
5.2. Optimal Original Input Feature Selection
By using the improved XGboost to evaluate the experiment, the best original input features of the model are obtained. Suppose
is the time of small disturbance occurrence;
is the small disturbance clearing time;
is the
i cycle time after the small disturbance clearing time
. The features of
,
and
moments coincide with the physical quantities represented by characteristic 2~16 in
Table 1.
Select 0.03 as the damping ratio threshold, and the selection results of different original features show that:
All the original input features have fast calculation, and the calculation speed is within 0.012ms, which can meet the requirements of online application and have real time evaluation.
The original input features are able to accurately assess the stability of low frequency oscillations. The correct rate of selecting the appropriate original feature input will reach 99.73%.
Choosing long time scale can increase the accuracy of judgment, but it cannot meet the requirement of timeliness because of the need to collect the data of the long time, and timeliness and accuracy need to be judged comprehensively.
By contrast, the best correct rate of the model results at the interval of three circumferential waves is 99.42%, and the requirement of meeting the time speed is higher than that of the choice interval of 4 weeks and two cycles, although it is lower than the correct rate of the interval of 30 circumferential waves, but the time of acquisition is 1/10.
The addition of the original input features of the long-term scale has an interference effect on the training, and has no effect on improving the correct rate.
The comparison between
Table 2 and
Table 3 shows that the original feature input of the original feature set
can be selected in this paper to have highly effective accuracy and rapidity.
The theoretical analysis shows that the characteristics of the three interval cycles can be used to characterize the change of the damping ratio of the system at that time, so it has a higher accuracy. However, the feature set of 30 interval-period times is selected to evaluate the high accuracy rate. Because the system low-frequency oscillation is stable and the sample data gap is larger, the characteristics of low-frequency oscillation stability are more obvious, but the rapidity of evaluation cannot be reflected.
The validity of the evaluation method is proved. After selecting three special operating states, the active oscillation curves of the generators after small disturbances are shown in
Figure 4,
Figure 5 and
Figure 6 by selecting the 0.03 threshold as an example. The results of machine learning evaluation are as follows:
Figure 4 evaluation result is 1;
Figure 5 evaluation result is 0; and
Figure 6 evaluation result is −1.
As shown in
Figure 4,
Figure 5 and
Figure 6, the low frequency oscillation stability of the generator active power oscillation curve system is consistent with the on-line evaluation results, which further proves the effectiveness of the method.
In order to verify the relationship between the results and the eigenvalues, the topological structure is analysed. The results of eigenvalues analysis are shown in
Table 4,
Table 5 and
Table 6 and the results of low frequency oscillation stability are shown in
Figure 4,
Figure 5 and
Figure 6. In order to verify the relationship between the results and the eigenvalues, the topological structure is analyzed. The online evaluation results are consistent with the results of eigenvalue analysis, which proves that the online evaluation results of low-frequency oscillation stability are correct.
5.3. Evaluation Performance of Models with Different Samples and Different Damping Ratios
The original feature set with different samples and the original feature set with different damping ratio thresholds are evaluated by improved XGboost. The results are shown in
Table 7. The performance of model evaluation is analyzed from the following two perspectives:
Performance comparison of the original feature set samples with the damping ratio of 0.03, 0.04, and 0.05 is carried out.
The model performance analysis is carried out for the original feature set samples considering the load fluctuation, the generator switching, and the transformer tap, and the model performance analysis of the original feature set samples was carried out based on the first three small disturbance cases.
The results of different sample and different damping ratio threshold show that:
The random response data contains more perturbation types, and the accuracy of the evaluation is higher.
The selection damping ratio threshold is 0.03, and the evaluation model has the highest evaluation accuracy.
5.4. Online Evaluation Results of Low Frequency Oscillation Stability in Different Models
The traditional SSI algorithm uses the singular value decomposition method, while the prony algorithm uses the method of quasi-sum sampling data. The calculation speed is slow in seconds, and its noise resistance is poor, and the shortcoming of real-time performance cannot be guaranteed.
To embody the accuracy and real-time performance of machine learning algorithm in low frequency oscillation stability evaluation, as a contrast, the SVM, the random forest, XGboost, and the improved XGboost algorithm proposed in this paper are used to carry out the comparison test of low frequency oscillation stability evaluation.
The results in
Table 8 of the same model selection shows that:
Compared with SVM, XGboost, and random forest, the improved XGboost has better accuracy and rapidity.
The improved XGboost has the highest evaluation accuracy in reducing the error evaluation rate of unstable samples, and it can better prevent unstable samples from being recognized as stable samples so that it cannot be alarmed in time.
Therefore, improved XGboost algorithm has the features of high accuracy and real-time in the evaluation of low-frequency oscillation stability.
5.5. Model Evaluation Performance Considering Wide Area Measurement System Noise
The results of noiseless and noisy under different models indicate:
The Gauss white noise of 50dB, 30dB, and 10dB is added to the original data to simulate the measurement error of the wide area measurement system, and the accuracy rate comparison of the different model low frequency oscillation stability on-line evaluation tests is carried out. The results are shown in
Table 9.
Under the same signal to noise ratio, the improved XGboost has the highest evaluation accuracy, and improved XGboost is slightly better than XGboost and random forest in anti-noise. It shows that improved XGboost can play the role of noise filtering, and all the two have strong generalization ability.
5.6. Actual System Simulation Analysis
The selection of Hebei southern power grid as the actual test system is shown in
Figure 7. The data set is simulated by MATLAB. To obtain the random response data in the power system, multiple actual run modes are selected. In the operation modes, the power flow calculation is carried out, the PSS parameters are changed, and the low frequency oscillation instability caused by the small disturbance is simulated. Small disturbances are set as follows: (1) The load fluctuation simulation small disturbance occurs; (2) Set part of the machine on generators to simulate small disturbance; and (3) Change the tap of transformer separately to simulate the occurrence of small disturbance.
In this paper, 24,000 samples (12,000 load fluctuation samples, 6000 cutting machine samples, 6000 changing transformer taps samples) are obtained, of which 16,800 samples are used as training sets and 7200 samples are used as test sets. The training set is input into the model for training, and the test set is used to verify the validity of the model. (Negative sample ratio is 21%)
To sum up, the results
Table 10 and
Table 11 show that in a real environment the online evaluation method of low frequency oscillation stability based on improved XGboost algorithm also has the features of high accuracy of evaluation, efficiency of calculation, strong anti-noise signal ability, and low error rate of unstable sample evaluation.