Distributed Feature Selection for Power System Dynamic Security Region Based on Grid-Partition and Fuzzy-Rough Sets

: In order to satisfy the requirements of modern online security assessment of power systems with continuously increasing complexity in terms of structure and scale, it is desirable to develop a power system dynamic security region (DSR) analysis. However, data-driven methods suffer from expensive model training costs and overfitting when determining DSR boundaries with high-dimensional grid features. Given this problem, a distributed feature selection method based on grid partition and fuzzy-rough sets is proposed in this paper. The method first employs the Louvain algorithm to partition the power grid and divide the original feature set so that high-dimensional features can be allocated to multiple computational units for distributed screening. At this point, the connections between features of different computational units are minimized to a relatively low level, thereby avoiding large errors in the distributed results. Then, an incremental search algorithm based on the fuzzy-rough set theory (FRST) is used for feature selection at each computational unit, which can effectively take into account the intrinsic connections between features. Finally, the results of all computational units are integrated in the coordination unit to complete the overall feature selection. The experimental results based on the IEEE-39 bus system show that the proposed method can help simplify the power system DSR analysis with high-dimensional features by screening the critical features. And compared with other commonly used filter methods, it has higher screening accuracy and lower time costs.


Introduction
The past decade has witnessed an increasing deployment of renewable energy resources and grid interconnection, which results in an extremely complex dynamic behavior of power systems [1].The traditional security and stability analysis method (i.e., the point-wise method) involves numerous simulations under pre-selected faults to judge whether a power system is secure/insecure or stable/unstable, which is difficult to meet the requirements of modern online security assessment of power systems [2,3].For the sake of quickly and intuitively obtaining a holistic evaluation of the current operating state of power systems with information such as security margins and optimized control directions, Wu et al. introduced the concept of security region (SR) [4].
The concept of SR addresses problems from the perspective of regions and describes a region in which a power system can operate securely as a whole.Based on the difference between static stability analysis and transient stability analysis, an SR can be classified into a steady-state security region (SSSR) or a dynamic security region (DSR) accordingly [5].A conventional DSR is defined as a set of operating points in the pre-fault power injection space that can guarantee the transient stability of power systems.Previous research has proved its topological characteristics of internal connectivity, no void, compact boundaries, and no knot, which can be approximated by hyperplanes [6].These characteristics have laid the foundation for the practical application of conventional DSRs, resulting in two kinds of boundary determination methods represented by direct and data-driven methods.
Direct methods search for a critical injected power of power systems for a given fault based on the transient energy function (TEF) [7][8][9], extended equal area criterion (EEAC) [10,11], etc., and then employ a sensitivity analysis to solve the analytical expression of DSR boundaries.Although these methods are fast and clear in physical interpretation, they cannot be applied to electronic power systems because of the limitations of TEF and EEAC (e.g., non-scalability and conservatism) [12].In contrast, data-driven methods can effectively compensate for the deficiency in this aspect, which obtains an approximate hyperplane of the DSR boundaries on a large number of samples by fitting [13,14] or classification [15,16] techniques.Although the process of generating samples brings extra time costs, it can be computed offline first and then matched online in practice.Therefore, these methods have become mainstream methods for determining DSR boundaries in engineering [17].
For actual grid dispatching, it is not enough to only consider the power injection of each node in a power system, but also the power transmissions of certain branches or critical sections [18].This makes the data-driven methods suffer from expensive model training costs and overfitting when dealing with massive grid operation features.Therefore, it is necessary to screen for critical features in advance [19].
The essence of feature selection problems is to find a minimal subset of features for the system under consideration that maintains or improves the decision-making ability of data.Currently, the corresponding solution methods in power systems can be mainly classified into the filter, wrapper, and embedded methods according to whether they are related to the data-driven model of the subject.The wrapper methods construct an objective function based on the decision-making accuracy of the data-driven model in the test set, and iteratively search for an optimal subset of features with an intelligent optimization algorithm [20][21][22][23].The embedded methods embed feature selection into the process of training the data-driven model [19,24,25].Although both of these methods are effective in screening for critical features in data, they have the disadvantages of slow feature selection efficiency and easy overfitting of the model, respectively.Therefore, they are unsuitable for a power system DSR analysis with high-dimensional features [26].In contrast, the filter methods are much simpler and more efficient.They only rely on the information attached to features to evaluate the screened feature subset and are independent of the data-driven model [27][28][29][30][31][32].However, most of the existing filter methods, such as Relief [33], Fisher Score [34], and Mutual Information [35], ignore the intrinsic connection between features, which leads to redundant feature selection results.In addition, there are also studies such as [36] that combine the filter and wrapper methods to screen features in two steps, but this might lead to further slowdowns in selection.
To effectively and efficiently screen for critical features in a power system DSR analysis, a better filter feature selection method needs to be developed that can address the above shortcomings.The fuzzy-rough set theory (FRST) provides us with an attractive idea [37].It uses discernable sample pairs of features to describe the implicit information in data, which helps to take into account the intrinsic connection between features during feature selection.Furthermore, inspired by [38], it is desirable to allocate high-dimensional features to multiple computational units (assuming that computer resources are divided into computational units and coordination units) for distributed screening.This can further improve the overall efficiency of the method.Considering that grid features within the same electrical partition usually have a much stronger connection to each other, while grid features between different electrical partitions are relatively less connected, adopting a grid partition-based method to divide features will be more in line with engineering practice than the Pearson correlation coefficient-based method in [38].
In summary, a distributed feature selection method based on grid partition and fuzzyrough sets is proposed, which consists of two parts: partitioning the power grid and dividing the original feature set by the Louvain algorithm, and realizing feature selection based on the FRST.The main contributions in this paper are summarized as follows: (1) Considering the need for actual grid dispatching, extend the definition of DSR in power systems from the power injection space to the Cartesian product of the power injection space and power transmission space (referred to as the power space).Meanwhile, considering the changes in topological characteristics brought about by the extension, introduce a support vector machine (SVM) with a Gaussian kernel function to approximate the boundaries of the extended DSR.(2) Propose a distributed feature selection method based on grid partition and fuzzyrough sets that can be applied to a power system DSR analysis with high-dimensional features.On the one hand, electrical partition by the Louvain algorithm can allocate the original feature set to multiple computational units for screening, while the connection between different feature subsets is minimized to a relatively low level, thereby avoiding large errors in the distributed results.On the other hand, feature selection based on the FRST is not only efficient but can also effectively take into account the intrinsic connections between features and, thus, improve the screening accuracy.
The remainder of this paper is organized as follows: Section 2 extends the definition of DSR for power systems and introduces the SVM method for approximating the extended DSR boundaries.Section 3 proposes a distributed feature selection method for screening critical DSR features.A case analysis was carried out and the experimental results are presented to substantiate the proposed method in Section 4. Finally, Section 5 draws the conclusion.

Extension of Dynamic Security Region
In actual dispatching work, grid dispatchers are often concerned about the power transmissions of certain branches or critical sections, in addition to the power injection of each node in the power system.A conventional DSR is defined in the power injection space, which cannot help dispatchers identify potential problems on the branches quickly.Hence, to perform online real-time security monitoring, assessment, and control more scientifically and efficiently, this paper first extends the definition of DSR to the power space, which can be formulated as follows: where Ω d denotes the DSR of the power system; F is a given fault; i and j are the pre-fault and post-fault system topology, respectively; x τ (y) represents the system state at the instant of fault clearing τ; R(y) represents the region of attraction (ROA) of the equilibrium point x s (y) determined by the power vector y; and Y is the set of all y that satisfies the constraints of the power flow equations, the upper and lower limits of node-injected power, and the upper and lower limits of branch-transmitted power.
where g(y) = 0 is the power flow equation, and y min and y max are the limit vectors of the power vector y.
As can be seen, the extended DSR determines a set of all points in the pre-fault power space that can ensure the transient stability of the power system after a given fault has occurred.In an engineering application, grid dispatchers need to determine the corresponding extended DSR for each fault in the pre-selected fault set.It has been shown that for a given fault type, fault duration, and system topology before and after the fault, this set is uniquely determined, and it has the topological characteristics of internal connectivity, no void, compact boundaries, and no knot.
The mapping relationship between DSR and ROA is shown in Figure 1.
Electronics 2024, 13, x FOR PEER REVIEW 4 of 17 set is uniquely determined, and it has the topological characteristics of internal connectivity, no void, compact boundaries, and no knot.The mapping relationship between DSR and ROA is shown in Figure 1.

Approximate Boundaries of Extended Dynamic Security Region
The essence of a power system DSR analysis is to determine its boundaries.For a conventional DSR defined in the power injection space, the boundaries can be easily approximated by one or a few hyperplanes [6].However, after the extension, they might be curved because of the internal electrical connection between the node-injected power and branch-transmitted power.Given this property, in this paper, we introduce an SVM with a Gaussian kernel function to approximate the boundaries of the extended DSR.
SVM is an effective tool for two-group classification problems [39], and its basic principle is shown in Figure 2. It uses a convex optimization technique to find the optimal discriminant surface based on the linear discriminant function as follows: where  and  are the normal vector and displacement term of the hyperplane, respectively;  represents the sample data; and   is a mapping function that maps the nonlinearly classifiable input  to a very high-dimension feature space so that they can become linearly classifiable.The mapping relationship between DSR and ROA.y and y ′ are the points inside and outside the DSR, respectively; x s and x ′ s are the equilibrium points of the corresponding ROA after a given fault has occurred; and x τ and x ′ τ are the system states at the instant of fault clearing.

Approximate Boundaries of Extended Dynamic Security Region
The essence of a power system DSR analysis is to determine its boundaries.For a conventional DSR defined in the power injection space, the boundaries can be easily approximated by one or a few hyperplanes [6].However, after the extension, they might be curved because of the internal electrical connection between the node-injected power and branch-transmitted power.Given this property, in this paper, we introduce an SVM with a Gaussian kernel function to approximate the boundaries of the extended DSR.
SVM is an effective tool for two-group classification problems [39], and its basic principle is shown in Figure 2. It uses a convex optimization technique to find the optimal discriminant surface based on the linear discriminant function as follows: where w and b are the normal vector and displacement term of the hyperplane, respectively; x represents the sample data; and ϕ(x) is a mapping function that maps the non-linearly classifiable input x to a very high-dimension feature space so that they can become linearly classifiable.
set is uniquely determined, and it has the topological characteristics of internal connectivity, no void, compact boundaries, and no knot.
The mapping relationship between DSR and ROA is shown in Figure 1.

Approximate Boundaries of Extended Dynamic Security Region
The essence of a power system DSR analysis is to determine its boundaries.For a conventional DSR defined in the power injection space, the boundaries can be easily approximated by one or a few hyperplanes [6].However, after the extension, they might be curved because of the internal electrical connection between the node-injected power and branch-transmitted power.Given this property, in this paper, we introduce an SVM with a Gaussian kernel function to approximate the boundaries of the extended DSR.
SVM is an effective tool for two-group classification problems [39], and its basic principle is shown in Figure 2. It uses a convex optimization technique to find the optimal discriminant surface based on the linear discriminant function as follows: where  and  are the normal vector and displacement term of the hyperplane, respectively;  represents the sample data; and   is a mapping function that maps the nonlinearly classifiable input  to a very high-dimension feature space so that they can become linearly classifiable.Under an ideal condition (where the post-mapping sample data are completely linearly classifiable), the optimal values of w and b can be obtained by solving the following optimization problem: where y i is the actual class of sample i, and n is the number of sample data.However, there are often some outliers, resulting in the post-mapping sample data not being completely linearly classifiable because of the existence of system noise and measurement errors in practice.At this time, for the sake of classification accuracy, the optimization objective and constraints in Equation ( 4) can be modified by introducing relaxation variables: where ε is a relaxation variable, and C is a penalty parameter that implies the importance of the outliers.It should be noted that increasing C can improve the classification performance of the discriminant surface on the training samples, but it also brings the risk of overfitting and additional training time costs.When we solve the above optimization problem by the Lagrange function and Karush-Kuhn-Tucker (KKT) Conditions [40], it will involve the operation ϕ(x i ) T ϕ x j , which is very difficult to solve since the dimension of ϕ(•) is too high and even infinite.An effective approach is to construct a kernel function: where Equation ( 7) is the Gaussian kernel function employed in this paper, and σ > 0 is the width, which is trained as a hyperparameter together with the penalty parameter C.
Once we obtain the discriminant surface, it is possible to classify the new data according to the following decision rule: if f (x) > 0, it is taken as a positive class; and if f (x) < 0, it is taken as a negative class.

Distributed Feature Selection for Dynamic Security Region
To avoid expensive model training costs caused by a large number of grid features when approximating the boundaries of the extended DSR by SVM, as well as the decrease in decision-making ability caused by wrong feature selection results, this paper proposes a distributed feature selection method based on grid partition and fuzzy-rough sets.The framework is shown in Figure 3.
First, for a given fault and system topology, grid dispatchers perform the Latin hypercubic sampling simulation on a power simulation software, such as BPA (version 5.0) or DSP (version 2.3.11), to construct sample data suitable for the DSR analysis.Each sample should contain information such as active and reactive power injections of nodes, active and reactive power transmissions of branches, and whether the system can return to a stable state after the fault.First, for a given fault and system topology, grid dispatchers perform the Latin hypercubic sampling simulation on a power simulation software, such as BPA (version 5.0) or DSP (version 2.3.11), to construct sample data suitable for the DSR analysis.Each sample should contain information such as active and reactive power injections of nodes, active and reactive power transmissions of branches, and whether the system can return to a stable state after the fault.
At the same time, the power system is abstracted into an undirected weighted network, where the weights of the edges between the nodes are measured by the inverse of the electrical distance.The smaller the electrical distance, the stronger the electrical connection between the nodes, and the larger the corresponding weights of the edges.The Louvain algorithm is employed to partition the above network in a bottom-up manner, and the corresponding grid-partition result is selected based on the available computational resources (i.e., the number of computational units).
The electrical distance can be computed as follows: where  denotes the electrical distance between node  and node , and  ,  , and  are the elements of the corresponding position in the node impedance matrix.
Then, the constructed sample data are divided based on the grid-partition result.The connection between features belonging to different partitions is minimized to a relatively low level so that they can be distributively screened based on the FRST without causing a large impact on the screening accuracy.It should be noted that allocating the above screening process to different computational units can effectively reduce the overall time costs of the proposed method.
Concerning the allocation of features to different computational units, we can first assign a partition to each computational unit.Then, the node-injected power and branchtransmitted power corresponding to the nodes and branches contained in the partition are the features that the computational unit responsible for screening.
Finally, the feature selection results of all computational units are integrated in the coordination unit to complete the overall feature selection of the power system.

Louvain Algorithm
Considering the excessive computation of centralized feature selection when dealing with high-dimensional grid features, it is desirable to allocate the original feature set to multiple computational units for distributed screening.When dividing the feature set, we must ensure a maximum connection of features within the same partition and a minimum connection of features between different partitions.This helps reduce the difference between distributed and centralized feature selection results.Therefore, this paper uses the Louvain algorithm to partition the grid electrically.At the same time, the power system is abstracted into an undirected weighted network, where the weights of the edges between the nodes are measured by the inverse of the electrical distance.The smaller the electrical distance, the stronger the electrical connection between the nodes, and the larger the corresponding weights of the edges.The Louvain algorithm is employed to partition the above network in a bottom-up manner, and the corresponding grid-partition result is selected based on the available computational resources (i.e., the number of computational units).
The electrical distance can be computed as follows: where D ij denotes the electrical distance between node i and node j, and Z ii , Z jj , and Z ij are the elements of the corresponding position in the node impedance matrix.
Then, the constructed sample data are divided based on the grid-partition result.The connection between features belonging to different partitions is minimized to a relatively low level so that they can be distributively screened based on the FRST without causing a large impact on the screening accuracy.It should be noted that allocating the above screening process to different computational units can effectively reduce the overall time costs of the proposed method.
Concerning the allocation of features to different computational units, we can first assign a partition to each computational unit.Then, the node-injected power and branchtransmitted power corresponding to the nodes and branches contained in the partition are the features that the computational unit responsible for screening.
Finally, the feature selection results of all computational units are integrated in the coordination unit to complete the overall feature selection of the power system.

Louvain Algorithm
Considering the excessive computation of centralized feature selection when dealing with high-dimensional grid features, it is desirable to allocate the original feature set to multiple computational units for distributed screening.When dividing the feature set, we must ensure a maximum connection of features within the same partition and a minimum connection of features between different partitions.This helps reduce the difference between distributed and centralized feature selection results.Therefore, this paper uses the Louvain algorithm to partition the grid electrically.
The Louvain algorithm is a heuristic partition method based on modularity optimization proposed by Blondel et al. [41], which can quickly and accurately find modular partitions in a large-scale network.The higher the modularity, the tighter the connections between nodes within a partition and the sparser the connections between nodes in different partitions.The specific partition process is shown in Figure 4.
The Louvain algorithm is a heuristic partition method based on modularity optimization proposed by Blondel et al. [41], which can quickly and accurately find modular partitions in a large-scale network.The higher the modularity, the tighter the connections between nodes within a partition and the sparser the connections between nodes in different partitions.The specific partition process is shown in Figure 4.The algorithm can be divided into two phases in each iteration: modularity optimization and network aggregation.The modularity optimization phase first assigns a different partition to each node of the network and then tries to separate each node from its current partition and reassign it to the partition to which a neighboring node belongs.If the gain of modularity is positive, the corresponding node assignment result is retained; otherwise, the node stays in the original partition.This first phase stops when a local maximum of modularity is attained, i.e., when no node reassignment can improve the modularity.
Based on the partition result in the first phase, the second phase aggregates the nodes within the same partitions to obtain a new network.The weights of the edges between new nodes are given by the sum of the weights of the edges between partitions, and the weights of the self-loops of new nodes are given by the sum of the weights of the edges inside the corresponding partitions.Once the network aggregation phase is completed, it is then possible to reapply the first phase of the algorithm to the resulting undirected weighted network and to iterate until the modularity of the network is unchanged.
The modularity and the gain of modularity can be computed as follows: where  denotes the modularity of the network under the current partition result;  is a set that includes all the current partitions; Δ denotes the gain of modularity resulting from reassigning node  to partition ; Σ and Σ are the sum of the weights of the edges inside partition  and the sum of the weights of the edges connected to the nodes in partition , respectively;  is the sum of the weights of all edges in the network; and  , and  represent the sum of the weights of the edges from node  to other nodes in partition  and the sum of the weights of the edges attached to node , respectively.The algorithm can be divided into two phases in each iteration: modularity optimization and network aggregation.The modularity optimization phase first assigns a different partition to each node of the network and then tries to separate each node from its current partition and reassign it to the partition to which a neighboring node belongs.If the gain of modularity is positive, the corresponding node assignment result is retained; otherwise, the node stays in the original partition.This first phase stops when a local maximum of modularity is attained, i.e., when no node reassignment can improve the modularity.
Based on the partition result in the first phase, the second phase aggregates the nodes within the same partitions to obtain a new network.The weights of the edges between new nodes are given by the sum of the weights of the edges between partitions, and the weights of the self-loops of new nodes are given by the sum of the weights of the edges inside the corresponding partitions.Once the network aggregation phase is completed, it is then possible to reapply the first phase of the algorithm to the resulting undirected weighted network and to iterate until the modularity of the network is unchanged.
The modularity and the gain of modularity can be computed as follows: where Q denotes the modularity of the network under the current partition result; P is a set that includes all the current partitions; ∆Q denotes the gain of modularity resulting from reassigning node i to partition p; Σ p in and Σ p tot are the sum of the weights of the edges inside partition p and the sum of the weights of the edges connected to the nodes in partition p, respectively; m is the sum of the weights of all edges in the network; and k p i,in and k i represent the sum of the weights of the edges from node i to other nodes in partition p and the sum of the weights of the edges attached to node i, respectively.

Fuzzy-Rough Set Theory
The traditional rough set theory (RST) is an effective tool for dealing with uncertain and incomplete data.It is capable of removing redundant and irrelevant features from the data while keeping the decision-making ability of the data intact [42].However, since this theory is developed based on the indistinguishable relation, which is an equivalent relation, it is necessary to discretize data when applying them to power systems with real-valued features [43][44][45].This may cause serious information loss and affect the result of feature selection.Given this problem, Dubois et al. [46] extended the RST to the FRST with the incorporation of the fuzzy theory.
In the FRST, the structure of data can be written as a decision table (U, A ∪ D), where U is a nonempty set that includes all the samples; A is a set of real-valued features that describe the samples; and D is the decision attribute.
Instead of the indistinguishable relation in the RST, the FRST uses a fuzzy similarity relation (FSR) to assess the similarity between samples: where R B (x, y) and R a (x, y) are the FSR between sample x and sample y on feature subset B ⊆ A as well as feature a, respectively, and a(x) is the feature a value of sample x.Suppose there are r different decision classes in the data, then U can be divided into r different sets U/D = {D 1 , D 2 , • • • D r } accordingly.The fuzzy upper and lower approximation of sample x on feature subset B and decision class D i are defined as follows: where D i (y) = {1, y ∈ D i ; 0, y / ∈ D i }.Then, the fuzzy positive region of sample x on feature subset B is defined as follows: The fuzzy positive region is an important measure in the FRST regarding whether there is a change in the decision-making ability of the data before and after feature selection.When the feature subset B satisfies the following conditions, it can be considered as a feature selection result of A: It should be noted that the feature subset that satisfies the above conditions is not unique.The number of features may fluctuate widely from case to case.For the sake of minimizing the number of screened features, this paper employs an incremental search method based on discernable sample pairs.The specific flowchart is shown in Figure 5.It should be noted that although the sequence of selecting different features from the set A − B has an impact on the results, i.e., there is no guarantee that the subset of features obtained by this method is optimal, it can guarantee a relatively optimal result.Discernable sample pairs are the key to take into account the intrinsic connections between features and measure whether the added features can improve the decisionmaking ability of the selected feature set, which can be computed as follows: where ψ(B) and ψ(a) denote the discernable sample pairs of feature subset B and feature a, respectively, and D(x) and D(y) are the decision classes of sample x and sample y, respectively.Discernable sample pairs are the key to take into account the intrinsic connections between features and measure whether the added features can improve the decision-making ability of the selected feature set, which can be computed as follows: = ,  | min  ,  ,  ,  = 0,   ≠ ,  = max   ,   (20) where   and   denote the discernable sample pairs of feature subset  and feature , respectively, and   and   are the decision classes of sample  and sample , respectively.Suppose   is used to represent all possible feature selection results of the data ,  ∪  , then the definition and the computational formula of the core feature set Suppose Red D (A) is used to represent all possible feature selection results of the data (U, A ∪ D), then the definition and the computational formula of the core feature set Core D (A) are, respectively, as follows: The convergence condition of the method is as follows:

Examples
To demonstrate the effectiveness and efficiency of the proposed distributed feature selection method, we selected the IEEE 39-bus system [47] for application in this paper.This system consists of 10 generators, 39 nodes, and 46 branches.In the DSR analysis, 54 power injections of nodes and 184 power transmissions of branches need to be considered.The schematic of the IEEE 39-bus system is shown in Figure 6.
The convergence condition of the method is as follows: ∀ ∈ ,   −  ≠

Examples
To demonstrate the effectiveness and efficiency of the proposed distributed feature selection method, we selected the IEEE 39-bus system [47] for application in this paper.This system consists of 10 generators, 39 nodes, and 46 branches.In the DSR analysis, 54 power injections of nodes and 184 power transmissions of branches need to be considered.The schematic of the IEEE 39-bus system is shown in Figure 6.For a better distribution of the constructed sample data throughout the feature space, we first performed stratified random sampling in the range of 75% to 125% of the original load and power generation level by the Latin hypercubic sampling method.The power flow calculation and transient stability calculation of each sample were carried out using the simulation software DSP of the China Southern Power Grid Electric Power Research Institute (CSG EPRI).In particular, the generator on node 31 was taken as the swing machine in the power flow calculation.A three-phase short circuit with a duration of five waveforms was set to occur at the end of branches 14-15 in the transient stability calculation.Finally, we obtained 435 stabilization sample data and 565 destabilization sample data.
At the same time, we computed the weight matrix of the nodes according to Equation (8) and partitioned the system electrically by the Louvain algorithm, as mentioned in Section 3.1.After partitioning twice, the algorithm converged.The modularity change curve and grid-partition result for each partition are shown in Figure 7.For a better distribution of the constructed sample data throughout the feature space, we first performed stratified random sampling in the range of 75% to 125% of the original load and power generation level by the Latin hypercubic sampling method.The power flow calculation and transient stability calculation of each sample were carried out using the simulation software DSP of the China Southern Power Grid Electric Power Research Institute (CSG EPRI).In particular, the generator on node 31 was taken as the swing machine in the power flow calculation.A three-phase short circuit with a duration of five waveforms was set to occur at the end of branches 14-15 in the transient stability calculation.Finally, we obtained 435 stabilization sample data and 565 destabilization sample data.
At the same time, we computed the weight matrix of the nodes according to Equation (8) and partitioned the system electrically by the Louvain algorithm, as mentioned in Section 3.1.After partitioning twice, the algorithm converged.The modularity change curve and grid-partition result for each partition are shown in Figure 7.
As can be seen, the IEEE-39 bus system can be effectively decomposed into multiple modular partitions with the Louvain algorithm.When the grid-partition result is like Figure 7b, the overall modularity of the system is maximized at 0.1538.
Then, we divided the constructed sample data based on the above grid-partition results and the available computational resources (this example assumes that there are three computational units for convenience of analysis) so that the high-dimensional features were allocated to the respective partitions to which they belonged for distributed screening.The number of features in each partition is shown in Table 1.As can be seen, the IEEE-39 bus system can be effectively decomposed into multiple modular partitions with the Louvain algorithm.When the grid-partition result is like Figure 7b, the overall modularity of the system is maximized at 0.1538.
Then, we divided the constructed sample data based on the above grid-partition results and the available computational resources (this example assumes that there are three computational units for convenience of analysis) so that the high-dimensional features were allocated to the respective partitions to which they belonged for distributed screening.The number of features in each partition is shown in Table 1.In the next step, centralized feature selection and distributed feature selection were carried out based on the FRST, as mentioned in Section 3.2.Meanwhile, to show the impact of the DSR analysis before and after the feature selection, the SVM method mentioned in Section 2.2 was used to determine the DSR boundaries of the IEEE-39 bus system.When training the SVM model, we randomly selected 700 samples as the training samples and the remaining 300 samples as the test samples; we set the logarithm of the penalty parameter C in Equation ( 5) and the width σ in Equation (7) to change in the range [−10, 10] in a step of 0.1, and we employed 5-fold cross-validation to select the value that made the model most accurate.The experimental results are described below.
Figure 8 shows the projection results of the DSR constructed on the original feature set for the 2D power injection space and the 2D branch transmission space.From the results presented in Table 2, it can be seen that the decision-making accuracy (i.e., DSR analysis accuracy) based on the screened feature set exceeds the decision-making accuracy based on the original feature set by 1.34%, both for centralized and distributed feature selections.In addition, compared to the centralized method, the screened feature set obtained by the distributed method has one more power transmission of the branches, but the time consumed is 3.0636 s less, and the performance is improved by 84.16%.It should be noted that decision-making accuracy is an important measure of whether a screened feature set is good or bad.It is related to whether the grid dispatchers can accurately identify the security and stability of the power system after a given fault has occurred.Therefore, decision-making accuracy based on screened feature sets should be the primary criterion to follow when comparing the performance between different feature selection methods.Then, we compared the time cost to show their scalability (ability to be applied to large-scale systems).Furthermore, the centralized method theoretically obtains more accurate results than the distributed method, but this should be expressed as fewer screened-out features, as shown in this paper.Therefore, it is reasonable that the decision-making accuracies of the screened feature sets obtained by the centralized and distributed methods are the same, as shown in Table 2.In the next step, centralized feature selection and distributed feature selection were carried out based on the FRST, as mentioned in Section 3.2.Meanwhile, to show the impact of the DSR analysis before and after the feature selection, the SVM method mentioned in Section 2.2 was used to determine the DSR boundaries of the IEEE-39 bus system.When training the SVM model, we randomly selected 700 samples as the training samples and the remaining 300 samples as the test samples; we set the logarithm of the penalty parameter  in Equation ( 5) and the width  in Equation ( 7) to change in the range −10,10 in a step of 0.1, and we employed 5-fold cross-validation to select the value that made the model most accurate.The experimental results are described below.
Figure 8 shows the projection results of the DSR constructed on the original feature set for the 2D power injection space and the 2D branch transmission space.From the results presented in Table 2, it can be seen that the decision-making accuracy (i.e., DSR analysis accuracy) based on the screened feature set exceeds the decision-making accuracy based on the original feature set by 1.34%, both for centralized and distributed feature selections.In addition, compared to the centralized method, the screened feature set obtained by the distributed method has one more power transmission of the branches, but the time consumed is 3.0636 s less, and the performance is improved by 84.16%.It should be noted that decision-making accuracy is an important measure of whether a screened feature set is good or bad.It is related to whether the grid dispatchers can accurately identify the security and stability of the power system after a given fault has occurred.Therefore, decision-making accuracy based on screened feature sets should be the primary criterion to follow when comparing the performance between different feature selection methods.Then, we compared the time cost to show their scalability (ability to be applied to large-scale systems).Furthermore, the centralized method theoretically obtains more accurate results than the distributed method, but this should be expressed as fewer screened-out features, as shown in this paper.Therefore, it is reasonable that the decisionmaking accuracies of the screened feature sets obtained by the centralized and distributed methods are the same, as shown in Table 2.  Decision-making accuracy can be computed as follows: where  denotes the number of test samples, and  denotes the number of samples with correct decisions in the test samples.Decision-making accuracy can be computed as follows: where n test denotes the number of test samples, and n correct denotes the number of samples with correct decisions in the test samples.This study also selected several other typical filter feature selection methods for a comparative analysis, including the Relief, Fisher Score, and Mutual Information methods.It should be noted that these filter methods only provide an evaluation index of each feature, so we needed to select a certain number (58 just like during the centralized feature selection) of top-ranked features as the results.The specific results are shown in Table 3.As can be seen, although the Relief and Fisher Score methods have lower time costs, one can observe the superiority of the proposed method in terms of decision-making accuracy based on their screened feature sets, which is higher than other methods by 1%, 3%, and 5.64%.The Mutual Information method gives the worst results, not only in terms of the highest time costs but also the lowest decision-making accuracy of 87%.In particular, its extremely high time costs stem from computing the mutual information between features and the decision attribute made by the nearest neighbor technique.
In summary, the proposed distributed feature selection method based on grid partition and fuzzy-rough sets has the following advantages: (1) It can help simplify the power system DSR analysis with high-dimensional features while screening for critical features, and it demonstrates good practicality.(2) Compared to other commonly used filter methods, it gains the highest screening accuracy with acceptable time costs.Furthermore, it should be noted that the more computational resources available, the more groups the highdimensional features can be divided into based on the grid-partition results of the Louvain algorithm, and the more efficient the feature selection becomes.

Conclusions
This paper proposes a distributed feature selection method based on grid partition and fuzzy-rough sets for power system DSR analyses.The method consists of two main parts: partitioning the power grid and dividing the original feature set by the Louvain algorithm, and realizing feature selection based on the FRST.On the one hand, the Louvain algorithm can quickly and accurately find modular partitions in a large-scale grid so that the connection of the features between different partitions is minimized to a relatively low level.At this point, the high-dimensional features can be allocated to multiple computational units for distributed screening without causing a large impact on screening accuracy.On the other hand, compared to other filter methods, the feature selection method based on the FRST can effectively consider the intrinsic connections between features and avoid the redundancy of results.The experiment results show the effectiveness and efficiency of the proposed method.Our future research will focus on exploring a better method to approximate the DSR boundaries and implementing it in real-world engineering applications with the proposed feature selection method.

k i
The sum of the weights of the edges attached to node i R B (x, y) The FSR between sample x and sample y on feature subset B R a (x, y) The FSR between sample x and sample y on feature a BD i (x) The fuzzy upper approximation of sample x on feature subset B and decision class D i BD i (x) The fuzzy lower approximation of sample x on feature subset B and decision class D i D i (y) Determine whether y belongs to decision class D i Pos B (x) The fuzzy positive region of sample x on feature subset B

ψ(B)
The discernable sample pairs of feature subset B

ψ(a)
The discernable sample pairs of feature a D(x) The decision classes of sample x Red D (A) All possible feature selection results of the data (U, A ∪ D) Core D (A) The core feature set Acc(%) The decision-making accuracy n test The number of test samples n correct The number of samples with correct decisions in the test samples

Figure 1 .
Figure 1.The mapping relationship between DSR and ROA. and ′ are the points inside and outside the DSR, respectively;  and ′ are the equilibrium points of the corresponding ROA after a given fault has occurred; and  and ′ are the system states at the instant of fault clearing.

Figure 2 .
Figure 2. The basic principle of SVM.

Figure 1 .
Figure 1.The mapping relationship between DSR and ROA.y and y ′ are the points inside and outside the DSR, respectively; x s and x ′ s are the equilibrium points of the corresponding ROA after a given fault has occurred; and x τ and x ′ τ are the system states at the instant of fault clearing.

Figure 1 .
Figure 1.The mapping relationship between DSR and ROA. and ′ are the points inside and outside the DSR, respectively;  and ′ are the equilibrium points of the corresponding ROA after a given fault has occurred; and  and ′ are the system states at the instant of fault clearing.

Figure 2 .
Figure 2. The basic principle of SVM.Figure 2. The basic principle of SVM.

Figure 2 .
Figure 2. The basic principle of SVM.Figure 2. The basic principle of SVM.

Figure 3 .
Figure 3.The framework of the proposed method.

Figure 3 .
Figure 3.The framework of the proposed method.

Figure 4 .
Figure 4.The basic principle of the Louvain algorithm.The same-colored circles in the figure represent the same partition.

Figure 4 .
Figure 4.The basic principle of the Louvain algorithm.The same-colored circles in the figure represent the same partition.

Figure 5 .
Figure 5. Incremental search method based on discernable sample pairs.

Figure 5 .
Figure 5. Incremental search method based on discernable sample pairs.

Figure 6 .
Figure 6.The schematic of the IEEE 39-bus system.The numbers in the figure represent the node numbers.

Figure 6 .
Figure 6.The schematic of the IEEE 39-bus system.The numbers in the figure represent the node numbers.
(a) The first partition.(b) The second partition.

Figure 7 .
Figure 7.The modularity change curve and grid-partition result for each partition, where (a) is the first partition, and (b) is the second partition.The different colors in the figure represent different partitions.

Figure 7 .
Figure 7.The modularity change curve and grid-partition result for each partition, where (a) is the first partition, and (b) is the second partition.The different colors in the figure represent different partitions.
(a) DSR projected into 2D power injection space.(b) DSR projected into 2D power transmission space.

Figure 8 .
Figure 8.The DSR in a 2D projected space, where (a) is the result of the projection to the active injected power space of load node 15 and generator node 30, and (b) is the result of the projection to the active transmitted power space of sections 2−3 and 26−27.

Figure 8 .
Figure 8.The DSR in a 2D projected space, where (a) is the result of the projection to the active injected power space of load node 15 and generator node 30, and (b) is the result of the projection to the active transmitted power space of sections 2−3 and 26−27.

Table 1 .
The number of features in each partition.

Table 1 .
The number of features in each partition.

Table 2 .
Comparison of DSR analysis accuracy before and after feature selection.

Table 3 .
Comparison of the proposed method and other commonly used filter methods.