A Modern Data-Mining Approach Based on Genetically Optimized Fuzzy Systems for Interpretable and Accurate Smart-Grid Stability Prediction

: The main objective and contribution of this paper was/is the application of our knowledge-based data-mining approach (a fuzzy rule-based classiﬁcation system) characterized by a genetically optimized interpretability-accuracy trade-off (by means of multi-objective evolutionary optimization algorithms) for transparent and accurate prediction of decentral smart grid control (DSGC) stability. In particular, we aim at uncovering the hierarchy of inﬂuence of particular input attributes upon the DSGC stability. Moreover, we also analyze the effect of possible "overlapping" of some input attributes over the other ones from the DSGC-stability perspective. The recently published and available at the UCI Database Repository Electrical Grid Stability Simulated Data Set and its input-aggregate-based concise version were used in our experiments. A comparison with 39 alternative approaches was also performed, demonstrating the advantages of our approach in terms of: (i) interpretable and accurate fuzzy rule-based DSGC-stability prediction and (ii) uncovering the hierarchy of DSGC-system’s attribute signiﬁcance. and M.B.G., J.P. and F.R. authors


Introduction
The stability of electrical grids depends on the balance between electricity generation and electricity demand. In conventional power systems, such a balance is achieved through demand-driven electricity production. Nowadays, however, due to a gradual shift from fossil-based power generation to renewable energy sources, the grid topologies are becoming more decentralized and the flow of power is becoming more bidirectional [1]. That means that consumers may function as both producers and consumers at the same time; they are often referred to as prosumers [2]. The volatile and fluctuating nature of renewable energy sources poses a significant challenge as far as design strategies and control of electric power grids are concerned. In order to balance the supply and demand in such fluctuating power grids, various smart grid approaches have been proposed. A key idea they implement is to regulate the consumers' demand [3], usually referred to as the demand response strategy [4,5].
The changes in the consumption of electricity (in comparison with normal patterns of consumption) by customers in reaction to the changes in the price of electricity are referred to as demand response. There are two general approaches to defining the electricity price and to communicating it to consumers. A conventional approach-using costly information and

Related Work
Various data-mining and machine-learning models and algorithms have been applied to security, stability prediction/monitoring, management, and control of electricity grids and microgrids. An approach using an artificial neural network (ANN for short) for generating security boundaries and their visualization to transmission system operators within a power system in California is presented in [15]. The resulting security boundaries are visualized in the form of the so-called nomograms (the total number of simulations performed is equal to 1792). Extreme learning machines (ELMs for short)-a special class of ANNs-are used in [31] to improve the online learning speed and parameter tuning of the real-time transient-stability assessment model for earlier detection of the risk of blackouts. ANNs are also used to construct new monitoring methods for smart power grids. Such a method and a virtual test evaluating its performance are presented in [32]. A multilayer ANN with four hidden layers trained using a deep reinforcement learning algorithm is applied to a smart grid optimization task in [33]. In turn, a contextual anomaly detection ANN-based approach for cyber-physical security in smart grids is presented in [34]. The simulation experiments show that the contextual anomaly detection performs over 55% better than the point anomaly detection. Support vector machines (SVMs for short) and ANNs supported by some feature selection methods are applied in [17] to the analysis of the transient stability of a large-scale Brazilian power system (the data are generated in 1242 simulation runs). SVMs and random forests (RFs for short) are used in [35] to detect smart grid devices compromised by cyber attacks. The proposed framework in different evaluation scenarios yields high accuracy (91% on average) which confirms its effectiveness at overcoming the compromised smart grid devices problem. Core vector machines (CVMs for short)-faster and big-data-oriented extensions of SVMs-are used in [36] for an online transient stability assessment of a power system by mapping that problem as a two-class classification task. An online approach makes it attractive to be used in real time applications. Genetic-algorithm-based SVMs (GA-SVMs for short) are used (and compared with conventional SVMs and ANNs) in [37] for an online voltage-stability monitoring and prediction. Their effectiveness is demonstrated by applying them to the New England 39-bus system and to the real Indian Northern Region Power Grid system.
Decision trees (DTs for short) are used (and compared with ANNs and SVMs) in [18] for prediction of the transient instability of a large-scale Iranian national grid following the test on a small 9-bus system. DTs are also used in [14] to classify the DSGC stability conditions based on the response of heterogeneous consumers to provide some insight into the relationship between the input space parameters and the grid stability.
As far as other selected data-mining techniques are concerned, the so-called active learning solution is applied in [38] to the voltage-stability prediction problem. The active learning approach interacts with the online prediction and offline training process to enhance the well-known data-mining methods (DTs, ANNs, SVMs, RFs, and radial basis function networks). In turn, modeling a non-linear security boundary by (i) using features formed as monomials of the original input up to a certain level and (ii) using kernel ridge regression to solve the problem of a large number of features is proposed in [16]. The potential of the proposed method is demonstrated by simulating the aforementioned New England 39-bus system and a larger power system with 470 buses. Next, the study [39] presents a cooperative multi-agent approach for solving the complex problems of energy management in a stand-alone solar microgrid using fuzzy logic systems and a reinforcement learning method referred to as fuzzy Q-learning. Moreover, the study [19] applies an optimized data-matching algorithm referred to as transparent open box learning network to the DSGC stability prediction. This study demonstrates the importance of compound feature selection in the considered stability prediction problem. The overwhelming majority of the above-listed studies presents the accuracy-oriented approaches. The transparency is not their priority and they usually do not provide an insight into the considered systems.

Electrical Grid Stability Simulated Data Set and Its Input-Aggregate-Based Version for Stability Prediction of a Four-Node Star System Implementing the DSGC Concept
As already mentioned in the Introduction of this paper, the Electrical Grid Stability Simulated Data Set available since November 2018 at the UCI Database Repository (https://archive.ics.uci.edu/ml) is the first set used in our experiments. This data set is the outcome of a simulation experiment using a two-part mathematical model of a four-node star grid implementing the DSGC concept. The first part of the model describes the physical dynamics of electric power generation and its connection with consumption loads. The second part is an economic structure which binds the electricity price to the grid frequency (see [11,13,14,19] for details). In the simulation experiment, three key input variables of the model were selected (each allowed to vary independently) for each of the four grid participants. These key input variables include: (i) P j , j = 1, 2, 3, 4 (referred to as p1, . . . , p4 in the simulated data set) describing the mechanical power produced (for j = 1) or consumed (for j = 2, 3, 4); (ii) τ j , j = 1, 2, 3, 4 (referred to as tau1, . . . , tau4 in the simulated data set) describing reaction time of each grid participant to an electricity price change; and (iii) γ j , j = 1, 2, 3, 4 (referred to as g1, . . . , g4 in the simulated data set) which is a coefficient proportional to price elasticity for each grid participant. Figure 1 illustrates the structure of the DSGC system considered in the simulation experiment and the feasible solution space values (boundary conditions) for all the previously listed key input variables. Except for P 1 (P 1 = −(P 2 + P 3 + P 4 ) as shown in Figure 1), they all are sampled as uniform distributions throughout their respective feasible spaces to initialize and launch the simulations (10000 simulation runs were performed). Several other input variables of the two-part mathematical model of the DSGC system are kept unchanged during the simulation process. They include: (i) the averaging time T required to measure the price signal (T = 2s), (ii) the coupling strength K proportional to line capacity (K = 8s −2 ), and (iii) the damping constant α (α = 0.1s −1 ). The model's output variable (referred to as stab in the simulated data set), i.e., the stability metric (ranging from −0.0808 to +0.1094), is quantified such that a negative value of that metric indicates that the grid is stable, whereas a positive value indicates that the grid is unstable. In the simulated data set the stab-variable is accompanied by a stabf label of the grid stability-a categorical attribute taking values from a two-element set: "stable" for stab < 0 and "unstable" for stab > 0. The details regarding the simulation experiment are presented in [14].  Figure 1. An illustration of the four-node star DSGC system structure in a simulation experiment.
Therefore, the characteristics of the Electrical Grid Stability Simulated Data Set collecting the simulation experiment results and used in our experiments is the following. It contains 10, 000 records (instances). Each record is characterized by 12 input attributes-tau1, tau2, tau3, tau4, p1, p2, p3, p4, g1, g2, g3, and g4; and two output attributes-stab and stabf, out of which only the stabf labels are used in our experiments (see Table 1 for details). A more concise representation of the above-characterized simulated data set is proposed in [14]. It is based on aggregates-such as minimum, maximum, and average (mean) values-of input attributes across all grid participants. For instance, for the reaction time τ j , j = 1, . . . , 4, the input aggregates are the following: τ min = min j=1,...,4 τ j , τ avg = 1 4 ∑ 4 j=1 τ j , and τ max = max j=1,...,4 τ j . In the modified simulated data set, they are referred to as tau_min, tau_avg, and tau_max; analogously-p_min, p_avg, and p_max for P j and g_min, g_avg, and g_max for γ j , j = 1, . . . , 4 (see Table 2 for details). The modified data set (referred to as the Concise Simulated Data Set) is the second set used in our experiments.

Methodology: Main Components of the Proposed FRBCs Designed from Data Using MOEOAs
In this section we briefly present the main components of our approach to designing FRBCs from data by means of MOEOAs. They are used to perform FRBCs' learning and structure-and-parameter optimization, which also results in the optimization of FRBCs' interpretability-accuracy trade-off (see [20][21][22]28] for details and discussion). The following components are characterized: learning data set, representation of input attributes and class labels, FRBC knowledge base, objectives of the FRBCs' evolutionary optimization, original dedicated genetic operators introduced, and MOEOAs used in our experiments. An FRBC with n input attributes x 1 , x 2 , . . . , x n and an output-a fuzzy set over the set Y = {y 1 , y 2 , . . . , y c } of c class labels-is considered. Our approach can process both numerical and categorical attributes. However, in the DSGC-stability prediction problem, only numerical attributes occur. Hence, only numerical attributes will be considered from now on.
Learning data set L: The construction of the proposed FRBC is based on the data set L which contains K input-output samples: where x nk ) ∈ X = X 1 × X 2 × · · · × X n (× stands for Cartesian product of ordinary sets) represents the collection of input attributes and y (lrn) k represents the corresponding class Representation of input attributes and class labels: is a family of all fuzzy sets defined in the universe X i , i = 1, 2, . . . , n. A i1 represents an S-type fuzzy set (corresponding to linguistic term "Small"), A ia i represents an L-type set (corresponding to linguistic term "Large"), and A i2 , A i3 , . . . , A i,a i −1 represent M-type sets (corresponding to linguistic terms "Medium 1,' "Medium 2,' ... , "Medium a i − 2"). For simplicity, A ik i s denote the corresponding linguistic terms also. Figure 2 shows trapezoidal membership functions for S, M, and L-type fuzzy sets used in our experiments. In turn, each class label y j , j ∈ {1, 2, . . . , c} is represented by a fuzzy singleton B j = B  FRBC's knowledge base contains R genetically optimized fuzzy rules discovered in the learning data set (1). The form of the r-th rule, r = 1, 2, . . . , R, is the following (the overall number of rules R changes as the learning progresses): .
(2) Formula [expression] (condition) in (2) represents conditional inclusion of the [expression]-part into a given rule assuming that the (condition)-part is fulfilled. In turn, sw (r) i denotes a switch-parameter which controls the presence/absence of the i-th input attribute in the r-th rule, i = 1, 2, . . . , n. sw i = 0, the i-th attribute is removed from (not active in) that rule, whereas for sw An FRBC's knowledge base contains two separate modules; i.e., a rule-base-structure module RB and a data-base module DB. We propose a simple, direct, and thus computationally efficient RB representation as follows: In turn, DB contains tunable and non-tunable parts. The tunable part contains parameters of membership functions of fuzzy sets representing particular numerical input attributes. These parameters-subject to tuning during the FRBCs' learning and MOEOA-based optimization-include (see Figure 2): e S and ρ S for S-type fuzzy sets; σ M , d M , e M , and ρ M for M-type fuzzy sets; and σ L and d L for L-type fuzzy sets. The non-tunable part of DB contains the set of class labels Y = {y 1 , y 2 , . . . , y c }.
Objectives of the FRBC's evolutionary optimization: Two separate optimization objectives are considered; i.e., the accuracy and the interpretability of the system. The FRBC's accuracy measure (subject to maximization) is defined as follows [20][21][22]40]: where Q (lrn) , which is the desired fuzzy-singleton response for . As far as the FRBC's interpretability is concerned, we use the notion of interpretability in a broader sense, which includes two essential aspects; i.e., the FRBC's complexity and semantic aspects of the FRBC's operation. We evaluate the FRBC's complexity-related interpretability using the following measure (subject to maximization): where and The FRBC's complexity measure Q CPLX (7) takes its values from interval [0, 1], where 0 represents the minimal complexity and 1 the maximal one. Q CPLX is an average of three sub-measures that evaluate: (i) an average complexity of particular rules Q RI NP (8) (n (r) (8) denotes the number of active input attributes in the r-th rule); (ii) the complexity of the whole system in terms of its active inputs Q I NP (8) (n I NP in (8) denotes the number of active inputs in the whole system); and (iii) the whole-system complexity in terms of its active fuzzy sets Q FS (8) (n FS in (8) denotes the numbers of active fuzzy sets (linguistic terms) in the whole system).
The FRBC's semantics-related interpretability is addressed by us by imposing-as optimization constraints-the so-called strong fuzzy partitions (SFPs) [41] upon domains of particular numerical attributes. SFPs are special class of fuzzy partitions; namely, for any domain value, the sum of the values of all membership functions constituting SFP is equal to 1. It can be shown [41] that SFPs satisfy the desired demands regarding the semantics-related interpretability. Straightforward implementation of SFP requirements for the case of trapezoidal membership functions can be formulated in the following way (see Figure 3 for illustration of the SFP-requirements' implementation for the three-set SFP of x i -domain): and, obviously, Original, dedicated genetic operators introduced: A population of FRBC's knowledge bases (each represented by its RB and DB) is processed during the genetic learning process. We developed original crossover and mutation operators for the transformation of the RB population. The crossover operator processes two individuals (i.e., two RBs) containing R 1 and R 2 fuzzy rules, respectively, by performing one sub-operation randomly selected from the set of five sub-operations referred to as Cro-RB1, Cro-RB2, . . . , Cro-RB5. They are defined as follows: Cro-RB1 (labeled as "exchange of many fuzzy rules") operates in two stages. First, for the r-th rule in both RBs, r = 1, 2, . . . , min(R 1 , R 2 ), a random-switch condition (equivalent to the random selection of 1 from the set {0, 1}) is checked. Then, the r-th rules from both RBs are exchanged, provided that the random-switch condition is fulfilled. Second, each of the remaining rules of the larger RB is moved to the smaller RB, provided that its random-switch condition is fulfilled.
Cro-RB2 (labeled as "exchange of a single fuzzy rule") is similar to Cro-RB1 but the Cro-RB1 activities are carried out unconditionally and only once for randomly selected single rule from the larger RB.
Cro-RB3 (labeled as "exchange of many fuzzy sets in many fuzzy rules") is analogous to the first stage of Cro-RB1. This time, the random-switch condition is run for each input attribute and for the output class label from the corresponding rules coming from both RBs. Fuzzy sets describing a given input attribute or class label in both RBs are exchanged provided that their random-switch conditions are fulfilled.
Cro-RB4 (labeled as "exchange of many fuzzy sets in a single fuzzy rule") performs the activities of Cro-RB3 unconditionally and only once for a randomly selected single rule from the first RB and its counterpart from the second RB.
Cro-RB5 (labeled as "exchange of a single fuzzy set") is a special case of Cro-RB4. The activities of Cro-RB4 are performed unconditionally and only once for a randomly selected input attribute or output class label.
Mut-RB2 (labeled as "fuzzy rule deletion") removes a randomly selected fuzzy rule from the RB. Mut-RB3 (labeled as "change a single fuzzy set") randomly selects one fuzzy rule from the RB and its one i-th input attribute or output class label j. Next, it randomly selects a new value of its switch sw i or class label j.
Mut-RB4 (labeled as "change of an input in a fuzzy rule") randomly selects: (i) one fuzzy rule from the RB, (ii) its one active input attribute (i.e., with sw i 1 > 0), and (iii) its one non-active input attribute (i.e., with sw i 2 = 0). Then, the first attribute is set to be non-active (i.e., sw i 1 = 0) and the second attribute-to be active (i.e., sw i 2 > 0) in that rule.
The crossover and mutation operators for the RB-population transformation are followed by three RB-repairing operators. The first operator removes "empty" fuzzy rules, i.e., the rules with all non-active input attributes (with sw i = 0 for i = 1, 2, . . . , n), whereas the second one removes rule duplicates. In turn, the third operator adds-for each class label that is currently not represented in RB-one fuzzy rule with that class label (in order to preserve the principle "at least one fuzzy rule per class in RB").
DB population transformation is performed by means of separate genetic operators. The crossover operator (processing two individuals; i.e., two DBs), randomly selects one fuzzy set from each DB. New values of d and e-parameters, which characterize membership functions of both fuzzy sets (see Figure 2) are calculated as linear combinations of their old values from both sets; they also must fulfill condition (10). New values of σ and ρ-parameters are calculated from (9) using new values of d and e-parameters. The mutation operator (processing a single individual, i.e., a single DB) randomly selects one fuzzy set from the DB and one of its two parameters d and e (say, d is selected). Its new value where rand(·) returns a random number from the assumed interval and [x i,min , x i,max ] is a range of the domain of the selected set. New values of σ and ρ-parameters are calculated from (9).
MOEOAs used in our experiments: The performances of different MOEOAs are usually evaluated and compared in terms of three aspects [42]. First, the accuracy of generated non-dominated solutions is considered. The accuracy represents the closeness of those solutions to either the Pareto-optimal solutions, or if they are not available, to reference solutions. Second, the spread of the solution set is investigated; i.e., how well these solutions arrive at the extrema of the Pareto-optimal or reference solution sets. The spread is usually represented by the distance between the extreme solutions in the set. Third, the distribution of the solutions in the set, i.e., how evenly distributed they are along the approximation of Pareto-optimal front in the objective space, is considered. A set of solutions which are more accurate and are characterized by a higher spread and a better-balanced distribution outperforms the alternative solution sets.
For the purpose of comparison, two MOEOAs are used in experiments reported in this work; i.e., the well-known SPEA2 method and our generalization of SPEA2, referred to as SPEA3. In comparison with SPEA2, the proposed SPEA3 approach generates sets of non-dominated solutions characterized by higher spread and a more even distribution in the objective space. The essence of the proposed SPEA2's generalization consists of replacing its environmental selection procedure with our original algorithm, which improves both the spread and distribution balance of generated solutions. The environmental selection consists of selecting a representation of the best solutions from all solutions obtained so far and keeping them in an external archive of a fixed size. In SPEA2 all non-dominated solutions from the archive and the current population are copied to the next-generation archive (to fill the archive, the best dominated solutions are copied to it). In the case of overfilling the archive, a truncation procedure is run to reduce the archive size to the predefined level. Thus, only the truncation procedure (if activated) contributes to improving the distribution balance and diversity of the final set of solutions. On the contrary, the proposed environmental selection algorithm implemented in SPEA3 is fully-oriented towards achieving these goals. The archive is iteratively increased by gradual adding of carefully selected non-dominated solutions from the current population (in each subsequent generation, a single solution characterized by possible longest and similar distances to its near neighbors is added). Then, a process of relocation of particular archived solutions aiming at increasing average distances between solutions and their nearest neighbors is carried out. Said relocation is performed by gradual replacement of archived solutions by new solutions selected from the population in such a way as (i) to maximize the distances between extreme solutions (which results in improving the spread of solutions) and (ii) to minimize the distance differences between neighboring solutions (which results in improving the distribution of solutions). Concluding, in our approach, the complementary operations of increasing and reducing the archive aim at obtaining the best available distribution balance and spread of solutions belonging to Pareto-front approximation (see [28,29] for detailed presentation and discussion).

Experiments (Application of Our Approach to DSGC-Stability Prediction) and Discussion
In this section we present the results of two experiments (for both data sets characterized in Section 3 of this paper) regarding the application of our approach to interpretable and accurate DSGC-stability prediction.

Application to Electrical Grid Stability Simulated Data Set
We start with revealing some details of the operation of our method for constructing FRBCs from the considered simulated data set. Figure 4a presents two 10-element-collections of non-dominated solutions (optimized FRBCs) generated in a single run of our FRBCs' design technique employing, independently, our SPEA3 and SPEA2 algorithms. Both experiments of genetic learning have been performed for a learning-test data split with 1:9 ratio. It means that only 10% of the whole original data set (preserving the class proportions) is used as the learning data to construct the system, whereas the remaining 90% of the original data are used for the system's testing. It is worth emphasizing that such a data split poses a significant challenge to any system's design technique. Each collection of solutions of Figure 4a represents the best available approximations of Pareto-optimal solutions generated either by our SPEA3 or SPEA2. Particular solutions from a given collection (front) are characterized by different levels of optimized interpretability-accuracy trade-off. Thus, the user can select a single solution (a specific FRBC) with a desired degree of compromise between the interpretability and accuracy.    shows that our SPEA3-based approach generates the collection of solutions characterized by: (i) a much-better-balanced distribution in the objective space (i.e., the solutions are distributed along the front in a much more even way) and (ii) higher accuracy (i.e., the closeness to Pareto-optimal front). Therefore, our SPEA3-based approach outperforms its SPEA2-based counterpart (see Section 4 of this paper for MOEOAs' performance evaluation criteria). The interpretability and accuracy-related numerical details of all SPEA3-based solutions (FRBCs) from Figure 4a are collected in Figure 4b, in which n I NP/R is the number of input attributes per rule, whereas ACC (lrn) is the percentage of correct decisions in the learning set and ACC (tst) -in the test set (the remaining parameters were defined in Section 4 of the paper). Table 3. Fuzzy rule bases for SPEA3-based solutions (FRBCs) 1-3 from Figure 4.  Tables 3 and 4 (membership functions of fuzzy sets representing input attributes used are also included). Table 4 presents the fuzzy rule base for the SPEA3-based solution (FRBC) 7 from Figure 4 which is characterized by the highest test-data accuracy. In turn, Table 3 reveals an interesting regularity, i.e., the fuzzy rule base of the solution 2 contains three rules from the solution 1. In turn, the fuzzy rule base of the solution 3 contains three rules from the solution 2, etc. Therefore, if higher system accuracy is required, then our approach adds some additional fuzzy rules (or extends the earlier-discovered rules) to provide a more detailed, and thus, more accurate description of the considered prediction problem. Said regularity confirms the internal integrity of our approach. This regularity is also illustrated in Table 5 (see, first, Part A of Table 5) in which each black square denotes the presence of a given input attribute in the fuzzy rule base of a given solution (FRBC). ACC is the test-data accuracy of the system exclusively based on the most significant input attribute. In Part A of Table 5 the most significant attribute is tau1, occurring in the solution 1 and giving ACC (tst) 1 = 68%. Moreover, ∆ACC (tst) j , j = 2, 3, . . . is the accuracy increase following the inclusion of the 2nd, 3rd, . . . most significant attributes into the system. In Part A of Table 5, the inclusion of tau4 (2nd most significant attribute) yields 1.5% increase in the test-data-accuracy and is related to the solution 2. In turn, the inclusion of tau3 (3rd most significant attribute) gives a further 1.1% test-data-accuracy-increase and is related to the solution 3, etc.

No. Fuzzy Classification Rules
In such a way, we can uncover the hierarchy of significance of particular attributes. However, the obtained attribute-significance hierarchy is correct provided that no "overlapping" of some input attributes over the other ones occurs in the simulated data set. In order to verify that, we remove from the original data set the so-far most significant attribute, i.e., tau1, and we repeat, in an analogous way, the learning experiment. Its results are presented in Part B of Table 5, giving tau4 attribute the most significance place. Tau4 occupied second position in the experiment of Part A. Therefore, we conclude that tau4 is not "overlapped" by tau1. In the next step, we remove tau4 from the present data set and repeat the learning experiment-see Part C of Table 5-obtaining tau3 as the most significant attribute at this stage. It occupied second position in experiment of Part B. Therefore, tau3 is not "overlapped" by tau4. We repeat analogous experiments several times; i.e., we remove the most significant-at a given stage-input attribute and repeat the learning process on the reduced data set-see Parts D-H of Table 5. In such a way, we arrive to the real final hierarchy of input attribute significance in the DSGC-stability prediction problem. It is shown in the left part of Table 6 that ACC (tst) 1 has the same meaning as in Table 5; i.e., it is the test-data-accuracy of the system exclusively based on a single attribute listed to the left of ACC (tst) 1 in Table 6. The results of an alternative approach addressing the hierarchy of importance of particular input attributes in the considered simulated data set and proposed in [43] are presented in the right part of Table 6. The following remarks are formulated in [43]: "The time response of the consumers and producer to the price fluctuation play a more important role in the grid stability, as compared to the price elasticity index... Unstable grid conditions generally prevail for higher values of τ and γ. . . The power consumption pattern does not show any difference under stable and unstable grid conditions thus conforming with the analysis of feature selection obtained." (a quote from [43]). The results of our experiments are consistent with all the above comments.  I  I  I  I  I  I  I  I  I  I  tau4  +1.5%  I  I  I  I  I  I  I  I  I   Attribute  significance   tau3  +1.1%  I  I  I  I  I  I  I  I  tau2  +3.9%  I  I  I  I  I  I  I  g3  +2.6%  I  I  I  I  I  I  g4  I  I  I  I  I  I  g2  +4.8%  I  I  I  I  I  g1 +2.6%  I  I  I  I  I  I  I  I  I  I   Attribute  significance   tau3  +0.7%  I  I  I  I  I  I  I  I  I  tau2  +2.0%  I  I  I  I  I  I  I  I  g3  +2.2%  I  I  I  I  I  I  I  g4 +3. 4%  I  I  I  I  I  I  g2 +3.9% +0.2% I p1 I Table 5. Cont.
Attribute Name Table 6. Final hierarchy of attribute significance-a comparison of our approach and an alternative method of [43] (Electrical Grid Stability Simulated Data Set). -p1, p2, p3, and p4 0.001 1) The attributes p1, p2, p3, and p4 do not occur in the fuzzy rule base of our most accurate solution 7 (see Table 4) and have not been tested.

Our Approach
The cross-validation experiment with 1:9 learning-test data split ratio is the next and important part of our work. Each single learning experiment begins with generation of a Pareto-front approximation. Next, a single solution with, first, the highest test-data accuracy, and second, the highest interpretability, is chosen from that front approximation. In turn, the results from all partial experiments are averaged. The experiment is then repeated 10 times for different initializations of our approach. The averaged results are shown in the last row of Table 7, which also collects the results of as many as 39 alternative approaches applied to the considered data set; i.e., the Electrical Grid Stability Simulated Data Set. There are four groups of them. The first one, considered in [44], includes four methods: logistic regression (LR), random forest (RF), gradient boosted trees (GBT), and multilayer perceptron classifier (MPC). Each of them was supported, independently, by three feature selection algorithms: multivariate adaptive regression splines (MARS) algorithm, binary kangaroo mob optimization feature selection (BKMOFS) algorithm, and binary particle swarm optimization feature selection (BPSOFS) algorithm. The second group, considered in [45], includes seven methods: k-nearest neighbors (kNN) algorithm, support vector machine (SVM), radial basis function (RBF) network, decision trees, RF, naive Bayes approach, and quadratic discriminant analysis (QDA). The third group, considered in [46], includes eight methods: fine and bagged-tree algorithms, fine and weighted-kNN, LR, and linear, quadratic, and cubic-SVMs. The last group, considered in [47], includes six methods: the stochastic damped regularized LBFGS algorithm (Sd-REG-LBFGS; LBFGS stands for the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm), stochastic damped regularized LBFGS without regularization (SdLBFGS), robust stochastic approximation (RSA), stochastic approximation averaging (SAA), stochastic gradient difference (SGD), and a method of gradient-based stochastic optimization (Adam [48]). Each of them was used, independently, with logistic regression and Bayesian logistic regression algorithms.
The overwhelming majority of the alternative approaches of Table 7 are the black-box methods focused only on the system's accuracy. Moreover, for methods applied in [44][45][46] only the learning-data-accuracy is available. Therefore, it is not possible to assess their generalizing capabilities (usually measured by test-data-accuracy), which belong to essential performance-evaluation criteria of any system designed from data. Some of the methods of [44] operate on more than 100 features, whereas the remaining approaches typically use 12 attributes; i.e., all the input attributes from the simulated data set. The experiments of [47] are performed for the 4:1 learning-test data split ratio, which significantly favors them in regard to our experiments using only 1:9 learning-test data split ratio. Despite that, our approach provides not only comparable test-data accuracy but also a detailed insight-in the form of fuzzy linguistic rules-into the mechanisms governing the DSGC stability. n/a 87.8% n/a n/a 10 n/a n/a n/a 88% n/a n/a 10 n/a n/a n/a 91.9% n/a -10 - with BKMOFS-based feature selection algorithm -n/a 66.3% n/a -102 ---n/a 83.7% n/a n/a 102 n/a n/a n/a 88.8% n/a n/a 102 n/a n/a n/a 91.5% n/a -102 with BPSOFS-based feature selection algorithm -n/a 64.7% n/a -111 ---n/a 84.3% n/a n/a 111 n/a n/a n/a 87% n/a n/a 111 n/a n/a n/a 93.8% n/a -111 -- [45] (Karim (2019)) kNN n/a n/a 80.4% n/a -12 --SVM n/a n/a 82.2% n/a -12 --RBF n/a n/a 62.7% n/a -12 --Decision Tree n/a n/a 84% n/a n/a 12 n/a n/a RF n/a n/a 88.5% n/a n/a 12 n/a n/a Naive Bayes n/a n/a 83.6% n/a -12 --QDA n/a n/a 82.7% n/a -12 -- [46] (Balali et al. (2020)) Fine Tree n/a n/a 84.2% n/a n/a 12 n/a n/a Bagged Tree n/a n/a 91.5% n/a n/a 12 n/a n/a Fine KNN n/a n/a 81.1% n/a -12 --Weighted KNN n/a n/a 86.6% n/a -12 --LR n/a n/a 81.5% n/a -12 --Linear SVM n/a n/a 81.5% n/a -12 --Quadratic SVM n/a n/a 94.2% n/a -12 --Cubic SVM n/a n/a 96.9% n/a -12 -- As already said in Section 3 of this paper, a concise, input-aggregate-based representation of the Electrical Grid Stability Simulated Data Set was proposed in [14]. According to [14]: "Since the system has symmetries, we hypothesize that a more concise representation of simulation results is feasible based on input aggregates, i.e., features. To create features, we take the minimum, maximum and mean values across all N participants of each input; e.g., minτ j for j = 1, . . . , N." (a quote from [14]).
Following that, we constructed such a set (referred to as the Concise Simulated Data Set) as shown in Section 3. It will be used in experiments reported in this section. Figure 5a presents a 10-element-collection of non-dominated solutions (optimized FRBCs) generated in a single run of our SPEA3 algorithm. Similarly to Section 5.1, the genetic learning experiment has been performed for the learning-test data split with a 1:9 ratio. Figure 5b presents the interpretability and accuracy-related numerical details for solutions of Figure 5a. No.

Objective function complements
Interpretability measures Accuracy measures   Table 8-presenting fuzzy rule bases of exemplary solutions from Figure 5-shows the same regularity as for experiments of Section 5.1. Namely, if the higher accuracy of the system is required, our approach adds additional rules or extends the existing ones to provide a more detailed and accurate model for decision support. Table 9 presents fuzzy rule base of the solution 6 which achieves the highest test-data accuracy (see Figure 5). Transformation of fuzzy classification rules from Table 9 into decision-tree form is shown in Figure 6. It reveals the mechanisms governing the DSGC-stability from the perspective of essential input aggregates such as tau_min, tau_avg, tau_max, g_avg, and g_max.  Figure 6. Transformation of fuzzy classification rules from Table 9 into decision-tree form.

Conclusions
In this paper we address the problem of transparent and accurate prediction of decentral smart grid control stability using our knowledge-based data-mining approach implemented as the fuzzy rule-based classifier. Our approach employs multi-objective evolutionary optimization algorithms to optimize the interpretability-accuracy trade-off of the classification system. The transparency and interpretability (i.e., the ability to provide the user with understandable and compact explanations of generated predictions) and the accuracy (i.e., the ability to generate correct and precise predictions) are important aspects of the operation of decision support systems for smart grid stability prediction. Compact, linguistic, fuzzy classification rules generated by our approach-due to their high readability and easy-to-grasp interpretation-belong to the most effective knowledge-representation structures in the considered domain. Our approach, in a single run, generates a set of non-dominated solutions (a collection of fuzzy classification systems) characterized by different levels of optimized interpretability-accuracy trade-off; the user can select a single solution according to his/her needs. Recently published and available at the UCI Database Repository (https://archive.ics.uci.edu/ml) Electrical Grid Stability Simulated Data Set and its input-aggregate-based concise version referred to as Concise Simulated Data Set are used in our experiments.
The contribution of this paper is twofold. First, by means of broad cross-validation-based experiments, we show that our approach significantly outperforms alternative methods (altogether 39 alternative methods are considered) in terms of the transparency and interpretability of generated predictions while remaining competitive or superior in terms of the accuracy of those predictions. It is worth emphasizing that the overwhelming majority of the existing studies on smart grid stability prediction concentrate on the accuracy-oriented approaches not providing an insight into the prediction mechanisms.
Second, our approach-besides being interpretable and accurate in the considered domain-is also an effective method for uncovering the hierarchy of significance of particular input attributes contributing to the smart grid stability prediction process. In order to uncover the real attribute-significance hierarchy we also analyze the possible "overlapping" of some input attributes over the other ones from the DSGC-stability perspective.
Our further work will concentrate on improving the optimization of the systems' interpretability-accuracy trade-off. It is essential from the point of view of generating highly interpretable and highly accurate modern systems (cf. explainable artificial intelligence [49,50] or interpretable machine learning [51,52] systems) for decision support in various areas of applications including smart grid modeling and control. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.