Voltage Stability Margin Index Estimation Using a Hybrid Kernel Extreme Learning Machine Approach

: This paper presents a novel approach for Voltage Stability Margin (VSM) estimation that combines a Kernel Extreme Learning Machine (KELM) with a Mean-Variance Mapping Optimization (MVMO) algorithm. Since the performance of a KELM depends on a proper parameter selection, the MVMO is used to optimize such task. In the proposed MVMO-KELM model the inputs and output are the magnitudes of voltage phasors and the VSM index, respectively. A Monte Carlo simulation was implemented to build a data base for the training and validation of the model. The data base considers di ﬀ erent operative scenarios for three type of customers (residential commercial and industrial) as well as N-1 contingencies. The proposed MVMO-KELM model was validated with the IEEE 39 bus power system comparing its performance with a support vector machine (SVM) and an Artiﬁcial Neural Network (ANN) approach. Results evidenced a better performance of the proposed MVMO-KELM model when compared to such techniques. Furthermore, the higher robustness of the MVMO-KELM was also evidenced when considering noise in the input data.


Introduction
Currently, the competitive tendency of deregulated electricity markets, along with limitations in network expansion planning due to several factors that include environmental constraints and investment delays have caused electric power systems to often perform very close to their operative and stability limits. One of these relates to voltage stability, which is violated when the power system no longer has the capacity to maintain stable voltages in all or some of its buses after a disturbance has taken place. Furthermore, voltage instability has been recognized as one of the main problems in power systems around the world [1][2][3].
The ever-growing integration of intermittent renewable energies and the development of smart grids have increased the complexity of power systems operation and planning. In this context, conventional methods of voltage stability based on offline studies are not up to the challenges that are facing current power systems [4]; therefore, system operators must adopt new methods to monitor and evaluate voltage stability near real time. This necessity has motivated the development of robust and sophisticated tools for the supervision and evaluation of power systems. Monitoring of these systems is essential to guarantee a safe operation, and it must be carried out in real time to reach a high accuracy [5][6][7].
For near real time stability assessment, estimation time must be short and calculation efforts minimum. Stability performance indicators have traditionally been used to establish the distance between the current operating point and a point at the voltage instability boundary. The main characteristic of these indices is that they can be estimated when the operating conditions change. In addition, these indices must be predictable and quickly calculable [8,9]. The problem with most stability indices is that their values vary in a highly nonlinear and discrete way, due to the non-linear characteristics of the system and its operating limits [8][9][10][11]. Other indices require a highly intensive calculation to determine their value for changing system conditions. Another important aspect to reach near real time assessment is to improve measurement of key power system variables; this has been recently achieved with Phasor Measurement Units (PMUs) that allow observing fast dynamic behaviors due to their high sampling rate.
In the last decades, the use of Artificial Intelligence (AI) tools has arisen as a solution for the evaluation of voltage stability near real time. The stability evaluation using AI drastically reduces the calculation time, eliminating the need of computing the nonlinear equations of the system models. AI techniques capture the relationships among the different states of the system and system stability information, extracting knowledge from these data and determining the corresponding stability status [4].
Voltage Stability evaluation using AI tools requires building a learning database in offline mode with data of many operational conditions. This database must contain the electrical variables that allow determining the actual condition of the power system in near real time. When the AI is trained, new information is fed to the AI; being the output a voltage stability indicator [12,13]. The following AI techniques using voltage stability assessment are reported in the scientific literature: Artificial Neural Network (ANN), Support Vector Machine (SVM) and Extreme Learning Machine (ELM).
ANNs have been widely used to estimate voltage stability indicators in both long and short term voltage stability assessment, they derive their computer power through a parallel-distributed structure that allows them to learn and generalize. One of the main features that an ANN provides for power system applications is their ability to deal with complex non-linear mapping through a set of input/output data. In the case of voltage stability studies the ANN identifies the existing relationships between an input set as measurable parameters and the output that is the stability indicator [14][15][16]. System parameters such as nodal voltage magnitudes and angles are used as the ANN input variables. These input variables present lower errors in the voltage stability indicator estimation [17]. Several ANN approaches based on backpropagation [18,19], Kohonen [20], Radial Basis Function (RBF) [21] and ANN with reduced input features have been reported in the specialized literature [22,23]. A large number of operating conditions has to be generated for the training and testing when ANNs are used. Thus, the power system is simulated with distinct operating conditions, including sometimes contingencies. These simulated conditions are used to obtain other system parameters such as branch currents, line flows, real and reactive power and different voltage stability indicators such as line voltage stability indices, bus voltage stability indices and Voltage Stability Margin (VSM) or Loading Margin (LM) [24][25][26]. Although ANNs have been extensively used by researchers, they exhibit some shortcomings, mainly related to the long time required in their training process, and the fact that its learning is highly dependent on the number of training data.
SVM is a machine learning technique based on statistical learning theory and structural risk minimization. In recent years, support vector regression (SVR), the regression version of SVM has been applied to voltage stability monitoring [27,28]. The proposed SVR method allows estimating the loading margin of the system in normal operation conditions and considering different demand growing scenarios. In [27] the contingencies that affect the LM estimation of the system are not considered; also the input vectors for the SVR, such as the phasor voltage of the system nodes that are delivered by the PMUs are not considered. On the other hand, in [28] the Power Transfer Stability Index (PTSI) is considered as the output stability index that requires impedance information and Thevenin voltage. In this case, dynamic voltage collapse prediction is determined based on the PTSI. The data collected from a dynamic simulation is used as input to the SVR which is used as a predictor to determine the dynamic voltage collapse indices of the power system. In [29] a SVR is proposed for online voltage stability evaluation with a reduced set of input variables. The vector dimension is reduced using Principal Component Analysis (PCA) combined with a feature selection technique called Mutual Information (MI). The VSM is used to complete the voltage stability evaluation. In this case, only the most severe contingencies are considered in the simulation scenarios, impacting negatively the VSM of the system. Voltage phasor system nodes are not considered and the selected input variables are the active and reactive powers of load nodes.
An inconvenience that arises when working with SVR is the accurate selection of its internal parameters in relation to any information of the problem. This allows ensuring a high accuracy and good performance of the AI. An inadequate selection of the SVR parameters leads to overfitting or under fitting. In [30], a metaheuristic known as Artificial Immune Least Square is used for obtaining the optimum parameters of the SVR, that executes the voltage stability index estimation. The index named as Voltage Stability Condition Indicator (VSCI) is used as a tool to evaluate the stability at load nodes, and the SVR is used to estimate the VSCI. For its estimation, both active and reactive powers are needed; however, the measures of the voltage phasor are not considered. Other metaheuristic optimization techniques used for the selection of the SVR parameters include Genetic Algorithms (GA) [31], Particle Swarm Optimization (PSO) [32] and Dragonfly Optimization algorithm (DFO) [33].
In [31] a GA is used to improve the accuracy and performance of the SVR, that completes the stability index estimation. Complex voltages of buses that are given by the PMUs are used as inputs and the output corresponds to the VSM. In the generated scenarios, the occurrence of contingences and their impact in the voltage stability are not taken into account. In [32], a PSO determines the setting of the SVM parameters. The proposed alternative for the stability evaluation uses as input vector for the SVR the voltage angles and the reactive power load; while the VSM index is used as an output vector. Similarly, in [33] a voltage stability evaluation technique based on swarm intelligences is proposed, as well as a DFO for the setting of the SVR parameters, aiming at completing the evaluation of the online voltage stability. In this case, the PMUs voltage magnitudes are used as an input vector for the hybrid DFO-SVR, and the output vector corresponds to the minimum values of the voltage stability index.
Extreme Learning Machine (ELM) is a new machine learning tool that is being applied for online stability evaluation. ELM has shown better performance and lower training times than ANN and SVM facing regression problems [34]. In [35][36][37], ELM is used as a tool to evaluate the long-term online voltage stability, considering different electric parameters as input vectors. These include voltage magnitudes and angles, power flows, as well as active and reactive power injections. In [38], an assembled ELM is proposed to estimate the voltage stability of a power system. The proposed method improves the performance of the VSM estimation, but increases the training times for the ELM set.
In this paper, a long-term online voltage stability evaluation is proposed through a VSM index using a novel AI technique called Kernel Extreme Learning Machine (KELM). KELM is a combination of ELM with an AI Kernel type, improving the performance and keeping a short training time.
To guarantee a good performance of the KELM, the Mean-Variance Mapping Optimization algorithm (MVMO) is proposed for the selection of the internal KELM parameters. A comparison of the performance of the hybrid MVMO-KELM model with SVM and an ANN is presented, and an analysis of cases with the 39-node IEEE testing system is performed. Results evidenced the robustness and effectiveness of the proposed MVMO-KELM model.

Voltage Stability Assessment
Voltage stability is concerned with the ability of the power system to maintain acceptable voltages both under normal conditions and under disturbances [7]. Voltage stability assessment deals with finding the voltage levels of the system under different loading conditions to ascertain the stability limit. Voltage stability is analyzed by employing two practices, namely time-domain (dynamic) simulation and steady state analysis. Depending on the stability phenomenon or phenomena under investigation, one or both of these techniques may be applied.
In the long-term voltage stability, the system dynamics are usually slow. So, many aspects of the problem can be effectively analyzed by using static methods, which allow examinations of a wide range of system conditions and provide insight into the nature of the problem and identifies the key contributing factors. For static voltage stability analysis, the loading of the system is slowly increased (in certain direction) to the point of voltage instability frontier.
Several indexes have been proposed to evaluate long-term voltage stability in power systems [8][9][10][11]. The general idea of a voltage stability index is to define a scalar magnitude that determines the distance from the current operation point to the voltage instability limit. It is important that voltage stability indexes are sensitive to loading changes and present a predictable behavior in relation to its own increases. The latter allows extrapolating the additional power that might be supplied by the power system before getting into the instability voltage point. It is also relevant that the stability indexes can be monitored when the system parameters change, and that their calculation is fast enough so that online system supervision is feasible.
Among the most used and accepted indexes for long-term voltage stability assessment, there are the deviation indexes category. These indexes require an increment of the system demand; from its current state, the load is increased following a previously defined pattern to reach the instability point. In general, theses indexes provide a distance to the instability point in MW/MVAR. An example is the LM which is the most basic and widely accepted to estimate the distance to the instability frontier. The LM is defined as the distance in terms of power (loading parameter, λ) between a current operation point (base case, λ0) to the instability voltage point (see Figure 1). This indicator takes into account the nonlinear behavior that occurs between the base case and the instability boundary. contributing factors. For static voltage stability analysis, the loading of the system is slowly increased (in certain direction) to the point of voltage instability frontier. Several indexes have been proposed to evaluate long-term voltage stability in power systems [8][9][10][11]. The general idea of a voltage stability index is to define a scalar magnitude that determines the distance from the current operation point to the voltage instability limit. It is important that voltage stability indexes are sensitive to loading changes and present a predictable behavior in relation to its own increases. The latter allows extrapolating the additional power that might be supplied by the power system before getting into the instability voltage point. It is also relevant that the stability indexes can be monitored when the system parameters change, and that their calculation is fast enough so that online system supervision is feasible.
Among the most used and accepted indexes for long-term voltage stability assessment, there are the deviation indexes category. These indexes require an increment of the system demand; from its current state, the load is increased following a previously defined pattern to reach the instability point. In general, theses indexes provide a distance to the instability point in MW/MVAR. An example is the LM which is the most basic and widely accepted to estimate the distance to the instability frontier. The LM is defined as the distance in terms of power (loading parameter, λ) between a current operation point (base case, λ0) to the instability voltage point (see Figure 1). This indicator takes into account the nonlinear behavior that occurs between the base case and the instability boundary. The advantages of LM as a voltage stability index are: a) The static model of the power system is required, it can be used with dynamic systems, but it does not depend on dynamic details; b) it is an accurate index that takes into account the nonlinear characteristics of the system and its limits, such as the reactive power limits that increase as the demand increases; c) limits do not directly reflect sudden changes on the LM; and d) the LM considers the demand increase patterns, that is, increasing load directions in the parametric demand space [8].
Among the disadvantages of the LM as a voltage stability index, the following should be considered: a) Evaluations in operating conditions away from the current operation point are required, that is why the computational demands are higher than in the calculations of indexes that are based on using only the information of the current operation point. This computational cost would be the most serious disadvantage of the LM. b) A demand increase direction is required; nevertheless, this information is usually easily acquired.
Some specific methods to increase the speed and reliability of the calculations of the voltage stability limits have been elaborated. These are the Continuation Power Flow (CPF) method [39], and the collapse point method [8]. The advantages of LM as a voltage stability index are: (a) The static model of the power system is required, it can be used with dynamic systems, but it does not depend on dynamic details; (b) it is an accurate index that takes into account the nonlinear characteristics of the system and its limits, such as the reactive power limits that increase as the demand increases; (c) limits do not directly reflect sudden changes on the LM; and (d) the LM considers the demand increase patterns, that is, increasing load directions in the parametric demand space [8].
Among the disadvantages of the LM as a voltage stability index, the following should be considered: (a) Evaluations in operating conditions away from the current operation point are required, that is why the computational demands are higher than in the calculations of indexes that are based on using only the information of the current operation point. This computational cost would be the most serious disadvantage of the LM. (b) A demand increase direction is required; nevertheless, this information is usually easily acquired.
Some specific methods to increase the speed and reliability of the calculations of the voltage stability limits have been elaborated. These are the Continuation Power Flow (CPF) method [39], and the collapse point method [8].

Artificial Intelligence (AI) Systems
The advantages that make AI a promising alternative for the estimation of a voltage stability margin index include [4]: (a) Fastness: using a trained AI, it is possible to obtain a value of the stability system margin in the current operation point. The AI determines the margin in a fraction of a second after receiving the input data, which allows an appropriate response to prevent instability. (b) Knowledge extraction: the AI can extract stability system information; this provides an understanding of the system operation. (c) Less data capacity: for the conventional evaluation methods, accurate information is required as well as a complete description of the system. In contrast, the AI evaluates the stability only with the available and significant parameters. (d) Generalization capacity: the AI simultaneously handles a wide spectrum of scenarios or system conditions in the stability evaluation; these conditions can be previously assumed and not foreseen.
In the following section, a brief description of the AI techniques that have been used in the evaluation of voltage stability is presented.

Artificial Neural Network (ANN)
An ANN is an AI technique composed of a great number of processing elements highly interconnected (artificial neurons) working at the same time to solve a specific problem. The artificial neuron is a processing information unity that finds nonlinear relations among sets of data. They are called ANN because they learn by experience; this is given by system training with experimental data so the network acquires the knowledge related to the problem being studied. An ANN is composed of many simple elements that work in parallel. The network design is mostly determined by the connections among its elements [40]. A neural system imitates the brain in two aspects; (a) knowledge is acquired by the system through a learning process, (b) connections among the neurons, and known as synaptic weighs, are used to store knowledge.

SVM
SVM is a type of machine learning technique used in solving problems of classification, regression, and pattern recognition [41]. SVM is a promising method for solving problems of linear and nonlinear classification. A SVM is an algorithm that uses a nonlinear mapping to transform original training data to a larger dimension. In this new dimension the SVM searches the hyperplane of optimal linear separation (decision limit) splitting one type from another. With an appropriate nonlinear mapping to a highly enough dimension, data of both types are always possible to be separated by a hyperplane. The SVM finds this hyperplane using support vectors and margins (defined by the support vectors); this is achieved through solving an optimization problem [42]. SVM belongs to a set of algorithms named Kernel-based methods and the structural risk minimization is used as the optimization principle [43]; also, SVM has statistical robustness and ability to overcome over adjustment problems [44].
SVR is an extension of the SVM with a good generalization capacity created to solve time series prediction problems and function approximation (regression). The function approximation consists on determining the relation between the inputs and outputs using data pairs corresponding to inputs and outputs (x i , z i ) where i = 1, . . . , l; x i is the input vector row i of n dimension, z i is the output scalar Energies 2020, 13, 857 6 of 19 row i, and l is the number of training data [45,46]. The SVR assigns the input space to a space of multidimensional characteristics to determine an optimal hyperplane defined as: where w is a pondered n-dimensional vector, φ(x) corresponds to the mapping functions from space x to the feature space, and b is the threshold term [46]. This work uses the support vector machine for regression problems called ε-SVR with a radial base function (RBF) as kernel. For a set of training points (x i , z i ) where x i ∈ R n is a feature vector and z i ∈ R 1 is an objective output vector, with given parameters C > 0 and ε > 0; the standard form of ε-SVR is given by (2)-(5) [47]: where C is the margin parameter that determines compensation between the margin magnitude and the estimation error of the training data, ξ i and ξ * i are slack variables associated to x i [45]. The mapping function φ(x) is called kernel function, and the RBF is used for the SVR like kernel that is defined as: where γ is the kernel parameter, this must be determinate. On the other hand, a small positive parameter ε is used to reduce residual errors in the regression, that is why an interval linear function is proposed rather than a squared error function [45]. In (3) and (4) the parameter ε > 0 corresponds to the radio of a zone that is named the ε insensibility zone or tube; the ideal estimation is achieved when the training data are in this zone. Equations (3) to (5) allow the existence of data outside the tube, hanging non-negative variables ξ i and ξ * i are used. A more detailed explanation regarding the ε-SVR can be consulted in [45]. The C and ε parameters in (2) and γ in (6) must be defined before the SVM, other parameters are determined in the optimization process defined in (2) to (5).

KELM
One of the inconveniences of ANNs relates to their low training speed, which has been the main obstacle for applications in which speed is of paramount importance. This is because most ANN training algorithms are based on descending gradient methods and all network parameters are synchronized iteratively in the training.
A new learning algorithm called Extreme Learning Machine (ELM) was proposed by Huang et al. in 2006, which is used both in classification and regression problems. The main advantage of ELMs is the absence of iterative adjustment in the hidden layer of the network; therefore, its training time is lower than those of traditional ANNs [48].
For ELMs, it is proved that the input weighs and the hidden layer threshold of a Single Layer Feedforward Network (SLFN) can be randomly assigned. If these activation functions in the hidden layer are infinitively differentiable, the SLFN can simply be considered as a linear system and the output weighs (hidden layer connections to the output layer) of the SLFN can be analytically determined through a simple opposite generalized operation of the output matrices of the hidden layer output [34].
The ELM training algorithm follows the following steps: (1) randomly assign hidden node parameters: input weights w i and threshold b i ; (2) calculate the hidden layer output matrix H, and (3) obtain an output weight vector β. Given a set of N input vectors samples x j , t j , and standard SLFNs with n hidden nodes whose weights are randomly initialized, and activation functions g(x) are modeled as: where w i is the weight vector connecting the ith hidden node and input nodes; β i is the weight vector connecting the ith hidden nodes and the output nodes, b i is threshold of ith hidden node while t j and x j are jth target vector and input vectors, respectively. Equation (7) can be expressed in a matrix form as indicated in (8): where H is the output matrix (N × n) of the hidden layer with respect to the input x j , β is the output weight matrix (n × m), T is the target matrix (n × m) and m is the number of distinct arbitrary targets. ELM training consists of a minimum norm least-squares solution of a general linear system, obtaining H T is the Moore-Penrose generalized inverse [48]. The final expression for Equation (8) is: According to the ridge regression theory a positive value 1/λ is added to the diagonal of HH T [34,49]. The corresponding output function of ELM is given by (10), where h(x) is the hidden layer feature mapping function.
In the KELM algorithm, the hidden layer feature mapping function h(x) is known, but can define a kernel function K(u, v) to find the value of the output function. The kernel function directly adopts the form of an inner product. Additionally, it is not necessary to set the number of hidden layer nodes when solving the output function, so that the initial weight and threshold of the hidden layer do not need to be set [50,51]. A kernel function is introduced in KELM to obtain better regression accuracy, as: where Ω ELM is the kernel function matrix, K(u, v) is the kernel function, as a kernel, the Gaussian function is usually chosen (see Equation (6)); finally, N is the input layer dimension.

Proposed Methodology
A long-term online voltage stability evaluation is proposed in this paper through the VSM index using KELM. The steps of the methodology are detailed in this section.

Building of AI Learning Database Using Monte Carlo Method
A Monte Carlo (MC) simulation is proposed to build different pre and post contingency operative conditions close to real life operation. The operative conditions consist of different scenarios or hourly demand curves (with a 24-h horizon) that are transferred to the nodal demands of the PQ buses. The scenarios are built from the demand of the base case, taken from the original information of the system. Then, the demand is changed in every node using hourly demand according to every type of user as indicated in Figure 2 (residential, industrial or commercial) [43]. Every PQ node is randomly assigned a type of user, guaranteeing the same number of PQ users for each type of user. The uncertainty in the demand is considered using a probability density function (PDF) of a normal distribution. Every outage (N-1contingency) is considered as an independent and randomly generated event. The selection of contingencies depends on the probability of forced outage of the system components. The probability of a contingency can be estimated using the Poisson distribution with a constant occurrence rate [52]. Using the Poisson distribution formula, the probability of occurrence of a N-1 contingency in a given period of time is given by the following equation: Where: 0 is the average occurrence rate of a contingency related to an element of the system, and is the duration considered; in this case, it lasts one hour.
Equation (13) is applied to all components that are considered in the eligible set of N-1 contingencies. In this case, contingencies caused by the output of transmission lines, transformers and generation units are considered. For probabilistic contingency selection using the MC Method, it is assumed that each component of the system has two operating states: normal and failure status. In the MC simulation, a uniformly distributed random number, , is generated for each component or group of components, the Markovian model is handled as follows [53]: Where is the operating state of the ith component of the system, is a random number distributed for the ith component. For the elements in a failure state, the contingency analysis is performed, which corresponds to executing a power flow to determine the post-contingency conditions of the system. The proposed MC process flowchart to build the database for the AI training stage is presented in Figure 3. Every outage (N-1contingency) is considered as an independent and randomly generated event. The selection of contingencies depends on the probability of forced outage of the system components. The probability of a contingency can be estimated using the Poisson distribution with a constant occurrence rate [52]. Using the Poisson distribution formula, the probability of occurrence of a N-1 contingency in a given period of time is given by the following equation: where: λ 0 is the average occurrence rate of a contingency related to an element of the system, and t is the duration considered; in this case, it lasts one hour. Equation (13) is applied to all components that are considered in the eligible set of N-1 contingencies. In this case, contingencies caused by the output of transmission lines, transformers and generation units are considered. For probabilistic contingency selection using the MC Method, it is assumed that each component of the system has two operating states: normal and failure status. In the MC simulation, a uniformly distributed random number, R, is generated for each component or group of components, the Markovian model is handled as follows [53]: where I i is the operating state of the ith component of the system, R i is a random number distributed for the ith component. For the elements in a failure state, the contingency analysis is performed, which corresponds to executing a power flow to determine the post-contingency conditions of the system. The proposed MC process flowchart to build the database for the AI training stage is presented in Figure 3. The MC method starts with the generation of a random sample from the PDFs of the input variables considered in the process, for example, the demand in each node, and the random selection of the contingency in an element. Then, an Optimal Power Flow (OPF) is executed for each sample of the input variables in order to define a stable pre-contingency state scenario. A static N-1 contingency analysis and a Continuation Power Flow (CPF) are run to determine the VSM using the UWPFLOW Energies 2020, 13, 857 9 of 19 program [54]. Finally, pre and post contingency power flow outcomes and the VSM corresponding to each evaluation carried out in the MC process are obtained.  The MC method starts with the generation of a random sample from the PDFs of the input variables considered in the process, for example, the demand in each node, and the random selection of the contingency in an element. Then, an Optimal Power Flow (OPF) is executed for each sample of the input variables in order to define a stable pre-contingency state scenario. A static N-1 contingency analysis and a Continuation Power Flow (CPF) are run to determine the VSM using the UWPFLOW program [54]. Finally, pre and post contingency power flow outcomes and the VSM corresponding to each evaluation carried out in the MC process are obtained.
With the CPF tool, the VSM is calculated as global index. In this case, that corresponds to the maximum point of transfer in the PV curve [55]. The distance from the system operating point to the voltage instability frontier depends on the load increase trajectory and the initial operating point of the system. In addition, the occurrence of N-1 contingencies affects the stability frontier, that is, the distance from the operation point to the instability frontier or VSM is reduced. In the MC process, different load increase directions are considered. This is achieved by varying the operating points and retaining a constant power factor for that operating point when the demand is increased. Figure 4 shows different operating points as a function of the real and reactive power demand, and how load increase directions and contingencies affect the VSM. With the CPF tool, the VSM is calculated as global index. In this case, that corresponds to the maximum point of transfer in the PV curve [55]. The distance from the system operating point to the voltage instability frontier depends on the load increase trajectory and the initial operating point of the system. In addition, the occurrence of N-1 contingencies affects the stability frontier, that is, the distance from the operation point to the instability frontier or VSM is reduced. In the MC process, different load increase directions are considered. This is achieved by varying the operating points and retaining a constant power factor for that operating point when the demand is increased. Figure 4 shows different operating points as a function of the real and reactive power demand, and how load increase directions and contingencies affect the VSM.

AI Training
The database for AI training and validation is formed by two matrixes X N×n which are composed of the features and Y N×1 target values. The feature matrix is formed by N observations that correspond to the voltage phasor magnitudes of the n buses of the test power system; and target values that correspond to the VSM of the N different system conditions that are represented by voltage phasor magnitudes, that is, N snapshots of the power system dynamics.
The AI training stage is done in an offline mode, in which the common strategy is to separate a set of training data and other set of testing data (unknown data). These unknown data are used to evaluate the estimation accuracy (validation process). The objective of the training stage is to build an AI model (SVM and KELM) based on the immerse information of the training set [42].
Before executing the training of the type kernel learning machine techniques, the optimal parameters of the AI must be found in order to improve the adaptation of the training data and the accuracy of the AI response facing unknown data [46].

AI training
The database for AI training and validation is formed by two matrixes X × which are composed of the features and Y × target values. The feature matrix is formed by observations that correspond to the voltage phasor magnitudes of the buses of the test power system; and target values that correspond to the VSM of the different system conditions that are represented by voltage phasor magnitudes, that is, snapshots of the power system dynamics. The AI training stage is done in an offline mode, in which the common strategy is to separate a set of training data and other set of testing data (unknown data). These unknown data are used to evaluate the estimation accuracy (validation process). The objective of the training stage is to build an AI model (SVM and KELM) based on the immerse information of the training set [42].
Before executing the training of the type kernel learning machine techniques, the optimal parameters of the AI must be found in order to improve the adaptation of the training data and the accuracy of the AI response facing unknown data [46].
Generally, SVR with kernel type RBF has three parameters to optimize: , and . For these SVR parameters selection, two methods have been combined : i) a -folks cross-validation based on repetitions [42,46], that allows preventing an overfitting problem, and ii) the grid search method, that allows obtaining the AI parameters.
In the -folk cross-validation, the data set is randomly divided into subsets, which are mutually exclusive and have the same size. Both training and testing are performed times, where a testing subset is kept and the remaining − 1 subsets are the training data. According to [42,46] it is recommended is to use the grid search algorithm using cross-validation, where several sets of ( , , ) are tested and those with the lowest error prediction in the cross-validation are chosen. The grid search algorithm is a procedure of high computational burden that takes more calculation time according to the number of pairs considered; furthermore, this procedure does not guarantee getting the optimal AI parameters.

Determination of AI optimal parameters.
As a solution to the optimal parameter setting, different metaheuristic optimization techniques have been proposed, such as Particle Swarm Optimization [56], Genetic Algorithms [31] and Mean-Variance Mapping Optimization (MVMO), where an objective function (OF) is minimized and corresponded to the Mean Squared Error (MSE) [43]. MVMO has successfully been applied for the Generally, SVR with kernel type RBF has three parameters to optimize: C, γ and ε. For these SVR parameters selection, two methods have been combined: (i) a k-folks cross-validation based on k repetitions [42,46], that allows preventing an overfitting problem, and (ii) the grid search method, that allows obtaining the AI parameters.
In the k-folk cross-validation, the data set is randomly divided into k subsets, which are mutually exclusive and have the same size. Both training and testing are performed k times, where a testing subset is kept and the remaining k − 1 subsets are the training data. According to [42,46] it is recommended is to use the grid search algorithm using cross-validation, where several sets of (C, γ, ε) are tested and those with the lowest error prediction in the cross-validation are chosen. The grid search algorithm is a procedure of high computational burden that takes more calculation time according to the number of pairs considered; furthermore, this procedure does not guarantee getting the optimal AI parameters.

Determination of AI Optimal Parameters
As a solution to the optimal parameter setting, different metaheuristic optimization techniques have been proposed, such as Particle Swarm Optimization [56], Genetic Algorithms [31] and Mean-Variance Mapping Optimization (MVMO), where an objective function (OF) is minimized and corresponded to the Mean Squared Error (MSE) [43]. MVMO has successfully been applied for the solution of different power system optimization problems. Furthermore, Numerical comparisons between MVMO and evolutionary algorithms have shown that MVMO exhibits a better performance, especially in terms of convergence speed [57][58][59]. In this case, the MVMO tool is used and the OF is defined as: Subject to: x min ≤ x ≤ x max where t is the desired output vector, y is the AI prediction vector, |·| corresponds to the Euclidian distance and n corresponds to the y vector dimension. The sub-index i refers to the ith-iteration of the k-folks cross-validation process, and the vector x comprises the AI parameters that solve the problem. The flowchart of the proposed methodology for the identification of the AI parameters using MVMO is presented in Figure 5. First, the AI training intelligence is organized; then, each AI parameter is initialized; the MVMO proposes the parameters to optimize, executes the optimization process and delivers the optimum parameters found, with which the MSE is calculated. For each k-folk cross-validation run, the MVMO minimizes the prediction error (PEi), being i the ith run for the validation run. At the end of the process, the MVMO delivers the optimum parameters of the AI that minimize the total error of all the runs executed in the cross validation. defined as: Subject to: Where is the desired output vector, is the AI prediction vector, |•| corresponds to the Euclidian distance and corresponds to the vector dimension. The sub-index refers to the ithiteration of the -folks cross-validation process, and the vector comprises the AI parameters that solve the problem.
The flowchart of the proposed methodology for the identification of the AI parameters using MVMO is presented in Figure 5. First, the AI training intelligence is organized; then, each AI parameter is initialized; the MVMO proposes the parameters to optimize, executes the optimization process and delivers the optimum parameters found, with which the MSE is calculated. For eachfolk cross-validation run, the MVMO minimizes the prediction error (PEi), being i the ith run for the validation run. At the end of the process, the MVMO delivers the optimum parameters of the AI that minimize the total error of all the runs executed in the cross validation.  Before training both SVR and KELM, the optimum parameters that allow getting a better accuracy in the AI response must be determined [34,46]. As mentioned before, k-folk Cross-Validation is used with the "grid search" method to prevent overfitting problems. This procedure is computationally intensive given that the number of the parameters combination is huge. Additionally, it is a process of local search whose search range is hard to establish. As a solution to the above mentioned problems, an alternative presented in [43] was developed, there an optimization heuristic algorithm known as MVMO is used to determine the best parameters of the AI with kernel. The optimum parameters that must be identified for SVR are ε, C and γ according to equation (2); while the optimum parameters to identify for KELM are C and γ.

Database for AI Training
In order to evaluate the performance of the different AI techniques under study, the database for the IEEE 39-bus test system (also known as the New England IEEE test system) is built [60]. The test system (depicted in Figure 6) comprises 10 generators with active and reactive power limits; 29 load or PQ buses, 12 transformers, and 34 transmission lines. The data of the transmission lines, transformers, nodal demand, active and reactive power limits were taken from [61]. In this case, 70% of the data was used for training and 30% was used in the AI validation. An important aspect prior to the training of intelligent systems, especially the AI with kernel, was the need to identify the optimum kernel parameters that present better results for the data handled by each AI [46]. The parameters identification method of the AI is presented in the next section.

AI optimal parameters selection
The proposed methodology for the identification of AI parameters, using MVMO, is presented in section 4.3. In this case, 10 repetitions are used in the k-folk cross-validation to ensure a robust regression. Based on [43], the search space of the AI parameters are defined in an exponential scale in the following way: ∈ 2 , 2 , ∈ 2 , 2 , ∈ 2 , 2 [42,46].
A library for support vector machine (LIBSVM) was used to execute the SVR for the design, training and testing model of the SVR [46]. For a more detailed description, the computational code adapted in this work and used to execute the KELM, can be consulted in [62]. The neuronal network package from Matlab [63] was used to execute the ANN (backpropagation type). The number of neurons in the hidden layer was chosen empirically; this one was varied to guarantee a good performance and reduce the time in the training process.
The parameters identification is applied to the generated data of voltage phasor and VSM using the 39-bus test system. Figure 7 illustrates the convergence of the MVMO, which was implemented The MC process was implemented with the following parameters. For each nodal demand of the PQ buses, the means are taken as the active and reactive power of the original information (provided in [61]) and deviations of 8% are considered for the nodal demand in each PQ bus. Furthermore, the N-1 contingencies are generated, assuming an average outage occurrence rate equal to 0.04, both for transmission lines and generation units.
The static CPF tool is run for every contingency of the system. This is done for the whole system or with contingencies in agreement to the respective case. The VSM is determined for both pre and post-contingency conditions in the test system. An input matrix X N×n is formed by 20000 operative conditions (normal and under contingency), where N corresponds to the number of simulated operative conditions (snapshots) and n corresponds to the voltage phasors of buses taken for each simulated operative condition. In this case, the voltage phasors are 39 values that correspond to the number of system buses. Additionally, an output Y N×1 matrix is built in order to store the calculated VSM for each simulated condition that is evaluated.
Energies 2020, 13, 857 13 of 19 In this case, 70% of the data was used for training and 30% was used in the AI validation. An important aspect prior to the training of intelligent systems, especially the AI with kernel, was the need to identify the optimum kernel parameters that present better results for the data handled by each AI [46]. The parameters identification method of the AI is presented in the next section.

AI Optimal Parameters Selection
The proposed methodology for the identification of AI parameters, using MVMO, is presented in Section 4.3. In this case, 10 repetitions are used in the k-folk cross-validation to ensure a robust regression. Based on [43], the search space of the AI parameters are defined in an exponential scale in the following way: C ∈ 2 −5 , 2 15 , γ ∈ 2 −15 , 2 5 , ε ∈ 2 −5 , 2 0 [42,46].
A library for support vector machine (LIBSVM) was used to execute the SVR for the design, training and testing model of the SVR [46]. For a more detailed description, the computational code adapted in this work and used to execute the KELM, can be consulted in [62]. The neuronal network package from Matlab [63] was used to execute the ANN (backpropagation type). The number of neurons in the hidden layer was chosen empirically; this one was varied to guarantee a good performance and reduce the time in the training process.
The parameters identification is applied to the generated data of voltage phasor and VSM using the 39-bus test system. Figure 7 illustrates the convergence of the MVMO, which was implemented to identify the optimum parameters for both SVR and KELM training. Note that the convergence process of the MVMO to find the KELM optimum parameters is faster than the one for the SVR, requiring only a few number of objective function evaluations.  Table 1 presents the optimal parameters for each AI considered. The approximation of functions (AI-regressors) were obtained for 15,000 samples performed using the MC process for the test system.

Comparison performance of machine learning techniques in VSM estimation
Several tests were considered to evaluate the performance of the VSM estimation for each AI technique under study. The general test cases consist in that for each simulated operating condition, the information of the voltage phasors of all buses and their respective VSM are generated. The AI system is trained with the optimal parameters obtained in the previous section. For the performance test of the AI, the following conditions are considered [64]: -Case 1: Voltage phasors measurements without noise.  Table 1 presents the optimal parameters for each AI considered. The approximation of functions (AI-regressors) were obtained for 15,000 samples performed using the MC process for the test system. the information of the voltage phasors of all buses and their respective VSM are generated. The AI system is trained with the optimal parameters obtained in the previous section. For the performance test of the AI, the following conditions are considered [64]: -Case 1: Voltage phasors measurements without noise. -Case 2: Voltage phasors measurements with noise in the phasor magnitude. The noise is added randomly following a normal distribution with zero mean and deviation equal to 0.01 p.u. -Case 3: Voltage phasors measurements with noise in the phasor magnitude, and with zero mean and deviation equal to 0.04 p.u. Table 2 shows the results performance of the AI considering several conditions of the test system that includes measurements with noise in the phasor magnitude. The Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) are defined as the performance indices for the estimates made by the AI-regressor. In this case, 5000 randomly samples were considered on the database for calculating MSE and RMSE. The MSE and RMSE are calculated considering all the test samples. The training time corresponds to the training of 15,000 samples (different from the ones of the testing stage) while the testing time corresponds to the execution of the remaining 5000 samples. For the ANN, an optimal number of twenty neurons was identified for the hidden layer, and the optimal parameters of SVR and KELM are shown in Table 1.
In Table 2, it is observed that KELM has a more accurate prediction in normal conditions (see Case 1) than in conditions with high level of noise; situation that occurs in Case 2 and Case 3. It can also be noted in Table 2 that the second most accurate approach is ANN, followed by SVR. Table 3 presents the training and testing time for each AI execution process. In this regard, it can be observed that KELM presents significantly lower training time than the other AIs (see second column of Table 3). The average execution time of KELM for 5000 test samples was 1.258 s, which is still adequate for the long-term voltage stability evaluation in near real time. CPF average execution time for a single run was 0.3216 s. Note that the KELM execution time is much shorter than the one of the CPF. The predicted values of VSM by each AI model in the testing phase are plotted in Figures 8-10, the VSM is scaled between 0 to 1, and a small test set (about 55 samples) was selected to present in the figures. Figure 8 displays the most accurate results. Note that the MVMO-KELM model provides results that are much more in correspondence with actual values of VSM for Case 1. The second most accurate results can be seen in Figure 9, which are obtained by using ANN model. Lastly, Figure 10 presents the results obtained by using MVMO-SVR model. The predicted values of VSM by each AI model in the testing phase are plotted in Figures 8 to  10, the VSM is scaled between 0 to 1, and a small test set (about 55 samples) was selected to present in the figures. Figure 8 displays the most accurate results. Note that the MVMO-KELM model provides results that are much more in correspondence with actual values of VSM for Case 1. The second most accurate results can be seen in Figure 9, which are obtained by using ANN model. Lastly, Figure 10 presents the results obtained by using MVMO-SVR model. Case 1

Conclusions
This paper presented a hybrid AI model for the online estimation of VSM. The proposed hybrid model combines the strengths of KELM and MVMO. In this case, the MVMO is used to optimize the parameter settings of the KELM for the online estimation of voltage stability. The model was trained considering several operative conditions including different generation-demand scenarios with three types of consumers (commercial, residential and industrial) as well as N-1contingencies.
The MVMO-KELM model was successfully implemented to estimate the VSM in a benchmark power system, demonstrating that the MVMO optimization technique guarantees an appropriate selection of the KELM parameters that improves the accuracy of the prediction. The performance of the MVMO-KELM model was compared, with the MVMO-SVR and an ANN. Different test cases were taken into account corresponding to the inclusion of several noise levels in the magnitudes of

Conclusions
This paper presented a hybrid AI model for the online estimation of VSM. The proposed hybrid model combines the strengths of KELM and MVMO. In this case, the MVMO is used to optimize the parameter settings of the KELM for the online estimation of voltage stability. The model was trained considering several operative conditions including different generation-demand scenarios with three types of consumers (commercial, residential and industrial) as well as N-1contingencies.
The MVMO-KELM model was successfully implemented to estimate the VSM in a benchmark power system, demonstrating that the MVMO optimization technique guarantees an appropriate selection of the KELM parameters that improves the accuracy of the prediction. The performance of the MVMO-KELM model was compared, with the MVMO-SVR and an ANN. Different test cases were taken into account corresponding to the inclusion of several noise levels in the magnitudes of nodal voltages (inputs of the model). It was confirmed that the proposed MVMO-KELM model has a more robust performance against noise in the input data that that of the MVMO-SVR and ANN.
The proposed MVMO-KELM model can also be incorporated within a long-term voltage stability monitoring method. The MVMO-KELM model combined with other monitoring techniques, might constitute a powerful tool for the evaluation of voltage able to withstand several conditions in real-scale power systems.