1. Introduction
The security-constrained optimal power flow (SCOPF) problem is an important tool in day-to-day power grid operations. Independent system operators (ISOs) solve the SCOPF problem frequently (every few hours [
1]) to obtain operational policies that are considered
N-1 secure (i.e., resilient against contingencies where any one transmission line or generator may fail without loss of service to power consumers). The most common practice involves solving a linearized form of the OPF problem without contingencies, and then use of heuristics to obtain safe (but potentially sub-optimal) SCOPF solutions using a subset of contingencies. Thus, further improvements in computational cost are necessary for increased adoption of the more accurate non-linear formulation. More recent approaches have formulated and solved the SCOPF problem using stochastic programming formulations that capture thousands of contingencies using parallel interior point solvers with high-performance computing hardware [
2]. Some parallel solutions of the SCOPF problem are scalable and can be computed in a timely manner (about 10 min for thousands of contingencies) if the computing resources are available. Several other methods for managing the computational tractability of SCOPF problems include problem decomposition [
3], learning active constraint sets [
4], and identifying umbrella constraints [
5].
Other work has looked at parallelization of the ACOPF [
6], which may be extended to SCOPF in the future. There are excellent literature sources [
7,
8] that survey all formulations, linearizations, and relaxations that are common in the OPF problem and, by extension, the SCOPF problem. Recent work has also shown that many convex relaxations to these problems can produce infeasible solutions [
9], and that determining AC system feasibility is an NP hard problem [
10]. As the problem is of key interest to critical infrastructure systems, there have been research competitions dedicated to solving the SCOPF efficiently. This includes the ARPA-E Grid Optimization competition [
11,
12,
13]. This contest focused on solving large cases of this problem and successful teams used convex relaxations, contingency-screening heuristics, problem decompositions and various solvers (IPOPT, KNITRO, Gurobi). While this work focuses on the economic dispatch problem, there have been some studies into using surrogate models for the unit commitment problem as well [
14].
Surrogate modeling techniques are widely used across engineering disciplines owing to their ability to produce tractable computational models that capture complex phenomena [
15,
16]. A key aspect of such data-driven techniques is their ability to incorporate diverse, large-scale datasets (e.g., operational and/or simulation data) into flexible, accurate, and fast prediction models. In the context of data science, surrogate modeling techniques can harness new hardware and algorithmic advances to utilize massive amounts of data with access to scalable training architectures [
17].
Neural networks have received particular interest with respect to surrogate modeling [
18] due their excellent approximation properties and arbitrary degree of accuracy with enough activation neurons [
19]. Deep neural networks have specifically garnered attention for their ability to represent complex non-linear relationships between inputs and outputs and learn significant features while utilizing training data efficiently [
20]. In the context of the machine learning (ML) literature, however, the optimization aspects commonly refer to the training of the neural network model [
21], as opposed to embedding [
22] the surrogate model within a mathematical optimization problem in the form of an algebraic function. The latter embedded NN differs from its more commonplace predictive applications (e.g., image classification) in that its algebraic representation permits design and control problems that are often concerned with enforcing physical constraints. By representing only a subset of the optimization formulation with an ML model, physics-based knowledge, which may be implicitly defined on the inputs and outputs, can be incorporated, and this allows for hybrid models that still maintain a first-principles relationships between the model variables.
This manuscript explores hybrid modeling for the well-known, yet challenging, SCOPF problem and demonstrates both the promise and research challenges associated with incorporating deep NNs into large-scale non-linear programming formulations (NLPs). We describe a novel sampling scheme that utilizes an equation-based sampling method of the SCOPF problem to generate a diverse dataset containing balanced points of safe and unsafe operating modes. Subsequently, a deep NN is trained to learn the classification boundary between these spaces. We then introduce a new optimization formulation that uses the surrogate model to enforce secure operation in a traditional AC optimal power flow (ACOPF) formulation. As opposed to fully black-box methods, this approach allows the constraint to be agnostic to objective function parameters and have mathematical guarantees of feasibility in the nominal state variables. We compare the scalability and accuracy of our proposed formulation against the equivalent extensive SCOPF problem and discuss future enhancements.
The rest of the paper is organized as follows:
Section 2 summarizes previous relevant work.
Section 3 describes the mathematical formulation of SCOPF and how certain constraints can be handled with surrogates.
Section 4 examines accurate sampling in the high-dimensional security space.
Section 5 shows how trained NN models can be embedded into optimization models.
Section 6 shows the numerical results, and
Section 7 summarizes the key findings.
The key advances and research contributions can be outlined in three methodological sections, shown in
Figure 1. The novel sampling approach is shown to be highly effective for characterizing non-convex boundary functions; algebraic reformulations of NN functions are detailed and discussed with respect to their use in large NLPs; and the hybrid solution method allows for quick and accurate solution of the SCOPF. These contributions with respect to related literature are further detailed below in Related Work.
2. Related Work
In this work, the SCOPF problem is solved using a hybrid surrogate-based approach, where a deep neural network (DNN) surrogate is used as the feasibility classifier within the overall SCOPF formulation. This approach tackles the issue of tractability by using computational resources a priori (offline) to map the secure space of the problem and train the surrogate. It also allows for a more compact representation of the feasibility space into a single NN constraint. Previous work [
23] trained non-linear regression models and single-layer NN models to find the security boundary of a given load profile for the worst case contingency. More recent work [
24] has trained NN classifiers for various criteria of power grid security with ReLU activation functions. These constraints were then added to the ACOPF problem as a set of linear constraints with binary variables to generate a mixed-integer non-linear program (MINLP). The power equations are then linearized to convert this to a mixed integer linear problem (MILP). Our work builds on this by using deep NN models to map the secure and insecure space of all contingencies of interest, variable in-load demand, and higher dimensional representations of the security boundary. We focus on ML representations that maintain non-linearity but avoid discrete variables, which ultimately maintains their representation as an NLP problem that can be solved using powerful interior point solvers.
Other related work has, instead, replaced the entire optimal power flow problem with a surrogate model, using optimal solutions to train it [
25]. In [
26], the authors look at verification of neural networks that predict optimal power flow solutions, showing that well-trained networks have good bounds on output error. These approaches have the advantage of learning the full problem which makes forward prediction faster, but can make the model inflexible to changing objective function parameters and requires some post-processing to guarantee even pre-contingency state feasibility. In [
27], the authors use a distance metric from the boundary to quantify historical data points’ security even when most are not near to the boundary, which also requires knowledge of the exact feasibility space. We, instead, use the model to map out boundary points where the constraints become active, using an optimization-based approach described in detail in
Section 4. One advantage of our approach is that the classification model can learn the feasibility boundary in the higher dimensional space of the nominal variables instead of learning a single distance metric. We build on our previous work in [
28] by introducing (a) a model-based sampling algorithm that can accurately find data points without the need for an iterative “guess and check” procedure, (b) exploring various neural network formulations/facilitated Python implementations, and (c) applying our approach to much larger grid problems with more contingencies.
5. Neural Network Representation
Implementing the security classifier in Equation (4e) requires formulating the neural network in terms of optimization variables and constraints. The choice of NN formulation affects the fundamental form of the surrogate model and the resulting optimization problem. A major thread of research has developed surrogate formulations for machine learning surrogates and includes mixed-integer linear representations for ReLU networks [
36,
37,
38,
39], full-space and reduced-space representations for
smooth activation functions [
22], as well as complementarity representations for ReLU functions [
40]. For our application, we assume a simple feed-forward NN described by Formulations (6a)–(6d) that contains
layers where each layer contains
nodes (depicted by
Figure 5). It is also important to note that the OPF problem we are applying this approach to is an NLP, so the inclusion of mixed integers for the NN representation would result in an MINLP problem more difficult to solve than the extensive form. Thus, we focus on three constraint formulations that implement the NN as a smooth, non-linear function that are compatible with large NLP solvers and provide a comparison between them when used in an optimization context.
We denote the input vector by
and the output vector by
. The input vector to each layer
is a linear combination of the output of the previous layer, i.e.,
, where
is a
weight matrix and
is a
bias vector between layers
k and
. Each hidden layer incorporates an activation function
, which usually applies a non-linear transformation to each element of the vector input. We denote the vector
x as
to represent the input layer to the neural network (the input layer is usually not considered a layer). The pre-activation values
at each layer
k are given by (6b) and the post-activation values
are denoted by (6c). Finally, the output layer produces the vector
y as a linear combination of the final hidden layer given by (6d).
It is helpful to express Formulations (6a)–(6d)
element-wise to demonstrate different neural network representations for the security-constrained optimization problem. Formulations (7a)–(7d) is analogous to (6a)–(6d), but it additionally unfolds the inner layer nodes.
The choice of the best activation function used for Equation (7c) generally falls to the problem of training a neural network, although the ReLU function has been commonly selected for its favorable properties [
41]. In the optimization problem, (7a)–(7d) can be implemented using the different aforementioned algebraic representations. As this manuscript utilizes the non-linear ACPF equations, we choose to examine the three following smooth NN representations.
The variables and constraints described by (7a)–(7d) are formulated explicitly in the problem. The activation constraints can be any smooth function (e.g., tanh, sigmoid, softplus). Intermediate variables (e.g., ) are formulated in IPOPT and related through sequential constraints.
The reduced-space representation is similar to the full space, but the NN variables and constraints are captured as one expression that connects the input and output variables. Here, intermediate variables (e.g.,
) are not explicitly formulated in IPOPT and the problem variable size is reduced. Previous research has shown that reduced-space representations may have advantages when embedded in optimization formulations [
22]; thus, we implement this formulation to access the advantages over a full-space formulation.
While we can formulate ReLU within full- and reduced-space representations, the resulting constraints are not smooth (ReLU is given by
). To handle this, ReLU can be formulated using complementarity conditions, where we substitute (7c) with (8) for each node in the NN to generate a mathematical program with complementarity constraints (MPCC) [
42].
The complementarity constraints in (8) permit smooth transformations, which have been studied extensively with respect to regularity properties [
43]. This manuscript uses a simple component-wise formulation initially presented in [
44] given by (9). This representation introduces a non-linear constraint for each node (complementarity) in the neural network and uses the regularization parameter
to satisfy NLP constraint qualifications. This formulation is implemented within pyomo.mpec [
45] and is used in the neural network package OMLT [
46].
Other variations of (9) can be found in the literature and include different regularization techniques, NCP functions, and objective penalties [
47]. Overall, this Section provides a general mathematical framework for embedding NN models from various ML libraries into a non-linear program. Open source code for the tool OMLT can be found online (
github.com/cog-imperial/OMLT, accessed on 11 July 2023) for implementing all of the discussed formulations in Python.
6. Results
We apply this overall framework to three case study examples: IEEE-118 system with 6 considered contingencies, IEEE-118 with 33 contingencies, and IEEE-300 with 10 contingencies. IEEE-118 has 99 loads, 54 generators, and 186 branches. IEEE-300 has 201 loads, 69 generators, and 411 branches. Increasing the number of buses affects the size of the nominal state model for both baseline and hybrid approaches, while increasing the number of contingencies affects the size and complexity of the security region we want to represent with the NN surrogate.
6.1. Sampling and NN Training
Below, we discuss the training specifications and results with respect to the feasibility classifiers and their accuracy. An NN with two hidden layers and 20 neurons is trained using ADAM in Tensorflow. The output layer is a softmax function, which converts the outputs to a probability between 0 and 1. This is commonly used in NN architectures used for binary classification. Specifically, we plot the receiver operating characteristic (ROC) curve, which is useful for showing the expected trade-off between the conservativeness and accuracy of classification models. An ideal ROC for a classifier would show a right-angle in the upper left corner corresponding to a high true positive rate and a low false positive rate. While a high true positive rate provides the most confidence in grid security, the accuracy trade-off would be important to tailor to the specific grid application.
These test points are drawn from the same distribution as is described in
Section 4, so they are closer to the boundary and, thus, more difficult to classify. We show the effect of doubling the training dataset on accuracy for IEEE Case 118 with six contingencies.
Figure 6 shows the model performance for testing data points with differing amounts of training data available: 5500 and 2750 training points, respectively. For an
value of 0.5, the model accuracy is 89% with the larger training set compared to 81% with the smaller set. One advantage of our approach is that sampling and fitting can be performed once and offline, so this would not increase the computational cost of the optimization formulation that needs to be solved routinely, but there are diminishing returns with further samples. Overall, the sampling costs for the Case 118 take, on average, 0.4 s per sample and the Case 300 takes, on average, 1.2 s per sample. The networks are trained for 5000 epochs, with restoration of the best weights for validation accuracy.
We observe in
Figure 7 (blue line, more constrained Case 118) that, when increasing the number of contingencies, the classification boundary becomes harder to accurately predict with the same amount of points. In this case, we must increase our training set to 11,000 points in order to achieve a test accuracy of 85%, again using a two-layer NN with 20 nodes in each layer.
Figure 7 shows the classifier performance for this highly constrained case study. The overall shape of the ROC is highly symmetric due to the balanced dataset provided by the sampling approach and shows that secure and insecure points are equally difficult to classify.
In
Figure 7 (red line, larger bus system Case 300), we show the results for using our methodology on the IEEE Case 300. As the dimensionality of the problem is increased, the training set is increased to 25,000 points and the hidden layers are increased to 50 nodes per layer. The accuracy for the test points is found to be 81%. In this case, the training data accuracy is less ideal than the previous cases as it only reaches an accuracy of 96%. Larger NN models are able to increase this training accuracy as universal approximators; however, there is an important trade-off between model complexity and the resultant hybrid optimization results, which is discussed below.
6.2. Optimization Results and Comparison of Hybrid versus Extensive Formulations
After we train the security surrogate, we embed it as a constraint into the original ACOPF formulation, which is a non-linear and non-convex problem. The final formulation is a hybrid model (i.e., comprised of both physics-based and data-based constraints) described by (4a)–(4f). First, we show a comparison of our method and the traditional extensive SCOPF for the 118 case study with six included contingencies.
In
Table 3, we show a complete comparison of Formulations (1a)–(1g) and (4a)–(4f), including various NN formulations, as described in
Section 5. Here
refers to the mean absolute error between the solution vector found with the hybrid approach and the corresponding solution vector with the full SCOPF formulation, presented in
Section 3.1. These values are normalized to represent the average percent error across each variable and unique solution. All hybrid formulations result in a small difference compared to the extensive solution when comparing optimal generator set points for various load profile inputs. This comparison allows us to quantify how close the hybrid optimization method gets to the baseline with respect to optimality. The baseline SCOPF solution is found by solving the full problem with all relevant contingencies using IPOPT. As the NN architectures have similar accuracy, it is not surprising that they perform similarly with respect to the fidelity of the solution. Furthermore, all the hybrid formulations achieve a significant reduction in CPU time (8–16×) compared to the extensive formulation, which is our primary motivation. We see some differences amongst these formulations in variable size and time, but would expect the reduced space formulation to further distinguish itself as the size of the network increases.
Next in
Table 4, we show the scalability of the method to highly constrained problems, where we consider 33 separate line contingencies.
For this case, the variable size of the problem increases significantly for the extensive version, while the hybrid models stay much closer to the smaller case study above. The results show that the extensive formulation scales poorly as more contingencies constraints are added, increasing the problem variable space by 6x and the average CPU time by a similar amount. The hybrid formulation presented maintains the same problem size as in the smaller contingency since the embedded networks are of equal size. Although the CPU time is increased, due to this being a harder optimization problem with a tighter feasible space, it scales much better than the traditional formulation and allows for secure solutions in fractions of the time (20× speedup).
Finally, we present the results for the IEEE 300 Case study in
Table 5. A similar pattern is seen with respect to the CPU time with about an order of magnitude reduction. A higher error is seen between the full SCOPF solution and the hybrid approximation, but still less than 10%. As with most data-driven approaches, increasing the problem dimensionality requires further sampling and considerations of the model size and complexity. Some prediction error is expected, but the method proves to be highly informative to the ACOPF solver when constraining the solution space to the secure region of the NN prediction. Importantly, this formulation also maintains the nominal case variables for the ACOPF problem allowing for guarantees on feasibility of the pre-contingency space. This allows for the solution of the non-linear, non-convex ACOPF formulation without relaxations.
With respect to the NN representation, we see that full-space, reduced-space, and ReLU with complementarity, all have similar accuracy in their solutions (). This is expected as these networks were trained on the same data with equivalent hyperparameters. In terms of the variable size, the reduced-space formulation is the most compact but the full-space and ReLU are both much smaller than the extensive formulation. There is a small but significant improvement in CPU time by using the reduced space formulation across each case. These differences may become more exaggerated as the size of the NN increases.
7. Discussion and Conclusions
Overall, we have provided a general approach to representing non-linear feasibility regions with NN models and integrating these models back into challenging optimization problems to reduce online computational costs. We present promising results with respect to the accuracy of NN-based classifiers for security, especially when carefully collecting samples of high value to our classifier using optimized-based sampling. The model-based sampling approach proves to be much more effective at characterizing non-convex boundary functions, such as feasibility, than existing space-filling designs of experiments. The positive results for this sampling approach motivates future studies looking into how it might perform in other modeling contexts that include feasibility functions. Mapping the feasibility space is useful in many engineering problems, including studies of system operability and flexibility, that arise in several industrial processes. We have shown how NNs can be included as algebraic constraints into the baseline AC optimal power flow problem, resulting in tractable hybrid models. We have observed that the hybrid model can be solved very fast (5×–20× speedup) and still give similar results to the extensive formulation. The facility and adaptability of the NN formulations discussed, along with open source software, allows for many future studies that integrate data-driven functions with first-principles models.
As seen by the results, the ML classification error increases with the dimensionality and the number of contingencies. This can be tackled by increased sampling, and use of adaptive sampling techniques that focus sampling in regions that balance exploration and exploitation of the feasibility boundary. The main advantage of this approach is that all of the sampling effort can be performed only once and offline. Moreover, as the size of the problem increases, it is expected that the size of the ML model will also increase. This may lead to increased CPU costs for optimization of the hybrid formulation, especially if dense NN architectures are used. This risk can be mitigated by taking advantage of modern sparsification techniques, as well as reduced-space formulations and customized NLP algorithms, to optimize NLP formulations with embedded ML models [
22]. Finally, as decomposition and parallelization techniques advance, this will lower the optimization times for extensive formulations with the use of high-performance computing. However, an approach for mapping complex feasible regions using ML models may have many applications beyond the one described here, such as fast computation and visualization of security feasibility, or for sensitivity analyses to identify critical contingencies.
There are several future directions that may be pursued that could further improve and extend this work. The ML model errors can be more accurately quantified using verification approaches. This is an important area of research that can provide bounds on the worst-case predictions of NN models, which would be useful to guarantee the accuracy of trained models and to promote their adoption by system operators. These bounds are especially informative in cases outside of the training data regime where NN extrapolation errors may be much higher. These studies also may assuage ethical/implementation challenges surrounding the black-box nature of NNs and the lack of formal guarantees of optimality. Network sparsification techniques could lead to equally accurate networks with reduced model complexity. This may be an important way to further speed up hybrid solutions, as sparser NN functions decrease the variable size of the problem. In this work, we have used and compared different forms of feed-forward NNs, but, for larger or more complex feasibility functions, other ML techniques can be considered. For example, an interesting future direction is to look at more complicated neural network architectures and training routines that can embed system physics into the surrogate. While this has been shown to improve extrapolation in regression models, no current work has looked at physics-informed neural networks for black-box-feasibility. One other future direction is to combine an ML screening model that can target important contingencies a priori to the sampling and hybrid solution method. This may allow for many low-risk contingencies to be eliminated. Many of these directions must be considered when scaling up the approach even further to larger case studies.