Fast Heuristic AC Power Flow Analysis with Data-Driven Enhanced Linearized Model

Though the full AC power flow model can accurately represent the physical power system, the use of this model is limited in practice due to the computational complexity associated with its non-linear and non-convexity characteristics. For instance, the AC power flow model is not incorporated in the unit commitment model for practical power systems. Instead, an alternative linearized DC power flow model is widely used in today’s power system operational and planning tools. However, DC power flow model will be useless when reactive power and voltage magnitude are of concern. Therefore, a linearized AC (LAC) power flow model is needed to address this issue. This paper first introduces a traditional LAC model and then proposes an enhanced data-driven linearized AC (DLAC) model using the regression analysis technique. Numerical simulations conducted on the Tennessee Valley Authority (TVA) system demonstrate the performance and effectiveness of the proposed DLAC model.


Introduction
The power system is one of the most complicated physical networks in the world. Almost all electricity demands are served through power systems. It is of utmost importance to study the fundamentals of power systems including power flow analysis. Power flow model is involved in a great number of problems and is incorporated in various application models. For instance, it is incorporated into the formulations of state estimation [1,2], security-constrained economic dispatch [3][4][5], security-constrained unit commitment [6,7], transmission maintenance scheduling [8], and transmission expansion planning [9][10][11][12]. Hence, power flow analysis is remarkably important for power system planning as well as power system operations. Currently, the two most popular power flow models are the full AC model and the linearized DC model.
The full AC power flow model, following Kirchhoff's circuit laws and Ohm's law, can accurately represent the physical power system. This model captures all electric variables of interest, including active power, reactive power, phase angle and voltage magnitude. However, it is not uncommon to observe the divergence of AC power flow problems even with commercial software such as PSS/E and DSATools. In addition, incorporating AC power flow model into an optimization model will make it impossible to solve.
Though a number of algorithms were proposed in literature to improve the convergence performance of the AC power flow model [13,14], it still remains to be an unresolved issue. The computational complexity due to its non-linearity and non-convexity makes it impossible to use the AC power flow model in a variety of optimization problems. For instance, though the classical formulation for AC model based economic dispatch was first created by Carpentier in 1962 [15], no robust and reliable algorithm has been developed since then to solve the problem in a timely manner due to its non-linearity, non-convexity, and large-scale features. As a result, the industry today still uses a linearized DC power flow model that ignores reactive power and voltage magnitude.
To relieve computational burden, the simplified traditional DC power flow model is adopted when only active power and phase angle are of concern. As the DC model is simple, efficient, and reliable, it is widely used in the power industry and many power system applications [16][17][18][19][20]. For instance, instead of the AC model, the DC model is employed in the day-ahead energy markets and real-time energy markets.
The DC model is a good approximation of the AC model in terms of active power solution for high voltage transmission networks, of which the X/R ratios are typically very high. However, DC power flow may fail to perform properly in some scenarios. For instance, the work in [21] shows that DC model based cascading failure simulators fail to capture the power system behaviors in several circumstances. Though the average error for active power is limited to 5%, significant errors are still observed on several individual lines [22]. Three DC power flow models are investigated in [23]. It is concluded that the α-matching model has the most accurate results. However, α-matching method is hot-start, and thus its utilization is only restricted to near-real-time applications with the knowledge of initial system status. Furthermore, DC model cannot be utilized in the cases that voltage magnitude and reactive power are of interest.
Though the full AC power flow model is accurate, its computational complexity and unstable characteristics restrict its utilization. DC power flow model can reduce the computational burden significantly, but it may suffer from an inaccuracy issue and report no information regarding reactive power and voltage magnitude. Therefore, for situations when the reactive power and voltage information are needed while the solution time is limited, a fast linearized AC (LAC) power flow model that can capture reactive power and voltage magnitude is desired.
Three linear-programming AC power flow models, derived with polyhedral relaxation and Taylor series, are proposed in [24]. Numerical simulation demonstrates the effectiveness of the proposed models. Another linear approximation of the AC model is presented in [25]. The error of this approximation is less than 6% for voltage magnitude. It is worth noting that the approximation error of the proposed model in this paper is only about 1%. A linear relaxation of AC power flow model using polynomial optimization is proposed in [26]. The linearized model is applied to transmission planning problem, and the case studies demonstrate its capability of obtaining approximate solutions in a reasonable time. However, the solution is not checked and compared with the accurate full AC model. An iterative linear power flow method proposed in [27] seems to be accurate and fast. However, this iterative method may fail to converge. All the above work [24][25][26][27] is demonstrated on small-scale standard test cases only, and further efforts are needed to investigate the model accuracy on large-scale practical power systems.
To solve the scalability and accuracy issues related to AC power flow linearization, a data-driven linearized AC (DLAC) power flow model is proposed in this paper. This model captures all system state variables including active power, phase angle, reactive power and voltage magnitude. First, a regular linearization of full AC power flow model is conducted by ignoring higher order terms. Then, coefficients are assigned to the terms that remain in the model. The regression analysis technique is performed to determine those coefficients, which can reflect the system's typical or recent status. The philosophy behind this idea is that the system condition does not change significantly in a short timeframe such as a day. For instance, generator voltage setting points typically do not change or change within very narrow ranges, which indicates that voltage magnitude for other neighboring buses would also not change significantly. Another noticeable fact is that the voltage magnitudes for high voltage transmission networks are typically higher than one per unit.
The proposed enhanced DLAC power flow model can (i) substantially reduce the error associated with active power flows as compared with the traditional DC power flow model; (ii) and obtain much more accurate voltage profile and significantly improve reactive power flow solutions as compared with the traditional linearized AC model. In addition, the proposed DLAC model can be easily incorporated in various power system applications such as day-ahead unit commitment or real-time economic dispatch, which can substantially improve their performance. The effectiveness of the proposed DLAC model is demonstrated with a practical U.S. power system. The rest of this paper is organized as follows. Section 2 presents an overview on the power flow model. Section 3 introduces the regression analysis technique. Section 4 discusses the regular LAC model and the proposed enhanced DLAC model. Case studies are presented in Section 5. Finally, Section 6 concludes the paper.

AC Power Flow Model
Power flow studies are the basis of power system analysis [28]. The per unit system and singleline diagram are usually used for simplification. In the power flow studies, the following assumptions are typically made:

•
The system is three-phase balanced and thus, only the positive sequence network is of concern. • The Pi-equivalent circuit model can accurately represent the transmission network. • Individual generation and load are known except for the generation at the slack bus.
Given these assumptions, the following state variables can be obtained by solving an AC power flow problem through computer programs: • voltage magnitude and phase angle at each bus, • active power and reactive power generations at each bus, • active power flow and reactive power flow in both directions on each branch, • loss on each branch. Figure 1 shows the single-line diagram of a two-terminal circuit. A power system network consists of a number of those 2-terminal circuits. Note that denotes active power while stands for reactive power. Normally, the power flowing out of one end-bus does not equal to the power flowing into the other bus because of (i) the reactive power produced by transmission lines, and (ii) the losses on the branch connected to the two end buses, which means that ≠ − and ≠ − . The power flow equations for a branch are given below.
where and denote the active power and reactive power on branch k flowing from bus i to bus j respectively, denotes the phase angle difference across this branch, while and are the voltage magnitude of bus i and bus j, respectively. and denote the series admittance and parallel susceptance of Pi-equivalent circuit respectively. and are the real part and imaginary part of respectively, and they can be calculated with the following equation, where , , and denote line impedance, resistance and reactance of the Pi-equivalent circuit respectively.
Other important equations for the power flow studies are the nodal power balance constraints as shown in Equations (4) and (5).
where denotes the set of buses that are directly connected to bus i when bus i is designated as the sending end, and denotes the set of buses that are directly connected to bus i when bus i is designated as the receiving end. and represent the total active power and total reactive power produced by the generators at bus i, respectively, and are the total active power load and total reactive power load at bus i respectively. and denote the branch active power and reactive power flowing from bus i to bus h respectively.

Regression Analysis
Regression analysis is a widely used statistical technique for estimating the relationships among variables and determining the model for those variables [29,30]. It typically involves a dependent variable that is also referred to as a response variable, and one or multiple independent variables that are often called as regressors or predictors. The most popular method is the least squares approximation. A general multiple linear regression model with k regressors is defined as follows: where is the j th regressor, denotes the coefficient, and denotes the error. Provided a sample set of n observations, all parameters can be determined as through regression analysis. The estimated value and residual for the i th observation in the sample space are defined as Equations (7) and (8) respectively.
where is the observed value of the i th observation. Residual denotes the difference between the observed value and the estimated or fitted value while error represents the discrepancy between the observed value and the true value.
After creating a regression model, it is important to (i) conduct model adequacy checking to ensure the model fits the data and (ii) perform model validation to demonstrate the effectiveness of the regression model.

Model Adequacy Checking
One statistical metric to evaluate the overall model adequacy is the Coefficient of Determination, which is a percentage number denoted as . quantifies how good the regression model is and how much variation can be explained with the regression model. In other words, is the proportion of variation in the response variable explained and predicted by the regressors. 100% indicates that the regression model explains all the variability around the mean.
is defined in the equation below, where denotes the residual sum of squares and denotes the total sum of squares. Residual analysis can effectively discover several types of model inadequacies and measure how good the regression model fits the data [30]. Scaled residuals such as standardized residuals may be a better analysis technique to find outliers and analyze the regression model. The standardized residuals with zero mean and approximately unit variance is defined in the equation below, where is the residual mean square that estimates the average variance of residuals. Another commonly used scaled residual is the R-student residual, defined in Equation (11) shown below. R-student residuals are often used since their variance is constant.
where ℎ is an element of the hat matrix and ( ) denotes the estimate of variance with the i th observation being moved from the dataset. A plot of the residuals against the fitted values can help detect several types of model inadequacy. If the residuals in the plot are contained in a horizontal band, then, there is no indication of model deficiency. If the residuals against fitted values form a pattern such as a funnel, double bow, or nonlinear, then it indicates that defects may exist in the regression model and further investigation is required.

Model Validation
In the regression analysis domain, the best fitted model to the sample dataset may not accurately describe the relationship between variables. One key concern is the danger of extrapolation. Though a regression model is often used for extrapolation, it does not apply to the power flow analysis since the power system status does not change significantly especially in a short time frame. The case studies section in this paper demonstrate that the proposed regression model for linearized power flow equations has very similar performance in different system scenarios.
Multicollinearity may occur when regressors are highly linearly dependent and it has serious negative effects on the regression model. The regression coefficients may be poorly estimated when multicollinearity exists, which may result in inaccuracy of the regression model. One popular technique to detect multicollinearity is variance inflation factor (VIF). VIF measures how much variance of the estimated regressor coefficient is inflated. Each regressor in the regression model corresponds to one VIF value. This will be very useful to determine whether that regressor is involved in multicollinearity. High VIF indicates the associated regressor may have poor coefficient. VIFs below 3 suggests multicollinearity does not exist [31]. A VIF of one means there is no correlation between the associated regressor and the other regressors.

Data-Driven Linearized Power Flow Model
The regression analysis technique is often used for determining the causal effect relationship between the variables. With a power engineering insight, this work uses regression analysis technique to improve the simplified model derived from linearization of the complex full AC power flow model. This section first presents the traditional DC model and regular linearized AC model. Then, it describes the proposed data-driven DC (DDC) model and the proposed data-driven linearized AC model that can capture recent or historical system status with regression analysis.

DC Power Flow Model
The most commonly used linearized power flow model is the DC power flow model represented by the equation shown below. This DC power flow model can be used for applications when voltage magnitude and reactive power are not of concern. Due to its linearity and low computational burden, the DC power flow model is widely used in academic studies, as well as industry practices.

Linearized AC Power Flow Model
Though DC power flow model has been widely used for dozens of years, it does not apply for the scenarios when voltage and reactive power are of concern. Therefore, a linearized AC power flow model can be very useful. The nodal voltage magnitude can be written as follows [32], where ∆ is supposed to be very small since nodal voltage is very close to one in normal situations. Substituting Equation (13) into Equations (1) and (2) and ignoring the second order terms including ∆ and ∆ , we can obtain: Then, substituting ∆ = − 1, derived from Equation (13), back into Equations (14) and (15), we will obtain the linearized AC power flow equations shown below.

Data-driven Linearized DC and AC Power Flow Models
In the simplification of the DC power flow model represented by Equation (12), voltage magnitude is assumed to be one per unit all over the entire system, which is one of the main sources why DC power flow model is not very accurate. It would be very useful if a more accurate model with similar computational complexity is available. Thus, this paper presents a data-driven DC power flow model that uses the regression analysis technique to capture the system specific condition and reflect it with different coefficients in the model. Thus, as shown in Equation (18), instead of having the proportional coefficient being one, we can adjust it and optimize it based on the system recent and historical status.
Similarly, adjustments to the LAC model can be made to improve the model by incorporating the system recent and/or historical status. The data-driven LAC model with coefficients associating to each term is shown below. Model Equation (19) represents regression model P while Equation (20) denotes regression model Q. The constant term in Equation (20) is the intercept of the regression model Q. As compared to the regular LAC model described in Section 4.2, the proposed DLAC model has better performance and smaller model error.
The proposed DDC and DLAC models have several advantages over other regression models. Unlike [33,34], the numbers of regressors in the proposed regression models Equations (18)- (20) do not increase with the size of the system, which enables the proposed DDC and DLAC models to be computationally efficient. As compared with the regular LAC model Equations (16)-(17), the proposed DLAC model Equations (19)-(20) shares very similar computational complexity since they have the same number of linear algebra equations. The addition of coefficients does not create substantial computational complexity. In addition, the proposed regression models are derived from the physical model and the regressors are not correlated. For instance, for the proposed regression model Q, each data entry includes the voltage magnitude of only one end bus of a line rather than two end buses of a line, which avoids potential correlation across regressors. To summarize, the proposed DDC and DLAC models have neither scalability nor collinearity issues.

Metrics
The approximation error of the simplified power flow heuristic model is defined in Equation

Case Studies
The proposed data-driven DC model and data-driven linearized AC model are tested against the practical Tennessee Valley Authority (TVA) system and compared with the traditional full AC power flow model and simplified DC power flow model. TVA is a U.S. federally owned corporation that covers a population of about 10 million people [35].
This system has 1779 buses and 2301 branches. There are 72 consecutive hourly cases that covers 3 days' scenarios [36]; they are used as the test cases in this work. They are referred to as hour 1 case, hour 2 case, ..., and hour 72 case in this paper. AC power flow simulation is conducted on hour 1 case to obtain the system status that is used as the training data for regression analysis: the data used for determining the coefficient for data-driven DC model are line reactance, line active power flow and phase angle difference across the line; the data used for determining the coefficients for data-driven linearized AC model include line reactance and susceptance, line active power flow and reactive power flow, phase angle difference across the line, and voltage magnitudes at two end buses. The other cases are used to validate the proposed models. In this work, R is used as the statistical tool to perform regression analysis [37].

Data-Driven DC Model
With the power flow results obtained from the full AC power flow simulation on hour 1 case, regression analysis determines the coefficient in Equation (18) to be 1.12, which is above one unit. This is consistent with the fact that the average voltage magnitude for the same hour 1 case is 1.04 which is also above one unit.
The coefficient of determination for this regression model is 0.9964. This indicates that the regression model, = 1.12 / , can explain 99.64% of the variation of the response variable . VIF does not apply to regression model with one single regressor. Thus, the DDC model won't have any multicollinearity or overfitting issues as it has only one single regressor. Note that in Equation (18) is mathematically equivalent to in Equation (12); in other words, the proposed DDC model will achieve the same while it can improve the solution for . The power flow solutions obtained from the DC model and the DDC model for the TVA system condition represented by hour 1 case are presented in Table 1. The branch active power P calculated from both models are the same while the improvement on is significant. The improvement for branches with flows of over 50 MW is more than 50% on average. As shown in Table 2, very similar results are observed for hour 2 case, which represents a different scenario with the one used to train the DDC model. This further demonstrates that the proposed DDC model can substantially improve as compared to the traditional DC model.

Data-Driven Linearized AC Model
In the DLAC model, two separate regression models are built for branch active power and reactive power respectively. The results of coefficients are presented in Table 3. All five coefficients deviate from the value of one that is used for the regular LAC model. The 95% confidence interval for each regression coefficient is very narrow, which indicates that the coefficients are very stable and accurate for the given dataset or system conditions. As presented in Table 4, the coefficients of determination are 99.74% and 98.71% for the two regression models respectively. This shows that both regression models can explain almost all variance of the branch flows to their means. Table 4 shows the VIFs for the regressors of both regression models are all very small, well below 3, which means that multicollinearity has little effect on the regression models and there is no overfit issue.  Table 5 shows that the residual mean of the regression model Q is about five times higher than that of the regression model P. This indicates the proposed DLAC model is more accurate for branch active power than branch reactive power. Table 6 shows the ANOVA analysis for regression model P, which are used for testing the significance of regression. As the last column indicates, the possibility of the two regressors being insignificance to the response is negligible. In other words, both the terms ( − ) and are necessary in regression model P. This implies that DC power flow models with only one term has obvious room for accuracy improvement. This is consistent with the comparison between Tables 1 and 7. The active power solutions obtained from the LAC and DLAC are very similar, but they are 20% more accurate as compared to the solution obtained with the DC and DDC models. Figure 3 shows the scatter plot for hour 1 case, which indicates the fitted branch active power flows of the proposed DLAC model, are closely in line with the solutions obtained from the full AC model.  Tables 7-9, show the statistical results of branch active power, reactive power and bus voltage obtained with DLAC and LAC models on hour 1 case respectively. It is observed that (i) the proposed DLAC model can improve branch reactive power accuracy by 34.5% on branches with flows exceeding 10 MVA and (ii) improve voltage solution by 35.0% against the regular LAC model. As shown in Tables 10-12, very similar observations are made from the results by applying both models to hour 2 case. Thus, it is concluded that the proposed DLAC model can significantly improve reactive power accuracy against the regular LAC model, and the active power derived from the LAC/DLAC model is more accurate than the DC/DDC model. It is important to analyze how much improvement the proposed DLAC model can achieve over the conventional LAC model in terms of branch complex power flow that is used to monitor line loading level against line thermal capacity limit. To avoid the effects of small amount of branch flows, the following analysis focuses on branches that have flows of at least 10 MVA. With this filter, the statistics for hour 1 case and hour 2 case are presented in Table 13  Two power flow simulations based on the regular LAC model and the proposed DLAC model are conducted on hour 1 case. As compared to the full AC model, the branch complex flow errors in percent for both LAC and DLAC are calculated and presented in Figure 4 where branches with less than 10 MVA flow and branches with flow errors less than 5% for LAC model are removed in order to clearly show the performance difference between the regular LAC model and the proposed DLAC model. There are 683 branches in Figure 4 and they are ordered based on flow errors of the proposed DLAC model. It is clearly observed from Figure 4 that the complex power flow errors of DLAC are well lower than LAC for most branches. To verify the proposed DLAC model on a different system operating condition, similar simulations are performed on hour 2 case representing the system scenario of the next following hour and the results are shown in Figure 5. Very similar observations can be made from Figure 5, which validates the proposed DLAC model. It is concluded the proposed DLAC model can substantially improve the regular LAC model with regression analysis.   The average voltage improvement with the proposed DLAC model against the regular LAC model over all 72 cases has a mean of 48.7% and a standard deviation of 14.2%; the average branch reactive power improvement with DLAC against LAC is 39.8% with a standard deviation of 4.7%. This demonstrates the proposed DLAC power flow model significantly improve the regular LAC model. The average error of voltage solutions obtained from DLAC is only 0.67% among 72 system scenarios. The standard deviation of voltage error is 0.33%, which indicates the proposed DLAC model is stable and robust over different system conditions. For branch reactive power, the average error is 19.4% with a standard deviation of 6%, and although reactive power solution is not as accurate as voltage, the proposed DLAC can provide reactive power information within an acceptable range, which shows its superiority over the traditional DC power flow model.
It is worth noting that determining regressors' coefficients with linear regression is fast and the regression model can be easily retrained with the latest system operating data. However, frequent model retraining may not be necessary unless the system voltage profile and operating points have significantly deviated from the previous training data. This feature can enable the proposed DLAC model to be integrated in several power system applications, such as economic dispatch, unit commitment, and grid expansion planning.

Conclusions
Due to nonlinearity and nonconvexity of the AC power flow model, the simplified DC model is widely used in both academic and industry. However, the DC power flow model does not include any information about reactive power and voltage profile. To maintain linearity while capturing reactive power and voltage, this work first introduced a regular LAC model and then proposed a data-driven enhanced LAC model. The proposed DLAC model can improve the regular LAC model by tuning the coefficients with the regression analysis technology. Both regular LAC and enhanced DLAC model can substantially reduce the error associated with calculated branch active power flows against the traditional DC power flow model. Moreover, as compared with the regular LAC model, the proposed DLAC model can obtain a more accurate voltage profile and improve reactive power flow solutions. The reason why DLAC model is more accurate is that the enhanced DLAC model can take recent/historical system conditions into account. It is worth noting that the proposed DLAC model can be easily incorporated in various power system applications such as day-ahead unit commitment or real-time economic dispatch, which can substantially improve their performance.
Case studies on the large-scale practical TVA system demonstrate the performance and effectiveness of the proposed DLAC model. Numerical simulations show that the proposed DLAC model can substantially improve the reactive power and voltage solutions against the regular LAC model. The simulation results also show that there is no overfitting or multicollinearity issue for the proposed DLAC model, which indicates that the proposed DLAC model is robust.