A Distribution System State Estimator Based on an Extended Kalman Filter Enhanced with a Prior Evaluation of Power Injections at Unmonitored Buses

In the context of smart grids, Distribution Systems State Estimation (DSSE) is notoriously problematic because of the scarcity of available measurement points and the lack of real-time information on loads. The scarcity of measurement data influences on the effectiveness and applicability of dynamic estimators like the Kalman filters. However, if an Extended Kalman Filter (EKF) resulting from the linearization of the power flow equations is complemented by an ancillary prior least-squares estimation of the weekly active and reactive power injection variations at all buses, significant performance improvements can be achieved. Extensive simulation results obtained assuming to deploy an increasing number of next-generation smart meters and Phasor Measurement Units (PMUs) show that not only the proposed approach is generally more accurate and precise than the classic Weighted Least Squares (WLS) estimator (chosen as a benchmark algorithm), but it is also less sensitive to both the number and the metrological features of the PMUs. Thus, low-uncertainty state estimates can be obtained even though fewer and cheaper measurement devices are used.


Introduction
Smart grid evolution requires high-performance instruments and advanced measurement techniques for a variety of purposes, for example, to support optimal control strategies, to improve efficiency, to detect faults and, last but not least, to monitor power quality [1,2]. One of the key functions for smart grid stability monitoring and control as well as for contingency analysis and dispatching is state estimation (SE) considering either equivalent one-line diagrams or full three-phase systems [3]. The purpose of SE techniques is to estimate the quantities that describe the behavior of a network for electricity transmission or distribution (e.g., bus voltage or line current phasors) relying on a system model and on a limited amount of data, partly available a priori (e.g., network topology, impedance line parameters, nominal load conditions) and partly derived from measurements that are directly or indirectly related to the chosen state variables [4]. In general, one of the main obstacles for SE is the limited network observability, which, however, can be partially achieved at the expense of estimation accuracy and precision by using pseudo-measurements (e.g., a priori information about the minimum, nominal and maximum active and reactive power injections at non-observable buses) or virtual measurements in the case of zero-injection 1.
The algorithm conceived to estimate the inputs to the prediction step of the filter. Such a sequence of data is obtained by disaggregating the active and reactive power injection variations measured at the substation; 2.
A deeper analysis of the relative impact of an increasing number of PMUs and SMs on state estimation performance when such measurements are used in the update step of the EKF.
The analysis is performed by using the classic IEEE 33-bus radial system as a case study under realistic time-varying load conditions. The rest of this paper is structured as follows. First, in Section 2, the main contribution of this paper in the context of related work is highlighted. Section 3 describes the system and measurement models. Section 4 first deals with the optimization algorithm used to compute the weekly sequence of power injection variations used as inputs to the EKF and then presents the EKF implementation details. In Section 5, the features of a test distribution system (namely the IEEE 33-bus test feeder) and the load profiles used to synthesize a realistic one-year-long time series of state variables are described. In Section 6, after an overview of the settings needed to run and to compare the proposed DSSE algorithm with the WLS approach, the results of multiple simulations using different kinds of measurements are reported. Section 7 presents an overview of the applicability, advantages and limitations of the proposed approach. Finally, Section 8 concludes the paper and outlines future work.

Related Work
The SE techniques for both transmission and distribution systems can be roughly classified as static or dynamic [27]. The most common kind of static estimators in power systems relies on the so-called WLS approach. Such estimators can be implemented in a variety of ways depending on the chosen state variables and on the wanted trade-offs between computational burden and numerical robustness [4]. WLS state estimation relies on the iterative linearization of a measurement model and it converges to the actual state by applying the Gauss-Newton method. A crucial issue in WLS estimation is the way in which weights are assigned to both real and pseudo-measurements [28]. Indeed, the weights have to be inversely proportional to the variances of measurement uncertainties. However, even a well-done weight assignment may cause ill-conditioning problems when such uncertainties differ by several orders of magnitude. Moreover, the algorithm sensitivity to different types of measurement uncertainties may change considerably [29]. Last but not least, if the algorithm is not initialized properly, the Gauss-Newton iterative approach can wrongly lead to local minima instead of the global one. Such convergence problems can be effectively addressed by using alternative algorithms for the WLS problem solution, such as Particle Swarm Optimization (PSO) [30], or Quasi-Imitative Evolution System Procedures [31].
The dynamic state estimation techniques rely on a state transition function modeling the time evolution of state variables [32]. As a result, recursive Bayesian estimators can be built upon this model. It is important to highlight that in the context of power systems, the dynamic and forecasting-aided concepts are often regarded as synonyms because, due to the lack of univocally defined system models, the coefficients of the state transition function typically result from some forecasting algorithm applied directly to the chosen state variables (e.g., bus voltage or line current amplitudes and phases). The forecasting model can be so simple as the two-parameters Holt-Winters method (as originally proposed in [33]) or a multivariate autoregressive process driven by artificial neural networks [34]. In any case, a considerable amount of historical information on all state variables is needed to build the state transition matrix. Unfortunately, this can hardly be done at the distribution level due to the observability limitations explained in Section 1.
On the other hand, if the dynamic system underlying state estimation results from the linearization and discretization of power injections equations [35], the forecasting algorithms can be applied to the power injection profiles rather than to the state variables to be estimated. In particular, two alternative approaches can be adopted depending on whether active and reactive power injections are regarded either as auxiliary state variables or inputs to the system [36]. In the former case, recently adopted in [37] as well (where it is further enhanced for robust estimation in the presence of outliers), the state transition matrix still requires forecasting (in particular load forecasting) at every bus, which can be still hardly feasible at the distribution level. Instead, in the latter case, which is also adopted in this paper, the state transition matrix is just the identity matrix, as explained in detail in Section 4. Therefore, system observability depends only on the system output function, namely on the type and the position of the measurements available at a given time like in the WLS case. However, if a Bayesian approach is adopted, the fusion between system input data (e.g., resulting from a forecasting algorithm, from a previous estimation algorithm or known a priori) and observed measurement data can be exploited to achieve better estimation results. Unlike the solution originally proposed in [38], which also includes the second-order terms in the state vector (thus reducing the linearization error at the expense of the computational burden), the proposed EKF is a generalization of the filter introduced in [36]. However, it is characterized much more in depth, with a special focus on current and future distribution level features. In fact, the forecasting model in [36] is unclear, the measurement settings and the related uncertainty contributions are quite simplistic (e.g., no PMUs or instrument transducers are considered) and the results are just a few. A more exhaustive performance comparison between WLS and EKF estimators including the effect of PMUs is presented in [25] and [26]. However, in those works, as briefly outlined in the Introduction, the load fluctuations regarded as system inputs are naively assumed to be white and uncorrelated between buses, which is not realistic. Moreover, in this paper the joint effect of PMUs and SMs (assuming that such measurement data progressively replace traditional pseudo-measurements as expected in smart grids [14]) is analyzed.

Models Description
As known, the state of a distribution system can be expressed in terms of bus voltage phasors or line current phasors. Given that such state variables are complex-valued, they can be expressed either in polar or in rectangular coordinates. Therefore, a given estimation algorithm can be formulated in four alternative ways, which differ in terms of computational burden, model approximation errors and robustness to numerical noise [39]. In this paper, without loss of generality, the state of a distribution system with N buses is represented by the bus voltage phasors in polar form, i.e., x = [θ 1 , . . . , θ N , V 1 , . . . , V N ] T , where θ i and V i are the angle and the Root Mean Square (RMS) voltage, respectively, at the i-th bus. Observe that, if θ 1 is not measured by a PMU, it can be excluded from the state vector because the phase at the slack bus can be conventionally set to 0. Alternative state variables, although possible, do not change the general formulation of the estimation problem substantially.

System Model
Let u = [P 2 , . . . , P N , Q 2 , . . . , Q N ] T be the vector of the active and reactive power injections at all buses other than the slack bus. In general, the power injection values result simply from the difference between the power of generator and load at a given bus. The system model adopted in this paper can be expressed in implicit form as [4] f(x, u) = whereθ can be conventionally set to 0 (and θ 1 can be removed from x) if no PMU is used to monitor the slack bus,V is the nominal operating RMS voltage (namely a guess of the slack bus RMS voltage value even if the actual value shall be estimated), while the i-th entries of f P (x, u) = 0 and f Q (x, u) = 0 describe the relationships between state vector x and the active and reactive power injection, respectively, at bus i = 2, . . . , N, i.e., In (2), Re(Y ij ) and Im(Y ij ) are the real and imaginary parts of the element (i, j) of the grid admittance matrix Y, while θ ij = θ i − θ j . Observe that P 1 and Q 1 are purposely not included in (1), because these values are likely to be linearly dependent on the power injections at the other buses. As a consequence, the Jacobian matrix of (2) with respect to x could be singular (i.e., not invertible) if also the equations related to the slack bus (i.e., for i = 1) are considered, thus preventing EKF implementation, as explained below. By replacing instead the power injection equations at the slack bus with θ 1 −θ = 0 and V 1 −V = 0, the system still comprises 2N nonlinear equations in 2N unknowns (or 2N − 1 if θ 1 = 0 is not monitored by a PMU), but the Jacobian matrix of (1) with respect to x is not only square, but also likely to be invertible.
Observe that even though in active distribution systems both input and state variables change with time, in (1) the time variable is omitted for the sake of brevity. However, if t k−1 and t k denote two consecutive discrete-time instants, from the Taylor's series of (1) truncated to the first order at time t k−1 it follows that [25,26] ∂f(x, u) ∂x where ∆x k = x k − x k−1 , ∆u k = u k − u k−1 , x k and u k are the values of the state and the input at time t k , respectively, and e k is the vector including: 1.
The intrinsic voltage phase and RMS amplitude fluctuations at the slack bus in normal operating conditions (first two entries); 2.
The errors resulting from the linearization of the power injection equations (remaining 2N − 2 entries).
If the two Jacobian matrices in (3) are represented with symbols J x k−1 and J u k−1 , respectively, and assuming that J x k−1 has full rank, the linearized discrete-time system can be more compactly rearranged as where G k−1 = −J −1 is the input matrix at time t k−1 (with I 2N and I 2N−2 being the identity matrices of size 2N and 2N − 2, respectively), while w k = −J −1 x k−1 e k can be regarded as the process noise vector. Since the linearization errors are weakly correlated and the corresponding average values over time are negligible, the elements of w k can be reasonably assumed to be normally distributed because of the central limit theorem under the so-called Lindeberg's condition. Indeed, each element of w k results from the linear combination of zero-mean and weakly correlated random terms.
In Section 4.2, the relation (4) is used to define the prediction step of the EKF. The elements of ∆u k are instead estimated through the optimization algorithm described in Section 4.1.

Measurement Model
Assuming that the measurement data available at time t k are gathered into vector z k , the relationship between such measured quantities and the state of the system can be expressed as [4] where h(·) is a vectorial and generally nonlinear function of the state variables and ν k is the vector including the corresponding zero-mean measurement uncertainty contributions. In principle, any combination of current, voltage or power measurements can be used in (5), provided that system observability is preserved. Since the state transition matrix of (4) is just the identity matrix, the linearized system is observable (and therefore the system observability matrix has a full rank) if and only if the rank of the Jacobian of (5) is 2N − 1 (if θ 1 is conventionally set equal to 0) or 2N (if θ 1 is measured by a PMU installed at the slack bus) at any time, i.e., ∀k [29]. This condition can be achieved by using any combination of true, pseudo-or virtual measurements (e.g., in the case of zero-injection buses). In principle, the solution of the DSSE problem is straightforward if an adequate number of PMUs is placed properly [9]. However, as briefly explained in Section 1, a large-scale PMU deployment at the distribution level could be infeasible or too expensive. Hence, without loss of generality in terms of the EKF applicability, a hybrid approach will be adopted and analyzed in Section 6. In particular, the slack bus voltage (which is continuously monitored) and the active and reactive power injection measurements at each bus in the following are used to ensure basic state observability. In addition, an increasing, yet limited, number of PMUs is used to refine state estimation accuracy. Since PMUs typically have separate input channels for voltage and current measurements, each deployed PMU is supposed to collect both voltage and current phasor measurements at a given time. Based on the assumptions above, the output function in (5) can be made explicit as [40] h( P(x k ) and Q(x k ) are the N−long vectors including the active and reactive power injections as functions of the state. In practice, the i-th elements of such functions are given by (2).

State Estimation Algorithm
In this section, the details of the proposed DSSE algorithm are described. In particular, first, the technique used to estimate the power injection variations at the input of the EKF is explained. Then, the structure of the EKF is presented. A block diagram of the whole estimator is shown in Figure 1.

Prior Estimation of Power Injection Variations
In (4), the inputs to the linearized system model are the differences of the active and reactive power injections between sampling times t k−1 and t k at all buses that are different from the slack bus. Usually, at the distribution level, just a few power injections can be observed directly due to the lack of a measurement infrastructure capable of collecting data from all buses at the same time. Often, the only power injections that can be actually measured are those at the slack bus or at buses located at the beginning of crucial feeder lines. As a consequence, the linear system of equations relating the unknown power injections of the unmonitored buses (as well as their variations in one sampling period) to the aggregate quantities measured at the few available buses is underdetermined. This means that the power injections at the unmonitored buses cannot be estimated univocally. Nevertheless, a single solution can be found if the equations of the underdetermined system are regarded as the constraints of an optimization problem minimizing a suitable objective function. In the case of compressive sensing problems, for instance, the system solution of interest is the one with the minimum number of non-zero elements. Instead, in the case at hand, the power injection variations under dynamic load conditions are generally not null, unless zero-injection buses are considered.  Thus, the function to minimize in this case is the mean square value of the overall uncertainty contributions associated with the active and reactive power measurements collected at the slack bus over subsequent time intervals of a given duration (e.g., one week), as explained below in greater detail.
If ∆u s k is the 2 × 1 vector including the active and reactive power injections differences at the slack bus between t k−1 and t k , and if ρ P and ρ Q are the relative active and reactive power losses over the whole grid in nominal conditions (e.g., computed a priori from power flow analysis), the aggregate active and reactive power injection variations at all the other buses can be estimated as where the negative sign on the right side of the equation is due to the fact that the power injected into the slack bus must meet the power demand at all buses regardless of the distribution losses. It is important to highlight that in practice the aggregate active and reactive power variations estimated with (7) are affected by the uncertainty contributions comprising the active and reactive power measurement errors, the unknown power losses variations between t k−1 and t k , and possible not modeled phenomena. As a result, the actual aggregate active and reactive power injection variations differ from the values returned by (7) by an uncertainty term δ k . Let T be the column vector of the corresponding uncertainty contributions; T be the 2(N − 1)K-long column vector obtained by stacking elements ∆u n (i−1) and ∆u n (i+N−2) for i = 2, . . . , N and n = k − K + 1, . . . , k (namely the active and reactive power injection variations at all the buses different from the slack bus in the observation interval Assuming, without loss of generality, that no zero-injection buses are present in the distribution system (possible zero-injection buses can indeed be excluded from the vector of unknown variables a priori), the following quadratic programming (QP) problem can be formalized, i.e., where: • Linear constraint (9) with is the underdetermined linear system that has to be solved to find the inputs to the EKF. In each equation, the sum at time t n ∈ [t k−K+1 , t k ] of the active or reactive power injection variations, respectively, at all buses different from the slack bus is set equal to sum of the aggregate value returned by (7) and the unknown uncertainty contribution δ n . The uncertainty contributions should be estimated as well. However, since they can be reasonably assumed to be small and their impact on (9) should be kept as small as possible, the minimization of their mean square value is indeed the objective function of the optimization problem (8). • Constraint (10) relies on the assumption that the mean value of variables δ k = [δ 1 k , δ 2 k ] T is zero and provides a lower bound to the variances of the uncertainty contributions affecting the aggregate active and reactive power variations, respectively. Therefore, it provides a lower bound to the objective function as well. Indeed, the variance values estimated over the observation interval [t k−K+1 , t k ] cannot be smaller than the square of the combined standard uncertainties (denoted with σ 2 δ P and σ 2 δ Q ) of the power injection variations measured at the slack bus.
• Finally, the equality constraint (11) forces the sample variances of the unknown active and reactive bus power injections variations to be equal to given values denoted as σ 2 ∆P i and σ 2 ∆Q i , respectively, for i = 2, . . . , N. Such values can be found a priori, for example, from past pseudo-measurements.
Again, (11) relies on the realistic assumption that the mean values of the power injection variations over time are null.
It is worth emphasizing that even if the number of variables is potentially large, the QP problem is standard and the constraint equations are quite simple. Therefore, the solution can be computed in real-time with standard tools, provided that K is not too large. This is very important because the solution of the optimization problem based on (8)-(11) must be updated at every time step (possibly restarting from the solution found at the previous time step) to provide timely inputs to the EKF. In practice, a weekly observation interval proved to offer a reasonable trade-off between computational burden and performance, as it will be shown in Section 6. Moreover, if the power injections at multiple buses are measured (e.g., not only at the slack bus, but also at the initial bus of a feeder line), the original QP problem can be partitioned hierarchically into smaller subproblems of the same kind, thus reducing computational complexity and improving estimation accuracy.

EKF Implementation
Let ∆u k denote the estimate of ∆u k extracted from the solution vector of the QP problem (8)-(11) at the present time instant t k . Since this vector is used as input to the EKF, the state of the linearized system (4) in the prediction step results fromx where the symbol· denotes the estimated quantities. The covariance matrix of the predicted state is instead computed as follows: where D k is the covariance matrix of ∆u k and W k is the covariance matrix of the process noise w k . It is worth noticing that in the initial observation interval (i.e., for k = 0, . . . K − 1), the QP problem based on (8)-(11) cannot be solved, since not enough data are collected. In this transient phase, the input vector ∆u k can be set to 0 and D k can be initialized as a diagonal matrix in which the non-zero elements are set twice as large as the expected variance of power injection fluctuations. In such conditions, the EKF still works (as already shown, for instance, in [26]), although results are certainly suboptimal. The Kalman gain of the filter is given by where H k = ∂h(x) ∂x x=x k|k−1 and R k is the covariance matrix of the measurement uncertainty contributions in ν k . Hence, in the update step, the estimated state of the system results from while its covariance matrix is given by Observe that (17) is the so-called Joseph form of the covariance matrix of the estimated state [41]. This formulation is indeed robust to possible numerical issues (especially in the case of large systems), as it enforces C k|k to be positive definite. In fact, if the covariance matrix were computed with the simpler expression C k|k = (I 2N−1 − K k H k )C k|k−1 , slight matrix asymmetries due to finite numerical precision might affect the result, thus possibly leading to sudden EKF divergence.

Case Study Description
To characterize the proposed state estimator, a modified version of the well-known IEEE 33-bus radial distribution system is used as a case study [42]. This choice stems from the fact that this grid example exhibits a typical radial structure (shown in Figure 2), the number of buses is reasonable (i.e., small enough to run one-year-long simulations within a reasonable time) and it does not include zero-injection buses. As a result, the optimization algorithm solving the QP problem (8)- (11) is tested under worst-case conditions since there are no inherently null power injection variations that can be excluded from the vector of unknown variables ∆ũ k . For the sake of completeness, the nominal active and reactive loads at all buses, as well the resistance and reactance of all lines, are reported in Table 1. Without loss of generality, the grid will be assumed to be fully passive (i.e., no distributed generators are present in the system). Therefore, the active and reactive power injections at all buses different from the slack bus will be equal to the respective active and reactive load power but with opposite sign, i.e., In order to obtain a realistic time series of the true state variables (i.e., bus voltage RMS amplitudes and angles) to be compared with the results of the DSSE algorithms (see Section 6), the nominal load values of the IEEE 33-bus system shown in Table 1 are replaced with one-year-long synthetic load profiles resulting from the aggregation of the residential and office electricity consumption generated with the software tool developed by Brodén et al. [43]. The time resolution of such profiles is 15 min. In this way, by running a power flow analysis through MATPOWER at each time step [44], the sequence of the actual values of state vector x k under time-varying quasi-static load conditions can be computed.   The individual synthetic profiles rely on a variety of models (e.g., based on Markov chains) that keep into account building size, heating and cooling needs, home and office appliance usage and level of occupancy at different times of the day [45,46]. As a result, they are much more realistic than those used, for instance, in [26], as their temporal autocorrelation depends on daily, weekly and even seasonal factors. As a consequence, even though the generated load profiles are random, they include common pseudo-periodic patterns that tend to remain constant if simulations over multiple years are performed. Quite importantly, the Brodén's application can synthesize an arbitrary large number of profiles of a different kind, but it does not allow setting wanted average and maximum load values. Therefore, to keep the results of the power flow analysis within acceptable limits, the load profiles of distinct buildings were selected, aggregated and assigned to different buses by solving the binary linear programming optimization problem described in Appendix A. In particular, given a set of approximately 2000 distinct building synthetic profiles, the optimization algorithm returns a minimum subset of them (usually a few tens for each bus), in such a way that the aggregate average power over one year is approximately equal to P i L nom and the corresponding overall sample variance is about γ 2 P i 2 L nom . The values of parameter γ are chosen by trial-and-error to ensure that the maximum load variations at all buses lie approximately within ±20%, ±40% or ±60% of the corresponding nominal values. It is worth emphasizing that even if distinct profiles are assigned to different buses, the correlation coefficients between pairs of aggregated loads can be significant (i.e., between 0.3 and 0.9). This is due to some features (e.g., daily and seasonal patterns) common to different synthetic profiles, even if they were generated independently by the Brodén's application. An example of yearly active load power profile (i.e., at bus no. 2 with variations within ±60% of the nominal value) is shown in Figure 3 (top picture) along with the corresponding histogram (bottom picture). Even if at a glance the overall profile looks quite irregular, a repetitive daily pattern is clearly visible in the inset. The histogram exhibits a strongly multimodal behavior that can be well described by a Gaussian Mixture Model (GMM) in accordance with other load models [47]. In the case at hand, the GMM probability density function resulting from histogram data fitting (dashed line in Figure 3

State Estimation Results
This section includes the results of the comparison between the proposed enhanced EKF and the WLS technique. In the following, first the DSSE-specific simulation settings are described and justified. They are partitioned into different groups, i.e., those needed to run the QP optimization problem for EKF input evaluation and those needed to simulate power injection measurements and PMU-based measurements. Whenever possible, EKF and WLS DSSE settings are the same to ensure a fair comparison between both estimators. Afterward, a subsection specifically focused on estimator performances is reported.

QP Settings for EKF Input Evaluation
The solution of the QP problem based on (8)-(11) relies on a variety of parameters that have to be set properly. First of all, as introduced at the end of Section 4.1, one-week-long observation intervals shifting by 15 min at a time for a full year are considered for simulations. Hence, K = 672. This time interval provides a good trade-off between the total number of variables of the QP problem and the accuracy in estimating the variance of power injection variations based on the record of collected data. The average relative active and reactive power loss coefficients obtained from a preliminary power flow analysis of the IEEE 33-bus grid in nominal conditions are ρ P = 5.2% and ρ Q = 5.5%. The values of σ δ P and σ δ Q considering both the measurement settings at the slack bus (described more in detail in Section 6.3) and the power losses fluctuations in time-varying load conditions (as they result from preliminary Monte Carlo simulations) range between about 7% and 9% of the standard deviations of the actual active and reactive power injection variations at the slack bus. Finally, the values of σ ∆P i and σ ∆Q i for i = 2, . . . , N depend on the actual load variability. Since the maximum weekly load changes in different tests depend on the season and lie within ±20%, ±40% and ±60% of the nominal load values, the corresponding standard deviations also change with time and do not exceed 7%, 13% and 20% of the nominal load values, respectively. Thus, given that the correlation coefficients between pairs of consecutive load values generated by the Brodén's application range between 0.3 and 0.4, the values of σ ∆P i and σ ∆Q i are in the order of about 10%, 21% and 32% of the bus nominal load values, respectively, for every load variability conditions considered. Also, the covariance matrix D k of the EKF input data sequence to be used in (13) results simply from D k = 1 K ∑ k n=k−K+1 ∆u n · ∆u T n .

Process Noise (EKF Only)
The elements of the covariance matrix W k needed to compute (14) result from W k = J −1 x k−1 is defined in Section 3.1 and E k is the presumably stationary matrix of linearization errors vector e k . In practice, the eigenvalues of E k (expressed in p.u.) are generally at most in the order of 1 ×10 −9 , i.e., a few orders of magnitude lower than those of D k . The only exception is the variance associated with V 1 , as it includes the physiological voltage fluctuations at the slack bus. In fact, this is set to 2.8 ×10 −4 , which corresponds to normally-distributed voltage fluctuations within ±5% of the nominal value. Consider that, without loss of generality, θ 1 is assumed to be 0 (as customary in state estimation problems) and therefore it is excluded from the state variables vector as well as from the computation of the process noise covariance matrix.

Power Injection Measurements
The power injections at the slack bus are assumed to be measured by a power quality analyzer with overall standard uncertainties equal to 0.4% of P 1 k and 1.4% of Q 1 k , respectively, as reported in [48]. All the other active and reactive power injections are estimated in four alternative scenarios characterized by an increasing "smartness" of the grid monitoring infrastructure.

1.
The first scenario refers to the classic case in which just historical a priori information on power injections (i.e., pseudo-measurements only) is available. In this case, the unknown delays and the time misalignment between the collected data strongly affect measurement uncertainty, which consequently is as large as load inherent variability (scenario 1).

2.
The second scenario relies on the assumption that power injection measurements rely not only on pseudo-measurements, but also on the data collected by the SMs every 15 minutes. Such data are assumed to be aligned in time with a delay of a few seconds before being aggregated (scenario 2). In this case, we assume that the weight of SMs measurements on the total relative variance of power injection measurement data is about 33% at all buses (scenario 2).

3.
The third scenario is similar to the second one, but a deeper penetration of SMs is envisioned. Therefore, the weight of SMs measurements on the total relative variance of power injection data reaches about 67% (scenario 3).

4.
Finally, the fourth scenario relies on the assumption that all power measurements are collected from customers' SMs and are properly aligned in time before being aggregated. Of course, in this case, the traditional pseudo-measurements are no longer used (scenario 4).
The relative standard measurement uncertainties of pseudo-measurements are set equal to 1/3 of the expected maximum load variability, as customary in state estimation problems [49]. The SM measurement uncertainty instead relies on the quite deep analysis reported in [50], which is indeed specifically focused on the uncertainty of aggregated SM data for state estimation. According to various international Standards (most notably the Standard ANSI CI12.20), the SM accuracy must lie within ±0.2%, ±0.5%, ±1.0%, and ±2% of the reported value, depending on the instrument class considered. However, the research in [50] reveals that other contributions may increase the uncertainty of aggregated SM data. Such contributions include: power losses along the lines (in the order of about 1.5%), possible missing or bad data replaced by sporadic pseudo-measurements (up to 4% of total traffic), phase identification errors (if any) and meter clock deviations (up to a few seconds). Last but not least, the accuracy of instrument transformers (especially the current transformers) is generally comparable with the accuracy of the SM itself (i.e., ranging from the fraction of percent to a few percent [51]). Given that all the aforementioned uncertainty sources are not correlated, assuming to use Class 1 SMs and randomizing the other terms uniformly within the range reported in [50,51], the combined relative standard uncertainties associated with the SM data aggregated at different buses span from 0.7% to 1.6% [52].
Of course, the relative variances used to compute the diagonal elements of matrix R k used in (15) depend on the actual fractions of SM-based and pseudo-measurements employed, namely on the specific scenario considered, as well as on the actual power injection values at a given time.

PMU-Based Measurements
The RMS amplitude and angle of some bus voltage and line current phasors are measured assuming to deploy an increasing amount of PMUs. To this purpose, a variant of the greedy PMU placement algorithm proposed in [40] is used (further details are reported in Appendix B). A key advantage of this approach is that, although suboptimal, it requires just a polynomial computation time. Moreover, system observability is not a problem as it is provided by power measurements. In particular, assuming that each PMU has just with two sets of channels (i.e., one for voltages and another for currents), the PMUs are placed one at a time to measure one bus voltage phasor and one line current phasor, while minimizing the worst-case state estimation uncertainty. As a consequence, the number of rows of matrices H k and R k in (15) as well as the number of elements of z k and h(x k|k−1 ) in (16) depend on the amount of deployed PMUs.
Based on the IEC/IEEE Standard 60255-118-1:2018 [53], the PMU Total Vector Error (TVE) (an accuracy parameter that keeps into account the joint effect of relative amplitude and angle phasor uncertainty contributions) in steady state conditions cannot exceed 1%. However, PMUs for active distribution systems are expected to be much more accurate [54]. Therefore, PMUs with a maximum TVE of 0.1% or 1% are considered in the present study. Moreover, the uncertainty contributions affecting the TVE value are supposed to be evenly split between relative RMS amplitude and phase errors. This means that the standard measurement uncertainties used in simulations are set to 0.033% or 0.33% of the RMS amplitude and to 0.033 crad or 0.33 crad (for angle measurements, respectively, depending on whether the maximum TVE is 0.1% or 1%).

Performance Analysis
In this section, not only the performance of the proposed enhanced EKF is analyzed, but a side-to-side comparison with a classic WLS estimator chosen as a benchmark is also shown. In both algorithms, the states are initialized tox 0|0 = [0, . . . , 0, 1, . . . , 1] T . In the former one, the state covariance matrix is set to C 0|0 = diag{0.01, . . . , 0.01, 0.01V 2 nom , . . . , 0.01V 2 nom }, as these values are much larger than the expected eigenvalues of the steady-state covariance matrix. The WLS algorithm instead does not need covariance matrix initialization. Also, it is implemented through orthogonal QR factorization to improve numerical robustness [4].
A qualitative overview of the behavior of the enhanced EKF is visible in Figure 4, which shows some voltage magnitude profiles (top picture) and the corresponding voltage angle profiles (bottom picture) estimated by the EKF in the first week of January, namely at the beginning of the simulation. In this example, worst-case conditions are considered, i.e., load variability within ±60% of the nominal values with no PMU or SM data used in the update step of the filter. Therefore, the EKF relies only on power and voltage measurements at the slack bus and traditional power injection pseudo-measurements at all the other buses. The diamond markers represent the actual values of the state variables as they result from the preliminary power flow analysis. Even though the results shown in Figure 4 are limited, they are interesting because they show that:

•
The initial EKF transient is quite short (just a few samples). This is most probably due to the good initialization of the EKF; • The estimated state tracks the pseudo-periodic voltage pattern resulting from the time-varying load conditions shown in Figure 3 very well; • State estimation accuracy is clearly high, since the differences between estimated and actual values at a given time are very small.
The estimation performance of the enhanced EKF and WLS estimator is analyzed and compared more in depth by computing the 99-th percentiles of the RMS voltage relative estimation errors and the phase estimation errors over all buses. Each estimation error results from the difference between the state variables returned by either estimation algorithm and the corresponding values obtained with the power flow analysis, when the same set of load profiles is used. The Monte Carlo simulations are performed in two weeks of all the months of one year, for a total of 16,128 samples. Also, the simulations above were repeated in different conditions of maximum load variability (i.e., ±20%, ±40%, ±60%), in the four SM penetration scenarios described in Section 6.3, by increasing the number of deployed PMUs and, last but not least, considering two different PMU accuracy levels, i.e., for maximum TVE values equal to 1% or TVE = 0.1%, respectively, as explained in Section 6.4.    Figure 5a,b shows the 99-th percentiles of the relative RMS voltage estimation errors and the phase estimation errors as a function of the number of deployed PMUs, when just traditional pseudo-measurements (scenario 1) are used to ensure system observability. The maximum TVE of all PMUs is set to 1%. The solid and dashed lines are related to the proposed enhanced EKF and the benchmark WLS estimator, respectively. The markers refer to different maximum load variations, i.e., ±20% (dot markers), ±40% (square markers) and ±60% (star markers). Quite interestingly, all curves tend to converge asymptotically to different lower bounds. Such results are consistent with those reported in [40]. In fact, if grid observability is guaranteed by conventional measurements, instrumenting about 1/3 of the grid buses with PMUs is enough to abate state estimation uncertainty to minimal levels in steady-state or quasi-steady-state conditions. The 99-th percentile values highlight that the estimation uncertainty achievable with the enhanced EKF is significantly smaller than the WLS one, especially as far as RMS amplitude is concerned. In particular, the average reduction (computed over the number of deployed PMUs) of the 99-th percentiles of the relative RMS voltage estimation errors with respect to those obtained with the WLS algorithm ranges from about 86% in the case of load variations within ±20%, down to about 64% in the case of load variations within ±60%. The relative improvement in estimating the angle state variables is clearly visible but it is quite smaller, although the curves exhibit a similar trend. The bar diagrams in Figure 6a,b provide complementary information, as they depict the 99-th percentiles of the relative RMS voltage estimation errors and the phase estimation errors, respectively, for an increasing number of PMUs of different accuracy (i.e., with TVE = 0.1% or 1%). Again, the 99-th percentiles are computed from the errors of the enhanced EKF estimator and the benchmark WLS algorithm over all buses and over a whole year of simulations when just active/reactive power injection pseudo-measurements are used (scenario 1). Note that the WLS estimation uncertainties of both amplitude and phase state variables are generally larger than those obtained with the enhanced EKF in the same conditions, i.e., using the same number of PMUs with the same accuracy. Moreover, the 99-th percentiles of the RMS amplitude errors in the WLS case are more sensitive to PMU accuracy. Indeed, the performance gap between the WLS and the enhanced EKF estimator tends to become quite small when the TVE of the deployed PMUs decreases from 1% to 0.1%. Ultimately, the enhanced EKF estimator can perform as well as the WLS technique (or even slightly better) by using either the same number of PMUs, but with lower accuracy (i.e., potentially cheaper instruments) or less PMUs with the same accuracy. In order to investigate in a more compact way the joint impact of PMU and SM data on state estimation uncertainty, in the following the 99-th percentiles of the relative state estimation vector errors are also computed. The relative estimation vector error of the i-th bus voltage phasor at time t k is defined as Figure 7a,b shows the 99-th percentiles of the relative vector estimation errors in scenarios 1, 2, 3 and 4 (namely for increasing fractions of accurate time-aligned SM data replacing traditional pseudo-measurements) when from one to nine PMUs with a maximum TVE = 0.1% or 1%, respectively, are deployed in the grid. The bar diagrams on the left side result from the WLS algorithm, whereas those on the right side refer to the proposed enhanced EKF. Again, the load fluctuations lie within ±60% of the respective nominal values. The positive impact of real-time SM measurements on the 99-th percentiles of the state estimation errors is strong and well visible in all cases, but it is more evident if the number of deployed PMUs is small. In fact, a growing fraction of well-aggregated SM measurements tends to desensitize the impact of PMUs on state estimation uncertainty. In other words, for a given fraction of aggregated SM data, deploying more than three PMUs in the EKF case or more than five PMUs in the WLS case, respectively, brings about just a minor reduction in state estimation uncertainty in stationary or quasi-stationary conditions. Moreover, if SM-only aggregated data are used to measure power injections, the reduction of state estimation uncertainty obtained with the EKF estimator by adding more than one PMU is negligible even when the TVE is 1%. With the WLS estimator instead, a similar result can be achieved only if the TVE of the PMUs is one order of magnitude smaller.

Discussion
Like any other DSSE algorithm, the proposed enhanced EKF is also supposed to support control and protection functions in smart distribution systems. The presented performance analysis refers mainly to time-varying, but quasi-stationary operating conditions. For instance, the estimated state variables could be used as inputs to control algorithms for optimal power dispatching within demand response programs. For this purpose, it is quite clear that a widespread deployment of high-accuracy SMs possibly (but not necessarily) complemented by a few PMUs installed in key points of the grid usually either provides more accurate results or it ensures comparable results but at a lower cost than the WLS technique.
Of course, this analysis does not keep into account the very different reporting rates of SMs and PMUs. While the former transmit the collected data every several minutes, the latter can perform measurements at a much higher rate (i.e., about every power line cycle). Thus, whenever sudden changes of the system operating conditions (e.g., due to a change of topology or a fault) have to be detected promptly, PMUs play a more crucial role for DSSEs and cannot be replaced just by SMs, regardless of how accurate they are. In such situations, the impact of an increasing number of PMUs on DSSE uncertainty is likely to be higher than in quasi-steady state conditions. However, a trustworthy quantitative performance analysis during transients cannot be reported in this paper because of the inherent limits of the adopted software tools (particularly MATPOWER).
Nonetheless, from the methodological point of view, the use of measurement data collected at different rates is not a problem for the proposed estimator. Indeed, PMU and SM data could be used to update the state as soon as they are available by using an iterated EKF implementation [41].
It is worth emphasizing that the proposed methodology can be safely applied also in the presence of widespread distributed energy resources basically with no change. Indeed, the bus power injection variations resulting from the solution of the QP problem (8)- (11) include the joint effect of load changes and generated power fluctuations. Therefore, the estimator structure does not need any change in the case of active distribution systems. On the contrary, since many distributed generators (e.g., photovoltaics) are equipped with their own meters, additional measurement data could be used in the update step of the EKF.
The main current limitation of the proposed estimator is its processing time. This is not due to the EKF per se, but rather to the time needed to solve the QP problem returning the EKF input data in the prediction step. In fact, the time to return a single EKF estimate is just a fraction of the WLS technique one (e.g., between 20 and 30 ms vs. up to 70 ms on a PC equipped with a 2.80-GHz Intel Xeon E3 microprocessor, 32 GB of RAM, 64-bit Windows 10 Pro and Matlab R2019b). On the contrary, the time needed to solve the QP problem (8)-(11) over a week-long observation interval is in the order of a few seconds due to the large number of variables involved. While this is not critical in quasi-steady state conditions (the time between subsequent DSSE estimates is usually quite longer than a few seconds), the state estimation performance under transient conditions might be affected by the computation time. In addition, in the case of large systems, possible scalability problems may arise. However, as explained in Section 4.1, estimator scalability can be improved by partitioning a monolithic QP problem into smaller subproblems, i.e., one for each feeder line in which the initial bus is properly instrumented and monitored.

Conclusions
This paper presents an Extended Kalman Filter (EKF) for Distribution System State Estimation (DSSE) enhanced with a prior evaluation of the bus power injection variations, as they are used as inputs to the EKF prediction step. The performances of the proposed algorithm are analyzed and compared with those of a classic Weighted Least Squares (WLS) estimator by changing maximum load variability, fraction of deployed smart meters (SMs), number of Phasor Measurement Units (PMUs) and PMU accuracy. Multiple simulation results show that, in the considered case study, the 99-th percentiles of the RMS voltage estimation errors achieved with the proposed enhanced EKF are at least 60% lower than the corresponding values obtained with the benchmark WLS technique. The estimation uncertainty reduction is instead smaller, although still significant, for voltage angle state variables. Quite interestingly, the sensitivity to PMU accuracy of the RMS voltage state variables estimated by the enhanced EKF is lower than in the WLS case. Moreover, in quasi-stationary operating conditions, the impact of PMUs on the estimation uncertainty of the proposed estimator becomes minor if a large fraction of synchronous aggregated SM data replaces the traditional power pseudo-measurements. Possible future developments of this work are: the implementation of an iterated EKF in which measurements at different rates can be used to update the estimated state as soon as they are available (which would also improve numerical robustness) and the inclusion of power injection forecasting models to further decrease estimation uncertainty.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A. Load Aggregation Algorithm
Let L be a set consisting of L time-varying load profiles with L much greater than the number of buses, i.e., L >> N. Let us denote withp l and s 2 P l the average and the sample variance (computed over time) of the l-th load profile for l = 1, . . . , L. Assuming thatp l << P i L nom and s P l << P i L nom for i = 1, . . . , N (with P i L nom being the nominal load at i-th bus), the purpose of the adopted aggregation methodology is to find N disjoint subsets L i ⊂ L for i = 1, . . . , N such that:

1.
The sum of all the average load profiles included in the i-th subset is approximately equal to the nominal power injection at the corresponding bus, i.e., ∑ l ∈ L np l ≈ P i L nom .

2.
The relative sample variance of the power at the i-th bus is approximately equal to a given fraction γ 2 of P i 2 L nom , i.e., ∑ l∈L i s 2 P l ≈ γ 2 P i 2 L nom .
Observe that conditions (1) and (2) are just approximate since L consists of a discrete number of elements and the values ofp l and s 2 P l of the profiles are finite.
If y i is a binary column vector in which the l-th element y i l is set to 1 if the l-th load profile belongs to L i or 0; otherwise, the load profiles can be aggregated by solving an integer linear programming optimization problem. In particular, for every bus i = 1, . . . , N, the minimum number of load profiles to be aggregated can be obtained by finding the binary configuration y i * = arg min y i 1 T y i s. t.
where the cost vector 1 is just an L−long, all-ones column array, p = [p 1 , . . . ,p L ] T , s 2 = [s 2 P 1 , . . . , s 2 P L ] T , and, finally, T 1 and T 2 are positive tolerance thresholds needed to relax conditions 1) and 2) in order to make the optimization problem feasible. Note that the zero-equality constraints in (A1) are incrementally added to the problem to prevent that the same load profiles are aggregated multiple times, i.e., at different buses. As known, the integer linear programming problems are NP-complete. However, feasible solutions of (A1) can be found within a reasonable time (i.e., a few hours) by applying standard techniques (such as the cutting plane or the branch and bound methods) provided that parameters T 1 and T 2 are set large enough (e.g., 0.5% and 2% in the case at hand).

Appendix B. PMU Placement Algorithm
As briefly explained in Section 6.4, the PMUs are placed incrementally with a greedy approach in order to minimize the maximum estimation uncertainty. Initially, all buses different from the slack bus (the voltage of which is steadily monitored) are eligible for PMU placement. Thus, one PMU at a time is used to measure both the voltage phasor of each bus and the current phasor of one of the lines connected to the same bus. Let M and |M| be the set and the number, respectively, of all possible pairs of voltage and current phasors that can be monitored by a single PMU. Recalling that full system observability is ensured by the other available measurements, the steps of the greedy placement algorithm can be summarized below.

•
For each PMU configuration m ∈ M, the state of the system under given load conditions (i.e., with known maximum variability) is estimated by the EKF or the WLS algorithm N t times, with N t large enough to ensure a reasonably accurate estimate of the state covariance matrix, i.e., in the order of a few hundred.

•
If the sequences of estimation errors are realizations of ergodic processes (as it should be when either the EKF or the WLS estimator is used), the state covariance matrixC m associated with the m-th PMU configuration can be estimated directly from the steady-state estimation error sequences. In the EKF case,C m tends also to coincide with the solution of the linearized matrix Riccati equation.

•
Once all covariance matricesC m for m ∈ M are computed, the PMU configuration is selected as follows, i.e., m * = arg min m∈M max Eig (C m ), where Eig(·) operators returns the eigenvalues of the argument.

•
Afterward, configuration m * is removed from set M, i.e., M ← M\{m * }, and the algorithm starts over.
Of course, the algorithm terminates as soon as N − 1 PMUs are placed. Consider that, the maximum eigenvalue ofC m in (A2) can be regarded as the radius of the hypersphere, centered in the estimated state, circumscribing the (2N − 1)−dimensional ellipsoid representing the state estimation uncertainty region. Therefore, function (A2) provides a conservative over-estimate of the state estimation standard uncertainty. Of course, since the elements ofC m depend on the specific DSSE algorithm adopted, the sequences of PMU to be placed associated with the EKF and the WLS algorithm could be quite different.