Smart Grid State Estimation with PMUs Time Synchronization Errors

We consider the problem of PMU-based state estimation combining information coming from ubiquitous power demand time series and only a limited number of PMUs. Conversely to recent literature in which synchrophasor devices are often assumed perfectly synchronized with the Coordinated Universal Time (UTC), we explicitly consider the presence of time-synchronization errors in the measurements due to different non-ideal causes such as imperfect satellite localization and internal clock inaccuracy. We propose a recursive Kalman-based algorithm which allows for the explicit offline computation of the expected performance and for the real-time compensation of possible frequency mismatches among different PMUs. Based on the IEEE C37.118.1 standard on PMUs, we test the proposed solution and compare it with alternative approaches on both synthetic data from the IEEE 123 node standard distribution feeder and real-field data from a small medium voltage distribution feeder located inside the EPFL campus in Lausanne.


Introduction
Traditionally addressed at the transmission level of the power system where it is formulated as a nonlinear weighted least squares problem [1], by far, State Estimation (SE) is one fundamental task to properly operate the system [2]. However, the smart-grid paradigm, with increasing penetration of renewables and the transition from passive to active non-linear loads, e.g., electric vehicles and storage, while progressively filling the traditional separation between transmission and distribution networks, is calling for system-level solutions to address issues crossing the entire grid such as demand side management and demand-response [3] just as an example. This need, in turn, calls for flexible solutions which can be easily adapted and used across all the grid infrastructure, thus putting SE back in the spotlight both in transmission and distribution networks. In this transitioning process, a central and potentially revolutioning role is played by Phasor Measurement Units (PMUs). Ideally synchronized with the Coordinated Universal Time (UTC) [4], PMUs are able to provide the phasor of an electrical waveform, i.e. voltage or current, thus enabling the possibility of directly measuring rather than estimating the state variables [5]. While initially conceived for transmission systems, because of their accuracy and fast reporting rate, PMUs are acquiring significant interest also at the distribution level where can be used to support different applications -protection and stability assessment [6], power quality evaluation [7], management of fast time-varying loads [8] -among which SE [9,10,11] is one of the most relevant. To avoid large phase errors and achieve high measurement accuracy, PMUs require almost prefect synchronization with the reference time (UTC). Hence they are usually equipped with expensive GPS units which, in turn, compromise their large-scale use. Consequently, given the limited number of PMUs that can be deployed in the system, counterbalanced by their high performance, most solutions to the SE problem propose to combine perfectly synchronized phasorial measurements, coming from a small number of PMUs, with measurements gathered from conventional smart meters [12]. Conversely, different source of uncertainty are present in PMUs with time synchronization representing only one of them [13]. Indeed, GPS provides 1pps (pulse-per-second) synchronization signals, with a theoretical accuracy of 1µs which affects synchronization offsets, while internal PMUs clocks present frequency deviations which can produce large time skews. In view of the this, the SE problem in the presence of synchronization error has been studied in the literature. In [14] a first investigation on the effect of sync error is provided together with a static distributed estimator suitable for distribution-only grids. In [15], thanks to a small angles approximation, the authors formulate a measurement model which is bilinear w.r.t. grid state and sync error parameters. Then, two parallel Kalman filters are used to approximately and simultaneously solve for state and sync error estimation. The very same model is exploited also in [16]. Finally, in [17] only PMUs offset error due to GPS is considered and a precision time protocol to support the synchronization is proposed.
In this paper, in the spirit of [12] and building upon our preliminary work [18], we address the problem of SE based on PMU measurements and load pseudomeasurements where we explicitly consider the presence of time synchronization error in the PMU measurements. In particular, we are interested in the simultaneous estimation of the grid state as well as of the synchronization error parameters (i.e., offset and skew). We show how such error easily leads to poor estimation performance if the measurement model does not properly account for it, even with high PMUs penetration.
The contribution of the paper is twofold: i) a Kalmanbased algorithm for the simultaneous estimation of the state and of the synchronization error parameters in realtime. Based on the linear model proposed in [19] we are able to approximate the power flow manifold around any feasible working point and to explicitly compute off-line accurate expected performance without resorting to time-consuming Monte-Carlo simulations. Also, as a byproduct of the model choice, our methodology seamlessly applies to transmission as well as to distribution grids. ii) Resorting mainly to the IEEE C37.118.1 [20] standard on PMUs, we quantify the impact of synchronization errors in state estimation and show that, if not compensated for, they can shadow the possible benefits of using PMUs altogether. To complement our contributions, we compare the proposed solution against [12] on synthetic data from the IEEE 123 node distribution feeder as well as against the recently proposed estimator [9,10] on real-field data collected from the smartgrid located inside the EPFL campus [21].

Network Model
Here we present the linearized power network model used. Building upon the recent [19] to which we refer the interested reader for all the mathematical details, validation and assessment of the model, we model an AC power network under synchronous sinusoidal steady-state condition as a graph G(V, E). The nodes set V = {1, . . . , n} denotes the electric buses, while the edges set E denote the set of electric branches between connected buses. In synchronous steady-state regime, for each bus h ∈ V, we define: • u h = v h e jθ h ∈ C complex voltage at the bus terminal where v h , θ h ∈ R are the modulus and phase of the complex phasor, respectively; • i h ∈ C complex current injected at the bus; • s h = p h + jq h complex (apparent) power absorbed by the bus where p h , q h ∈ R are the active (real) and the reactive (imaginary) power, respectively.
Also, we define the nodal admittance matrix Y ∈ C n×n element-wise as where y hk is the admittance of the electric line (h, k) connecting bus h with bus k, while y sh h is the shunt admittance (admittance to ground) at bus h. By conveniently collecting all the nodal quantities into vectors u = [u 1 , . . . , u n ] T , i = [i 1 , . . . , i n ] T , s = [s 1 , . . . , s n ] T , Kirchhoff's law and the nodal power balance read as where (·) denotes the complex conjugate operator and diag(·) denotes the diagonal matrix with ii-th diagonal element equal to the i-th element of its vector argument. Finally, by combining the two above equations, one gets which represent n complex equations, i.e., the power flow equations, that must be satisfied by any feasible power flow.
At this point, we recall the main result of [19] which will let us linearize the nonlinear power flow equations (1) around any feasible point in the power flow manifold. We start by defining the power network state vector as where v, θ, p, q ∈ R n are obtained stacking together the corresponding nodal quantities. Then, by expressing the complex Eq. (1) in rectangular coordinates, it is possible to rewrite them in implicit form as F (ξ) = 0, F : R 4n → R 2n , and, in turn, implicitly define (Lemma 1 of [19]) the power flow manifold Proposition 1 (Proposition 1 of [19]) Let ξ * ∈ M, i.e., Then, the linear manifold tangent to M in x * is given by where u * := v * e jθ * , I is the identity matrix of suitable size and Proposition 1 conveniently states how to reconstruct the best linear approximant, i.e., the plane tangent to M at ξ * , of the power manifold M at the feasible point ξ * . Interestingly, assuming A u * invertible 1 , it is possible to express the voltage deviations δv := v − v * and δθ := θ − θ * (in polar coordinates) as linear functions of the power deviations δp := p − p * and δq := q − q * (in rectangular coordinates) Interestingly, the implicit formulation of Eq.
(2) defines all the voltages and power injections that are compatible with the physics without assuming any a priori model for the network's buses such as the typical PQ, PV or slack buses. Hence, as stated in [19], one strength of the presented linear formulation is that it holds for any admissible working point ξ * ∈ M and that it can generalize many previously presented linear approximation such as the Linear Coupled power flow model [22], the DC power flow model [23] and the rectangular DC power flow model [24]. We refer to [19] for all the details.

Measurement models
In the spirit of [12], we assume to have at disposal two types of information: i) ubiquitous historical data series of active and reactive power demands, used to provide the estimator with a rough prior knowledge of the state; and ii) sparse real-time high-accuracy phasorial measurements, used to refine the estimate.

Power Demand Time-Series
The first source of information are historical data series of active and reactive power demands collected at each bus from low-cost largely-available and low accurate smart meters. Namely, at node h ∈ V and time t ∈ Z + we have

PMU Measurements
The second source of information are phasor measurements coming from high-cost and highly accurate Phasor Measurement Units which, because of their cost, are usually deployed only at a limited number of electric buses. We recall that, to provide high accuracy phasorial values, PMUs are equipped with a GPS module exploited for synchronization purposes. Because of this, it is a far common assumption to consider PMUs as perfectly synchronized, thus neglecting the impact of the synchronization uncertainty on the resulting measurements. However, even in the presence of GPS modules, PMUs can still suffer of lack of synchronicity due to, e.g., temporary occlusion of satellites [26]. In addition to this, within successive synchronization instants with the GPS module, usually providing 1pps (pulse-per-second) synchronization signal, PMUs exploit an internal oscillator as reference clock which, in turn, might cause additional synchronization uncertainty. Hence, depending on the type of GPS module and oscillator, different synchronization accuracy, directly proportional to the cost of these devices, can be achieved. Ultimately, the measurements at bus h at time where we set 2 σ pmu,v = 0.1%, σ pmu,θ = 10 −3 [rad], and assume uncorrelated measurement noise within the same node and across different nodes, i.e., Indeed, w v is mainly due to sampling jitter and synchronization error while w v to the instrumentation amplitude noise. Differently from standard Gaussian additive models, the additional term d h (t) in (6) represents the error with respect to the true universal time. Since this component mainly affects the angle measurements [13], we restrict our analysis to synchronization errors (also referred to as de-sync) affecting voltage 3 angles only. In particular, we consider a clock error model which, within successive synchronization instants kT, (k +1)T, . . ., being T the GPS synchronization period, has the form where M is the number of PMU measurements collected within two successive synchronization instants and β h , α h ∈ R are an offset term due to GPS error and the clock skew due to the fact that the internal clock of the PMU in general does not oscillate at the reference frequency, respectively.
referring to the k-th re-sync instant and to the t-th measurements within [kT, (k + 1)T ), see Figure 1. However, with a slight abuse of notation, in the following when talking about the evolution of a discrete quantity x, instead of writing x(τ (k, t)) we use the simpler x(k, t).

Estimation
By taking advantage of the linear model (4), the measurements and their statistical information, our final goal is to provide an accurate and real-time state estimate as output of a Kalman-based algorithm which, conversely to standard procedures, explicitly considers the effect of synchronization uncertainty in the measurements units. We cast our estimation procedure as a Bayesian inference process. More specifically, power demand predictions, characterized by a low accuracy, are used to provide a prior for our Bayesian model while highly accurate PMU measurements are used to improve the state estimate.

State-Space Model
Our estimator consists of a Kalman filter [28], hence a suitable state-space model is needed. Conventionally in the power system literature, since the network state are the voltages in rectangular coordinates, those are also chosen as filter state. Conversely, as state vector for the filter, we pick the power demand deviations δp, δq ∈ R n together with the synchronization error parameters 4 α, β ∈ R m . Our choice is not restrictive and naturally arises from (4) and (6). Now, since the incremental linear model (4) is defined w.r.t. a predefined operating point, assume p * and q * are nominal demands to which correspond u * = (v * , θ * ). Then, state and output models at τ (k, t), k ∈ Z + , t ∈ {0, . . . , M −1}, consists of where W can be used to embed information among the process noise covariance estimated, e.g., from data. In the following, given the small re-synchronization period, we usually consider W = 0 and outline an interesting closed-form analysis. Since the above model is incremental with respect to the nominal value ξ * and since the de-sync parameters can assume both positive and negative values, the state is reasonably initialized as a zero mean Gaussian random variable.

Remark 3 (Exact linear output model)
It is worth observing that, due to the linear relation (4) between buses power p, q expressed in rectangular coordinates and voltage v, θ expressed in polar coordinates, the desynchronization enters linearly in the output model (9) without any further approximation. This is opposed to standard approaches in the literature where, to deal with linear models, the network state is expressed in rectangular coordinates, i.e., real and imaginary parts of the voltages. In this case, to resort to linear output models, the synchronization error must either be assumed or approximated as purely imaginary [12,14], under the additional  t)) end for end for assumption of small voltage angles differences. 5 . Also, even in the case when no sync error is considered, i.e., d(t) = 0, observe that phasorial measurements are practically collected in polar coordinates. Hence, by expressing the output model with the same representation, we do not need any further manipulation of the data, i.e., projection from polar to rectangular coordinates, which, in turn, requires re-computation of the measurements correlation, inevitably introducing additional errors.

Synchronization-aware State Estimator
Thanks to Eqs. (8)-(9), we have at disposal a complete linear model to built a Kalman filter [28] to simultaneously estimate network state and de-sync parameters. Algorithm 1 describes what we refer to as Synchronization-aware State Estimator, hereafter denoted with SASE. Observe that to run Algorithm 1 values for p * , q * , v * and θ * are required from which y and H are derived. By leveraging the information coming from the available power demand time series, p * and q * are computed as one-day a-head predictions. Then, by means of a single full AC power flow computation it is possible to compute the corresponding values for v * and θ * . Note that, since Σ(t) does not depend on the measurements, its evolution can be computed offline and stored for t ∈ {0, . . . , M − 1} thus alleviating the computational burden. Finally, thanks to Eq. (4), the estimated voltages are equal to 5 The assumption of small voltage angle differences usually holds for power distribution grids where the voltage values are clumped together in the proximity of the voltage value at the point of common coupling (PCC). However, the same does not hold in power transmission grids. [29] where, by partitioning Σ as Σ 0 , the covariance is given by As a side note, observe that both the model (8)-(9) and Algorithm 1 are outlined for t ∈ {0, . . . , M − 1} for a given k ∈ Z + . As suggested by Figure 1, since at τ (k, 0) the PMUs re-synchronize with the GPS, the filter is reinitialized to reset α and β and allow the computation of a new estimate. Similarly, newly available p * , q * and new data can be used to recompute the model and W , respectively.
5 Two-nodes case with no process noise (W = 0) Consider a network consisting of one load connected to one generator (the PCC, v pcc = 1, θ pcc = 0) through a purely inductive line with susceptance b = −1[p.u.], in the absence of shunt admittance. For the sake of the analysis, we assume the load is absorbing only active power p while q = 0. In this case the flat profile is a particular solution which can be chosen as linearization point. Thus, by leveraging the linear model (3) one has p = θ and v = 1 being v and θ the voltage magnitude and the phase at the load, respectively. Notice that since we assumed q = 0, the voltage magnitude is fixed and equal to 1 thus only p = θ is of interest. Now, assume to collect, within two successive sync instants, M phase measurements of the form (6) which, as in (9), can be expressed as . Furthermore for the sake of the analysis, let us assume absence of process noise, i.e., w x = 0, W = 0 which, in the case of reasonably stable power demands within T [s], represents an acceptable first order approximation. Then, the posterior variance matrix in information form reads as and, thanks to (10), after some tedious but straightforward algebraic manipulations, it is possible to compute Σ = Σ(σ pmu,θ , σ θ , σ β , σ α , M, T ) in closed form (reported in [30] for space reasons). Interestingly, it can be seen that, in the limit of the product M T , it holds that, and, in particular, as shown in [30], that Hence, while for growing M or T , σ 33 → 0 meaning that the uncertainty on the skew parameter goes to zero and, consequently, the parameter is perfectly estimated, residual uncertainty remains on both θ and β for which σ 11 , σ 22 → 0. As can be seen from the output matrix C, this is due to the fact that θ and β are linearly dependent. Nonetheless, similarly to σ 22 , even σ 11 and σ 12 are functions of the product σ θ σ β . Thus σ 11 , σ 12 , σ 22 → 0 for σ θ σ β → 0, meaning that if σ β = 0 then θ is perfectly estimated and viceversa. As highlighted later in the simulation section, this suggests that the different performance between our proposed SASE and what will be referred to as Ground Truth (GT) is majorly due to this linear dependence.

Simulations
In this section we test the proposed SASE algorithm on two different data sets: i) synthetic data generated from the standard IEEE 123 nodes test bed [31]; ii) field-data collected from the smartgrid located inside the EPFL campus, Switzerland [21]. We use the Matlab Matpower package [32] for power flow computations. Finally, if not differently specified, Table 1 summarizes the parameters' values. Some observation are in order. First, regarding the PMU reporting rate, since it depends on the network frequency, here we consider a set of values for M . Second, given the relatively small sync period T = 1[s], desync parameters β, α are assumed constant within the interval [kT, (k + 1)T ). Third, for σ α we assume PMUs equipped with quartz-crystal oscillator characterized by an accuracy ≈ 10 ÷ 30 ppm [33]; also for σ β we assume a 50Hz frequency signal with GPS ≈ 0.5 ÷ 1µs accurate [4]. Finally, given the small re-synchronization period of 1[s], we assume no process noise, W = 0.

Synthetic data set -IEEE 123 nodes grid
We compare the SASE algorithm against: i) an online iterative version of the Bayesian Linear State Estimaion algorithm presented in [12] (denoted as BLSE) assuming no synchronization error in the measurements; ii) a Ground Truth (denoted as GT) strategy assuming perfect knowledge and compensation of the desynchronization error. The estimation performance is [20] v * , θ * , p * , q * power flow nominal solution Table 1: Parameters used in the simulations measured in terms of Average of Root Mean Square Error which, given any two n-dimensional vectors a(t), b(t), possibly functions of time, we numerically approximate over N = 500 Monte Carlo runs as Observe that, under the assumptions of linear model and correct measurement noise statistic, the matrix Σ(t) can be used to compute the empirical ARMSE as Also, Eq. (13) can be leveraged to perform model qualification checking, indeed, ARMSE(t) ≈ ARMSE(t) means the non-linear measurements statistic is effectively captured by the linear filter built on the approximation. 6 Figure 2 shows the performance as function of the number of PMUs deployed 7 in the network when all the M = 30 PMU measurements have been processed, i.e., right before a new synchronization instant occurs. From Figure 2a, it is interesting to note that the proposed SASE approach behaves almost indistinguishable from the ground truth GT while the unmodeled synchronization error clearly deteriorates the BLSE, whose performance achieves, at best, ≈ 30% improvement. Conversely, with only one PMU the proposed SASE performance improves of ≈ 60%. Regarding the desynchronization, Figure 2b shows that the estimation performance does not improve for increasing number of PMUs deployed. This can be expected since the PMUs have been assumed uncorrelated. Finally, note that in both Figure 2a   prescribed linear approximation. This suggests that expensive Monte-Carlo simulations are not needed and many optimization problems such as optimal PMU placement or parameter sensitivity analysis can be performed very effectively also for large scale networks using the linearized model. It is worth stressing that this result holds for values of the parameters as in Table 1.
To emphasize the analysis of Section 5, Figure 3 shows the performance for a fixed number of PMU as function of the number of collected PMU measurements M . Figure 3a confirms the good behavior of the SASE compared with the GT. Conversely, as time passes, the BLSE does not improve its performance since it has no clue about the presence of the delay. Figure 3b supports this claim. Indeed, since the estimated skew improves for increasing M , the SASE is able to compensate for it. Finally, Figure 3b shows that the offset does not improve. As stressed in Section 5, this is an intrinsic modeling problem due to the fact that offset and power demand happen to be linearly dependent. In addition, Figure 3b reports the values σ 22 , σ 33 in (11) as a function of M computed for the two-nodes network using the parameters value of Table 1. Observe how the theoretical values corresponding to the two-nodes case turn out to be extremely close to those obtained from the real network. This fact is interesting mainly for two reasons: i) it supports the claim that, in the limit for M (or T ), the proposed estimator perfectly reconstructs the skews while residual uncertainty remains in the offsets; ii) from the closed form  expressions for σ 22 and σ 33 , it is possible to retrieve, at least approximately, the value of the parameters needed to obtain a desired level of estimation accuracy.

Real-world data set -EPFL smartgrid
Here we test the SASE on data from the 20 kV 3-phase 6 nodes smartgrid installed in Lausanne within the framework of the NanoTera S 3 -Grid project and located inside the EPFL campus [9,21]. For a representation and an in-depth description of the network we refer to [10]. We recall that the network is characterized by a line topology with nodes 1 to 5 monitored with PMUs (measuring current and voltage at 50Hz) and node 6 (the last along the line) is zero-injection and not monitored. Also, to estimate the measurements characteristics and noise variance values, we resort to the description reported in Section 4.2.1 of [10] where the variances are computed from the data sheets of the PMU devices.
We compare the SASE against the first Kalman-based estimator proposed in [9,10] and carefully described in Sec 2.4 of [10]. A preliminary comparison between SASE and the algorithm in [9] highlights the following interesting facts: i) The algorithm in [9] processes both currents and volt-ages in rectangular coordinates. This, even if the measurement covariances are correctly projected from polar to rectangular coordinates, inevitably introduces an approximation. SASE on the other hand considers only voltage measurements expressed in polar coordinates as naturally returned by the PMUs without introducing any additional source of error.
ii) The filter's state vector of [9] algorithm are real and imaginary parts of the voltages. Since voltage values are functions of all power demands, it is extremely hard to design meaningful priors, for both the mean and the covariance matrix, to properly initialize the filter. While this does not represent a critical issue in the presence of ubiquitous PMUs, in scenarios where the number of PMUs is small compared to the state dimension, this can translate in observability issues. Conversely, the SASE can be properly initialized leveraging, .e.g., one day-ahead forecast of power demands. iii) Since the actual network frequency deviates from the nominal one, the phase angles are observed to rotate. To account for this rotation, at every iteration, the state model of the Kalman based algorithm in [9] is manually rotated of the quantity θ k = 2π f k −f0 f0 , i.e., the phase angle difference due to the discrepancy between the nominal frequency f 0 and the actual frequency f k measured by the PMU itself. Hence, the algorithm resorts to additional information from the PMUs, namely, the measured network frequency or the frequency error (FE). Conversely, SASE automatically accounts for any linear trend existing in the phase angle measurements. iv) Finally, in [9] no information regarding the posterior covariance characterizing the estimates are presented. Conversely, we show how the SASE nicely provides accurate estimates characterized by meaningful confidence intervals.
Before analyzing the simulation results we come back on the issue (see Section 4.1) regarding the choice of the state model during synchronization instants τ (k, 0), k ∈ Z. Assume the state is x = [δp T , δq T , α T , β T ] T with evolution To address the drift in the measurements, the state matrix is The last row-block of F acts as an integrator for β and is used to set their mean values at τ (k, 0) to We now turn to the comparison between the two algorithms which are tested on a small subset of data consisting of a time window of 6 seconds collected on November 17 th , 2014, starting at 10:03:20 AM. More specifically, we are interested in comparing the SASE and the algorithm in [9] in terms of estimation and prediction. Hence, we assume to have at disposal measurements from nodes 1, 2 and 3 to perform the estimation while we use node 4 and 5 for validation, i.e., we do not use their measurements during estimation. The zero-injection node 6 is considered as virtual measurement for the algorithm in [9] and eliminated for the SASE. Figures 4-5 show the evolution of estimates and predictions at node 3 and 5, respectively, in the time interval [2,6]s. The first two seconds of simulation have been cut out in order to let the estimator in [9] to properly compute the covariance matrix Q and converge to its steady state. From Figure 4 it is possible to see that the estimator in [9] nicely follows the measurements. The SASE, in harmony with its static state space model (since we assumed W = 0), captures the average demand. However, the confidence interval returned by the estimator is in perfect accordance with the measurement values. Conversely, Figure 5 highlights the first difference between the two algorithms. Indeed, due to lack of (prior) information, in the prediction task, the algorithm in [9] does not provide any useful value (not even reported due to the high difference in the scaling factor). This analysis is supported by Table 2 reporting the values of TVE of estimates and prediction w.r.t. the corresponding measurements.
In conclusion, even if the SASE algorithm was originally motivated to compensate de-synchronization with the GPS, it can be effectively used also to compensate linear frequency deviations at different nodes.

Conclusions & Future directions
In this paper we proposed a Kalman filtering procedure to address the problem of state estimation in power systems combining ubiquitous power time series and highaccurate sparse PMU measurements. Conversely to the standard assumption on the perfect synchronization of PMU devices, we considered the presence of PMUs desynchronization and analyzed the problem of simultaneous estimation of network state and synchronization error parameters. Interestingly, by testing the proposed al-Alg.
2.1 · 10 −3 1.4 · 10 −2 6 · 10 5 8.5 · 10 −3   Figure 4: Evolution of the estimates at node 3 (used for estimation) using three PMUs for estimation and two for validation.  gorithm on both synthetic and real-field data, it is shown how the presence of synchronization errors can easily mislead the estimator if the measurement model does not properly account for it.