Compressed Machine Learning Models for the Uncertainty Quantiﬁcation of Power Distribution Networks

.


Introduction
Nowadays, the reliability assessment of a power distribution network (PDN) must incorporate the effects of the unavoidable fluctuation of load consumption on the node voltages. Typical examples are represented by the pervasive spread of distributed generators (DGs), for which the injected power depends on (1) weather conditions; (2) the impact of hubs for charging electrical vehicles, located in different points of the network; and (3) the unpredictable user behavior, in a scenario in which both users and buildings play an active role in the continuous monitoring of the fluid electricity fees, and modify their power consumption accordingly [1][2][3][4]. Therefore, the inherent statistical nature of the problem makes a deterministic interpretation unsuitable, and demands for stochastic methodologies for the uncertainty quantification (UQ) [5][6][7]. regression are employed in conjunction with PCA compression to build a surrogate model of the nodal voltages of the power distribution network with a large number of uncertain parameters consisting of power loads and renewable sources. The performance of the proposed modeling scheme in terms of efficiency, accuracy, and convergence, is thoroughly investigated by means of two scenarios with 450 and 900 uncertain parameters.
The remainder of this paper is organized as follows. Section 2 outlines the goal of this work. Section 3 presents the simulation scheme that is used to carry out the power-flow analysis. In Section 4.1, the mathematical background of LS-SVM regression and sparse PCE, as well as the PCA compression, are briefly introduced. The performance of the proposed methodology is investigated in Section 5 by considering the UQ of a complex benchmark network with an increasing number of training samples and of uncertain parameters. Finally, Section 6 concludes the paper.

Goal Statement
Without loss of generality, we consider the IEEE-8500 distribution feeder benchmark network [40], depicted in Figure 1, as test case for the proposed analysis. It is a three-phase radial distribution network consisting of medium and low voltage levels, with unbalanced loads, transformers, capacitors, and regulators. The network is modified by the addition of 200/400 photovoltaic (PV) distributed generators randomly connected at the load nodes. The uncertain variables are the values of real power for a set of N load network loads and N pv PV generators, thus leading to a problem with d = (N load + N pv ) uncertain parameters, which we denote with vector x = [x 1 , . . . , x d ] T ∈ R d . The goal of our analysis is to quantify the impact of the uncertain parameters on the magnitude of the network nodal voltages, hereafter referred to as the output vector y = [y 1 , . . . , y M ] T ∈ R M . The output vector is an implicit function of x, which we denote as where M : R d → R M generically indicates the full-computational model that is used to calculate y for a given configuration of x. It is important to point out that we are targeting applications with d ∼ 10 2 , 10 3 and M ∼ 10 3 , 10 4 , for which both standard and advanced surrogate modeling techniques usually fail.

Power-Flow Analysis
This section describes the simulation technique that is used in the context of this paper as the full-computational model (1) for the power flow analysis of a power distribution network. The approach is illustrated in Figure 2. It relies on a nonlinear circuital interpretation of the distribution network in the phasor domain, see, for example, Figure 3a, in which the loads are replaced by nonlinear blocks characterized by their real and reactive power [41]. The circuit in Figure 3a can be interpreted as the interconnection of a linear and a nonlinear part, as shown in Figure 3b. The linear part is composed by the two-terminal elements Y n , which represent the lumped (e.g., RL) equivalents of the transmission lines, and by the ideal feeding voltage source. The loads are considered as nonlinear elements in which the I-V characteristics are defined according to their complex absorbed power. As an examples, the I-V characteristic for the n-th load readŝ whereÎ n andV n are the phasors of the node voltage and current, respectively, while P n and Q n represent the real and reactive power absorbed by the node, respectively. The resulting nonlinear circuital equivalent can be also reinterpreted as shown in Figure 3c, where the linear and nonlinear blocks have been decoupled by means of controlled sources. This schematic is readily described in matrix form by means of the modified nodal analysis (MNA) formulation [42]. Unfortunately, a closed-form solution of the resulting nonlinear equation is not available, and a numerical solution must be considered. To this aim, the problem is solved by means of the waveform relaxation technique [43], leading to the simplified circuital interpretation shown in Figure 3d. The above circuital interpretation turns out to be equivalent to a fixed-point iteration scheme: where, with reference to Figure 3d, •M is the MNA matrix, which is constructed by circuit inspection and incorporates the characteristics of the circuit elements; •Ŵ is a complex vector collecting the nodal voltagesV The iterations are terminated when a given error threshold ε is reached, such that As discussed in [41], the linear portion of the system needs to be solved only once at the beginning of the iteration (i.e., the inverse matrixM −1 must be computed once), and the computation of the Jacobian matrix of the system is not required, thus simplifying the implementation. The circuital interpretation of Figure 3d is readily implemented in any SPICE-type solver to obtain the AC solution at a single frequency point. The outlined simulation framework is implemented in MATLAB (R2019b, Mathworks, USA). It has been thoroughly validated and proven to be a viable solution for large networks [41]. At this point, it should be noted that any deterministic tool for load-flow analysis could be alternatively adopted as full-computational model.

Surrogate Models
We introduce here the surrogate models that will be used in conjunction with the power-flow analysis method of Section 3 for the UQ of the distribution network introduced in Section 2. For the sake of simplicity, we base the discussion on a system (1) with a scalar output (i.e., M = 1), and we assume that a set of L training pairs {( . . , L. At this stage, it is understood that for a multi-output system (M > 1), the procedure is repeated for each output component. A compression strategy is introduced later, in Section 4.3, to handle multi-output systems more effectively.

LS-SVM Regression
This section introduces the underlying idea and the essential mathematical background of the LS-SVM regression in its primal and dual space formulations [32], along with its application to the construction of a surrogate model for UQ. It is worth mentioning that the code for the construction of LS-SVM surrogate models is available within the MATLAB toolbox LS-SVMLab, (version 1.8) [44].

Primal Space Formulation
The primal space formulation of the LS-SVM suggests approximating the actual responses of (1) with the surrogate model where T is a vector with the respective coefficients, and ·, · is the inner product in R N . The regression coefficients w and the scalar parameter b are computed as the solution of the following optimization problem, where is the model error on the training samples and γ is a parameter that provides a trade-off between the accuracy of the model and its flatness, thus reducing the overfitting [30,32]. From (7), it is noted that the primal space formulation is equivalent to ridge regression [26]. Furthermore, as in the classical least-squares regression [19], the number of coefficients (i.e., the size of vector w) coincides with the number of basis functions in (6). The above feature makes this implementation suffer from the curse of dimensionality and demands for an alternative formulation, which is briefly outlined in the next section.

Dual Space Formulation
The LS-SVM in the dual space formulation is a nonparametric regression [32]. The introduction of the kernel function K(·, ·) : allows recasting (6) as where the coefficients α i become, together with the bias term b, the new unknowns. These are estimated by inverting the matrix equation . . , L, and γ is the same parameter as defined in (7). Several kernels can be used, the most commons of which are Gaussian radial basis function (RBF) kernel: It is important to remark that in this nonparametric dual space formulation, and thanks to the so-called "kernel trick", the number of unknowns α i in (9) is completely independent of the number of parameters and basis functions considered in the LS-SVM regression model, and it always equals the number of training samples L that are used for the regression. For this reason, the LS-SVM regression in the dual space is an attractive candidate for the UQ of high-dimensional problems [37].

Sparse PCE
This section briefly introduces the fundamentals of (sparse) PCE approximation. Moreover, in this case, a MATLAB toolbox, i.e., UQLab, is available for building PCE surrogate models [45].
Similarly to (6), a generic PCE model reads where two equivalent and interchangeable indexing notations are used: k is a scalar index pointing to the elements of K, which are typically assumed to be sorted according to the graded lexicographic ordering.
With the above definitions, there is a one-to-one correspondence between k and κ, and the number of basis functions K corresponds to the cardinality of K, i.e., K = |K|. The latter notation is more convenient to define the basis functions ϕ κ (x), which are d-variate polynomials constructed as The functions φ k j (x j ) are univariate polynomials satisfying the orthogonality condition where ρ(x j ) is the PDF of the uncertain variable x j . For quantities exhibiting a finite second-order moment (variance), the model (11) with K ≡ N d (or, equivalently, K → ∞) is exact. However, the model is truncated for obvious practical reasons. The truncation is defined by bounding a given u-norm of the multi-indices with a given order p, leading to the set Common truncation schemes are (in order of popularity) Hyperbolic truncation (0 < u < 1), which leads to an increasingly sparser expansion as u is decreased; • Tensor-product truncation (u = ∞), which is usually avoided because of the exorbitant number of K = (p + 1) d terms.
Given one of the above predefined truncations, a sparse PCE is intended as a model (11) in which some (typically many) of the coefficients c k are further identified to be negligible, with a pattern that is not a priori known, or, equivalently, in which a reduced setK ⊂ K of basis function is adaptively identified when calculating the coefficients.
The PCE model (11) is particularly suitable for UQ, as its accuracy is defined in statistical terms. Indeed, the quadratic error obeys [20] lim where is the joint PDF of the uncertain parameters x. As the local error is weighted by the distribution of the uncertain parameters, a larger error is tolerated for unlikely parameter values without compromising the overall statistical accuracy. Moreover, average and variance of the output y are calculated directly from the PCE coefficients c κ as where 0 = (0, . . . , 0) is the null element of N d , corresponding to a zero-degree (i.e., constant) polynomial. Several approaches are available for the calculation of the PCE coefficients. For full-blown PCEs, one of the simplest and most popular methods is by least-square regression [19], i.e., by minimizing the norm of the residual on the training samples, i.e., The solution to the above minimization problem is found as the classical and well-known least-square solution c * = arg min where c = (c 1 , . . . , c K ) T , Ψ ∈ R L×K is a matrix with elements Ψ ik = ϕ k (x i ), and Ψ + = (Ψ T Ψ) −1 Ψ T is its Moore-Penrose pseudo-inverse. As the regression problem needs to be overdetermined, i.e., L > K (typically, at least L = 2K [19]), the number of training samples grows dramatically for high-dimensional problems. For sparse PCEs instead, the non-zero coefficients or, equivalently, the subsetK of significant basis functions, are identified as a part of the optimization process using, e.g., least-angle regression [22]. As the number of unknowns that has to be calculated is greatly reduced, a much smaller training set can be used for the regression.

PCA Compression
The surrogate modeling techniques of Sections 4.1 and 4.2 were introduced for a single output system (M = 1). For multi-output systems (M > 1), a naive approach is to repeat the pertinent optimization process for each component of the output vector y to build the corresponding surrogate model. Clearly, the computational cost scales roughly linearly with the number of variables M, thus becoming impractical for systems with thousands of outputs. An exception is the full-blown PCE for which, if the same set of basis functions is used for all outputs, the optimization process (19) can be easily vectorized and solved at once by stacking the training data column-wise. However, as already mentioned, full-blown PCEs are impractical for high-dimensional problems (large number of inputs).
To overcome this limitation, we introduce a compression strategy based on PCA [39,46]. Let us consider again a set of L training pairs {( We organize the training responses y i column-wise into a matrix Y ∈ R M×L , which therefore has elements Y mi = [M(x i )] m . We further define the zero-mean datasetỸ with elementsỸ mi = Y mi − µ m , where is the column-wise mean, and we calculate its "economy-size" singular value decomposition (SVD) with U ∈ R M×L and Σ, V ∈ R L×L . Matrix Σ is diagonal and collects the singular values {σ i } L i=1 ofỸ in descending order. By defining threshold andn such that the SVD of (21) can be approximated by retaining only then most significant singular values (the "principal components"). This leads to a matrixÛ ∈ R M×n that retains only the firstn columns of U and that is used to define a compressed version of model M in (1) as where M PCA : R d → Rn and µ = (µ 1 , . . . , µ M ) T . Now, the output z of model (23) has onlyn components compared to the M output components of (1). As the singular values decay rather fast, it is found thatn M, typically by two to four orders of magnitude. As such, any surrogate modeling technique suitable for high-dimensional problems (in terms of number of input parameters d), like the LS-SVM regression or the sparse PCE, can be applied with a limited computational effort to the components of z. Once the surrogate model of M PCA is available, let us generically denote it withM, the value of the original outputs y is recovered from (23) as The flowchart of Figure 4 summarizes the main steps of the proposed methodology. For the model generation (left panel), a (limited) number of training output responses (in this case, phasor nodal voltages) are obtained from the full-computational model of the PDN for some random configurations of the input uncertain parameters (power of loads and DGs). These training responses are compressed using matrixÛ, which is obtained by truncating the SVD of the dataset matrix according to the magnitude of its singular values. Finally, a surrogate model is trained for this reduced dataset. For the model evaluation (right panel), the process is reversed. A (usually large) number of samples of the uncertain input parameters is drawn. The surrogate model is used to inexpensively compute the corresponding compressed data. Finally, this compressed data is mapped back to the original data by applying the inverse transformation.

Application Examples
In this section, the LS-SVM and sparse PCE regressions are applied in conjunction with PCA compression to the test case of Section 2, which has M = 3798 network nodes. A compact surrogate model is obtained, capable of predicting the behavior of the nodal voltages of this complex PDN in which a high number of loads/DGs are considered as uncertain parameters. All the simulations have been performed using MATLAB on a workstation with Intel Core i7 CPU running at 3.6 GHz and 32 GB of RAM.
The statistical properties of the network voltage profile are investigated by considering two cases: • Case 1: d = 450 uncertain parameters, i.e., N load = 250 loads and N pv = 200 PV generators. • Case 2: d = 900 uncertain parameters, i.e., N load = 500 loads and N pv = 400 PV generators.
For each case, the network nodes with uncertain load or PV generator have been randomly chosen. For the loads, the real power follows a Gaussian distribution with a relative standard deviation of 80% from its nominal value, and samples with negative value are discarded. The corresponding reactive power is calculated by assuming a constant power factor. Therefore, the variability of the active and reactive power is not independent. Nevertheless, the generalization to independent variations of the active and reactive power components is straightforward. For the PV generators, the solar radiation r is typically described with a beta distribution [47]. Specifically, a beta distribution with parameters α = 0.90 and β = 0.85 is considered here. In turn, the active power P pv of the PV generator is expressed as a function of the solar radiation r as discussed in [48]. The rated power of each PV is 100 kVA. The PV generators are assumed to be operating at a unitary power factor [48], and therefore their reactive power is considered to be zero in this study, with a capacity penetration level of 17.5%.
The variability of the node voltages is shown in Figure 5 for the test case with 450 uncertain parameters (Case 1). The plot highlights the large variation of the nodal voltages due to the uncertainty, with a maximum variation on phase A of about 20%. On the other hand, the variability turns out to be very small for some nodes. In particular, for 283 out of the 3798 nodes, connected to one of the phases A, B, or C, the relative variation turns out to be less than 1%. Next, a subset of L = 450 responses is considered as "training samples". Figure 6 shows the normalized singular values of the corresponding zero-mean dataset matrixỸ. The dashed horizontal lines represents the thresholds ε = 10 −i , for i = 1, 2, 3, 4, 5. The singular values cross the lowest threshold of ε = 10 −5 at indexn = 38. This indicates that the compressed training responses retain only 38 out of the 3798 original components, with a compression rate of 100×. The impact of the PCA compression is illustrated in Figure 7, which shows the scatter plots of the actual training responses paired with the responses reconstructed from a PCA truncation with an increasing number of components. Ideally, the points should line on the diagonal. A very high accuracy in reproducing the training samples is observed starting from a threshold of ε = 10 −4 andn = 23 coefficients only. Therefore, this threshold is used in the following analyses. For Case 2, this leads ton = 39.  Compressed training datasets with different sample size L are then used to train both LS-SVM and second-order (p = 2) sparse PCE surrogate models. The performance of the proposed modeling methodology is investigated by comparison with reference results generated with a MC simulation featuring 10,000 samples. Figures 8 and 9 provide a deterministic assessment for Case 1 and Case 2, respectively, by showing the correlation between the reference outputs from the MC simulation and the corresponding predictions obtained with the LS-SVM and PCE surrogate models for an increasing number of training samples. The best model is the one in which the cloud of points lies along the diagonal (black dashed line). A good correlation is observed for both the LS-SVM and PCE models with L = 900 for Case 1 and L = 1800 for Case 2. In this case, the two surrogate models provide comparable accuracy. When using smaller training datasets instead, the LS-SVM model exhibits superior accuracy, while a larger error is observed for the PCE model. This could be explained by the fact that the PCE is not primarily intended for parametric modeling, since it allows a large error for unlikely samples, as discussed in Section 4.2. Indeed, scatter plots assess the accuracy of the model in reproducing the system behavior for a wide range of parameter configurations, disregarding their actual probability of occurrence. Figures 10 and 11 provide instead a probabilistic analysis. The PDFs of the ensemble of all nodal voltages (i.e., all the entries of the output vector y) calculated from 10,000 MC samples (gray bars) are compared to the predictions obtained with the proposed PCA-compressed LS-SVM (solid red curve) and sparse PCE (dashed black curve) surrogate models, both of which provide excellent accuracy. A notable effect on the distribution shape, due to the additional set of uncertain parameters included in Case 2, is observed. The results highlight the capability of both surrogate models of accurately providing the statistical information about the expected distribution of nodal voltages for such a heterogeneous situation described by the presence of a large variability in the load and DG power. This statistical analysis is extremely helpful since it provides a single global picture on the possible critical behavior of the network, at least in terms of the values of the nodal voltage magnitudes in the lower and in the upper part of the distribution. A similar assessment can be done by considering other sensitive parameters such as the phase of the nodal voltages or the branch power, using the same modeling methodology.    In order to provide a more detailed validation, Table 1 collects quantitative information on the performance. Specifically, the root mean squared error (RMSE) between the reference MC responses and the surrogate model predictions, as well as the CPU time required by (i) the generation of the training samples (indicated next to the number of training samples L), (ii) the model generation (t model ), and (iii) the model evaluation for 10,000 validation samples (t cost ), are reported for both test cases. The above figures confirm that the LS-SVM provides better accuracy for smaller training set sizes, whereas both methods yield similar RMSE for the largest training dataset. Moreover, it is possible to conclude that the efficiency of the LS-SVM model is much better than the one of the sparse PCE, with a dramatic benefit especially when the number of training samples L increases. For the largest training dataset with L = 1800 output responses and d = 900 input parameters, the amount of time required by the generation of the LS-SVM model is 23.8 min, i.e., over one order of magnitude faster than for the sparse PCE (5.9 h). It should be noted, however, that overall training cost is in this case largely dominated by the time required by the simulation of the training responses (3.4 h). The evaluation time is instead almost negligible for both techniques. At this point, it is also important to remark that the direct application of the surrogate models to the original output datasets with 3798 nodal voltages would be definitely unfeasible.

Conclusions
This paper addressed the challenging problem of constructing compact, accurate, and fast-to-evaluate surrogate models of a complex PDN, able of predicting the effect of a large set of uncertain parameters on its nodal voltages. Specifically, the benchmark three-phase IEEE 8500-node test feeder with 450 and 900 uncertain parameters, consisting in either loads or renewable distributed solar generators, has been considered as an application test case with the aim of highlighting the strength and feasibility of the advocated methodology.
The surrogate models are built from a limited number of training samples via a well-established two-step scheme. First of all, PCA is applied to remove the redundant information on the output samples, thereby leading to a compression of the training set. For the application test cases considered in this work, a PCA with a relative truncation threshold of ε = 10 −4 on the singular values allowed reducing the number of output variables (i.e., the nodal voltages) from 3798 to 23 for case 1 and to 39 for case 2, respectively, thus achieving a compression rate of~100×. Then, the compressed training datasets were used to build compact surrogate models based on either LS-SVM regression or sparse PCE.
For each of the two test cases, the performance of the surrogate models was investigated in terms of accuracy, efficiency, and convergence, by comparing the predictions of the models obtained with different training set sizes against reference results from MC simulations. Concerning the accuracy, the LS-SVM regression outperforms the sparse PCE for all the considered test cases and training set sizes, although they converge to similar RMSEs for large training sets. The LS-SVM regression is also more efficient in the model construction. However, the overall training cost was dominated by the simulation of the training responses.
The results provided in this paper demonstrate that the proposed modeling methodology provides an effective alternative to MC simulations, with an overall speed-up between 2× and 4.9× (including the cost required for the generation of the training samples). At this point, it is important to remark that the surrogate models inherently provide also a close-form parametric model, as opposed to the "blind" MC method. In summary, the modeling strategy presented in this work can be considered as a viable and robust solution for the generation of an accurate surrogate model for both the UQ and the parametric analysis of a PDN with a large number of nodes and of uncertain parameters.

Conflicts of Interest:
The authors declare no conflicts of interest.