Generalized Polynomial Chaos Expansion for Fast and Accurate Uncertainty Quantiﬁcation in Geomechanical Modelling

: Geomechanical modelling of the processes associated to the exploitation of subsurface resources, such as land subsidence or triggered/induced seismicity, is a common practice of major interest. The prediction reliability depends on different sources of uncertainty, such as the parameterization of the constitutive model characterizing the deep rock behaviour. In this study, we focus on a Sobol’-based sensitivity analysis and uncertainty reduction via assimilation of land deformations. A synthetic test case application on a deep hydrocarbon reservoir is considered, where land settlements are predicted with the aid of a 3-D Finite Element (FE) model. Data assimilation is performed via the Ensemble Smoother (ES) technique and its variation in the form of Multiple Data Assimilation (ES-MDA). However, the ES convergence is guaranteed with a large number of Monte Carlo (MC) simulations, that may be computationally infeasible in large scale and complex systems. For this reason, a surrogate model based on the generalized Polynomial Chaos Expansion (gPCE) is proposed as an approximation of the forward problem. This approach allows to efﬁciently compute the Sobol’ indices for the sensitivity analysis and greatly reduce the computational cost of the original ES and MDA formulations, also enhancing the accuracy of the overall prediction process.


Introduction
Geomechanical modelling is a scientific and engineering activity of paramount importance to evaluate the safety and predict possible environmental impacts related to the exploitation of subsurface resources. The reliability of the predictions depends on different sources of uncertainty, which are somewhat intrinsically introduced in any modelling process. Uncertainty typically affects the knowledge of the constitutive rock behaviour, the geometry of the depleted formations, the diffusion of the pressure perturbation, just to mention a few important occurrences. In this study, we focus on the reduction of uncertainty affecting the constitutive model parameters and the land subsidence prediction via assimilation of ground surface displacements.
A synthetic test case application dealing with the depletion of a deep hydrocarbon reservoir is considered. Land settlements are predicted with the aid of a 3-D Finite Element (FE) model using a one-way coupled approach [1][2][3]. The focus is on the calibration of the rock constitutive parameters that mainly control the compaction of the rock formation caused by the hydrocarbon production. Assuming the use of some well-established constitutive models, such as Mohr-Coulomb, Modified Cam-Clay, hypo-elastic, hypo-plastic or visco-elasto-plastic laws [4][5][6][7], different approaches can be used to estimate the governing parameters. If the required parameters have a measurable physical meaning, a major investigation branch concerns either laboratory tests, properly developed for characterization of geological formations [8], or in-situ observations employed to study reservoir deformation with specific measurement apparatus, for example, well-logs equipped with radioactive markers [9]. Another important approach, which is especially related to those parameters that are not directly associated to measurable quantities, deals with inverse models to infer the uncertain model parameters using either deterministic or stochastic techniques. The latter approaches are increasingly employed in geomechanics. For example, in References [10,11] the Ensemble Smoother (ES), that is, a Monte Carlo (MC)-based method, is used to estimate the parameters of a transversely isotropic and isotropic constitutive law by the joint assimilation of horizontal and vertical displacements. Later developments of the ES algorithm in the form of Multiple Data Assimilation (ES-MDA) are used in Reference [12] to constrain the reservoir compaction coefficient and the subsurface basement elastic modulus by assimilation of ascending and descending line-of-sight displacements revealed by Permanent Scatterer Interferometry (PSI). Recently, Reference [13] compares the results obtained from applying the ES and the ES-MDA to estimate the parameters of the modified Cam-Clay and the Vermeer-Neher (VN) visco-elasto-plastic models [3].
The growing interest on the ES as a data assimilation technique is mainly due to its straightforward implementation and the capability of using simultaneously the measurements collected at different times and locations in a 4-D assimilation. However, its convergence is guaranteed only with a large number of MC simulations that may be computationally unfeasible in demanding large-scale problems. Moreover, its effectiveness in constraining the model for a strongly non-linear dependence of the outcome on the uncertain parameters is theoretically questionable. This limitation can be overcome by employing the ES in a multiple assimilation fashion, with the introduction of the ES-MDA technique [14], but this implies a further significant increase of the computational cost for the overall procedure, with the ES algorithm repeatedly applied to different ensembles. It follows the need for fast model predictions that can be achieved by approximating the forward model operators with surrogate solutions, obtained at a highly reduced computational cost.
In geomechanical applications, surrogate models have been recently employed for uncertainty quantification (UQ) and sensitivity analysis (SA). A functional data analysis technique is used to perform a vertical displacements UQ on a synthetic field-scale test by setting the oedometric compressibility as uncertain parameter [15]. Polynomial chaos proxies of the linear poroelastic problem with hydromechanical coupling allows a variance-based SA on Lamé's constants, the Biot-Willis coefficient and the hydraulic mobility [16]. Another non-intrusive approach is used in sequentially coupled reservoir-geomechanical simulations to estimate the caprock safety factor in a probabilistic setting [17]. Global SA and parameter identification are carried out by means of Gaussian processes in a carbon capture and storage test case to model fault poromechanics and induced seismicity in a stochastic framework [18]. A similar approach is used in Reference [19] to investigate the onset of fault activation in deep hydrocarbon reservoirs with surrogates based on polynomial chaos expansion. Global SA are also employed to investigate geomechanical fractured reservoirs [20] and hydraulically fractured wells with reduced order models [21].
Here we present an approach where we further extend the use of the gPCE as surrogate for regional geomechanical models with the aim of land subsidence prediction due to the exploitation of subsurface resources. Our contribution is threefold: (i) to validate the gPCE surrogate model for a highly non-linear consitutive law such as the VN visco-elasto-plastic model, (ii) to perform a global SA and rank the influence of the model parameters and their combinations, and (iii) based on sensitivity results, to adapt accordingly the probabilistic framework for the ultimate goal of solving a parameter identification problem via ensemble-based Bayesian updating schemes. The gPCE surrogate is used in combination with the ES (gPCE-ES) and ES-MDA (gPCE-MDA) to solve the inverse problem based on measurements of vertical displacements , by sampling the ensemble members from the gPCE expansion. This is possible at a very low computational cost, because only direct polynomial evaluations are involved instead of running a full ensemble of forward models, provided that the gPCE approximation is sufficiently representative of the actual geomechanical outcome.
The paper is organized as follows. Section 2 includes the description of the governing equations of the forward simulation model, their approximation via gPCE expansion, the derivation of the Sobol' indices and the Bayesian-based updating schemes (ES and ES-MDA). Section 3 presents the numerical test case focusing on the variability range of the significant uncertain parameters. The results are provided in Section 4, where uncertainty propagation via gPCE approximation is first presented, with further assimilation of surface displacements to update the model outcome and provide the best estimate of the model parameters. The result of the gPCE-ES updating is compared with the solution from the classical ES formulation and with the gPCE-MDA approach, showing the computational efficiency and algorithmic effectiveness of the proposed surrogate model for uncertainty quantification and numerical prediction purposes. Finally, some conclusive remarks close the presentation.

Geomechanical Governing Equations and Model Parameterization
The geomechanical model of a producing reservoir computes the stress and displacement fields generated by spatial and temporal pore pressure changes due to fluid withdrawal. The analysis is carried out by solving numerically the governing partial differential equations of 3-D poro-mechanics [41,42] with the aid of the FE method. A one-way coupled approach is used, as it is usually accepted in petroleum engineering. This implies that the pore pressure change in space and time is first computed by a dynamic flow model, and then used as an external source of strength in the equilibrium equations.
Let Ω ⊂ R 3 denote a 3-D domain in the x-y-z reference system with boundary Γ. Neumann and Dirichlet boundary conditions are imposed on boundaries Γ N and Γ D , respectively, with Γ = Γ N ∪ Γ D and Γ N ∩ Γ D = ∅. The equilibrium equations governing the consolidation of a porous medium at some instant t ∈ I =]0, T[, with T the simulation period read: where σ(x, t) is the effective stress tensor, α(x) the Biot coefficient, p(x, t) the pore pressure change, b(x, t) the external body forces, and n(x) the outer normal to Γ N . The system of Equations (1) is solved in terms of displacements u(x, t), subject to the prescribed displacementsû(x, t) along Γ D and the force per unit surface f(x, t) along Γ N . To close the system, a non-linear constitutive relationship is introduced to link the effective stress to the strain tensor (x, t): where is related to u according to the small strain hypothesis: In this study, the VN [43] visco-elasto-plastic constitutive law is used. Equation (2) is written in differential form as:σ where C is the standard rank-4 elasticity tensor,γ is the plastic multiplier rate and p c is the plastic potential. The elasticity tensor C depends on Poisson's ratio ν and a stress-dependent Young's modulus E: with p the volumetric stress (first stress invariant), λ * the modified compression index, and ρ κ the ratio between the modified compression and swelling index. The plastic multiplier rate is defined as: where ρ µ is the ratio between the modified compression and creep index, τ * is a reference time value, p v is the volumetric plastic strain, and p c is set equal to a lumped scalar representation of the stress state in the plane of the first and second stress invariants (p, q): In this last equation, c is the material cohesion, ϕ the friction angle, and M the slope of the Critical State Line in the (p, q) plane: with K 0 the ratio between the vertical and horizontal stress in normal consolidation state, also known as confinement factor. Finally, in Equation (6) R is the geotechnical initial overconsolidation ratio defined as p c,r,0 /p c,0 , where p c,0 is a representation of the stress state at initial conditions and p c,r,0 is a parameter related to the plastic strain developed by the material before loading. For details on the numerical implementation of Equations (4)- (8) in the forward model, the reader is referred to Reference [3]. The VN visco-elasto-plastic model requires a significant number of independent material parameters, which may vary according to the selected implementation. In particular, for the formulation presented herein we need: ν, λ * , ρ κ , ρ µ , τ * , R, c, ϕ, and K 0 . The present analysis focuses on the parameters that mostly characterize the VN constitutive law with respect to more traditional ones, specifically, (1) λ * , (2) ρ κ , (3) ρ µ , and (4) R. Parameter λ * represents the slope of the normal consolidation profile of volumetric strain vs axial stress in a logarithmic plot, while ρ κ allows to obtain the slope of the same profile in unloading/reloading conditions. The parameter ρ µ depends on the slope of the volumetric strain vs. time profile, in natural logarithmic scale. Finally, the ratio R provides the over-consolidation degree of the porous rock and describes the size of the implicit yield surface defined by the VN model in the plane of the first two stress invariants (p, q). This set of parameters is also selected because of the availability of literature estimates and the independence on the employed unity of measure. All the remaining parameters have been fixed to standard values generally accepted for a wide range of materials (ν = 0.3, τ * = 1 day, c = 1 MPa, ϕ = 30 o , and K 0 = ν/(1 − ν) = 0.43).

gPCE Surrogate Model
Running the forward geomechanical model multiple times for large and complex systems can be a very demanding task, both in terms of CPU and memory requirements. A gPCE approach [22,44] is therefore proposed to approximate the outcome of the deterministic simulator and then compute the propagation of the input uncertainty through the forward model. The main idea of gPCE surrogate models is based on using orthogonal polynomial approximations of the random input to project the stochastic model output. In the following, we provide the basic mathematical framework as derived in Reference [44].
Let us consider the random vector U = (U 1 , . . . , U ) ∈ R written as a function of the random vector Z of mutually independent random variables Z = (Z 1 , . . . , Z n ) and distribution function F Z (z 1 , . . . , z n ) = P (Z 1 ≤ z 1 , . . . , Z n ≤ z n ), where we are considering a stochastic process in the probability space (Ω,F ,P) with space of events Ω, σ-algebra F and probability measure P on F , see for example Reference [27]. As usual, the independence assumption implies Since any random variable may be represented as a series of polynomials in uncorrelated and independent Gaussian variables [22] and, in its generalized extension, in non-Gaussian measures, gPCE basis functions of a univariate random variable Z i are defined as the polynomials {φ k (Z i )} N k=0 of Nth-degree satisfying the orthogonality conditions the normalization factors, δ s,r the Kronecker delta function and Σ i is the support of Z i . In the multivariate case, the gPCE basis functions Φ α (Z) of degree up to N are products of the univariate orthogonal polynomials: where α = (α 1 , ..., α n ) ∈ N n 0 is a multi-index with |α| = α 1 + · · · + α n . The multivariate basis functions are orthogonal polynomials in L 2 dF z , that is, the space of all mean-square integrable functions of Z with respect to the inner product based on the measure dF Z : where Σ is defined by Σ = Σ 1 × Σ 2 · · · × Σ n . As a consequence, the class of orthogonal polynomials is selected according to the measure F Z i . The polynomials orthogonal for the standard normal distribution are the Hermite polynomials, which form an ideal basis for the output stochastic domain [27].
In the gPCE context, we aim at finding an approximationŨ N (Z) of the random function U(Z) ∈ R in the N-th degree polynomial space generated by the basis functions Φ α (Z): where c α ∈ R are the coefficients of the expansion. For U(Z) ∈ L 2 dF z , the coefficients c α can be computed by definingŨ N as the orthogonal projection of U onto the polynomial space Z = span{Φ α }. By prescribing the orthogonality condition U −Ũ N ⊥ span{Φ α }: the coefficients c α simply read that is, they can be numerically computed as an integral of the product of Φ α and U. The expansion terms of Equation (14) guarantees the optimal approximation of U in the sense of the norm defined in L 2 dF Z . The coefficients c α of the approximating gPCE are numerically computed by a non-intrusive approach, where the forward model providing U(Z) is used in a black-box fashion. We use a pseudo-spectral projection, with the integral term approximated by a high-dimensional quadrature rule: with z j and w(z j ) the n q integration nodes and weights, respectively. Since Φ α is at most of degree N, the integrand function has at most degree 2N. In the univariate case, this requires the use of a (n q,1 = N + 1)-point Gaussian quadrature rule, while in the multivariate case with n random variables the number of points grows up to n q = (N + 1) n . Using this approximation, the surrogate model needs the evaluation of U through the the numerical solver S of the forward model at the n q integration points z j . In our application, we denote with U the vector of state variables, that is, displacements and stresses on the points and times of interest, provided by the solution of problem (1) via a FE numerical solver S. The random variables Z are the material parameters defined by the visco-elasto-plastic VN constitutive law, that is, λ * , ρ κ , ρ µ , and R, with n at most equal to 4.

Sobol' Indices in the gPCE Framework
The objective of a preliminary SA is to quantify the influence of the uncertain input parameters and their combinations on the model output, so as to retain only the most significant contributions. This is important to (i) focus the design of laboratory/field experiments on the collection of measurements of the most important parameters, and (ii) reduce the number of calibration parameters for an easier solution of the inverse problem, especially in real-world applications. In this work, we propose the use of the variance-based SA according to the framework developed in Reference [45]. A variance decomposition of the model output into partial variances of the model input (and combination of more than one parameter), allows to estimate the so-called Sobol' indices, which provide a measure of the relative importance of each uncertain parameter to the model outcome. In the sequel, we provide a brief description of the ideas lying behind Sobol' indices. For completeness, the reader is referred to the work of Reference [45] or to the application developed in Reference [19].
For the sake of simplicity, let us consider the model outcome U, that is, the quantity of interest, as a one-dimensional variable function f of the n-variate random vector Z = (Z 1 , Z 2 , . . . , Z n ) of mutually independent components defined over an n-dimensional hypercube Σ n . Assuming f to be a square-integrable function, the functional decomposition of U = f (Z) reads where f 0 is a constant representing the mean value of f , and f i , f i,j , . . . , f 1,2,...,n are the uncorrelated random effects associated to the factors in their indices, for example, f i = f i (Z i ) are the main effects due to the factors Z i and f i,j = f i,j (Z i , Z j ) are the effects related to interactions between the factors Z i and Z j with j > i. It has been proved in Reference [45] that a unique expansion of Equation (16) exists for any function f (Z) integrable in Σ n under the hypothesis of zero mean of all expansion terms with respect to each variable. The expansion (16) for the quantity of interest U is used to derive the associated variance V(U). In fact, squaring Equation (16) and using the orthogonality condition holding for the expansion terms [45], it can be demonstrated that V(U) reads: where V i , V j , . . . , V 1,2,...,n are the partial variances of f i , f j , . . . , f 1,2,...,n , respectively. Generally, the partial variance V i 1 ,...,i s in (17) reads: which can be written in terms of conditional expectations as: . . .
Equations (19) provide practical relationships for the computation of partial variances for any indices combination. The Sobol' first and higher-order indices are defined as: where S i measures the relative importance of a single factor Z i on the total model variance and the higher-order indices S 1,...,s represent a measure of the combined model sensitivity to the group of factors Z 1 , . . . , Z s . The total effects S T,i are also computed to quantify the contribution of the i-th factor to the total output variation as [46]: where V(E(U|Z ∼i ))/V(U) include first and higher-order interactions of all factors except Z i . Sobol' indices can be easily obtained by MC sampling. However, the computational cost of the evaluations can be prohibitive for large scale problems and the use of gPCE approximation is straightforward for the purpose, being the Sobol' indices readily evaluated from the coefficients of the gPCE [33,47,48]. The idea is to replace the functional decomposition of Equation (16) with the Nth-degree polynomial expansion of Equation (12) by defining [48]: where the multi-index β = (β 1 , ..., β s ) ∈ N s 0 satisfies |β| = β 1 + · · · + β s and Φ β (Z i 1 , . . . , Z i s ) are the s-variate Nth-degree gPC basis functions.

gPCE-ES and gPCE-MDA
In this section, we provide the mathematical framework for the Bayesian inverse modelling approach based on gPCE. We aim at estimating the state variable and model parameter distributions where few information is available on such inputs. This is pursued by establishing a Bayesian inverse modelling approach with the model outcome constrained by spatio-temporal observations of measurable quantities.
Let us assume that d ∈ R m is the vector of noisy empirical measurements of the true observable vector d t ∈ R m : where d t = H(u t ) with H a measurement operator mapping the true state variables u t ∈ R into the true observable vector. The vector ∈ R m is the additive observational error with probability distribution function ρ( ) = ∏ m i=1 ρ(e i ), that is, assuming e i with i = 1, ..., m mutually independent and identically distributed variables. Of course, u t is not available because of the uncertainties of the mathematical modelling process. Through inverse modelling we aim at providing the best estimate of the "true" state/parameters variables conditioned by (i) available mathematical models, (ii) prior parameters knowledge, and (iii) measurement data. Here, we employ the ES technique and its iterative version in form of ES-MDA [14,49,50]. The linear unbiased smoother estimate can be written in a matrix form as: where X f = (U f , P p ) T ∈ R ( +n)×n MC is the augmented matrix of the forecast ensemble states u f and prior parameters p p , with U f obtained as the collection of the outcome of n MC MC simulations and P p the corresponding prior parameters for every simulations assuming no model errors. The augmented matrix X u = (U u , P u ) T ∈ R ( +n)×n MC is the updated solution by correction of X f with assimilated measurements. Matrix D ∈ R m×n MC contains the perturbed measurements matrix with ∼ N(0, R) and R ∈ R m×m the error covariance matrix, while H is a linearization of the measurement operator H. The innovation (D − HX f ) is weighted over the so-called Kalman gain K calculated as: where C f is the covariance matrix of the forecast augmented state X f . In the form of gPCE-ES, the MC simulation of the forward operator are approximated by straightforward polynomial evaluations to cast the matrix X f by using the surrogate solution provided by Equation (12). Thus, a large number of MC samples can be adopted for the forecast ensemble and reduce sampling errors [27]. It is sometimes difficult to achieve a proper constraining of the model through the available data with the single-update of ES, especially when the forward model outcome have a strongly non-linear dependence on the uncertain parameters [14,51]. For this reason, Reference [50] introduced the MDA technique, that was later combined with ES [14], in order to improve the results of a single assimilation for history-matching problems. The ES-MDA is widely recognized as a tool to refine the conditioning of the model especially when non-linearities are significant, even if there is no rigorous proof about its convergence [52]. The idea behind ES-MDA is to repeat the assimilation of the same measurement data for n MDA times. In order to avoid an over confidence in the available data, the covariance matrix R is inflated by a coefficient α MDA ≥ 1. The choice of the coefficient α MDA for every assimilation step is arbitrary as long as the sum of their inverses is unitary: The condition (27) ensures the equivalence between ES and ES-MDA for Gaussian-linear problems [14].

Numerical Results
The sequence of steps resulting from the mathematical framework defined in the previous section is summarized in Algorithm 1. After the initialization, the gPCE surrogate model is built taking into account only the most significant parameters, whose selection is based on the computation of the total Sobol' index S T . We retain in the final model only the parameters with S T larger than a user-specified threshold ε S , which can be conveniently adjusted so as to provide a good balance between model representativeness and cost. Whenever a parameter is excluded from the uncertainty analysis, the gPCE model has to be recomputed. The most expensive task, that is, the n q full model S runs for a variable number of sweeps, is carried out at this stage, which must be therefore carefully designed. Once the surrogate model is available, the uncertainty quantification step can be completed by means of the classical ES or ES-MDA algorithms, where a very large number of MC realizations can be now computed at a low computational cost. for j = 1, ..., n q do 13: Set the nodes z j ∈ Σ and the weights w(z j ) 14: 16: end for 17: Set s as the total number of Sobol' indices 18: for i = 1, . . . , s do 19: Compute S i = V i /V 20: end for 21: for k = 1, ..., n do 22: Compute S T,k

Model Set-Up and Random Parameters
The proposed approach is tested in a realistic geomechanical analysis concerning the development of a production program from a synthetic hydrocarbon reservoir. For the sake of realism, the model is built by introducing some of the typical features of the off-shore reservoirs buried in the Northern Adriatic basin, Italy [9,10,53].
The model domain covers an area of 50 km × 50 km and extends down to 5 km depth. It is discretized into the 3-D FE mesh shown in Figure 1a consisting of 71,734 nodes and 410,030 tetrahedrons. The rock in the reservoir and the hydraulically connected aquifer has a strongly non-linear visco-elasto-plastic behaviour described by the VN model, while the overburden and the underburden, that is the portions of the domain that are not filled by color in Figure 1a, are linear elastic. Homogeneous essential conditions apply on the outer and the bottom boundaries, with the top of the domain modeled as a traction-free surface. We assume the hydrocarbon production program from the reservoir to last for 3 years, then start again after a 2-year stop and be completed after 7 years, with the monitoring continuing for 3 additional years after the production stop. The average pressure variation occurring within the reservoir in time is shown in Figure 1b. The pressure distribution in space, also within the connected aquifer, is predicted by means of a dynamic reservoir flow model. The reader can refer to Reference [13] for more details. Model uncertainty is connected to the four parameters of the material constitutive law mentioned in Section 2.1. Based on physical considerations and a possible prior knowledge, if available, a range of variability is associated to such parameters. In particular, we set: The range for λ * has been chosen recalling the physical relationship between λ * and the vertical compressibility c m , which has been derived in References [9,53] for the Northern Adriatic basin. For the other parameters typical values reported in the literature, for example in Reference [3], have been taken into account. Hence, we initialize the gPCE algorithm by defining a vector of n = 4 random parameters p = (λ * , ρ κ , ρ µ , R). Note that since we employ uniform distributions for the random variables, the Legendre polynomials are used as gPCE basis functions. All the other modelling assumptions, including initial and boundary conditions, geometry and pressure distribution, are considered as deterministic.
As quantities of interest, collected in the vector U given as output from the full forward geomechanical model S, we consider the time series of the vertical displacement at a single-point of the top surface showed in Figure 2a, corresponding to land subsidence in time. Measurements of this quantity are supposed to be collected every three months for ten years, for example, by a GPS station. The true observations are collected in the vector d t . They are synthetically obtained by running the forward model with what is assumed to be the true material parameter set (continuous lines in Figure 2b). Then, the observation vector d is obtained by adding an error sampled from a zero-mean Gaussian noise with standard deviation σ = 0.01 m (dots and squares in Figure 2b).

gPCE Surrogate Model
Since we use uniform distributions for the input parameters, the gPCE basis functions are multivariate Legendre polynomials, which are orthogonal with respect to the uniform measure. The first objective of this analysis is to test the quality of gPCE as a surrogate of the exact outcome of the forward model. We provide results for different degrees of the polynomial truncation, N =1,2,3. A full tensor approach is used for N = 1 and N = 2, leading to a total number n q of forward model runs equal to 16 and 81, respectively. Since with N = 3 we would have needed n q = 256 full model runs, a sparse grid approach based on Smolyak's coarse tensorization [54] is used, thus reducing n q to 137. It should be noticed that the use of such sparse grid numerical integration might lead to unacceptable errors in the expansion coefficients with higher order polynomials [55]. This issue is not encountered in the present application, however alternative accurate approaches can be also employed if needed, such as those advanced in References [55,56].
A quantitative evaluation of the fitting quality of the gPCE surrogate is provided by employing the leave-one-out (LOO) cross-validation [57,58], where an estimate of the mean-square error of the residuals between the full model and surrogate solution is given by the SSE LOO defined as: where u z,i is the full model result at a certain time t and parameter combination i and u ∼i z,i is the surrogate solution at the same t and i with the gPCE built without the point denoted by i. In practice, the n q residuals can be obtained without building n q different gPCE but using the original gPCE constructed with the whole set of collocation points [57]. The coefficient Q 2 , similar to the coefficient of determination R 2 , can be defined as: where SST is the sum of the squared deviations of u z,i from their mean values µ u at the instant t.
Values of Q 2 close to 1.0 indicates a good match between the model outcome and the gPCE surrogate approximation. Table 1 reports the mean and variance of vertical displacements at the times t =1, 5, and 10 years, along with the Q 2 value. Q 2 progressively increases to the upper limit of 1 for increasing N, proving the convergence of both the polynomial expansion and the high-dimensional quadrature formula. This holds true for t > 5 years with a gPCE-surrogate solution less accurate at the onset of the simulation. Table 1. Mean µ u , variance σ 2 u and coefficient Q 2 with the generalized Polynomial Chaos Expansion (gPCE) approximation up to degree N = 1, N = 2 and N = 3 (sparse grid).

Sensitivity Analysis
Before proceeding with the assimilation, it is important to evaluate the actual influence of the uncertain parameters on the model output. We compute the Sobol' indices using the gPCE-surrogate solution of the model output. The results are presented in Figure 3. First-order indices clearly show the higher impact on the total variance with a negligible influence of the factors interactions. In particular, parameters λ * and R appear to have the major impact on the solution at several times, except for the first two years where ρ κ and ρ µ have a non-negligible effect. However, the significance of the quantity of interest, that is, vertical displacements, is quite limited at the beginning of the production period, because both the pressure variation and the geomechanical answer of the system are still very small and possibly influenced by the measurement errors. Thus, we limited our sensitivity analysis to the time interval between 3 and 10 years. In this temporal window, based on the total Sobol' indices S T,i (Figure 3c,d) and setting ε S = 0.1, we can remove from the parameter space the ratios ρ κ and ρ µ . However, it is important to point out that further analysis may be needed in order to investigate the model sensitivity based on different metrics, for example, on the expected value, skewness and kurtosis [59,60]. This may be helpful to better understand the role of each parameter, thus ensuring its exclusion from the parameter space is negligible.
The new parameter space is defined by p = (λ * , R) with n = 2. This allows to reduce significantly the computational effort and, for the further analysis, retain the gPCE order to N = 3 with a total number of high-dimensional quadrature nodes n q = 16. The Sobol' indices for this new parameter space have the same behaviour as the one shown in Figure 3.

Bayesian Update
In this section, we compare the results of state/parameter update by the assimilation of surface displacements using the classical ES approach and its variation in the form of gPCE-ES. As already mentioned before, the synthetic "true" observations d t are collected from a geomechanical model run with the "true" parameter set and perturbed by a zero-mean Gaussian noise (Figure 2b). Different "true" parameter sets are investigated to test the robustness of the proposed approach. The gPCE-ES technology is then applied within the iterative ES-MDA framework using either constant and decreasing factors α MDA to inflate the covariance matrix of measurement error.

ES and gPCE-ES
We assimilate a time series of vertical displacements collected at a point on the top surface located approximately at the center of the reservoir (Figure 2a). The synthetic observations are given by the geomechanical model run with what we assume to be the "true" parameter set (λ * = 0.011, R = 1.25). The purpose is to constrain the forecast response of both state variables, for example, surface displacements and material parameters. Figure 4 shows the prior and updated ensembles of the uncertain parameters, λ * (left sub-panels) and R (right sub-panels), obtained by progressively increasing the number of MC realizations. The assimilation is carried out by both the standard ES with the full forward model (red dots) and the proposed gPCE-ES approach (blue dots). As expected, the mean of the updated ensembles tends to approach the "true" value for both parameters, though with possible oscillations far from convergence, for example at n MC = 50. We provide the results for n MC = (16,25,41,50,75,100) in order to analyze the cases when the number of realizations is either smaller than, equal to, or larger than, the number of observations m = 41. Since the total cost of an ES application can be regarded as roughly equal to the number of forward model runs, we can measure the computational cost, η, just by counting the number of MC simulations. Therefore, the computational cost η g associated to gPCE-ES is constant and equal to 16, that is, the number n q of forward model runs required to compute the surrogate gPCE model. By distinction, with the standard ES the computational cost, η f , increases with n MC . The uncertainty reduction is significantly improved if n MC ≥ m. In case n MC < m we retain 99% of the non-zero singular values computed to invert the matrix in Equation (26). Generally speaking, the parameter R better approaches the "true" value as compared to λ * .  Figures 5 and 6 show the update of the state variable u z , that is, the vertical displacement in time at the monitoring point, by increasing n MC with ES and gPCE-ES, respectively. The observations are generally well-captured by both approaches, with no significant differences between the methods. Hence, the proposed surrogate model appears to capture very well the overall behaviour of the forward model. To measure the quality of the approximation provided by gPCE-ES, we define the metrics AE (average error) and AES (average ensemble spread) defined as: where ψ generally stands for state variables or parameters, and µ ψ its mean. The metrics are computed both for prior and updated ensembles of states and parameters, providing a quantitative measure of the assimilation quality. For the state u z , time-averaged values are used to compute the metrics. AE and AES reductions (  Based on these outcomes, we consider now only the gPCE-ES framework. Figure 7a shows the gPCE-ES convergence with n MC . We recall that the computational cost is roughly constant, η g = 16. The performance index AE u decreases with n MC for both parameters and stabilizes after ∼100 n MC . If a different "true" parameter set is used, for example, (λ * = 0.005, R = 1.5), we obtain the behaviour shown in Figure 7b. In this case, the mean converges to the "true" value at a much faster rate for λ * , while R remains poorly constrained. The "true" parameter sets are chosen such that they provide very different "true" reference simulations, as shown in Figure 2b. (f) n MC = 100 Figure 6. The same as Figure 5 with the forecast ensembles, in grey, obtained through the gPC expansion.

gPCE-MDA
The ES-MDA approach is based on a multiple assimilation process to provide a better model constraint especially on strongly non-linear models. Despite the potential improvement with respect to the standard ES method, the MDA approach is rarely used in practice mainly because of the large computational cost associated to the generation of several ensembles. The introduction of an inexpensive surrogate model in place of the full forward model can make also the MDA technique more affordable in real-world problems.
We consider two test cases differing in the selection of the inflation factors α MDA : (i) constant α MDA = n MDA , and (ii) decreasing α MDA = (18.66, 14,8,4,2). The number of successive assimilations is fixed a-priori and equal to n MDA = 5. Each assimilation step is performed by using ensembles with n MC = 3000 realizations. This ensemble size allows to achieve the convergence also with the ES technique, as shown in Figure 7. Figure 8 shows the mean and the standard deviation of the updated parameter ensembles in the five successive assimilation steps considering the "true" parameter set p t = (0.005, 1.5). In this case, the differences with respect to a single ES assimilation (blue bar in Figure 8) are evident, that is, with multiple assimilation steps the solution appears to better approach the truth and with a reduced variability spread. This behaviour is clearly noticeable for the parameter R (Figure 8b). Similar to ES, the MDA effectiveness seems to vary when the set of observations changes. For example, with the choice of p t = (λ * = 0.011, R = 1.25) the improvement of multiple assimilation steps with respect to a single ES update is not significant.

Discussion and Conclusions
In this work, we develop a gPCE-based framework to quantify and reduce the prediction uncertainty in geomechanical problems. The idea is to exploit the low computational cost of a gPCE-based surrogate model to-(i) perform a global sensitivity analysis and capture the material parameters mostly affecting the expected prediction, and (ii) update the variation ranges of both the state variables of interest and the governing parameters via data assimilation. We use an ensemble-based approach, such as the ES and its variation in the MDA form. Due to the high computational cost of the forward model runs required by the ensemble approaches, the proposed gPCE-ES and gPCE-MDA algorithms, if able to effectively approximate the full model behaviour, can allow for the use of large ensembles and successive assimilations at an affordable cost. The proposed approach is tested in a large-scale geomechanical model dealing with the prediction of land subsidence due to a hydrocarbon production program from a deep reservoir. To describe the constitutive behaviour of the deep rocks, we use a strongly non-linear visco-elasto-plastic model based on the VN law [3,43]. We show that high-accuracy surrogate models can be achieved via gPCE. Through the computation of the Sobol' indices, in the specific application considered herein we demonstrate that the material parameters mainly impacting on the outcome of interest, that is, land subsidence in time, are those controlling the rock deformability in loading conditions and the size of the yield surface, λ * and R. The initial variability ranges of such parameters are progressively reduced through data assimilation considering: (i) standard ES, (ii) gPCE-ES, and (iii) gPCE-MDA. The gPCE-ES provides highly accurate results comparable to those obtained with the native ES approach but at a significantly lower computational cost. It is worth recalling that the high efficiency of gPCE is mostly due to the low dimension of the parameter space. In fact, the gPCE approach suffers from the so-called "curse of dimensionality" due to the exponential increase of the cost with the number of independent stochastic variables.
The gPCE-ES is used to test the ES convergence rate at an increasing size of the ensemble. By fast polynomial evaluation we can analyse ensembles up to 20,000 realizations, which are not affordable without a surrogate model. In general, the parameter mean approaches the "true" solution with increasing MC realizations. Figure 9 shows that ensemble sizes greater than ∼2000 realizations do not provide any significant improvement on the update of the state variable, that is, the mean-ensemble value of the surface displacements coincides with the observation vector. This also implies a limitation to the best match of the parameter. We analyse the gPCE-ES convergence rate by using two parameter sets to define the "true" reference observations, with values located in the middle and on the boundary of the expected variability ranges. Different results are obtained, with a poor constraint for R when the "true" value is closer to the upper bound of the variability range. Then, we test again these sets with the gPCE-MDA approach by repeatedly assimilating the same observations a number of times with an inflated measurement error covariance matrix. We use both constant and decreasing values for the inflating factors, without any significant difference between the two approaches. The quality of the results for constraining R is significantly improved. The gPCE-MDA is obtained at the cost of computing the gPCE coefficients only, and by replacing the forward model runs required at any MDA step with fast polynomial evaluations. We demonstrate that MDA may outperform the standard ES and this option can become particularly attractive if combined with the gPCE approximation, which reduces considerably the required computational effort.
Our results show different parameter-constrain capabilities of the proposed approach depending on the choice of the reference "true" point in the parameter space. Figure 10 illustrates this behaviour in the space defined by parameters λ * and R. We randomly sampled 500 "true" reference parameter sets and perform the data assimilation by means of the gPCE-ES with n MC = 3000 realizations and constant noise standard error. Figures 10a,b show the error e = |µ u − p t | between the mean updated value and the true value for parameters λ * and R, respectively. Figure 10c shows the maximum vertical displacements assimilated at final simulation time. Clearly the u z,max increases by increasing λ * and diminishing R. Generally, λ * (Figure 10a) presents higher errors with 1.35 < R < 1.45 and the parameter seems to better approach the true reference value with increasing values of u z,max . The constrain on parameter R is more complex to analyse. A poor match is obtained at the four corners of the parameter space and minimum errors are obtained for u z,max −0.6 m and u z,max −0.3 m. Based on these results, further analyses would be needed to investigate the role of the parameter interactions on the model outcome linked to the influence of the varying time-behaviour parameter importance, with model non-linearity and measurements noise providing for additional crucial factors affecting the inverse problem solution. (c) Figure 10. Error e = |µ u − p t | between the mean updated value and the true value for parameters (a) λ * and (b) R at 500 randomly-sampled 'true' reference simulations. Sub-panel (c) shows the maximum subsidence assimilated at each assimilation.
Funding: This research received no external funding.

Acknowledgments:
The Authors wish to thank the University of Padova Strategic Research Infrastructure Grant 2017 "CAPRI: Calcolo ad Alte Prestazioni per la Ricerca e l'Innovazione" for the availability of the computational resources used in this work.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: