Next Article in Journal
Numerical Simulation of Dynamic Mechanical Properties of Concrete under Uniaxial Compression
Next Article in Special Issue
Correction: Ungson, Y. et al. Filling of Irregular Channels with Round Cross-Section: Modeling Aspects to Study the Properties of Porous Materials. Materials 2018, 11, 1901
Previous Article in Journal
Bio-Based Polymers with Antimicrobial Properties towards Sustainable Development
Previous Article in Special Issue
Stochastic Constitutive Model of Isotropic Thin Fiber Networks Based on Stochastic Volume Elements
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Adaptivity in Bayesian Inverse Finite Element Problems: Learning and Simultaneous Control of Discretisation and Sampling Errors

1
School of Engineering, Cardiff University, The Parade, Cardiff CF243AA, UK
2
MINES ParisTech, PSL University, Centre des Matériaux, BP87 91003 Evry, France
3
Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100 Copenhagen, Denmark
*
Author to whom correspondence should be addressed.
Materials 2019, 12(4), 642; https://doi.org/10.3390/ma12040642
Submission received: 23 December 2018 / Revised: 2 February 2019 / Accepted: 5 February 2019 / Published: 20 February 2019
(This article belongs to the Special Issue Randomness and Uncertainty)

Abstract

:
The local size of computational grids used in partial differential equation (PDE)-based probabilistic inverse problems can have a tremendous impact on the numerical results. As a consequence, numerical model identification procedures used in structural or material engineering may yield erroneous, mesh-dependent result. In this work, we attempt to connect the field of adaptive methods for deterministic and forward probabilistic finite-element (FE) simulations and the field of FE-based Bayesian inference. In particular, our target setting is that of exact inference, whereby complex posterior distributions are to be sampled using advanced Markov Chain Monte Carlo (MCMC) algorithms. Our proposal is for the mesh refinement to be performed in a goal-oriented manner. We assume that we are interested in a finite subset of quantities of interest (QoI) such as a combination of latent uncertain parameters and/or quantities to be drawn from the posterior predictive distribution. Next, we evaluate the quality of an approximate inversion with respect to these quantities. This is done by running two chains in parallel: (i) the approximate chain and (ii) an enhanced chain whereby the approximate likelihood function is corrected using an efficient deterministic error estimate of the error introduced by the spatial discretisation of the PDE of interest. One particularly interesting feature of the proposed approach is that no user-defined tolerance is required for the quality of the QoIs, as opposed to the deterministic error estimation setting. This is because our trust in the model, and therefore a good measure for our requirement in terms of accuracy, is fully encoded in the prior. We merely need to ensure that the finite element approximation does not impact the posterior distributions of QoIs by a prohibitively large amount. We will also propose a technique to control the error introduced by the MCMC sampler, and demonstrate the validity of the combined mesh and algorithmic quality control strategy.

1. Introduction

The Bayesian statistical framework has been used extensively in the problem of system identification [1] or model updating based on experimental test data [2,3]. The objective of using inverse problems to learn or calibrate model or latent parameters, including model error terms, lies in the fact that underlying parametrised computational models are uncertain, or erroneous [4]. The experimental data when assimilated into the model, using the Bayesian inference framework, is expected to provide joint estimates of the model parameters conditional on the data and compensate for uncertainty/bias in the model predictions. However, operating only in the field of model parameters is problematic for a number of reasons. Firstly, the underlying computational models for any practical application is expensive. As a result working with a very high resolution numerical model at every stage is prohibitively expensive. Secondly, when using surrogate models in the parameter space in conjunction with a simulator, adaptive enrichment of the response surface does not take adaptive mesh refinement within its purview. This is not optimal since ensuring that the model error in the energy norm is bounded in the parameter space requires a uniformly high resolution and increases the associated computational overhead. Lastly the advanced adaptive mesh refinement techniques are rarely used in conjunction with parametric learning using Bayesian inference. There is significant room for improvement in this regard since a simultaneous control of both statistical and discretisation error would lead to substantially improved predictive numerical models both in terms of accuracy and computational efficiency.
At the root or our methodology is the estimation of errors due to the finite element approximation of the partial differential equation (PDE) of interest. Classically, spatial resolution of finite element models can be adaptively refined (also known as local h-refinement) based on a posteriori error estimation techniques [5,6,7], combined with re-meshing strategies. Within this, methods focussing on errors estimated in terms of specific quantities of interest (QoI), rather than the classical energy norm, constitute the goal-oriented adaptivity scheme [8,9,10,11,12], which is of particular interest in the present study. Numerical studies have shown that these give better convergence in the local features of the solution compared to traditional approaches.
Integrating model reduction techniques with finite element model updating techniques has received some attention in recent years [13], where the motivation is to use Bayesian model updating framework with an adaptive scheme of enriching the surrogate response surface. Multistage Bayesian inverse problems are quite important in this respect [14,15] and have important applications for system identification of vibrating systems. The prediction error is an important parameter to be calibrated in such cases, as the authors point out. But the improvement in model predictions, if solely focussed on obtaining the posterior probabilistic parameter estimates or for adaptive enrichment of the response surface in the parameter space, without considering simultaneous enhancement of the resolution of the numerical simulator would be unsatisfactory both from the standpoints of computational accuracy and efficiency.
The forward problem of uncertainty propagation has been investigated extensively for the solution of stochastically parametrised partial differential equations. These range from efficient stochastic Galerkin methods using polynomial chaos basis functions [16,17,18,19], stochastic collocation techniques [20,21], Monte-Carlo sampling based methods (and its various improvements) [22,23,24] and other deterministic sampling methods [25,26,27]. The main challenge is to obtain a good approximation of the lower order statistical moments of the state vector or specific quantities of interest. On the other hand, advances in the resolution of stochastic inverse problems has become a very active topic in engineering and mathematical research (see e.g., [28]). Scalable Bayesian inversion algorithms for large-scale problems have been investigated [29] and as well as Bayesian inversion to probabilistic robust optimization under uncertainty [30]. The use of adaptive sparse-grid surrogates unified with Bayesian inversion for posterior density estimates of the model or design has also been studied parameters [3,31,32]. This research focuses mostly on the definition of surrogates of the numerical model and having adaptive methods to control the statistical sampling error in the definition of the response surface. Lately there has been some research [33] that defines adaptivity as the local enrichment of the surrogate model and uses error estimation to bound this error, with no mention of spatial discretisation. However, coupling engineering uncertainty quantification (UQ) with an adaptive scheme for goal-oriented finite element model refinement remains a sparsely studied domain and presents significant challenges. It is so both from the problem formulation perspective, owing to the choice of appropriate candidate estimates based on which adaptive model enrichment can be performed, as well as incorporating it into the general formulation of Bayesian inversion.
The main focus of the paper is to develop a robust methodology for the simultaneous control of errors from multiple sources—the goal-oriented finite element error and the uncertainty-driven statistical error—in a Bayesian identification framework for identification system parameters conditional on data (experimental or otherwise). The novel algorithmic approach proposed in this paper consists in running two Markov Chain Monte Carlo algorithms simultaneously to sample the posterior densities of the quantities of interest (component-wise MCMC). The first chain utilises the current finite element model, whilst the second chain runs with a corrected likelihood function that takes into account the discretisation error. The latter quantity may be constructed by making use of a posteriori finite element error estimates available in the literature. At any time during the sampling process, two empirical densities are available and may be compared to evaluate the effect of the discretisation error onto posterior densities. The second building block of our algorithmic methodology allows us to determine whether enough samples have been drawn by the component-wise MCMC. Using multiple parallel chains combined with bootstrap-based estimates of sampling errors, we automatically stop the MCMC algorithms when either (i) the required level of accuracy is achieved or (ii) we have generated enough statistical confidence in the fact that the discretisation error is too large for our purpose, and therefore that mesh refinement is necessary. While any available error estimate of the discretisation error may be used within the general strategy outlined above (a goal-oriented residual or recovery-based error estimate, for instance [8,9,11,34]), we choose instead to construct this estimate via a dedicated machine learning approach. This interesting feature, inspired by previous work in data-driven error modelling [35,36,37,38], will be briefly outlined in the paper.
The paper is organised as follows. Section 2 introduces the Bayesian inverse problem based on a finite element model of a parametrised vibrating structure. This section also includes discussions on discretisation error. Some numerical examples are presented in Section 3, which aims to demonstrate the convergence of the joint posterior distributions on the model parameters through successive stages of mesh refinement. Section 4 discusses the total error as a combination of statistical and discretisation error that results from the MCMC algorithm used for sampling from posterior distributions. Section 5 gives the methodology for robust, simultaneous control of all error sources within the adaptive inverse problem solver using a component-wise MCMC algorithm in conjunction with bootstrap-aggregated regression model for model parameters. Numerical examples are presented and discussed in Section 6 to demonstrate the capabilities of the proposed methodology.

2. Finite Element Bayesian Inverse Problems

Although the methodology proposed in this paper is general, we will apply it to the verification of popular inverse finite element procedures used to monitor the integrity of structures during service. We will assume that the structure can be modelled as an elastic body, and that potential structural damage can be modelled by the evolution of a field of elastic constants characterised by a finite number of uncertain parameters. Assuming that the resonance frequencies of the structure have been measured physically, we attempt to identify this set of mathematical parameters, through the solution of an inverse problem. Significant deviations from the parameter values corresponding to an undamaged structural state may reveal structural failure. This is a typical condition monitoring method used for non-destructive structural testing.
In order to clarify the proposed study, we will assume that we have measured the n d first dynamic eigenfrequencies d = ln ( ω 1 ) ... ln ( ω n d ) T of the structure represented in Figure 1. The purpose of the model inversion is to identify the values of n μ parameters μ = μ 1 ... μ n μ T of the structural dynamics model, here the log of the elastic modulii of the subdomains represented in Figure 1. We will also aim to predict the remaining n p n d frequencies p = ln ( ω n d + 1 ) ... ln ( ω n d + n p ) T . The relationships μ d ( μ ) and μ p ( μ ) , i.e., the computational model, are defined implicitly through the evaluation of a standard finite element model of the steady-state, undamped structural vibrations.

2.1. Bayesian Inverse Problem

We assume that the quantities d that are measured experimentally are described by the mathematical model, up to an error, which is modelled in a probabilistic manner. Following standard Bayesian procedures, this error is modelled as a white Gaussian noise,
d = d m + ϵ , ϵ N ( 0 , Σ n 1 )
with Σ 0 a positive definite covariance matrix. A d : μ d m : = A d ( μ ) is the mathematical prediction that corresponds to the experimental observation. Model A d is a deterministic function of n μ uncertain model parameters that we organise in a n μ -dimensional vector μ . Bayesian inversion requires to associate a prior probability density with the model parameters μ . This prior probability density, denoted by π μ ( μ ) in the following, encodes the knowledge that we possess about μ before making any physical observations.
Seen from a different point of view, the probabilistic inversion setting amounts to the definition of a joint probability distribution for μ and d :
π μ , d ( μ , d ) = π d | μ ( μ , d ) π μ ( μ )
The formal expression of likelihood function L ( μ ; d ) : = π d | μ ( μ , d ) is a direct consequence of assumption (1), namely
L ( μ ; d ) = 1 ( 2 π ) n d | Σ n | e 1 2 d A d ( μ ) T Σ n 1 d A d ( μ )
It is now possible to formally proceed to the inversion itself by conditioning the joint distribution to actually observed quantities. Applying Bayes’ formula, the posterior probability density of the model parameters is
π μ | d ( μ ; d ) = L ( μ ; d ) π μ ( μ ) π d ( d )
where π d ( d ) is a normalising constant, whose computation requires a usually intractable integration. Bayes formula provides us with an updated knowledge about the uncertain part of our mathematical model. It is now possible to predict unobserved quantities i.e.,
p = A p ( μ ) , μ π μ | d ( μ ; d )
by formally propagating the posterior uncertainty through model A p : μ p m : = A p ( μ ) . The posterior predictive probability density of p will be denoted by symbol π p | d ( μ ; d ) .

2.2. Finite Element Modelling of Inverse Structural Vibration Problems

2.2.1. Direct Finite element Procedure for Frequency-Domain Vibrations

The numerical model
μ A ( μ ) : = A d ( μ ) A p ( μ )
is implicitly defined through the solution of a continuum mechanics problem. The components of d m and p are the logarithm of the eigenvalues corresponding to the following parametrised eigenvalue problem: Find u H 1 ( Ω ) and ω R + such that v H 1 ( Ω )
Ω s u : D ( x ; μ ) : s v d Ω ω 2 Ω ρ u · v d Ω = 0
In the previous variational statement, Ω is the domain occupied by the structure of interest, H 1 ( Ω ) is the space of functions defined over Ω , with values in R 2 , that are zero on part u Ω of the boundary of the domain, and whose derivatives up to order one are square integrable. s denotes the symmetric part of the gradient operator. D is the fourth order Hooke tensor and ρ is the mass density of the solid material. The problem possesses an infinite number of solutions ( ω , u ) called free vibration modes. We order the free vibration frequencies in increasing order ω 1 < ω 2 < ... < ω n d + n p . The free vibration modes are functions of uncertain material parameters μ through the definition of the Hooke tensor. Specifically,
D ( x ; μ ) : s . = λ ( x ; μ ) Tr ( s . ) I + 2 G ( x ; μ ) ( s . )
with the parametrised Lamé constants
λ ( x ; μ ) = 0.3 × E ( x ; μ ) 2 ( 1 + 0.3 ) ( 1 2 × 0.3 ) and G ( x ; μ ) = E ( x ; μ ) 2 ( 1 + 0.3 )
Solving the continuum mechanics model is equivalent to evaluating the mapping
μ A d ( μ ) A p ( μ ) = ln ω 1 ( μ ) ... ω n d ( μ ) T ln ω n d + 1 ( μ ) ... ω n d + n p ( μ ) T .
There is, in general, no analytical solution to the continuous vibration problem, and a standard way to obtain approximate solutions is to substitute a finite element space U h ( Ω ) for infinite dimensional search space H 1 ( Ω ) U h ( Ω ) [39]. Too coarse a finite element discretisation may result in poorly predictive results, while too fine a mesh will lead to numerically intractable results, or, in any case, to a waste of computing resources.

2.2.2. Finite Element Approximation of the Bayesian Inverse Problem

Whilst using the continuum mechanics model exactly would deliver the posterior density π μ | d ( μ ; d ) as solution of the Bayesian inverse problem, we now have an approximate posterior density
π ¯ μ | d ( μ ; d ) = L ¯ ( μ ; d ) π μ ( μ ) π ¯ ( d )
where the finite element likelihood L ¯ is obtained by substituting finite element mapping A ¯ d for A d in Equation (3). Similarly, the approximate posterior density of p is denoted by π ¯ p | d ( μ ; d ) and obtained by evaluating finite element mapping A ¯ p instead of A p when propagating the posterior uncertainties forward.

2.2.3. Discretisation Error

The finite element error is the mismatch between π ¯ μ | d ( μ ; d ) and π μ | d ( μ ; d ) on the one hand, and π ¯ p | d ( μ ; d ) and π p | d ( μ ; d ) on the other hand. Various measures can be used to quantify this mismatch, amongst which the Hellinger distance, defined by
D h ( π μ | d ( μ ) , π ¯ μ | d ( μ ) ) = π μ | d ( μ ) π ¯ μ | d ( μ ) 2
the Kullback-Leibler divergence, the total variation distance and the Kolmogorov-Smirnov (KS) distance, defined by
D ks ( π μ | d ( μ ) , π ¯ μ | d ( μ ) ) = sup μ ( Π μ | d ( μ ) Π ¯ μ | d ( μ ) )
where the capital symbols Π . denotes the cumulative probability density corresponding to π . , and Π ¯ . is the finite element approximation of Π . This contribution will make use of the latter measure, in a one-dimensional setting (i.e., it will be applied to control the accuracy of the posterior density of one of the elements of μ or one of the elements of p ). The attractiveness of the Kolmogorov–Smirnov distance is its straightforward application in the context of Monte-Carlo procedures, where only empirical densities are available, and its closeness to confidence intervals (CIs), which makes its values relatively easy to interpret within the context of a posteriori error estimation. Notice that the D ks is lower bounded by 0 (identical density functions) and upper bounded by 1 (non-overlapping support for the probability density functions).

3. Numerical Examples—Part I: Effect of Discretisation Errors Onto Posterior Densities

This section introduces the numerical examples that will be investigated in this paper, and aims to provide a first qualitative understanding of the effect of mesh refinement onto the quality of posterior probability densities.

3.1. Forward Stochastic Model

The stochastic field of Young’s modulus that will be used to exemplify the error control approach proposed in this paper is defined via a decomposition of domain Ω into n μ + 1 = 7 non-overlapping subdomains { Ω i } 1 n μ + 1 such that i 1 n μ + 1 Ω i = Ω and Ω i Ω j = { } for i j . The domain decomposition is represented in Figure 1. Then, the proposed model is such that
E ( x ; μ ) = e μ i if x Ω i and i 1 , E ( x ; μ ) = 1 otherwise .
Hence, the scalar parameters contained in vector μ R n μ are the logarithms of the Young’s modulii corresponding to each of the subdomains.
The prior density is a multivariate Gaussian, and is given by equation
π μ ( μ ) = 1 ( 2 π ) n μ | Σ 0 | e 1 2 μ μ 0 T Σ 0 1 μ μ 0
The prior mean μ 0 is the null vector, and the prior variance Σ 0 = σ 0 2 I d , where I d R n μ × R n μ is the identity matrix, is diagonal and isotropic. The prior density is represented in dimensions ( μ 1 , μ 3 ) in Figure 1.
Finally, we model the error ϵ as a zero-mean multivariate Gaussian (consistently with what was described in previous section), with independent components and isotropic variance, i.e., Σ n = σ n 2 I d .

3.2. Computational Meshes

The evaluation of the likelihood function appearing in solution (11) of the Bayesian inverse problem requires solving the continuum mechanics problem using the finite element method. In this example, we use a sequence of meshes { M i } i n m associated with a monotonically increasing number of degrees of freedom. These meshes are represented in Figure 2. Although the sequence of meshes is not strictly hierarchical, we see that the typically uniform element size is divided by 2 when moving from mesh M i to mesh M i + 1 . The intermediate meshes { M i + 1 2 } i 1 n ˜ m represented in Figure 2 will be used later on.

3.3. Inverse Problems and First Results

Two tests will now be investigated:
  • Test 1(weakly informative data): only the first eigenvalue is measured, i.e., d is scalar. This can be interpreted as a task of model updating, where new data is used to update an existing knowledge.
  • Test 2 (strongly informative data): the first three eigenvalues of the structure are measured. This can be interpreted as an inverse problem, where rich information is used to identify all the unknown of the model, and the probabilistic setting acts as a regulariser.
The two corresponding marginal posterior density π ^ μ 1 , μ 3 | d obtained when using mesh M 3 are represented in Figure 3. Samples from this density are obtained by a Monte-Carlo sampler, as presented in next section. A Kernel Density Estimate (KDE) is used as a smoother for illustration purposes only. Notice that for the predictive posterior densities, the histograms correspond to the marginal densities of each individual eigenvalue, which explains their overlap.
It is interesting to notice that the posterior densities observed in Test 2 are much sharper than that of Test 1, owing to the quality of the data. The symmetry in the results are a consequence of structural and probabilistic symmetries (see Figure 1 and the definition of the prior probability density).
Synthetic data for this problem is generated by computing the average of the spectra delivered by meshes M 2 and M 2 + 1 2 , for a reference parameter vector μ d μ 0 but situated in the vicinity of the prior mode. The model averaging, together with selecting relatively coarse finite element meshes in sequence { M i } i n m , is meant to circumvent the “inverse crime" problem.

3.4. Convergence with Mesh Refinement

The posterior densities corresponding to Test 2 and to various level of mesh refinement are displayed in Figure 4. The difference between the modes of the predicted eigenvalues that are obtained with coarse and fine meshes are very large. The fact that this discrepancy increases with the mode number is to be expected as the spatial wave length of the deformations in the continuum can be shown to increase linearly with the eigenfrequency. As a consequence, a good mesh for the first range of frequencies might be unable to capture the faster spatial variations associated with higher vibration mode.
The difference between the posteriori densities corresponding to meshes M 3 and M 4 is qualitatively small. The solution of the Bayesian inverse problem converges with mesh refinement. Notice that the posterior distribution for the model parameters goes from mono-modal to multi-modal, which may prove a stumbling block when selecting an appropriate Monte-Carlo sampler.

4. Monte-Carlo Sampler and Combined Effect of Statistical and Discretisation Errors

4.1. Tempered Metropolis-Hastings Markov Chain Monte Carlo Algorithm

The approximate posterior density described by Equation (11) is an arbitrarily complex function of μ , as it depends on the nonlinear computational model ( A d ( μ ) T A p ( μ ) T ) T . In particular, standard random number generators cannot be used to draw samples from this distribution. Importance Monte-Carlo procedures whereby the prior distribution is used as proposal density, will also fail. This is because the posterior density may be arbitrarily different from the prior density, resulting in unacceptably large variances of importance sampling estimates. Designing better proposal densities a priori is impossible and a successful importance sampling approach would require generating the proposal density using advanced methods such as sequential Monte-Carlo samplers.
One of the simplest and generic generator of samples from posterior densities is the Metropolis-Hastings (MH) Markov Chain Monte Carlo (MCMC) algorithm (see Figure 5). The algorithm works as follows. Starting from sample μ n , the next sample μ n + 1 is drawn from transitional distribution
T μ n + 1 μ n ( μ n + 1 ; μ n ) min 1 , π ¯ μ | d ( μ n + 1 ) π ¯ μ | d ( μ n ) N ( μ n + 1 ; μ n , Σ ˜ )
This is done by first drawing a random move from μ k using the multivariate Gaussian (any proposal can be used, but the expressions exposed in this section are only valid for symmetric proposals), and then accepting or rejecting the move in order to account for the first term of the transition. Typically, a move ending up in a state of higher posterior density is always accepted, whilst a move ending up in a state of lower density may be accepted or not, depending on the result of a die and the ratio of posterior densities between current and proposed states.
This particular transition is designed such that the following ergodic property holds:
μ n + 1 T μ n + 1 μ n ( μ n + 1 ; μ n ) d μ n = π ¯ μ | d ( μ n + 1 )
As a result, under some assumptions, the Markov chain is guaranteed to have π ¯ μ | d as its stationary distribution, which means that μ n π ¯ μ | d as n . Each sample, taken individually, is distributed according to the posterior density, if the chain is run long enough. Due to the Markov process, the samples are not independent, which makes frequentist error estimation and convergence diagnosis more difficult than in the context of traditional Monte-Carlo algorithm, where tools such as standard error and bootstrap apply without particular difficulty. Practically, a burn-in phase is first observed, whereby the chain “seeks” the regions of high probability densities (see Figure 5). Once found, samples become progressively distributed according to the target posterior density.
Notice that Σ ˜ is difficult to choose a priori. Adaptive proposal MCMC have been proposed in [40,41]. Alternative solutions to choose good proposal are methods based on particle mechanics (e.g., Langevin diffusion, Hamiltonian dynamics) (see e.g., [42]). In this particular contribution, the proposal densities have been calibrated “by hand”. We use a tempered version of the MCMC algorithm, whereby we simultaneously sample multiple tempered replicates of the posterior density, with proposed state exchange between replicates that are subsequently Metropolis corrected [43,44] (see also [45] for an interesting application in the context of structural damage assessment). This relatively classical MCMC algorithm allows us to sample from; densities that are multi-modal or become multi-modal with mesh refinement (at least in the low-dimensional parametric settings that is investigated in this paper, as multi-modality in high-dimensions remains an open problem).
The call to the finite element solver is hidden in the acceptance/rejection test, which requires the evaluation of the finite element likelihood at the proposed state. Therefore, as for a standard MC algorithm, one iteration means one evaluation of the computational model.

4.2. Empirical Posterior Densities

Once n samples M ˜ = { μ 1 , μ 2 , ... , μ n } have been computed (n larger than 10) by the MH algorithm, we discard the first 25% (burn-in) of these samples, resulting in n ˜ samples that we hope are located in the ergodic part of the Markov chain. The resulting empirical probability density is given by
π ¯ μ | d ( μ ) π ¯ μ | d ( n ) ( μ ) : = 1 n ˜ k = n n ˜ + 1 n δ ( μ μ k )
where δ is the Dirac delta function. Accordingly, the empirical predictive posterior density is
π ¯ p | d ( p ) π ¯ p | d ( n ) ( p ) : = 1 n ˜ k = n n ˜ + 1 n δ ( p A p ( μ k ) )
The cumulative empirical predictive posterior density may be expressed as
Π ¯ μ | d = d ( μ ) : = 1 n ˜ k = n n ˜ + 1 n I ( μ ; μ k μ )
where I is the indicator function, and the inequality is to be understood in a component-by-component manner. A similar definition holds for the cumulative posterior density of p .

4.3. Total Error Measure

It is now clear that both the finite element discretisation and the Monte-Carlo approximate sampling affect the quality of the resulting posterior densities. The total error, for the marginal distribution of a single element μ i of μ , reads as
D ks ( π μ i | d ( μ i ) , π ¯ μ i | d ( n ) ( μ i ) ) = sup μ i ( Π μ i | d ( μ i ) Π ¯ μ i | d ( n ) ( μ i ) )
which can be formally decomposed as follows
D ks ( π μ i | d ( μ i ) , π ¯ μ i | d ( n ) ( μ i ) ) = sup μ i ( Π μ i | d ( μ i ) Π ¯ μ i | d ( μ i ) ) + Π ¯ μ i | d ( μ i ) Π ¯ μ i | d ( n ) ( μ i ) ) D ks ( π μ i | d ( μ i ) , π ¯ μ i | d ( μ i ) ) + D ks ( π ¯ μ i | d ( μ i ) , π ¯ μ i | d mc ( μ i ) )
In the last expression, the first term is the pure finite element error, which would occur if we could run the Markov process for an infinite number of iterations, while the second term is a pure statistical error.
We also define the error of an element p i of p as follows:
D ks ( π p i | d ( p i ) , π ¯ p i | d ( n ) ( μ i ) ) = sup p i ( Π p i | d ( p i ) Π ¯ p i | d ( n ) ( p i ) )

5. Robust, Automatised and Comprehensive Error Control

5.1. Simulation of the Discretisation Error

The finite element method introduces an error in computational mapping A ( μ ) = A d ( μ ) T A p ( μ ) T T . Mapping A contains all the scalar quantities that need to be evaluated through calls to the finite element solver, namely the numerical predictions of the physical measurements d , and the posterior predictions p . We define the error in the simulated data as
Δ A d ( μ ) = A d ( μ ) A ¯ d ( μ )
and the error in the posteriori predictions as
Δ A p ( μ ) = A p ( μ ) A ¯ p ( μ )
For now, we assume that both these quantities can be estimated, at affordable numerical cost and in a reliable manner. Therefore, for any value of parameter μ , a corrected computational model is available, which reads as
A d ( μ ) A p ( μ ) very close to A ^ d ( μ ) A ^ p ( μ ) : = A ¯ d ( μ ) A ¯ p ( μ ) + Δ A ^ d ( μ ) Δ A ^ p ( μ )
where symbol . ^ denotes computable estimates.

5.2. Component-Wise MCMC

It is now posible to sample the corrected posterior distribution
π ^ μ | d ( μ ; d ) = L ^ ( μ ; d ) π μ ( μ ) π ^ ( d )
using MCMC. It should be clear that the corrected posterior distribution is simply obtained by replacing the corrected computational model (26) into the expression of the likelihood function, Equation (3). Notice that the normalising constant is affected by modifications of the computational model. This is of no practical consequence as MCMC samplers work with unnormalised densities, and the KS distance uses empirical cumulative distributions directly, without the need for smoothing or marginalisation.
We formally define a component-wise MCMC were the uncorrected and corrected computational models are sampled at the same time. This will yield an estimate of the effect of the discretisation error on posteriori densities at any stage of the Markov process, which, in turn, will allow us to develop an early-stopping methodology. The Component-wise MCMC iteration proceeds as follows, given a current sample ( μ ¯ n , μ ^ n ) of the uncorrected/corrected finite element posteriori densities,
  • Draw ( μ ¯ n + 1 , μ ^ n + 1 ) such that
    μ ¯ n + 1 N ( . ; μ ¯ n , Σ ˜ ) and μ ^ n + 1 N ( . ; μ ^ n , Σ ˜ )
  • draw ( u , v ) such that
    u U ( [ 0 1 ] ) and v U ( [ 0 1 ] )
  • Accept μ ¯ n + 1 if and only if
    u min 1 , L ¯ ( μ ¯ n + 1 ; d ) π μ ( μ ¯ n + 1 ) L ¯ ( μ ¯ n ; d ) π μ ( μ ¯ n )
    set μ ¯ n + 1 = μ ¯ k otherwise.
  • Accept μ ^ n + 1 if and only if
    v min 1 , L ^ ( μ ^ n + 1 ; d ) π μ ( μ ^ n + 1 ) L ^ ( μ ^ n ; d ) π μ ( μ ^ n )
    set μ ^ n + 1 = μ ^ k otherwise.
The MCMC algorithm is initialised by state ( μ ¯ 0 , μ ^ 0 ) , where both ( μ ¯ 0 and μ ^ 0 ) are drawn from distribution π 0 .
In the field of a posteriori finite element error estimation, the error estimate is usually a post-processing operation of the coarse finite element solution. This adapts seamlessly to non-Markovian Monte-Carlo samplers, by post-processing each of the independent samples. Here, unfortunately, the finite element model has to be called twice at every iteration of the Markov process. Whether this can be avoided or not, for instance by making use of elements of sequential Monte-Carlo samplers, is unclear to us at this stage.

5.3. Machine Learning-Based Simulation of the Discretisation Error

At this stage, any numerical error estimator can be used, provided that it is goal-oriented. A method of choice could be a residual-based [5,6] or smoothing-based a posteriori error estimator [12,46] in conjunction with the adjoint methodology [8,9,11]. In addition, nothing prevents us from using a meta-modelling approach, such as projection-based reduced order modelling [47,48,49,50,51,52] or polynomial chaos expansions [16,17] to approximate the variations of the computed quantities of interest with parameter variation. Error estimates also exist for such two-level approximations.
In this contribution, we develop and use a feature-based method, that finds its roots in data-science methodologies and is much more “black-box” than the previously mentioned strategies. The proposed technique is inspired by the work of [35,36,37].
The exact continuum mechanics model is well approximated by a very refined numerical strategy (e.g., no meta-modelling and a very fine mesh). However, this very refined numerical model cannot be used at every iteration of MCMC as its evaluation is very costly. We propose to train a model that will map parameter μ to the output of the generally intractable very fine model through the combination of (i) a dedicated feature extractor and (ii) a weakly parametric regression model. This combination is defined as
A d ( μ ) A p ( μ ) A ¯ d ( μ ) A ¯ p ( μ ) = wished Δ A ^ d ( μ ) Δ A ^ p ( μ ) = constructed R ( F ( μ ) ; θ )
where the . denote quantities that are delivered by the overkill (but computable) numerical model and R is a weakly parametrised regression model, here a neural network regression (a Gaussian process could be used as well, but a random forest would probably have been the most efficient choice, given the way we bootstrap the regression model to generate estimates of generalisation errors), parametrised by a set of parameters θ R n θ (we do not explicitly distinguish parameters and hyper-parameters in our notations). F is a mapping from input μ to a feature space. Its careful design is critical to the success of the machine learning procedure. We choose to construct the following features:
F ( μ ) = A ˜ d ( μ ) A ¯ d ( μ ) A ¯ d ( μ ) A ˜ p ( μ ) A ¯ p ( μ ) A ¯ p ( μ )
where A ˜ d ( μ ) are slighlty corrected computational models, here generated by refining the mesh by a moderate factor. In our examples, the typical mesh size is divided by 1.5 (see Figure 2 and Figure 6 where we use mesh M i + 1 2 to correct mesh M i , whilst the overkill solution is computed using M i + 2 ). In this fashion, the generation of features will remain of the order of the computation of the finite element solution itself.
Now, we train n d + n p multivariate neural network regression models
Δ A ^ d ( μ ) Δ A ^ p ( μ ) = R d , 1 e 1 n d T ( A ˜ d ( μ ) A ¯ d ( μ ) ) e 1 n d T A ˜ d ( μ ) , θ d , 1 . . . R d , n d e n d n d T ( A ˜ d ( μ ) A ¯ d ( μ ) ) e n d n d T A ˜ d ( μ ) , θ d , n d R p , 1 e 1 n p T ( A ˜ p ( μ ) A ¯ p ( μ ) ) e 1 n p T A ˜ p ( μ ) , θ p , 1 . . . R p , n p e n p n p T ( A ˜ p ( μ ) A ¯ p ( μ ) ) e n p n p T A ˜ p ( μ ) , θ p , n p
where e j m denotes the jth canonical vector of R m .
Each of the regression R l , i is a single-hidden layer bootstrap-aggregated neural network model with n n neurons and n nbs bootstrap replicates.
R l , i x y , θ l , i = 1 k k = 1 n nbs j = 1 n n a j ( k ) tanh a x , j ( k ) x + a y , j ( k ) y + o j ( k ) + o ( k )

Training

We sample artificial “data” by running the overkill computational model n ml times, after sampling the training set parameters { μ 1 ml , μ 2 ml , . . . , μ n ml ml } from prior π μ . The number of neurons and the cardinality of the training set is chosen automatically by making use of an automatised early-stopping methodology that aims to maximise the predictive coefficient of determination. We will not detail this procedure here. Fitting of the nonlinear regression coefficients is performed by employing standard least-squares method, solved by a gradient descent algorithm with randomised initialisation. Outliers of the set of bootstrap replicates are identified and eliminated to decrease the variance of the boostrap-aggregated regression model.
An example of fitted regression model is represented in Figure 6, where the output is the discretisation error in the first free vibration circular frequency (i.e., regression model R d , 1 ).

5.4. Bootstrap Confidence Intervals for the MCMC Sampler

At any iteration n of the MCMC sampler, a Monte-Carlo estimate of the discretisation error for the posterior density of the ith component of μ is given by
D ks ( n ) : = D ks ( π ^ μ i | d ( n ) ( μ i ) , π ¯ μ i | d ( n ) ( μ i ) ) = sup μ i ( Π ^ μ i | d ( n ) ( μ i ) Π ¯ μ i | d ( n ) ( μ i ) )
where
Π ¯ μ i | d = d ( n ) ( μ i ) : = 1 n ˜ k = n n ˜ + 1 n I ( μ i ; μ ¯ k , i μ i ) ,
and
Π ^ μ i | d = d ( n ) ( μ i ) : = 1 n ˜ k = n n ˜ + 1 n I ( μ i ; μ ^ k , i μ i ) .
Crucially, D ks ( n ) is a random variable whose statistics, and in particular its bias and variance, strongly depend on the length of the Markov chain. Unfortunately, evaluating the convergence of any statistics provided by MCMC is difficult, due to the statistical dependency between successively drawn samples.
Following standard diagnostic approaches for MCMC samplers (e.g., the Gelman-Rubin convergence test [53,54]), we will run n c 10 independent (tempered) MCMC chains in parallel and pool all the resulting samples, after discarding the first 25% of every individual chain as burn-in (see Figure 7 as a visual aid). The pooled samples at iteration n of the multiple-chain MCMC (MC 3 ) algorithm are
S ¯ ( n ) = i = 1 n c S ¯ i ( n ) S ^ ( n ) = i = 1 n c S ^ i ( n )
where ∏ denotes the cartesian product. In the previous expression, the sample set from chain i (at ambient temperature) is
S ¯ i ( n ) = { μ ¯ i , n n ˜ + 1 , , μ ¯ i , n } S ^ i ( n ) = { μ ^ i , n n ˜ + 1 , , μ ^ i , n }
D ks ( n ) is now the pooled KS distance estimate provided by the MC 3 algorithm (we will keep the same notation for the sake of simplicity). Formally, we simply replace Equation (37) by
Π ¯ μ i | d = d ( n ) ( μ i ) : = 1 n c 1 n ˜ l = 1 n c k = n n ˜ + 1 n I ( μ i ; μ ¯ l , k , i μ i ) ,
and perform a similar operation to define the pooled corrected empirical distribution.
The independence of the n c MCMC chains allows us to compute confidence intervals for D ks ( n ) by making use of the non-parametric bootstrap. This is done by resampling S ¯ ( n ) and S ^ ( n ) with replacement, generating bootstrap replicates of the pooled sample sets { S ¯ k ( n ) } k 1 n bs and { S ^ k ( n ) } k 1 n bs ,
S ¯ k ( n ) = i = B k S ¯ i ( n ) S ^ k ( n ) = i = B k S ^ i ( n ) ,
where B k 1 n c n bs is such that each element of this set is drawn uniformly over 1 n c and k varies between 1 and a large number n bs , typically set to 1000. For each replicate, statistics D ks ( n ) , denoted by D ks , k ( n ) , can be computed in a straightforward manner by using the bootstrap replicates of the pooled empirical distributions
Π ¯ μ i | d = d , k ( n ) ( μ i ) : = 1 n c 1 n ˜ μ ¯ i S ¯ k ( n ) I ( μ i ; μ ¯ i μ i ) ,
Π ^ μ i | d = d , k ( n ) ( μ i ) : = 1 n c 1 n ˜ μ ^ i S ^ k ( n ) I ( μ i ; μ ^ i μ i ) ,
which reads as
D ks , k ( n ) : = D ks ( π ^ μ i | d , k ( n ) ( μ i ) , π ¯ μ i | d , k ( n ) ( μ i ) ) = sup μ i ( Π ^ μ i | d , k ( n ) ( μ i ) Π ¯ μ i | d , k ( n ) ( μ i ) ) ,
Finally, the bootstrap confidence intervals are calculated by calculating the Xth and ( 100 X ) th bootstrap percentiles such that the Xth percentile reads as
q X ( n ) = Q X { D ks , k ( n ) } k 1 n bs median { D ks , k ( n ) } k 1 n bs + D ks ( n )
where Q X is an operator that extracts the Xth and Yth percentile of the set passed as argument.
It is important to understand that the derived bootstrap confidence interval stands for a chain of finite length n, and not for the asymptotic limit. In fact, estimate D ks ( n ) of D ks is strongly biased (upward for small asymptotic errors D ks and typically downward for large asymptotic errors), which is due to two factors:
  • the existence of the burn-in phase. For small n, each individual chain will be strongly affected by the initialisation of the chains. For small asymptotic errors, this can be expected to have a strong upward bias effect providing that the initialisation is disperse, which is often the case in practice (e.g., initialisation from the prior, sequential MC approaches with decreasing levels of noise). This can be visualised in Figure 7. Two incomplete chains running on the same probability density may be exploring completely different regions of space, yielding values of D ks that are large, even in the case where the corrected and uncorrected densities are close to one another.
  • the discrete evaluation of the KS distance itself, which generates an additional (upward) bias.
The variance of D ks ( n ) decreases with the number of chains of the MC 3 algorithm (which is not a free parameter as overall CPU cost increases linearly with n c ). Of course, bias and variance are both expected to decrease with the length n of the run.

5.5. Simultaneous Control of All Sources of Errors

We now make use of the CI derived for D ks ( n ) to construct an adaptive inverse problem solver that jointly controls the quality of the mesh, and that of the statistical evaluation of the posterior densities. The algorithm is as follows. Given a current mesh M i and number of iterations n of the Monte-Carlo solver, do:
  • If n = 0 , perform m 0 iterations of the MC 3 algorithm and set n n + m 0 .
  • Evaluate mesh convergence criterion C m . If this criterion is satisfied, exit the adaptation procedure;
  • If C m is not satisfied, evaluate statistical convergence criterion C s
    -
    If C s is satisfied, set i i + 1 , reinitialise the MC 3 algorithm and set n = 0 ;
    -
    otherwise perform m ( n ) iterations of the MC 3 algorithm and set n n + m ( n ) (m should be an exponentially increasing function).

5.5.1. Criterion C m

Convergence of the finite element-based Bayesian inverse problem is achieved when D ks γ , where γ is a numerical tolerance that will typically be chosen between 0.01 and 0.2 . As we only have Monte-Carlo estimates of D ks , we require instead that
C m : q 100 X ( n ) γ
Criterion C m is an indicator of the combined effect of the discretisation and statistical errors onto the posterior densities. The discretisation error is evaluated through the choice of measure D ks as reliability indicator, whilst the statistical error is taken into account by making use of the upper limit of the bootstrap confidence interval for D ks ( n ) . The risk of falsely detecting mesh convergence due to a high statistical error is small, due to the fact that D ks ( n ) is an upwardly biased estimate of D ks for small D ks . Confidence in the result can be increased through a loose Gelman-Rubin convergence test in order to eliminate the risk of stopping the MC 3 algorithm in its non-ergodic phase. However, this has proved to be unnecessary in our numerical tests. This is because C m is a rather strict criterion.

5.5.2. Criterion C s

The second criterion will help us determine whether mesh refinement is actually needed, or whether the statistical error is too large for use to take a robust decision regarding mesh refinement. The ideal criterion is D ks η , where η < γ . The obvious strategy that would require q X ( n ) η does not work. This is due to the previously explained upward bias of estimator D ks ( n ) , which would eventually lead to the spurious satisfaction C s , and consequently to systematic mesh refinement operations for low n count even when the mesh becomes fine enough for D ks to be well below target γ . In order to derive an appropriate criterion C s , we remark that whilst the ideal criterion D ks η cannot be statistically evaluated for general values of η , criterion D ks > 0 can be. More precisely, we propose to estimate the statistical error by evaluation whether, at current n, the corrected and uncorrected posterior distributions can be statistically distinguished.
We postulate the following null hypothesis: Hypothesis 0 (H0): the corrected and uncorrected posterior densities are identical.
The rejection of the null hypothesis will indicate that the two densities are significantly different. Now, the criterion for the need for mesh refinement becomes the following:
C s : Pr D ks ( n ) q Z ( n ) | H 0 ξ
The probability density of D ks ( n ) under null hypothesis H0 can be approximated by adapting the previously described bootstrap procedure. We will estimate the density of
D ks , 0 ( n ) : = D ks ( π ¯ μ i | d ( n ) , 1 ( μ i ) , π ¯ μ i | d ( n ) , 2 ( μ i ) ) = sup μ i ( Π ¯ μ i | d ( n ) , 1 ( μ i ) Π ¯ μ i | d ( n ) , 2 ( μ i ) )
where π ¯ μ i | d ( n ) , 1 and π ¯ μ i | d ( n ) , 2 are obtained, respectively, by pooling the result of two independent runs of n c Markov chains of lengths n and corresponding to the uncorrected finite element model only (one can equally choose the corrected one). Notice, to clarify the idea, that D ks , 0 ( n ) tends to 0 as n tends to infinity: this is a measure of the statistical error only. The desired density can be estimated by resampling S ¯ ( n ) twice, computing the KS distance between the two pooled sets of samples, and repeating the operation n bs times, which generates a sequence of real numbers { D ks , 0 , k ( n ) } . The replicated empirical cumulative distributions are
Π ¯ μ i | d = d , k ( n ) , j ( μ i ) : = 1 n c 1 n ˜ μ ¯ i S ¯ k ( n ) , j I ( μ i ; μ ¯ i μ i )
with the sampling sets
S ¯ k ( n ) , j = i = B k j S ¯ i ( n )
where j = 1 or j = 2 and B k 1 and B k 2 are sets of 1 n c n c constructed such that each element of these two sets is drawn uniformly over 1 n c . We can now extract the ( 100 × ( 1 ξ ) ) th percentile q Y , 0 ( n ) corresponding to p-value ξ , and evaluate whether q Y , 0 ( n ) q Y ( n ) . If so, test C s is true. If not, the statistical sampling error is too large to allow us to decide whether mesh refinement should be performed or not. In this case, and assuming that mesh convergence criterion C m is not satisfied, we need to continue sampling with MCMC.
Notice that the use of a larger percentile Z yields a less conservative indicator for the need for mesh refinement, which can be compensated by decreasing p-value ξ . The criterion that is arguably the easiest to interpret is Pr D ks ( n ) q 50 ( n ) | H 0 ξ , where q 50 ( n ) is directly the KS distance D ks ( n ) (i.e., computed without bootstrapping).

6. Numerical Examples—Part II. Automatised Error Control and Discussion

We now come back to the numerical examples introduced in Section 3 and produce three series of results, illustrating three difference aspects of the proposed approach.

6.1. MCMC Iterations Only When Needed

The results reported in Figure 8 correspond to Test 1. For now, we aim to control the quality of the posterior density of the first parameter μ 1 . Each of the graphs shows the evolution of various statistics of the corresponding KS distance as a function of the number of iteration n of the tempered MC 3 algorithm. The mesh used is displayed on the left-hand side of each of the graphs. The solid black line represents the evolution D ks ( n ) itself. The 5th and and 95th bootstrap percentiles of this quantity are also reported, in solid blue line and solid red line, respectively. We set convergence criterion C m such that γ = 0.2 and Y = 95 for the pure discretisation error. Therefore, convergence with respect to the finite element discretisation is obtained when, for sufficient long chains, the red line (upper limit of the bootstrap CI for D ks ( n ) ) is below γ .
In grey colours, we report evolution of D ks , 0 ( n ) for the coarse scale (light grey), consistently with the derivations of the previous section, but also, for reference, a similar statistics constructed with the corrected finite element model (dark grey). Both curves have similar evolutions and, of course, tend to 0 as n tends to infinity. We set Z = 5 , and η = 0.05 . This means that the statistical error is considered small enough once the blue line (lower limit of the bootstrap CI for D ks ( n ) ) is above the 95th percentile of D ks , 0 ( n ) . Here, “small enough” is to be understood, as small enough for a decision regarding the need to refine the mesh one step further to be taken.
The results are as follows. For the coarsest mesh, we detect a separation of the discretisation and statistical errors at iteration 60 (vertical solid red line). C s is satisfied. As C m is not satisfied, we can stop the MCMC sampler and refine the mesh. A similar behaviour is seen for mesh M 2 . For the last mesh, convergence is achieved after 530 iterations, C s never reaching satisfaction. The discretisation and sampling errors remain entangled, but the sum of them is below the desired target.
We can see here that the proposed algorithm allows us to stop the MCMC sampler early, moving directly to the level of mesh refinement that will yield the desired quality of the posterior densities.

6.2. Goal-Oriented Error Control

The second set of results correspond to Test 1, still. However, we now monitor the convergence of the posterior predictive density of the fourth eigenvalue. This is reported in the top two graphs of Figure 9. For M 1 Early-stopping is performed at iteration 40 of the MCMC sampler. For M 2 , global convergence is achieved after 120 iterations, and this error keeps decreasing. The MCMC is allowed to continue generating samples, meaning that the discretisation error is actually a lot smaller than the requested tolerance γ . It is interesting to see that the convergence is much faster than for the first parameter, whose convergence was studied in the previous subsection. This shows that for the same inverse problem, the algorithm may spend more or less resources depending on the engineering quantity of interest.

6.3. Uncertainty-Driven Error Control

Finally, the last set of results concerns Test 2. Here, remember that the three first eigenvalues are used as measurements. We control the convergence of the fourth eigenvalue, which was controlled in the context of Test 1 in the previous subsection. We see here that for mesh M 2 , the posterior density of the fourth eigenvalue is still far from convergence, whilst it was evaluated in a very precise way in Test 1. This shows that the proposed error control algorithm automatically adapts to the level of posterior uncertainty. Qualitatively, a wide posterior density is associated with a high uncertainty concerning the value of QoIs. Consequently, the mesh does not need to be very refined to capture the posterior density correctly. Conversely, for Test 2, the posterior density is sharper, and similar levels of discretisation errors have a much stronger impact on the KS distance.

7. Concluding Remarks and Discussion

We have presented a methodology to control the various sources of errors arising in finite-element based Bayesian inverse problems. We have focussed on a simple numerical approximation chain consisting of (i) a finite element discretisation of the continuum mechanics problem and (ii) a MCMC solver to draw samples from posterior density distributions. So far we did not consider further error sources such as that engendered by meta-modelling. We have showed that it was possible to drive the mesh refinement process in a goal-oriented manner, by quantifying the impact of the associated error onto posterior density distributions. In order to do so, we run two independent MCMC simultaneously, one of them using a corrected likelihood function that takes the discretisation error. Any a posteriori error estimate available for the PDE under consideration may be used to obtain this correction. Of course, the accuracy (effectivity) of the chosen error estimate will impact the accuracy of the methodology developed in this paper. The study and control of this effect will require further research to be carried out,
We have shown that by using multiple replicates of the component-wise MCMC, it was also possible to (i) derive bootstrap-based confidence intervals for the error in posterior densities engendered by the spatial discretisation of the underlying PDE. Importantly, this approach allows us to stop the MCMC iteration as soon as we can be sufficiently confident in the fact that the posterior densities obtained with the current mesh are not accurate enough for our purpose. The approach is goal-oriented: the adaptivity is performed in the sense of the posterior distribution of either some of the latent parameters, or in the sense of the predictive posterior density of engineering QoIs. Finally, we have shown that the approach is uncertainty driven: it will only refine the mesh and/or request additional samples to be drawn by the MCMC algorithm if the effect on the QoIs can be felt when measured using a statistical distance between their posterior densities. In particular, model updating problems with wider priors tend to require less computational effort than parameter estimation problems with very rich observed data and/or narrow prior densities. The proposed methodology establishes a bridge between the field of reliability estimation for the sampling-based algorithms used to solve probabilistic inverse problems, and the field of deterministic, goal-oriented finite element error estimation.
Although the results presented in this paper are encouraging, the proposed approach has shortcomings that need to be addressed in future research work. The error estimation procedure is relatively wasteful for two reasons. Firstly, multiple MCMC algorithms need to run independently for bootstrapping to be possible. These chains all exhibit their own burn-in phase, which increases the amount of discarded samples. Secondly, the finite element error estimation procedure is not merely a post-processing operation any longer; one cannot post-process the finite element results corresponding to the uncorrected chain in order to compute the corrected likelihood. This is because both chains are independent and require running the coarse finite element simulations at different points of the parameter domain. Finally, let us acknowledge that Bayesian finite element inverse problems should not be solved by a MCMC solver without constructing a surrogate model first. The number of finite element computations involved, even if the proposal distribution is well designed, is in the tens of thousands. We are currently investigating the use of Polynomial Chaos surrogates, which adds another layer of numerical approximation that needs to be controlled in a robust and efficient manner. In this context, an elegant approach to separate the sources of errors (i.e., Finite Element discretisation error and Polynomial Chaos error) may be found in [55], which may constitute a solid starting point for the next step of our investigations.

Author Contributions

Methodology, P.K., A.K. and S.C.; Formal Analysis, P.K., A.K. and S.C.; Writing—Original Draft Preparation, P.K.; Writing—Review & Editing, A.K. and S.C.

Funding

We acknowledge the support provided by the Welsh Government and Higher Education Funding Council for Wales through the Sêr Cymru National Research Network in Advanced Engineering and Materials.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Nagel, J.B.; Sudret, B. A Unified Framework for Multilevel Uncertainty Quantification in Bayesian Inverse Problems. Probab. Eng. Mech. 2016, 43, 68–84. [Google Scholar] [CrossRef]
  2. Beck, J.L.; Au, S.K. Bayesian Updating of Structural Models and Reliability Using Markov Chain Monte Carlo Simulation. J. Eng. Mech. 2002, 128, 380–391. [Google Scholar] [CrossRef]
  3. Chen, P.; Schwab, C. Sparse-Grid, Reduced-Basis Bayesian Inversion: Nonaffine-Parametric Nonlinear Equations. J. Comput. Phys. 2016, 316, 470–503. [Google Scholar] [CrossRef]
  4. Cotter, S.L.; Dashti, M.; Stuart, A.M. Approximation of Bayesian Inverse Problems for PDEs. SIAM J. Numer. Anal. 2010, 48, 322–345. [Google Scholar] [CrossRef]
  5. Ainsworth, M.; Oden, J. A Posteriori Error Estimation in Finite Element Analysis; Wiley: Chichester, UK, 2000. [Google Scholar]
  6. Ladevèze, P.; Pelle, J.P. Mastering Calculations in Linear and Non Linear Mechanics; Springer: New York, NY, USA, 2004. [Google Scholar]
  7. Díez, P.; Parés, N.; Huerta, A. Encyclopedia of Aerospace Engineering; Chapter Error Estimation and Quality Control; Wiley: Chichester, UK, 2010. [Google Scholar]
  8. Oden, J.T.; Prudhomme, S.; Oden, J.T.; Prudhomme, S. Goal-Oriented Error Estimation and Adaptivity for the Finite Element Method. Comput. Methods Appl. Mech. Eng. 1999, 41, 735–756. [Google Scholar] [CrossRef]
  9. Cirak, F.; Ramm, E. A Posteriori Error Estimation and Adaptivity for Elastoplasticity Using the Reciprocal Theorem. Int. J. Numer. Methods Eng. 2000, 47, 379–393. [Google Scholar] [CrossRef]
  10. Strouboulis, T.; Babuŝka, I.; Datta, D.K.; Copps, K.; Gangaraj, S.K. A Posteriori Estimation and Adaptive Control of the Error in the Quantity of Interest. Part I: A Posteriori Estimation of the Error in the von Mises Stress and the Stress Intensity Factor. Comput. Methods Appl. Mech. Eng. 2000, 181, 261–294. [Google Scholar] [CrossRef]
  11. Becker, R.; Rannacher, R. An Optimal Control Approach to a Posteriori Error Estimation in Finite Element Methods. Acta Numer. 2001, 10, 1–102. [Google Scholar] [CrossRef]
  12. González-Estrada, O.; Nadal, E.; Ródenas, J.; Kerfriden, P.; Bordas, S.A.; Fuenmayor, F. Mesh Adaptivity Driven by Goal-Oriented Locally Equilibrated Superconvergent Patch Recovery. Comput. Mech. 2013, 53, 957–976. [Google Scholar] [CrossRef]
  13. Jensen, H.; Esse, C.; Araya, V.; Papadimitriou, C. Implementation of an Adaptive Meta-Model for Bayesian Finite Element Model Updating in Time Domain. Reliab. Eng. Syst. Saf. 2017, 160, 174–190. [Google Scholar] [CrossRef]
  14. Au, S.K.; Zhang, F.L. Fundamental Two-Stage Formulation for Bayesian System Identification, Part I: General Theory. Mech. Syst. Signal Process. 2016, 66–67, 31–42. [Google Scholar] [CrossRef]
  15. Zhang, F.L.; Au, S.K. Fundamental Two-Stage Formulation for Bayesian System Identification, Part II: Application to Ambient Vibration Data. Mech. Syst. Signal Process. 2016, 66–67, 43–61. [Google Scholar] [CrossRef]
  16. Babǔska, I.; Tempone, R.; Zouraris, G.E. Galerkin Finite Element Approximations of Stochastic Elliptic Partial Differential Equations. SIAM J. Numer. Anal. 2004, 42, 800–825. [Google Scholar] [CrossRef]
  17. Ghanem, R.G.; Spanos, P.D. Stochastic Finite Elements: A Spectral Approach; Dover Publications: New York, NY, USA, 2003. [Google Scholar]
  18. Nouy, A. Recent Developments in Spectral Stochastic Methods for the Numerical Solution Ofstochastic Partial Differential Equations. Arch. Comput. Methods Eng. 2009, 16, 251–285. [Google Scholar] [CrossRef]
  19. Kundu, A.; DiazDelaO, F.; Adhikari, S.; Friswell, M. A Hybrid Spectral and Metamodeling Approach for the Stochastic Finite Element Analysis of Structural Dynamic Systems. Comput. Methods Appl. Mech. Eng. 2014, 270, 201–219. [Google Scholar] [CrossRef]
  20. Ganapathysubramanian, B.; Zabaras, N. Sparse Grid Collocation Schemes for Stochastic Natural Convection Problems. J. Comput. Phys. 2007, 225, 652–685. [Google Scholar] [CrossRef]
  21. Foo, J.; Karniadakis, G.E. Multi-Element Probabilistic Collocation Method in High Dimensions. J. Comput. Phys. 2010, 229, 1536–1557. [Google Scholar] [CrossRef]
  22. Pradlwarter, H.J.; Schuëller, G.I. On Advanced Monte Carlo Simulation Procedures in Stochastic Structural Dynamics. Int. J. Non-Linear Mech. 1997, 32, 735–744. [Google Scholar] [CrossRef]
  23. Yamazaki, F.; Shinozuka, M. Digital Generation of Non-Gaussian Stochastic Fields. J. Eng. Mech. 1988, 114, 1183–1197. [Google Scholar] [CrossRef]
  24. Au, S.K.; Beck, J.L. A New Adaptive Importance Sampling Scheme for Reliability Calculations. Struct. Saf. 1999, 21, 135–158. [Google Scholar] [CrossRef]
  25. Rosenblueth, E. Point Estimates for Probability Moments. Proc. Natl. Acad. Sci. USA 1975, 72, 3812–3814. [Google Scholar] [CrossRef] [PubMed]
  26. Christian, J.T.; Baecher, G.B. The Point-estimate Method with Large Numbers of Variables. Int. J. Numer. Anal. Methods Geomech. 2002, 26, 1515–1529. [Google Scholar] [CrossRef]
  27. Julier, S.J. The Scaled Unscented Transformation. In Proceedings of the 2002 American Control Conference, Anchorage, AK, USA, 8–10 May 2002; Volume 6, pp. 4555–4559. [Google Scholar]
  28. Yuen, K.V.; Kuok, S.C. Bayesian Methods for Updating Dynamic Models. Appl. Mech. Rev. 2011, 64, 010802. [Google Scholar] [CrossRef]
  29. Cui, T.; Marzouk, Y.; Willcox, K. Scalable Posterior Approximations for Large-Scale Bayesian Inverse Problems via Likelihood-Informed Parameter and State Reduction. J. Comput. Phys. 2016, 315, 363–387. [Google Scholar] [CrossRef]
  30. Kundu, A.; Matthies, H.G.; Friswell, M.I. Probabilistic optimization of engineering system with prescribed target design in a reduced parameter space. Comput. Methods Appl. Mech. Eng. 2018, 337, 281–304. [Google Scholar] [CrossRef]
  31. Schillings, C.; Schwab, C. Sparse, Adaptive Smolyak Quadratures for Bayesian Inverse Problems. Inverse Probl. 2013, 29, 065011. [Google Scholar] [CrossRef]
  32. Chen, P.; Schwab, C. Adaptive Sparse Grid Model Order Reduction for Fast Bayesian Estimation and Inversion. In Sparse Grids and Applications—Stuttgart 2014; Garcke, J., Pflüger, D., Eds.; Springer International Publishing: Cham, Switzerland, 2016; Volume 109, pp. 1–27. [Google Scholar]
  33. Mattis, S.A.; Wohlmuth, B. Goal-Oriented Adaptive Surrogate Construction for Stochastic Inversion. Comput. Methods Appl. Mech. Eng. 2018, 339, 36–60. [Google Scholar] [CrossRef]
  34. Pares, N.; Diez, P.; Huerta, A. Subdomain-Based Flux-Free a Posteriori Error Estimators. Comput. Methods Appl. Mech. Eng. 2006, 195, 297–323. [Google Scholar] [CrossRef]
  35. Drohmann, M.; Carlberg, K. The ROMES Method for Statistical Modeling of Reduced-Order-Model Error. SIAM/ASA J. Uncertain. Quantif. 2015, 3, 116–145. [Google Scholar] [CrossRef]
  36. Paul-Dubois-Taine, A.; Amsallem, D. An Adaptive and Efficient Greedy Procedure for the Optimal Training of Parametric Reduced-Order Models. Int. J. Numer. Methods Eng. 2014, 102, 1262–1292. [Google Scholar] [CrossRef]
  37. Goury, O.; Amsallem, D.; Bordas, S.P.A.; Liu, W.K.; Kerfriden, P. Automatised Selection of Load Paths to Construct Reduced-Order Models in Computational Damage Micromechanics: From Dissipation-Driven Random Selection to Bayesian Optimization. Comput. Mech. 2016, 58, 213–234. [Google Scholar] [CrossRef]
  38. Trehan, S.; Carlberg, K.T.; Durlofsky, L.J. Error Modeling for Surrogates of Dynamical Systems Using Machine Learning. Int. J. Numer. Methods Eng. 2017, 112, 1801–1827. [Google Scholar] [CrossRef]
  39. Ciarlet, P.G. The Finite Element Method for Elliptic Problems; SIAM: Philadelphia, PA, USA, 1978. [Google Scholar]
  40. Haario, H.; Laine, M.; Mira, A.; Saksman, E. DRAM: Efficient Adaptive MCMC. Stat. Comput. 2006, 16, 339–354. [Google Scholar] [CrossRef]
  41. Andrieu, C.; Thoms, J. A Tutorial on Adaptive MCMC. Stat. Comput. 2008, 18, 343–373. [Google Scholar] [CrossRef]
  42. Cheung, S.H.; Beck, J.L. Bayesian Model Updating Using Hybrid Monte Carlo Simulation with Application to Structural Dynamic Models with Many Uncertain Parameters. J. Eng. Mech. 2009, 135, 243–255. [Google Scholar] [CrossRef]
  43. Geyer, C.J. Markov Chain Monte Carlo Maximum Likelihood; Interface Foundation of North America: Fairfax Station, VA, USA, 1991. [Google Scholar]
  44. Neal, R.M. Sampling from Multimodal Distributions Using Tempered Transitions. Stat. Comput. 1996, 6, 353–366. [Google Scholar] [CrossRef]
  45. Lam, H.F.; Yang, J.H.; Au, S.K. Markov Chain Monte Carlo-based Bayesian Method for Structural Model Updating and Damage Detection. Struct. Control Health Monit. 2018, 25, e2140. [Google Scholar] [CrossRef]
  46. Zienkiewicz, O.C.; Zhu, J.Z. A Simple Error Estimator and Adaptive Procedure for Practical Engineerng Analysis. Int. J. Numer. Methods Eng. 1987, 24, 337–357. [Google Scholar] [CrossRef]
  47. Prud’homme, C.; Rovas, D.V.; Veroy, K.; Machiels, L.; Maday, Y.; Patera, A.T.; Turinici, G. Reliable Real-Time Solution of Parametrized Partial Differential Equations: Reduced-Basis Output Bound Methods. J. Fluids Eng. 2002, 124, 70–80. [Google Scholar] [CrossRef]
  48. Ryckelynck, D.; Benziane, D.M. Multi-Level a Priori Hyper-Reduction of Mechanical Models Involving Internal Variables. Comput. Methods Appl. Mech. Eng. 2010, 199, 1134–1142. [Google Scholar] [CrossRef]
  49. Carlberg, K.; Farhat, C.; Cortial, J.; Amsallem, D. The GNAT Method for Nonlinear Model Reduction: Effective Implementation and Application to Computational Fluid Dynamics and Turbulent Flows. J. Comput. Phys. 2013, 242, 623–647. [Google Scholar] [CrossRef]
  50. Kerfriden, P.; Ródenas, J.J.; Bordas, S.P.A. Certification of Projection-Based Reduced Order Modelling in Computational Homogenisation by the Constitutive Relation Error. Int. J. Numer. Methods Eng. 2014, 97, 395–422. [Google Scholar] [CrossRef]
  51. Cui, T.; Marzouk, Y.M.; Willcox, K.E. Data-Driven Model Reduction for the Bayesian Solution of Inverse Problems. Int. J. Numer. Methods Eng. 2015, 102, 966–990. [Google Scholar] [CrossRef]
  52. Hoang, K.; Kerfriden, P.; Bordas, S. A Fast, Certified and “Tuning Free” Two-Field Reduced Basis Method for the Metamodelling of Affinely-Parametrised Elasticity Problems. Comput. Methods Appl. Mech. Eng. 2016, 298, 121–158. [Google Scholar] [CrossRef]
  53. Gelman, A.; Rubin, D.B. Inference from Iterative Simulation Using Multiple Sequences. Stat. Sci. 1992, 7, 457–472. [Google Scholar] [CrossRef]
  54. Brooks, S.P.; Gelman, A. General Methods for Monitoring Convergence of Iterative Simulations. J. Comput. Graph. Stat. 1998, 7, 434–455. [Google Scholar]
  55. Chamoin, L.; Florentin, E.; Pavot, S.; Visseq, V. Robust Goal-Oriented Error Estimation Based on the Constitutive Relation Error for Stochastic Problems. Comput. Struct. 2012, 106, 189–195. [Google Scholar] [CrossRef]
Figure 1. (a) Structure undergoing structural vibrations. The grey area have a known, deterministic Young modulus. This property is piecewise constant in the coloured areas, and Gaussian distributed. (b) The corresponding marginal probability density function for two of the 6 parameters is represented in the top-left corner. (c) Standard Monte-Carlo finite element process is employed to compute the distribution of the 10 first free vibration frequencies and represent the marginal distribution of each of these quantities.
Figure 1. (a) Structure undergoing structural vibrations. The grey area have a known, deterministic Young modulus. This property is piecewise constant in the coloured areas, and Gaussian distributed. (b) The corresponding marginal probability density function for two of the 6 parameters is represented in the top-left corner. (c) Standard Monte-Carlo finite element process is employed to compute the distribution of the 10 first free vibration frequencies and represent the marginal distribution of each of these quantities.
Materials 12 00642 g001
Figure 2. Hierarchy of computational meshes used in the study. We aim at selecting the coarsest mesh that delivers the targeted distributions up to a user-defined numerical accuracy. The second line of meshes is only used for error estimation purposes.
Figure 2. Hierarchy of computational meshes used in the study. We aim at selecting the coarsest mesh that delivers the targeted distributions up to a user-defined numerical accuracy. The second line of meshes is only used for error estimation purposes.
Materials 12 00642 g002
Figure 3. (a) Joint posterior probability distribution for two of the 6 unknown elastic modulii and (c) marginal posterior predictive distributions of the first 10 free vibration frequencies. The data set consists of a noisy measurement of the first frequency only, resulting in fat posterior densities. (b) Posterior distribution of the two unknown elastic modulii when the first 3 vibration frequencies are used as dataset, which results as in much thinner posterior densities. (d) Corresponding posterior predictive densities.
Figure 3. (a) Joint posterior probability distribution for two of the 6 unknown elastic modulii and (c) marginal posterior predictive distributions of the first 10 free vibration frequencies. The data set consists of a noisy measurement of the first frequency only, resulting in fat posterior densities. (b) Posterior distribution of the two unknown elastic modulii when the first 3 vibration frequencies are used as dataset, which results as in much thinner posterior densities. (d) Corresponding posterior predictive densities.
Materials 12 00642 g003
Figure 4. Evolution of the joint posterior density of two of the unknown elasticity parameters as the computational mesh is progressively refined from M 1 to M 4 .
Figure 4. Evolution of the joint posterior density of two of the unknown elasticity parameters as the computational mesh is progressively refined from M 1 to M 4 .
Materials 12 00642 g004
Figure 5. Three independent MCMC chains sampling a multivariate Gaussian distribution.
Figure 5. Three independent MCMC chains sampling a multivariate Gaussian distribution.
Materials 12 00642 g005
Figure 6. Machine-learning-based error estimation procedure. The overkill error (difference between the current mesh and a much finer mesh as shown in (a)) is postulated as being an unknown function of an error estimate. The inexpensive error estimate is the distance between the results obtained when using the current mesh and those obtained when using a slightly refined mesh as shown in. (b) The statistical learning is done offline via a Monte-Carlo sampling of the true error (using the prior as sampling density) and the adaptive fitting of a Neural Network regression. The regression is bootstrapped to provide enhanced stability, and derive confidence interval estimates for the overall error estimation procedure which is shown in (c). Both the size of the dataset and the hyperparameters of the network are found automatically via a Greedy process that is not detailed in this paper.
Figure 6. Machine-learning-based error estimation procedure. The overkill error (difference between the current mesh and a much finer mesh as shown in (a)) is postulated as being an unknown function of an error estimate. The inexpensive error estimate is the distance between the results obtained when using the current mesh and those obtained when using a slightly refined mesh as shown in. (b) The statistical learning is done offline via a Monte-Carlo sampling of the true error (using the prior as sampling density) and the adaptive fitting of a Neural Network regression. The regression is bootstrapped to provide enhanced stability, and derive confidence interval estimates for the overall error estimation procedure which is shown in (c). Both the size of the dataset and the hyperparameters of the network are found automatically via a Greedy process that is not detailed in this paper.
Materials 12 00642 g006
Figure 7. Multiple MCMC routines were run in parallel, for both the coarse likelihood function (coarse mesh) as shown in (a) and the enhanced likelihood function obtained after correction by the goal-oriented finite element error estimate, as shown in (b). The results are pooled, and will be boostrapped to derive confidence intervals for the summary statistics of the posterior densities.
Figure 7. Multiple MCMC routines were run in parallel, for both the coarse likelihood function (coarse mesh) as shown in (a) and the enhanced likelihood function obtained after correction by the goal-oriented finite element error estimate, as shown in (b). The results are pooled, and will be boostrapped to derive confidence intervals for the summary statistics of the posterior densities.
Materials 12 00642 g007
Figure 8. Convergence of the KS distance as a function of the number of Monte-Carlo samples in each chain (black line, with confidence interval represented by the area between the blue and red lines). The mesh used is represented schematically on the left-hand side of each of the graphs in (ac). The data is composed of a unique noisy measurement of the first eigenfrequency. The quantity of interest is here the first elasticity parameter, in log scale. The grey lines are an indicator of the bias of the KS distance estimate, which decreases with the sample size. The bias should be be small for the estimate to be robust, but not too small so as to avoid running unnecessarily long chains using meshes that are too coarse for our needs. Sufficiently accurate results are obtained with the third mesh. Importantly, very few MCMC iterations are needed to determine that the two first meshes are too coarse for our needs.
Figure 8. Convergence of the KS distance as a function of the number of Monte-Carlo samples in each chain (black line, with confidence interval represented by the area between the blue and red lines). The mesh used is represented schematically on the left-hand side of each of the graphs in (ac). The data is composed of a unique noisy measurement of the first eigenfrequency. The quantity of interest is here the first elasticity parameter, in log scale. The grey lines are an indicator of the bias of the KS distance estimate, which decreases with the sample size. The bias should be be small for the estimate to be robust, but not too small so as to avoid running unnecessarily long chains using meshes that are too coarse for our needs. Sufficiently accurate results are obtained with the third mesh. Importantly, very few MCMC iterations are needed to determine that the two first meshes are too coarse for our needs.
Materials 12 00642 g008
Figure 9. Convergence of the KS distance as a function of the number of Monte-Carlo samples in each chain. The (a,b) correspond to the inverse problem where only the first eigenfrequency is observed. (c,d) corresponds to the case where the first three eigenfrequencies are observed. The quantity of interest is the fourth eigenvalue, which is unobserved in both cases and must be inferred. The corresponding posterior density is illustrated above the mesh pictogram. Interestingly, the data-poor problem (a,b) can be solved appropriately with a coarse mesh, while the data-rich case (c,d) requires a much finer grid to be solved accurately.
Figure 9. Convergence of the KS distance as a function of the number of Monte-Carlo samples in each chain. The (a,b) correspond to the inverse problem where only the first eigenfrequency is observed. (c,d) corresponds to the case where the first three eigenfrequencies are observed. The quantity of interest is the fourth eigenvalue, which is unobserved in both cases and must be inferred. The corresponding posterior density is illustrated above the mesh pictogram. Interestingly, the data-poor problem (a,b) can be solved appropriately with a coarse mesh, while the data-rich case (c,d) requires a much finer grid to be solved accurately.
Materials 12 00642 g009

Share and Cite

MDPI and ACS Style

Kerfriden, P.; Kundu, A.; Claus, S. Adaptivity in Bayesian Inverse Finite Element Problems: Learning and Simultaneous Control of Discretisation and Sampling Errors. Materials 2019, 12, 642. https://doi.org/10.3390/ma12040642

AMA Style

Kerfriden P, Kundu A, Claus S. Adaptivity in Bayesian Inverse Finite Element Problems: Learning and Simultaneous Control of Discretisation and Sampling Errors. Materials. 2019; 12(4):642. https://doi.org/10.3390/ma12040642

Chicago/Turabian Style

Kerfriden, Pierre, Abhishek Kundu, and Susanne Claus. 2019. "Adaptivity in Bayesian Inverse Finite Element Problems: Learning and Simultaneous Control of Discretisation and Sampling Errors" Materials 12, no. 4: 642. https://doi.org/10.3390/ma12040642

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop