1. Introduction
In the modeling of engineering problems, the governing equations usually consist of balance equations, such as the balance of mass, momentum and energy. These are known with absolute certainty. However, to obtain a tractable problem, parameterized constitutive equations are needed. These constitutive equations describe relations such as the ones between stress and strain in solid mechanics problems, stress and strain rate in fluid flow problems, and heat flow and temperature gradients in thermal problems. These constitutive equations are typically known with less certainty.
The selection and parameter estimation of constitutive laws is typically performed using well-defined lab experiments. In viscous flow problems, for example, we typically determine the constitutive model through rheological measurements, where we study the deformation of a soft solid or liquid in a well-defined flow. However, as engineering systems become more complex, the selection and parameter estimation of constitutive models becomes increasingly difficult. Bayesian inference is a powerful framework for addressing this issue. In Bayesian inference, model parameters are described probabilistically and are updated as new data become available [
1,
2], allowing for a systematic integration of prior knowledge and experimental data. Currently, Bayesian inference is still in its infancy in constitutive modeling, although it has recently seen an increase in interest. Some examples include determining the heat conductivity in a thermal problem [
3], rheological parameters in non-Newtonian fluid mechanics [
4,
5], and constitutive modeling in biomaterials [
6], elasto-viscoplastic materials [
7], and elastic materials [
8].
To infer the probability distribution of the parameters of a constitutive model, also referred to as the posterior, Markov chain Monte Carlo (MCMC) sampling methods are commonly used. Many different variants of MCMC are available, and researchers typically choose a sampling method based on pragmatic arguments such as software availability and experience. However, this might not always lead to an optimal choice in terms of sampler robustness, accuracy and efficiency.
A recent comparison of MCMC samplers based on a synthetically manufactured problem was presented by Allison and Dunkley [
9], who considered the number of likelihood evaluations as the primary indicator for the performance of the samplers. In this contribution, we study the performance of various Bayesian samplers for a physical problem with real experimental data, viz., a squeeze flow. This problem considers the viscous flow of a fluid compressed in between two parallel plates. We herein consider a Newtonian fluid, for which the squeeze flow problem has an analytical solution [
10]. We use the experimental data gathered with the tailored setup developed in Ref. [
4]. For our comparison, we investigate three MCMC samplers: Metropolis–Hastings (MH), Affine Invariant Stretch Move (AISM) and No-U-Turn Sampler (NUTS). We study the convergence of these samplers through the Kullback–Leibler (KL) divergence to monitor the statistical distance between the sampled and the ‘true’ posterior [
11].
This paper is structured as follows. In
Section 2, we discuss the squeeze flow model, the experiments, and our prior information about the model parameters. The sampling methods to be compared are introduced in
Section 3. The KL divergence is then discussed in
Section 4, after which the performance of the samplers in the context of the squeeze flow problem is studied in
Section 5. Conclusions are finally drawn in
Section 6.
3. Markov Chain Monte Carlo (MCMC) Samplers
Markov chain Monte Carlo (MCMC) samplers can be used to generate samples of the posterior distribution through stochastic simulation. The (statistics of the) model parameters can then be inferred from a simulated Markov chain [
1,
12], i.e., a stochastic process in which, given the present state, past and future states are independent [
12]. If a Markov chain, independently of its initial distribution, reaches a stage that can be represented by a specific distribution,
, and retains this distribution for all subsequent stages, we say that
is the limit distribution of the chain [
12]. In the Bayesian approach, the Markov chain is obtained in such a way that its limit distribution coincides with the posterior. More details about Markov chains for Bayesian inference can be found in the literature, e.g., Ref. [
12].
We consider herein three different MCMC samplers: Metropolis–Hastings (MH), Affine Invariant Stretch Move (AISM) and No-U-Turn Sampler (NUTS). We aim to understand their advantages and disadvantages for the inference of the constitutive behavior of the squeeze flow. The considered solvers, as illustrated in
Figure 3, are discussed in the remainder of this section.
3.1. Metropolis–Hastings (MH)
The first MCMC method we use is the Metropolis–Hastings (MH) algorithm [
13,
14]. We consider this algorithm because it is easy to implement and widely used for the estimation of parameters [
1,
2]. The MH algorithm starts with the definition of an initial guess for the vector of parameters. The parameter space is then explored by performing a random walk, in which the new candidate for the vector of parameters is sampled from a proposal distribution. This proposal is typically chosen to be a multivariate normal distribution, with its mean centered at the current state. The new candidate is then either accepted or rejected according to a probabilistic criterion. This procedure is repeated to generate the Markov chain.
The performance of the MH sampler is strongly influenced by the choice of the proposal covariance matrix, making careful selection a necessity. To objectively and optimally determine the proposal covariance, in the remainder of this work we employ the adaptive MCMC algorithm proposed by Haario et al. [
15].
3.2. Affine Invariant Stretch Move (AISM)
The Affine Invariant Stretch Move (AISM) method is based on the algorithm developed by Goodman and Weare [
16]. In this method, one uses multiple chains (also known as walkers) to explore the parameter space, where the number of walkers should be at least twice the number of parameters. The algorithm starts by defining an initial guess for each of the walkers, here sampled from the prior distribution. To explore the parameter space, the proposal for each walker (
k),
, is based on the current step,
, of another randomly chosen walker (
j) through linear extrapolation:
. The extrapolation factor,
Z, is sampled from a distribution with density
where the parameter
has a default value
, which is used in this work.
The number of walkers, to be selected by the user, can effect the performance of the sampler. To ensure objectivity of our comparison study, the sensitivity of the results to the number of walkers has been studied. The results of this sensitivity study convey that the results are rather insensitive to the number of samplers.
3.3. No-U-Turn Sampler (NUTS)
The No-U-Turn Sampler (NUTS) belongs to the class of Hamiltonian Monte Carlo (HMC) algorithms [
17], which explore the parameter space using information from the gradient of the posterior. An advantageous feature of the NUTS sampler is that it automatically adapts its step size while exploring the parameter space [
18], avoiding the need for the user to specify sampler parameters. The new candidate for the vector of parameters is selected according to the “No-U-Turn” criterion, which is based on the observation that the trajectory of the candidate often reverses direction when it encounters regions of high posterior density [
18]. The sampler terminates the trajectory when it detects that the new candidate is starting to turn back on itself, indicating that further exploration is unlikely to yield better samples [
18]. By continuously evaluating the gradient and the direction of the trajectory, NUTS can provide a more efficient exploration of the parameter space when compared to random-walk-based methods [
18] such as the MH algorithm.
From the three samplers considered in this work, NUTS is the only one which requires gradient information of the model. In this work, we obtain the gradient with the forward mode automatic differentiation method [
19]. This method allows for automatic differentiation of the considered analytical model. Although this increases the computational effort involved in the evaluation of the model, in terms of implementation no additional work is required.
4. Kullback–Leibler Divergence-Based Convergence Analysis
In this paper, we study the convergence behavior of the sampling algorithms discussed above. There are numerous possibilities for a convergence criterion for MCMC sampling methods, for example, the Gelman–Rubin statistic, the Geweke diagnostic, or the cross-correlation diagnostic [
20,
21,
22]. We herein study convergence based on the Kullback–Leibler (KL) divergence, which provides a rigorous definition for the statistical distance between two probability distributions.
Given two probability distributions for a continuous random vector
,
P and
Q, the KL divergence from
P to
Q is defined by
where
and
are the probability density functions for
P and
Q, respectively. In
Figure 4 (left), we illustrate the concept of the KL divergence for a scalar-valued continuous variable
by comparing two different probability density functions (
and
) to a reference distribution (
q). In the illustrated case, the KL divergence
is smaller than
. The KL divergence is non-negative and only zero when a distribution is identical to the reference distribution.
In practice, it is convenient to approximate the KL divergence (
6) through binning. Denoting the set of bin centers by
, the KL divergence can be computed as
where
and
are the probabilities assigned to bin
i for the distributions
P and
Q, respectively. Provided that the bins (both locations and sizes) are selected appropriately, the discrete KL divergence can be expected to resemble the divergence of the underlying continuous distributions; see
Figure 4 (right).
In the case of sampling methods, it is natural to consider a binned discrete representation of the sampled distribution. Therefore, the discrete version of the KL divergence (
7) will be considered in the remainder of this work. When we consider continuous distributions, we determine the bin probability as the product of the probability density in the bin center and the “volume” of the bin, and normalize the discrete distribution.
5. Comparison of the Samplers
With the squeeze flow problem and KL divergence measure introduced, the samplers can now be compared. In this section, we will first specify essential details regarding the comparison, after which we will present and discuss the comparison results.
5.1. Specification of the Comparison
To compare the sampling algorithms, we have to define a ‘true’ posterior used as the reference probability distribution. For this, we evaluate the posterior on a rectilinear grid in the parameter space, with the bounds for each parameter taken as
. The mean and standard deviation used for these bounds have been determined through overkill sampling using 409,600 samples (excluding the burn-in period), the results of which are listed in
Table 2.
To attain insight into the quality of the rectilinear reference grid, we have created three different grid structures, enclosed by . Each of the four parameter ranges is divided into 27, 9 or 3 cells, resulting in , and bins, respectively. In each bin, the posterior is evaluated by multiplying the likelihood and the prior and subsequently normalizing it. To compare the sampled posterior to the reference grid, we divide the number of samples within a bin and divide it by the total number of samples collected in all bins. In the exceptional case that samples fall outside the boundaries of the reference domain, these are not taken into account in the total number of samples.
5.2. Results of the Comparison
To provide a frame of reference for the KL divergence comparison, in
Figure 5, the divergence of the prior distribution from the reference posterior distribution based on
bins is shown (dot–dashed line). It is observed that the divergence of the posterior computed on the one-time coarsened grid (
bins) and on the two-times coarsened grid (
bins) is substantially smaller than that for the prior (dashed lines), indicating that the rectilinear grid approximation on the finest reference grid (
bins) provides a high-quality approximation of the posterior.
From
Figure 5, it is observed that all three samplers decrease the error monotonically, demonstrating the robustness of all three solvers. Asymptotically, the KL divergence is expected to approach a finite (but small) value corresponding to the divergence from the exact posterior to the reference grid with
bins. We observe that the MH sampler has the largest distance from the reference for a fixed value of the sample size. Despite the fact that the AISM sampler does not require additional information from the model, for the considered problem, this sampler explores the parameter space more efficiently than the MH sampler, despite the optimized selection of the MH proposal covariance. The NUTS results are observed to significantly outperform the AISM sampler. This conveys that the gradient information used by NUTS effectively improves the exploration of the parameter space.
Figure 5 also shows that for the relatively small number of parameters considered in this problem, the evaluation of the posterior on equidistant grids is very efficient. Specifically, the results obtained on the intermediate grid (6561 evaluations for
bins) outperforms all samplers for the maximum considered sample size (4.096 · 10
5 samples). We stress that this observation is very specific to the case of a small number of parameters [
23], and that for problems with more parameters, evaluation of the posterior on rectilinear grids becomes quickly intractable.
6. Conclusions
A comparison of three commonly used MCMC samplers is presented for the Bayesian uncertainty quantification of a Newtonian squeeze flow. The Bayesian inference of the model parameters is based on experimental data gathered through a tailored setup. The Kullback–Leibler (KL) divergence is used to study the convergence of the different samplers.
It is observed that the KL divergence decreases monotonically for all samplers, and that all samplers converge toward the exact posterior. The No U-Turn Sampler (NUTS) is observed to result in the best approximation of the posterior for a given sample size, which we attribute to the gradient information used by this sampler. The Affine Invariant Stretch Move (AISM) sampler, which does not require gradient information, is found to yield a substantial improvement in exploring the parameter space compared to the Metropolis–Hastings algorithm.
In this study, we have limited ourselves to the KL divergence as a measure for convergence. The comparison of this convergence measure to other criteria will be considered in future work. The comparison can also be improved further by taking the computational effort into consideration. This will, for example, take into account the increase in computational effort involved in evaluating gradients for the NUTS solver. Furthermore, extending the physical problem under consideration to more complex cases (with more model parameters) will provide insights regarding how such extensions affect the sampler comparison.