Uncertainty Quantification at the Molecular – Continuum Model Interface †

Non-equilibrium molecular dynamics simulations are widely employed to study transport fluid properties. Observables measured at the atomistic level can serve as inputs for continuum calculations, allowing for improved analysis of phenomena involving multiple scales. In hybrid modelling, uncertainties present in the information transferred across scales can have a significant impact on the final predictions. This work shows the influence of force-field variability on molecular measurements of the shear viscosity of water. In addition, the uncertainty propagation is demonstrated by quantifying the sensitivity of continuum velocity distribution to the particle-based calculations. The uncertainty is modelled with polynomial chaos expansion using a non-intrusive spectral projection strategy. The analysis confirms that low-order polynomial basis are sufficient to calculate the dispersion of observables.


Introduction
Recent years have seen a large number of publications devoted to atomistic-continuum hybrid modelling (for a review, see Teschner et al. [1]).Emerging strategies which combine disparate length and time-scales allow us to tackle numerical limitations and enable better understating of complex physical and biological processes.High-fidelity simulations that aim to accurately describe the fluid flow behaviour are beginning to take advantage of particle-based methods for analysing system properties or detailed physics at interfaces while modelling the bulk of the domain with an efficient Navier-Stokes solver.A major challenge in hybrid modelling is the quality of the information exchanged between domains, as well as the efficiency of the inter-scale communication.Molecular dynamics (MD) is a powerful tool for capturing phenomena at the atomic level [2][3][4][5], but the method relies on the accuracy of the potential (or force-field) with which the inter-atomic forces of a real system can be reproduced [6].A widely used model of molecular interactions is the Lennard-Jones (LJ) potential, defined for particles located at r i and r j as [7]: where r ij ≡ r ij , r ij = r i − r j and r c is a cut-off radius beyond which the interactions are neglected.The parameters of the LJ model, the depth of the potential well ( ), and the distance of zero inter-potential energy (σ) are estimated through quantum chemistry calculations or fitted to match the experimental data.Large discrepancies in developed force-fields are the main source of uncertainty, resulting in often different predictions; as stated by Ouyang and Bettens [8], the existence of over 40 water MD models indicates the challenge of reproducing the properties of real liquids.Sullivan [9] describes two common categories of uncertainty in physical and numerical experiments: aleatoric and epistemic.The first type refers to uncertainty due to the physical variability of a system (irreducible), while the latter arises from a lack of knowledge (reducible).The epistemic uncertainty can further be subdivided into model form uncertainty, where the correctness of the strategy is questioned, and parametric uncertainty caused by lack of confidence that the chosen values are the best to represent certain properties.A common objective of uncertainty quantification (UQ) is to analyse a stochastic model's response to variable inputs (forward propagation) or to determine the value of input parameters which correspond to a given observation (inverse problem).Traditionally, the sensitivity analysis of system parametrisation has been conducted using Monte Carlo sampling.Although very powerful, the convergence rate of the method is slow, O(M −1/2 ), where M is a number of realisations/trials, resulting in high computational cost [10].An alternative approach is based on spectral representations of observables allowing for higher efficiency of the analysis.Parametric uncertainty can be modelled with the polynomial chaos (PC) method [11], which in its generalised form (gPC) [10] describes an expansion of a random variable stochastic process over a probability space (Ω, F , µ) with respect to an appropriate orthogonal polynomial basis of L 2 (Ω, µ; R).Here Ω defines a sample space which contains all possible outcomes, while F is a σ-algebra (a set of events), and µ is the probability measure which defines event's likelihood.Given a functional representation of quantities of interest (QoI), statistical post-processing is then straightforward.In addition, the higher the regularity of the approximated function, the faster the convergence of the truncated expansion.
It is worth noting that there are other surrogate-based methods.The approach used depends not only on computational resources or the problem specifications, but also on particular expertise or background.While polynomial chaos is a popular choice in the applied mathematics and engineering community, different techniques have gained recognition in other fields; for example, Gaussian process emulator [12] or Bayes linear surrogate [13] are principally used in statistics, while a neural network approach [14] is becoming popular among physicists.In our work, we focused on incorporating a polynomial chaos solution in order to validate the results with published studies in the field of fluid mechanics.However, we plan in future to explore different surrogate models for uncertainty quantification analysis.We are particularly interested in a Gaussian process emulator which has recently been reported to outperform the polynomial-based approach [15,16].
Since the work of Xiu and Karniadakis [17], the use of polynomial chaos to represent the uncertainty of the input in Navier-Stokes flow simulations has been growing [18].However, the application of the probabilistic methodology to particle-based simulations-or atomistic-continuum modelling-is still recent [6].Rizzi et al. [19] exploited the polynomial chaos expansions and Bayesian inference [20] to develop a framework for sensitivity analysis of MD measures of diffusivity, density, and enthalpy to parameters of the TIP4P water model [21].Further research on the reliability of MD predictions involved the analysis of parametric uncertainty on the flux of ions through silica nanopores [22,23].Jacobson et al. [24] applied generalised polynomial chaos to assess the accuracy of a family of coarse-grained water models.A recent study by Salloum et al. [25] focused on a two-way coupling of uncertainty across scales with the use of surrogate models for computational efficiency.The studies showed the benefit of using a polynomial chaos expansion to validate the results and explore the parametric dependence of final predictions, enabling the development of improved fluid modelling.Mentioned works stressed that the main source of uncertainty in molecular dynamics was due to the atomistic force fields, with the intrinsic noise making only a minor contribution.However, filtering particle-based data can lead to more efficient information extraction and intra-scale communication; a review of various signal processing methods for molecular simulations can be found in Zimo ń et al. [26,27].
In this work, a non-intrusive strategy of polynomial chaos is used to determine the influence of uncertainty in Lennard-Jones potential on non-equilibrium MD-based viscosity calculation of water (TIP4P/2005 model [28]).Although the UQ analysis of this kinetic property is important for parametrising the forcefield, it has not yet been considered in the literature.In addition, the probability distribution of the viscosity is propagated through the simulation of classical Stokes flow in order to analyse the sensitivity of final velocity prediction.The overall study enables us to verify the previously reported results on polynomial chaos expansion for fluid simulations and contribute to the development of high-fidelity multi-scale modelling.This article is organised as follows.In Section 2, the theory of polynomial chaos is briefly outlined.In Section 3, the simulation set-up and the uncertainty quantification propagation in modelling components are described, followed by results and discussion.Concluding remarks are provided at the end.

Mathematical Background
Given the uncertain input-in this study the LJ parameter (ξ), where ξ has a certain probability measure µ on Ω ⊆ R, a real-valued random variable Y-the MD prediction of QoI can be represented as a PC expansion as follows: where the basis functions ψ k are orthogonal with respect to measure µ and {y k } ∞ k=0 is a set of PC coefficients.The study can be extended to the R n -valued random variable, Y = {Y 1 , . . ., Y n }, where with y k = {y 1,k , . . ., y n,k } ∈ R n for each k ∈ N 0 ; dependence of QoI on multiple sources of uncertainty ξ = {ξ 1 , . . ., ξ α }, can also be considered.In practice, the series expansion is truncated after P + 1 = (α + p)!/α!/p! terms, where α defines the stochastic dimensionality and p is the order of the corresponding multivariate polynomials, Ψ k (ξ).Such representation (including the truncated series) allows for straightforward stochastic post-processing; the measurement of the expected value of Y is simply its zeroth PC coefficient, E [Y] Ψ 0 Y = ∑ P k=0 y k Ψ 0 Ψ k = y 0 , while the variance is defined as Wiener [11] introduced PC expansions, where Hermite polynomials were used to model stochastic processes with Gaussian random variables.The extension to generalised PC developed by Xiu and Karniadakis [29] allows for non-Gaussian random processes.In the gPC approach, different orthogonal polynomials in the Askey scheme are incorporated for different distribution types.In this work, the abbreviation PC refers to a generalised formalism of Wiener's approach.
Considering its practical implementation, there are two main strategies for determining the expansion coefficients, y k , in Equation ( 2).The PC expression of random variables can be directly introduced to the model equations, and a Galerkin projection is then applied to obtain a stochastic solver [9,10]; hence, the approach is referred to as the intrusive spectral projection (ISP).Xiu [10] explains that although the Galerkin formalism offers the most accurate solutions, it is not suitable for highly complex models.It is also not recommended when the UQ analysis is performed on legacy codes, which are often not amenable to changes.The opposite situation is for non-intrusive methods which rely on individual realisations of Y to determine the response of the system to random inputs.In this approach, the deterministic solution does not need to be modified and can be evaluated at any desired point of the probability space (Ω, F , µ).If the normalisation constant, γ k = Ψ 2 k , is known, in non-intrusive spectral projection (NISP), the PC coefficients can be obtained by approximating the integral with quadrature rules.The realisations of Y are used to determine a surrogate model Ỹ for which where ỹk = 1 , while ξ (j) , w (j) are the prescribed nodes and their corresponding weights.If the normalisation constant, γ k , can be described by polynomials of degree less than or equal to (2Q − 1)/2, where Q is the number of quadrature nodes, then the PC coefficients can be exactly determined.Sullivan [9] states that if the dimensionality of vector ξ is small and Y(ξ) is relatively smooth, the Gaussian quadrature approximation (nodes at the roots of µ-orthogonal polynomials) is an optimal choice.In this work, the NISP is performed with the open-source tool Chaospy [30].

Viscosity Calculation with Non-Equilibrium Molecular Dynamics (NEMD)
Computational simulations play an important role in the interpretation of experimental results.In many studies, the transport coefficients are assumed to be constant, depending only on the temperature and density of the fluid, without considering local behaviour such as the dependence of shear viscosity on the velocity gradient [7].Achieving high accuracy and precision measurements of such properties is not a simple task.However, valuable knowledge of a material's rheological characteristics can be obtained with molecular dynamics.Particle-based modelling offers an ideal environment to analyse the viscous nature of a fluid without some of the constraints of experimental studies [31][32][33].
There are two main strategies for calculating transport coefficients: by performing non-equilibrium molecular dynamics or using the equilibrium Green-Kubo (GK) formalism [34].The main benefit of the first approach is its fast convergence, making it a popular choice for estimating the shear viscosity.In this work, a sensitivity analysis to the LJ potentials of NEMD calculations is undertaken.The results are obtained by simulating a Couette flow where a constant shear stress is induced in water confined between two parallel silicon walls by the constant motion of the upper boundary.All molecular simulations are performed using the mdFoam solver [35,36] developed in OpenFOAM [37].A popular TIP4P/2005 water model was chosen for the analysis, as it gives the best prediction of the viscosity for the complete range of temperatures.As previously mentioned, the value of the potential well is assumed to be uncertain, (ξ), where ξ ∼ U ( min , max ); the support of the distribution is based on the force-fields of other water models from the TIP4P family, min ≈ 0.84 × OO from TIP4P [38] and max ≈ 1.14 × OO from TIP4P/Ice [39], where OO = 1.287 × 10 −21 J and σ OO = 3.1589 Å are the values of the TIP4P/2005 model introduced by Abascal and Vega [28].The solid-liquid interaction is modelled following the common Lorentz-Berthelot mixing rule [40], based on the LJ potential of silicon from Ritos et al. [41].
The simulation details are described in Figure 1.The computational domain is divided into a measurement (bulk) zone where the continuum behaviour of the fluid is observed, and interfaces with non-homogeneities caused by the wall presence (note the water layering in Figure 1).The system is modelled as periodic in the xand z-directions for efficiency.A linear velocity response to the movement of the upper boundary develops in the measurement zone; the calculated shear stress, τ, and velocity gradient, δv/δy, are used to obtain the viscosity coefficient, η τ = η δv δy .
Averaged values of observables are measured in 300 horizontal bins of thickness 0.032 nm every 4000∆t to avoid redundancy, where the time-step is set to ∆t = 2 fs.The simulations are performed in a (canonical) NVT ensemble (a statistical mechanical ensemble with a fixed number of particles (N), volume (V), and system's temperature (T)); the temperature of the system is maintained at T = 298 K via the Berendsen thermostat [42].In order to avoid affecting the velocity calculations, the thermostat is applied only in the z-direction of the flow; the equipartition theorem ensures that each degree of freedom is close to the target temperature.In the equilibration stage, water density is obtained with a FADE controller [43].After the desired system has developed, the density controller is switched off and the top wall is set to move at v = 47.5 m/s until a steady-state is reached (high velocities reduce thermal fluctuations); the calculation of velocity takes place in the measurement-zone consisting of 100 bins.The applied shear stress is measured using the Irving-Kirkwood equation [44].

Non-Intrusive Uncertainty Quantification in Multi-Scale Modelling
As mentioned previously, the potential well parameter has been assigned a uniform probability distribution, (ξ) ∼ U ( min , max ).The sample points in probability space for which our MD model is evaluated are specified through p + 1-order Gaussian quadrature selected for given distribution.Having determined nodes, 1 , . . ., p+1 , and corresponding weights, we perform the MD simulations and use viscosity results to describe the stochastic response to the uncertain parameter.Polynomials of order p are computed using a three-term recurrence relation [30].We then use the basis, quadrature nodes, weights, and model samples to create the surrogate model, Ỹη .
The surrogate model, Ỹη , is used to construct a response probability density function (PDF).The cost of the procedure is very low in comparison with evaluating the PDF with the original model, Y η , which requires a full MD simulation.Ten thousand realisations are selected from the distributions and plugged into the surrogate model.An empirical histogram-or alternatively a kernel density estimate-can then be used to fit one of the standard distributions, which are subsequently applied in the macroscopic simulations.Two alternatives could be pursued at this stage.Firstly, the kernel density estimate can be directly used in constructing an orthogonal family for the spectral approximation at the macroscopic level.This approach removes the necessity of a fitting procedure.Another alternative is to use evaluated viscosities as nodes for the macroscopic response quadratures, but this procedure hides the multi-scale interface.
Step-by-step analysis gives more insight into the response propagation.
An estimated viscosity PDF is then passed onto the continuum calculation as a random input for NISP, η(ξ 2 ).In the continuum model, we use a Navier-Stokes solver (OpenFOAM) to obtain the steady-state laminar solution for a 2D-channel with fixed dimensions and flow rate.For the sake of demonstration, we focus on the spatial development of the center-line velocity profile.Uniform fixed-velocity inlet and no-slip on walls are given as boundary conditions.The center-line velocity profile will reach the fully-developed value more quickly for higher viscosities.For each streamwise point, x, a surrogate model, Ỹu (x), is obtained.
The procedure for propagating the uncertainty in this multi-scale framework is summarised in Figure 2.

Results and Discussion
In the previously mentioned work on UQ in molecular dynamics and multi-scale simulation, it was claimed that low-order polynomials are sufficient to model parametric uncertainty.Rizzi et al. [19] used Q = 7 quadrature nodes to compute the PC coefficients for density, enthalpy, and self-diffusivity of water with polynomials up to p = 6.Salloum et al. [25] showed that an expansion order p = 3 accurately quantifies the uncertainty in molecular dynamics calculation of water velocity.In our study, we consider three PC models of order p = {2, 3, . . ., 11} with Q = {3, 6, 12} nodes, correspondingly.In Figure 3a,b we present, respectively, mean and standard deviation for spectral projections of increasing order.These pictures verify that a low-order basis is sufficient for determining the stochastic response of the system.Clearly, the first-and second-order statistics measured with Q = 6 and Q = 12 are comparable and do not vary significantly for p ≤ Q − 1.However, polynomials of order p = 2 do not provide mean value close to the expected η = 8.525 × 10 −4 Pa•s (the value differs by approximately 4% from the experimental result provided by Harris and Woolf [45] and is within the error bounds of the TIP4P/2005 water model, 8% [46]) obtained with the TIP4P/2005 water model and the standard deviation differs largely from the other estimations.
Subsequently, a third-order polynomial is used to obtain the histogram and kernel density estimates (KDE) describing the viscosity, η. Figure 4 presents the PDF that was obtained for Q = 12 and p = 3 with a log-normal fit.Qualitatively similar histograms were obtained for other settings.The histogram exhibits high skewness: the bulk of the mass is situated just below the mean, and large values are vanishing relatively slowly (long tail).This shows that the viscosity estimate is sensitive to the LJ potential parameters.More specifically, a majority of the from the selected interval underestimates the viscosity, and a large fraction leads to a significant overestimate.
The observation in Figure 4 shows the misrepresentation of the system response in finite order statistics.Even though the mean agrees well with the established value (see Figure 3), the detailed picture provided by the PDF gives a less positive message.The values around the mean are rarely realised if the parameters are taken from the uniform distribution on the selected interval.
UQ analysis of mean  Finally, the identified log-normal distribution is used as an input to a computational fluid dynamics (CFD) analysis.Due to the relatively high skewness of the distribution, the right-most points were only weakly influencing the obtained results.As a result, a quadrature accuracy study was not pursued here, as the fifth order was already reaching machine precision for weights corresponding to larger nodes.
The width of the channel and the velocity at the inlet are fixed to unity, and therefore the center-line velocity varies between 1 and 1.5, with the latter being the fully-developed value.Viscosity affects the distance at which a fully developed value is reached: for high viscosity, the value is reached faster and vice versa.For the lowest viscosity, the channel length of 200 height units was not sufficient to obtain a fully developed velocity profile.
In Figure 5a,b, PDFs for center-line velocity are shown at different spatial positions.The construction follows the same procedure as the PDF for the viscosity, but this time one million samples are used in order to create a smooth transition region.In Figure 5a, each spatial position is normalised with respect to its maximum value to illustrate where the bulk of the mass is situated.
The PDFs derived from PC representations can be used to construct meaningful prediction intervals.As we have seen with the viscosity example, plotting mean and standard deviation as a measure of dispersion could be misleading if the distributions are asymmetrical or multi-modal.Having a PDF allows us to evaluate the relative likelihood of different values in the measured output.

Conclusions
In this study, we investigated uncertainty propagation at the interface between molecular and continuum fluid models.Following previous studies, we assume that the potential well coefficient of Lennard-Jones interaction is characterised by a random variable with a uniform distribution.We then characterise the output of the molecular dynamics in terms of standard probability distributions for viscosity and feed that as input to a standard Navier-Stokes solver.
Investigation of increasing quadrature nodes and polynomial order shows that sufficient resolution is reached with low-order bases.The considered polynomial chaos models with p ≥ 3 result in converged first and second order statistics.With the assumed distribution bounds, standard deviation of molecular observable is approximately 40% of the mean value.These results show how sensitive molecular calculations are to the force field coefficients.Moreover, the PDF of viscosity is asymmetric and highly skewed, suggesting that analysis based solely on an expected value and variance is inadequate.
The study emphasizes the benefits of constructing a surrogate model with an associated stochastic response.It allows an analytical approach to establish confidence in our numerical predictions.A non-intrusive spectral strategy enables this without high computational cost and the need to modify the simulation process.Its significance is particularly visible in hybrid modelling, where additional insight into parametric sensitivity may improve the understanding of multi-scale phenomena.

Figure 2 .
Figure 2. Workflow for uncertainty quantification in multi-scale simulation.QoI: quantity of interest.

Figure 5 .
Figure 5. (a) PDFs as a function of streamwise position.PDF for each x is normalised with respect to its maximum value; (b) PDFs for three selected positions.