# Making Steppingstones out of Stumbling Blocks: A Bayesian Model Evidence Estimator with Application to Groundwater Transport Model Selection

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methodology

#### 2.1. Bayesian Model Averaging

#### 2.2. Path Sampling

#### 2.3. Importance Sampling

## 3. Gaussian Model Example

^{−8}for the 50-dimensional to 8.88 × 10

^{−16}for the 100-dimensional problem due to the presence of large parameter space with low-likelihood. Increasing the contribution of low-likelihood regions is equivalent to decreasing the informativeness of prior (either due to increasing the number of dimensions or using wider parameter space for each dimension). HM tends to ignore low-likelihood regions in the parameter space, which means that it will impose less plenty than it should on complex model with superfluous parameters. This can impact model ranking as shown in the groundwater transport model selection problem in the next section. On the other hand, the performance of TI, SS, and MOSS for the 50- and 100-dimensional problems is relatively stable with relative error of less than 1%. However, this comparison is based on a fine discretization of the sampling path with $K=50$, which is not computationally efficient.

## 4. Groundwater Transport Models with Different Complexity

#### 4.1. Problem Statement

^{3}L

^{−3}or L°], ${\rho}_{s}$ [ML

^{−3}] is the soil bulk density; and ${k}_{d}$ is a sorption partition coefficient [L

^{3}M

^{−1}]. For consistency with the original analysis we fix the retardation factor $R$ at 1.05. $C=c/{c}_{0}$ is the normalized solute concentration [-], where $c$ and ${c}_{0}$ are the effluent and influent tracer concentrations [ML

^{−3}], respectively. The dimensionless pulse duration $T=qt/\theta L=vt/L$ is solute pulse volume [-] such that $q=2.71$ cm d

^{−1}is the Darcy flux [LT

^{−1}], $\nu $ is the mean pore water velocity [LT

^{−1}], $t$ is time [T] and $L=71.6$ cm is the column length [L]. $X=x/L$ is the relative distance where $x$ is the distance from the inlet boundary. $P=vL/D=L/\lambda $ is the Peclet number [-], where $D$ is the dispersion coefficient [L

^{2}T

^{−1}] and $\lambda $ is the dispersivity [L]. The initial condition is $c(x,0)=0$. The inlet and outlet boundary conditions are described by [97]. At the inlet boundary, a pulse input of concentration ${c}_{0}$ is applied from $t=0-{t}_{0}$ with pulse duration ${t}_{0}=39.39$ days. Note that the dimensionless pulse duration ${T}_{0}$ is 2.84 pore volumes.

^{−1}] . Note that for $\beta =1$ and $\omega =0$, the physical non-equilibrium MIM model in Equation (31) reduces to the equilibrium ADE model in Equation (30). The analytical solutions for both ADE and MIM models are given in [97] and model simulations are implemented using CXTFIT/Excel [86,98]

^{−1}and $q=111$ cm d

^{−1}, respectively. The high Darcy flux $q=111$ cm d

^{−1}is a special case since equilibrium is not reached at high velocity, and thus the performance of ADE and MIM models will be very similar. At low velocity, the performance of ADE models will deteriorate as it does not account for non-equilibrium transport. Thus, we selected the experiment with $q=2.71$ cm d

^{−1}that represents the general case. Also, note that the fixed input model parameters are experimentally measured

#### 4.2. Reference Solution

^{−3}. Figure 2 shows that the likelihood space of the posterior is very peaked relative to the prior such that the average of log-likelihood ${y}_{k}$ (Equation (13)) varies from −462.2 at the prior to 60.9 at the posterior for ADE2 model. Unlike the previous problem discussed in Section 3, there is no true analytical solution of BMA as all the power posterior distributions are non-Gaussian as shown in Figure 2. To facilitate the comparison of different estimators we estimated the reference solutions of the four candidate models by sampling the prior using 5 million samples for each model and calculating the BME using an AM estimator (Equation (17)).

#### 4.3. Sampling Error

_{(ZS)}[60], and burn-in the first 5000 samples. This burn-in period is the default case for all the following examples. Figure 3a shows that the relative error of the SS-based BME fluctuates with both under and overestimation, and at a certain ${\beta}_{k}$ value the BME estimate drops. By zooming in using the log-scale (Figure 3b) this drop is around ${\beta}_{30}=0.018$. Although adding the next ${\beta}_{31}=0.02$ raised the relative error from about −37% to −23%, the error propagated such that the relative error of the last step ${\beta}_{100}=1$ (i.e., the SS estimate) is about −33.23 %. Similarly, errors in mean potential ${y}_{k}$ of TI (Equation (13)) resulting from MCMC sampling errors will accumulate through the integration (Equation (12)). Yet as the integration is not series of multiplications as SS (Equation (23)), TI underestimated the BME by −20.62%.

#### 4.4. Penalizing Model Complexity

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Chitsazan, N.; Pham, H.V.; Tsai, F.T.C. Bayesian Chance-Constrained Hydraulic Barrier Design under Geological Structure Uncertainty. Groundwater
**2015**, 53, 908–919. [Google Scholar] [CrossRef] [PubMed] - Wöhling, T.; Schöniger, A.; Gayler, S.; Nowak, W. Bayesian model averaging to explore the worth of data for soil?plant model selection and prediction. Water Resour. Res.
**2015**, 51, 2825–2846. [Google Scholar] [CrossRef] - Safi, A.; Vilhelmsen, T.N.; Alameddine, I.; Abou Najm, M.; El-Fadel, M. Data-Worth Assessment for a Three-Dimensional Optimal Design in Nonlinear Groundwater Systems. Groundwater
**2019**, 57, 612–631. [Google Scholar] [CrossRef] [PubMed] - Moazamnia, M.; Hassanzadeh, Y.; Nadiri, A.; Khatibi, R.; Sadeghfam, S. Formulating a strategy to combine artificial intelligence models using Bayesian model averaging to study a distressed aquifer with sparse data availability. J. Hydrol.
**2019**, 571, 765–781. [Google Scholar] [CrossRef] - Xu, T.; Valocchi, A.J.; Ye, M.; Liang, F. Quantifying model structural error: Efficient Bayesian calibration of a regional groundwater flow model using surrogates and a data-driven error model. Water Resour. Res.
**2017**, 53, 4084–4105. [Google Scholar] [CrossRef] - Xu, T.; Valocchi, A.J.; Ye, M.; Liang, F.; Lin, Y.F. Bayesian calibration of groundwater models with input data uncertainty. Water Resour. Res.
**2017**, 5, 3224–3245. [Google Scholar] [CrossRef] - Neuman, S.P. Maximum likelihood Bayesian averaging of uncertain model predictions. Stoch. Environ. Res. Risk Assess.
**2003**, 17, 291–305. [Google Scholar] [CrossRef] - Nowak, W.; Rubin, Y.; de Barros, F.P.J. A hypothesis-driven approach to optimize field campaigns. Water Resour. Res.
**2012**, 48, W06509. [Google Scholar] [CrossRef] - Pham, H.V.; Tsai, F.T.C. Bayesian experimental design for identification of model propositions and conceptual model uncertainty reduction. Adv. Water Resour.
**2015**, 83, 148–159. [Google Scholar] [CrossRef][Green Version] - Pham, H.V.; Tsai, F.T.-C. Optimal observation network design for conceptual model discrimination and uncertainty reduction: Observation network design for model discrimination. Water Resour. Res.
**2016**, 52, 1245–1264. [Google Scholar] [CrossRef] - Kwon, H.H.; Brown, C.; Lall, U. Climate informed flood frequency analysis and prediction in Montana using hierarchical Bayesian modeling. Geophys. Res. Lett.
**2008**, 35, L05404. [Google Scholar] [CrossRef] - Tsai, F.T.C.; Elshall, A.S. Hierarchical Bayesian model averaging for hydrostratigraphic modeling: Uncertainty segregation and comparative evaluation. Water Resour. Res.
**2013**, 4, 5520–5536. [Google Scholar] [CrossRef] - Elshall, A.S.; Tsai, F.T.C. Constructive epistemic modeling of groundwater flow with geological structure and boundary condition uncertainty under the Bayesian paradigm. J. Hydrol.
**2014**, 517, 105–119. [Google Scholar] [CrossRef] - Zhang, X.; Niu, G.-Y.; Elshall, A.S.; Ye, M.; Barron-Gafford, G.A.; Pavao-Zuckerman, M. Assessing five evolving microbial enzyme models against field measurements from a semiarid savannah-What are the mechanisms of soil respiration pulses? Geophys. Res. Lett.
**2014**, 41, 6428–6434. [Google Scholar] [CrossRef] - Enemark, T.; Peeters, L.J.; Mallants, D.; Batelaan, O.; Valentine, A.P.; Sambridge, M. Hydrogeological Bayesian Hypothesis Testing through Trans-Dimensional Sampling of a Stochastic Water Balance Model. Water
**2019**, 11, 1463. [Google Scholar] [CrossRef] - Zhang, Y.G.; Fengge Su, F.; Hao, Z.; Xu, C.; Yu, Z.; Wang, L.; Tong, K. Impact of projected climate change on the hydrology in the headwaters of the Yellow River basin. Hydrol. Processes
**2015**, 29, 4379–4397. [Google Scholar] [CrossRef] - Mani, A.; Tsai, F.T.-C.; Kao, S.-C.; Naz, B.S.; Ashfaq, M.; Rastogi, D. Conjunctive management of surface and groundwater resources under projected future climate change scenarios. J. Hydrol.
**2016**, 540, 397–411. [Google Scholar] [CrossRef][Green Version] - Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Chamberlin, T.C. The method of multiple working hypotheses. Science
**1890**, 15, 92–96. [Google Scholar] [CrossRef] - Friel, N.; Wyse, J. Estimating the evidence-A review. Stat. Neerl.
**2012**, 66, 288–308. [Google Scholar] [CrossRef] - Schöniger, A.; Wöhling, T.; Nowak, W. Model selection on solid ground: Rigorous comparison of nine ways to evaluate Bayesian model evidence. Water Resour. Res.
**2014**, 50, 9484–9513. [Google Scholar] [CrossRef] - Lartillot, N.; Philippe, H. Computing Bayes Factors Using Thermodynamic Integration. Syst. Boil.
**2006**, 55, 195–207. [Google Scholar] [CrossRef][Green Version] - Xie, W.G.; Lewis, P.O.; Fan, Y.; Kuo, L.; Chen, M.H. Improving Marginal Likelihood Estimation for Bayesian Phylogenetic Model Selection. Syst. Biol.
**2011**, 60, 150–160. [Google Scholar] [CrossRef] - Liu, P.G.; Elshall, A.S.; Ye, M.; Beerli, P.; Zeng, X.K.; Lu, D.; Tao, Y.Z. Evaluating marginal likelihood with thermodynamic integration method and comparison with several other numerical methods. Water Resour. Res.
**2016**, 52, 734–758. [Google Scholar] [CrossRef][Green Version] - Rojas, R.; Feyen, L.; Dassargues, A. Conceptual model uncertainty in groundwater modeling: Combining generalized likelihood uncertainty estimation and Bayesian model averaging. Water Resour. Res.
**2008**, 44, W12418. [Google Scholar] [CrossRef] - Lu, D.; Ye, M.; Neuman, S.P. Dependence of Bayesian Model Selection Criteria and Fisher Information Matrix on Sample Size. Math. Geol.
**2011**, 43, 971–993. [Google Scholar] [CrossRef] - Xue, L.; Zhang, D.X. A multimodel data assimilation framework via the ensemble Kalman filter. Water Resour. Res.
**2014**, 50, 4197–4219. [Google Scholar] [CrossRef] - Gelman, A.; Meng, X.L. Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Stat. Sci.
**1998**, 13, 163–185. [Google Scholar] [CrossRef] - Guthke, A. Defensible Model Complexity: A Call for Data-Based and Goal-Oriented Model Choice. Ground Water
**2017**, 55, 646–650. [Google Scholar] [CrossRef] - Höge, M.; Guthke, A.; Nowak, W. The hydrologist’s guide to Bayesian model selection, averaging and combination. J. Hydrol.
**2019**, 572, 96–107. [Google Scholar] [CrossRef] - Ye, M.; Neuman, S.P.; Meyer, P.D. Maximum likelihood Bayesian averaging of spatial variability models in unsaturated fractured tuff. Water Resour. Res.
**2004**, 40, W05113. [Google Scholar] [CrossRef] - Ye, M.; Meyer, P.D.; Neuman, S.P. On model selection criteria in multimodel analysis. Water Resour. Res.
**2008**, 44, W03428. [Google Scholar] [CrossRef] - Ye, M.; Lu, D.; Neuman, S.P.; Meyer, P.D. Comment on “Inverse groundwater modeling for hydraulic conductivity estimation using Bayesian model averaging and variance window” by Frank T.-C. Tsai and Xiaobao Li. Water Resour. Res.
**2010**, 46, W09434. [Google Scholar] [CrossRef] - Ye, M.; Pohlmann, K.F.; Chapman, J.B.; Pohll, G.M.; Reeves, D.M. A Model-Averaging Method for Assessing Groundwater Conceptual Model Uncertainty. Ground Water
**2010**, 48, 716–728. [Google Scholar] [CrossRef] - Marshall, L.; Nott, D.; Sharma, A. Hydrological model selection: A Bayesian alternative. Water Resour. Res.
**2005**, 41, W10422. [Google Scholar] [CrossRef] - Poeter, E.; Anderson, D. Multimodel Ranking and Inference in Ground Water Modeling. Ground Water
**2005**, 43, 597–605. [Google Scholar] [CrossRef] - Tsai, F.T.C.; Li, X.B. Inverse groundwater modeling for hydraulic conductivity estimation using Bayesian model averaging and variance window. Water Resour. Res.
**2008**, 44, W09434. [Google Scholar] [CrossRef] - Singh, A.; Mishra, S.; Ruskauff, G. Model Averaging Techniques for Quantifying Conceptual Model Uncertainty. Ground Water
**2010**, 4, 701–715. [Google Scholar] [CrossRef] - Foglia, L.; Mehl, S.W.; Hill, M.C.; Burlando, P. Evaluating model structure adequacy: The case of the Maggia Valley groundwater system, southern Switzerland. Water Resour. Res.
**2013**, 49, 260–282. [Google Scholar] [CrossRef][Green Version] - Lu, D.; Ye, M.; Curtis, G.P. Maximum likelihood Bayesian model averaging and its predictive analysis for groundwater reactive transport models. J. Hydrol.
**2015**, 529, 1859–1873. [Google Scholar] [CrossRef][Green Version] - Kikuchi, C.P.; Ferre, T.P.A.; Vrugt, J.A. On the optimal design of experiments for conceptual and predictive discrimination of hydrologic system models. Water Resour. Res.
**2015**, 51, 4454–4481. [Google Scholar] [CrossRef][Green Version] - Kass, R.E.; Raftery, A.E. Bayes factors. J. Am. Stat. Assoc.
**1995**, 90, 773–795. [Google Scholar] [CrossRef] - Schwarz, G. Estimating the Dimension of a Model. Ann. Stat.
**1978**, 6, 461–464. [Google Scholar] [CrossRef] - Kashyap, R.L. Optimal Choice of AR and MA Parts in Autoregressive Moving Average Models. IEEE Trans. Pattern Anal. Mach. Intell.
**1982**, 4, 99–104. [Google Scholar] [CrossRef] - Schoups, G.; Vrugt, J.A. Bayesian Selection of Hydrological Models Using Sequential Monte Carlo Sampling, American Geophysical Union, Fall Meeting 2011, Abstract #H23D-1310. 2011. Available online: http://faculty.sites.uci.edu/jasper/files/2012/10/poster_AGU2011.pdf (accessed on 28 July 2019).
- Schoups, G.; van de Giesen, N.C.; Savenije, H.H.G. Model complexity control for hydrologic prediction. Water Resour. Res.
**2008**, 44, W00B03. [Google Scholar] [CrossRef] - Schöniger, A.; Illman, W.A.; Wöhling, T.; Nowak, W. Finding the right balance between groundwater model complexity and experimental effort via Bayesian model selection. J. Hydrol.
**2015**, 531, 96–110. [Google Scholar] [CrossRef] - Volpi, E.; Schoups, G.; Firmani, G.; Vrugt, J.A. Sworn testimony of the model evidence: Gaussian Mixture Importance (GAME) sampling. Water Resour. Res.
**2017**, 53, 6133–6158. [Google Scholar] [CrossRef][Green Version] - Höge, M.; Wöhling, T.; Nowak, W. A Primer for Model Selection: The Decisive Role of Model Complexity. Water Resour. Res.
**2018**, 54, 1688–1715. [Google Scholar] [CrossRef] - Elsheikh, A.H.; Wheeler, M.F.; Hoteit, I. Nested sampling algorithm for subsurface flow model selection, uncertainty quantification, and nonlinear calibration. Water Resour. Res.
**2013**, 49, 8383–8399. [Google Scholar] [CrossRef][Green Version] - Cao, T.; Zeng, X.; Wu, J.; Wang, D.; Sun, Y.; Zhu, X.; Lin, J.; Long, Y. Integrating MT-DREAMzs and nested sampling algorithms to estimate marginal likelihood and comparison with several other methods. J. Hydrol.
**2018**, 563, 750–765. [Google Scholar] [CrossRef] - Zeng, X.; Ye, M.; Wu, J.; Wang, D.; Zhu, X. Improved Nested Sampling and Surrogate-Enabled Comparison With Other Marginal Likelihood Estimators. Water Resour. Res.
**2018**, 54, 797–826. [Google Scholar] [CrossRef] - Chib, S.; Jeliazkov, I. Marginal Likelihood From the Metropolis-Hastings Output. J. Am. Stat. Assoc.
**2001**, 96, 270–281. [Google Scholar] [CrossRef] - Neal, R.M. Annealed importance sampling. Stat. Comput.
**2001**, 11, 125–139. [Google Scholar] [CrossRef] - Huelsenbeck, J.P.; Larget, B.; Alfaro, M.E. Bayesian Phylogenetic Model Selection Using Reversible Jump Markov Chain Monte Carlo. Mol. Boil. Evol.
**2004**, 21, 1123–1133. [Google Scholar] [CrossRef] - Del Moral, P.; Doucet, A.; Jasra, A. Sequential Monte Carlo samplers. J. R. Stat. Soc. Ser. B-Stat. Methodol.
**2006**, 68, 411–436. [Google Scholar] [CrossRef] - Bouchard-Cote, A.; Sankararaman, S.; Jordan, M.I. Phylogenetic Inference via Sequential Monte Carlo. Syst. Boil.
**2012**, 61, 579–593. [Google Scholar] [CrossRef][Green Version] - Ter Braak, C.J.F.; Vrugt, J.A. Differential Evolution Markov Chain with snooker updater and fewer chains. Stat. Comput.
**2008**, 18, 435–446. [Google Scholar] [CrossRef][Green Version] - Vrugt, J.A.; ter Braak, C.J.F.; Diks, C.G.H.; Robinson, B.A.; Hyman, J.M.; Higdon, D. Accelerating Markov Chain Monte Carlo Simulation by Differential Evolution with Self-Adaptive Randomized Subspace Sampling. Int. J. Nonlinear Sci. Numer. Simul.
**2009**, 10, 273–290. [Google Scholar] [CrossRef] - Laloy, E.; Vrugt, J.A. High-dimensional posterior exploration of hydrologic models using multiple-try DREAM(ZS) and high-performance computing. Water Resour. Res.
**2012**, 48, W01526. [Google Scholar] [CrossRef] - Vrugt, J.A. Markov chain Monte Carlo simulation using the DREAM software package: Theory, concepts, and MATLAB implementation. Environ. Model. Softw.
**2016**, 75, 273–316. [Google Scholar] [CrossRef][Green Version] - Smith, R.C. Uncertainty Quantification: Theory, Implementation, and Applications; Computational science and engineering series; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2014; p. XVIII + 382. [Google Scholar]
- McCulloch, R.; Rossi, P.E. A bayesian approach to testing the arbitrage pricing theory. J. Econ.
**1991**, 49, 141–168. [Google Scholar] [CrossRef] - Chen, M.-H.; Shao, Q.-M.; Ibrahim, J.G. Monte Carlo Methods in Bayesian Computation; Statistics Springer: New York, NY, USA, 2000. [Google Scholar]
- Troldborg, M.; Nowak, W.; Tuxen, N.; Bjerg, P.L.; Helmig, R.; Binning, P.J. Uncertainty evaluation of mass discharge estimates from a contaminated site using a fully Bayesian framework. Water Resour. Res.
**2010**, 46, W12552. [Google Scholar] [CrossRef] - Newton, M.A.; Raftery, A.E. Approximate Bayesian Inference by the Weighted Likelihood Bootstrap. J. R. Stat. Soc. Ser. B-Stat. Methodol.
**1994**, 56, 3–48. [Google Scholar] [CrossRef] - Seidou, O.; Ramsay, A.; Nistor, I. Climate change impacts on extreme floods I: Combining imperfect deterministic simulations and non-stationary frequency analysis. Nat. Hazards
**2012**, 61, 647–659. [Google Scholar] [CrossRef] - Calderhead, B.; Girolami, M. Estimating Bayes factors via thermodynamic integration and population MCMC. Comput. Stat. Data Anal.
**2009**, 5, 4028–4045. [Google Scholar] [CrossRef] - Skilling, J. Nested sampling for general Bayesian computation. Bayesian Anal.
**2006**, 1, 833–859. [Google Scholar] [CrossRef] - Friel, N.; Pettitt, A.N. Marginal likelihood estimation via power posteriors. J. R. Stat. Soc. Ser. B-Stat. Methodol.
**2008**, 70, 589–607. [Google Scholar] [CrossRef] - Chopin, N.; Robert, C.P. Properties of Nested Sampling. Biometrika
**2010**, 97, 741–755. [Google Scholar] [CrossRef] - Samani, S.; Ye, M.; Zhang, F.; Pei, Y.-Z.; Tang, G.-P.; Elshall, A.; Moghaddam, A.A. Impacts of prior parameter distributions on Bayesian evaluation of groundwater model complexity. Water Sci. Eng.
**2018**, 11, 89–100. [Google Scholar] [CrossRef] - Elshall, A.S.; Ye, M.; Pei, Y.; Zhang, F.; Niu, G.-Y.; Barron-Gafford, G.A. Relative model score: A scoring rule for evaluating ensemble simulations with application to microbial soil respiration modeling. Stoch. Environ. Res. Risk Assess.
**2018**, 32, 2809–2819. [Google Scholar] [CrossRef] - Elshall, A.S.; Ye, M.; Niu, G.-Y.; Barron-Gafford, G.A. Bayesian inference and predictive performance of soil respiration models in the presence of model discrepancy. Geosci. Model Dev.
**2019**, 12, 2009–2032. [Google Scholar] [CrossRef][Green Version] - Schöniger, A.; Wöhling, T.; Nowak, W. A statistical concept to assess the uncertainty in Bayesian model weights and its impact on model ranking. Water Resour. Res.
**2015**, 51, 7524–7546. [Google Scholar] [CrossRef][Green Version] - Enemark, T.; Peeters, L.J.M.; Mallants, D.; Batelaan, O. Hydrogeological conceptual model building and testing: A review. J. Hydrol.
**2019**, 569, 310–329. [Google Scholar] [CrossRef] - Hoeting, J.A.; Madigan, D.; Raftery, A.E.; Volinsky, T.C. Bayesian model averaging: A tutorial. Stat. Sci.
**1999**, 14, 382–417. [Google Scholar] - Ye, M.; Neuman, S.P.; Meyer, P.D.; Pohlmann, K. Sensitivity analysis and assessment of prior model probabilities in MLBMA with application to unsaturated fractured tuff. Water Resour. Res.
**2005**, 41, W12429. [Google Scholar] [CrossRef] - Meyer, P.; Ye, M.; Rockhold, S.; Neuman, S.; Cantrell, K. Combined Estimation of Hydrogeologic Conceptual Model, Parameter and Scenario Uncertainty with Application to Uranium Transport at the Hanford Site 300 Area; Rep. NUREG/CR-6940 PNNL-16396; US Nuclear Regulatory Commission: Washington, DC, USA, 2007. [Google Scholar]
- Neal, R.M. Markov Chain Sampling Methods for Dirichlet Process Mixture Models. J. Comput. Graph. Stat.
**2000**, 9, 249–265. [Google Scholar] [CrossRef] - Lefebvre, G.; Steele, R.; Vandal, A.C. A path sampling identity for computing the Kullback-Leibler and J divergences. Comput. Stat. Data Anal.
**2010**, 54, 1719–1731. [Google Scholar] [CrossRef] - Friel, N.; Hurn, M.; Wyse, J. Improving power posterior estimation of statistical evidence. Stat. Comput.
**2014**, 24, 709–723. [Google Scholar] [CrossRef] - Baele, G.; Lemey, P.; Vansteelandt, S. Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution. BMC Bioinform.
**2013**, 14, 85. [Google Scholar] [CrossRef] - Anamosa, P.R.; Nkedi-Kizza, P.; Blue, W.G.; Sartain, J.B. Water movement through an aggregated, gravelly oxisol from cameroon. Geoderma
**1990**, 46, 263–281. [Google Scholar] [CrossRef] - Massmann, C.; Birk, S.; Liedl, R.; Geyer, T. Identification of hydrogeological models: Application to tracer test analysis in a karst aquifer. In Calibration and Reliability in Groundwater Modelling: From Uncertainty to Decision Making, Proceedings of ModelCARE’2005, The Hague, The Netherlands, 6–9 June 2005. IAHS Publ.
**2006**, 304, 59–64. [Google Scholar] - Tang, G.P.; Mayes, M.A.; Parker, J.C.; Yin, X.L.; Watson, D.B.; Jardine, P.M. Improving parameter estimation for column experiments by multi-model evaluation and comparison. J. Hydrol.
**2009**, 376, 567–578. [Google Scholar] [CrossRef] - Lu, D.; Ye, M.; Meyer, P.D.; Curtis, G.P.; Shi, X.; Niu, X.-F.; Yabusaki, S.B. Effects of error covariance structure on estimation of model averaging weights and predictive performance. Water Resour. Res.
**2013**, 49, 6029–6047. [Google Scholar] [CrossRef] - Zhang, G.N.; Lu, D.; Ye, M.; Gunzburger, M.; Webster, C. An adaptive sparse-grid high-order stochastic collocation method for Bayesian inference in groundwater reactive transport modeling. Water Resour. Res.
**2013**, 49, 6871–6892. [Google Scholar] [CrossRef][Green Version] - Knappett, P.S.K.; Du, J.; Liu, P.; Horvath, V.; Mailloux, B.J.; Feighery, J.; Van Geen, A.; Culligan, P.J. Importance of Reversible Attachment in Predicting E. Coli Transport in Saturated Aquifers From Column Experiments. Adv. Water Resour.
**2014**, 63, 120–130. [Google Scholar] [CrossRef] - Goldberg, E.; Scheringer, M.; Bucheli, T.D.; Hungerbuhler, K. Critical Assessment of Models for Transport of Engineered Nanoparticles in Saturated Porous Media. Environ. Sci. Technol.
**2014**, 48, 12732–12741. [Google Scholar] [CrossRef] - Ruhl, A.S.; Jekel, M. Degassing, gas retention and release in Fe(0) permeable reactive barriers. J. Contam. Hydrol.
**2014**, 159, 11–19. [Google Scholar] [CrossRef] - Shi, X.Q.; Ye, M.; Curtis, G.P.; Miller, G.L.; Meyer, P.D.; Kohler, M.; Yabusaki, S.; Wu, J. Assessment of parametric uncertainty for groundwater reactive transport modeling. Water Resour. Res.
**2014**, 50, 4416–4439. [Google Scholar] [CrossRef] - Feder, F.; Bochu, V.; Findeling, A.; Doelsch, E. Repeated pig manure applications modify nitrate and chloride competition and fluxes in a Nitisol. Sci. Total. Environ.
**2015**, 511, 238–248. [Google Scholar] [CrossRef] - Mehta, V.S.; Maillot, F.; Wang, Z.M.; Catalano, J.G.; Giammar, D.E. Transport of U(VI) through sediments amended with phosphate to induce in situ uranium immobilization. Water Res.
**2015**, 69, 307–317. [Google Scholar] [CrossRef][Green Version] - Kret, E.; Kiecak, A.; Malina, G.; Nijenhuis, I.; Postawa, A. Identification of TCE and PCE sorption and biodegradation parameters in a sandy aquifer for fate and transport modelling: Batch and column studies. Environ. Sci. Pollut. Res.
**2015**, 22, 9877–9888. [Google Scholar] [CrossRef] - Ritschel, T.; Totsche, K.U. Closed-flow column experiments-Insights into solute transport provided by a damped oscillating breakthrough behavior. Water Resour. Res.
**2016**, 52, 2206–2221. [Google Scholar] [CrossRef][Green Version] - Toride, N.; Leij, F.J.; van Genuchten, M.T. The CXTFIT Code for Estimating Transport Parameters from Laboratory or Filed Tracer Experiments; US Salinity Laboratory: Riverside, CA, USA, 1995. [Google Scholar]
- Tang, G.P.; Mayes, M.A.; Parker, J.C.; Jardine, P.M. CXTFIT/Excel-A modular adaptable code for parameter estimation, sensitivity analysis and uncertainty analysis for laboratory or field tracer experiments. Comput. Geosci.
**2010**, 36, 1200–1209. [Google Scholar] [CrossRef] - Van Genuchten, M.T. Non-Equilibrium Transport Parameters from Miscible Displacement Experiments; Res. Rep. No. 119; U.S. Salinity Lab., USDA, ARS: Riverside, CA, USA, 1981. [Google Scholar]

**Figure 1.**Bayesian model evidence (BME) estimation for the Gaussian model with D = 100 dimensions and K = 5 using thermodynamic integration (TI), steppingstone sampling (SS), and multiple one-steppingstone sampling (MOSS). The relative error (%) of the mean BME is based on 10 independent estimations.

**Figure 2.**Likelihood surface at different ${\beta}_{k}$ values of model ADE2. The number of power posterior coefficients ${\beta}_{k}$ are $K+1$ with $K=100$, and their values are determined by Equation (14) using the shape parameter $\alpha =0.3$. We also report the potential expectation ${y}_{k}$ (Equation (13)) for each distribution.

**Figure 3.**The relative error of sequence of BME estimates for ${\beta}_{k}$ values expressed in (

**a**) standard linear scale and (

**b**) log scale based on one-steppingstone sampling with $\mathsf{\beta}=\{{\beta}_{0}=0,{\beta}_{k},{\beta}_{K}=1\}$ and a steppingstone sampling with $\mathsf{\beta}=\{{\beta}_{0}=0,{\beta}_{1},\dots ,{\beta}_{k},{\beta}_{K}=1\}$, given K = 100 and $\alpha =0.3$. Note that the mean of values presented by the blue line is the MOSS estimate, and the last estimated value of the orange line with $\mathsf{\beta}=\{{\beta}_{0},{\beta}_{1},\dots ,{\beta}_{K}\}$ is the SS estimate.

**Table 1.**True analytical solution (AS) of the BME for the Gaussian model with different dimensions (D), and the corresponding numerical estimations with K = 50 and α = 0.3 using arithmetic mean (AM), harmonic mean (HM), thermodynamic integration (TI), steppingstone sampling (SS), and multiple one-steppingstone sampling (MOSS). The mean BME and its relative error (δ) are based on 10 independent estimations.

D = 1 | D = 50 | D = 100 | ||||
---|---|---|---|---|---|---|

Mean | δ | Mean | δ | Mean | δ | |

AS | 0.707107 | - | 2.98E−08 | - | 8.88E−16 | - |

AM | 0.706716 | −0.06% | 2.89E−08 | −3.18% | 6.15E−16 | −30.80% |

HM | 0.70718 | 0.07% | 3.6E−08 | 20.72% | 2.57E−15 | 189.34% |

TI | 0.707131 | −0.01% | 2.98E−08 | −0.17% | 8.85E−16 | −0.32% |

SS | 0.707143 | 0.01% | 2.98E−08 | 0.02% | 8.89E−16 | 0.04% |

MOSS | 0.707147 | 0.01% | 2.99E−08 | 0.24% | 8.8E−16 | −0.97% |

**Table 2.**BME estimation for the Gaussian model with D = 100 dimensions and α = 0.3 using the arithmetic mean (AM), harmonic mean (HM), thermodynamic integration (TI), steppingstone sampling (SS), and multiple one-steppingstone sampling (MOSS). The mean BME and its relative error (δ) are based on 10 independent estimations. The true analytical solution is 8.88E−16.

K = 5 | K = 10 | K = 50 | ||||
---|---|---|---|---|---|---|

Mean | δ | Mean | δ | Mean | δ | |

AM | 5.13E−16 | −42.27% | 5.21E−16 | −41.31% | 6.15E−16 | −30.80% |

HM | 4.44E−15 | 399.94% | 4.19E−15 | 371.42% | 2.57E−15 | 189.34% |

TI | 6.3E−16 | −29.06% | 8.15E−16 | −8.21% | 8.85E−16 | −0.32% |

SS | 8.95E−16 | 0.72% | 8.89E−16 | 0.08% | 8.89E−16 | 0.04% |

MOSS | 8.95E−16 | 0.78% | 8.86E−16 | −0.23% | 8.8E−16 | −0.97% |

**Table 3.**Metrics and model ranking of the candidate models based on the best realization (i.e., RMSE) and based on the average model fit of all the parameter values that the model can take (i.e., BME and model weight). The results are based on the implementation of AM.

ADE1 | ADE2 | MIM1 | MIM2 | |
---|---|---|---|---|

RMSE of the best realization | 0.0227 | 0.0121 | 0.0115 | 0.0115 |

RMSE normalized by mean observation | 5.4% | 2.88% | 2.738% | 2.737% |

Model rank by the best realization | 4 | 3 | 2 | 1 |

BME | 6.23E+23 | 7.68E+23 | 1.73E+24 | 2.73E+23 |

Model weight | 0.1836 | 0.2266 | 0.5092 | 0.0806 |

Model rank by model weight | 3 | 2 | 1 | 4 |

BME estimation error | ±1.81E+21 | ±7.34E+21 | ±8.44E+21 | ±2.56E+21 |

BME estimation error [%] | ±0.29% | ±0.96% | ±0.49% | ±0.94% |

BME 95% credible interval error [%] | ±0.48% | ±1.57% | ±0.80% | ±1.54% |

**Table 4.**BME (BME Relative Error [%]) for ADE2 model using thermodynamic integration (TI), steppingstone sampling (SS), and multiple one-steppingstone sampling (MOSS), for varying sample size $n$ per $K$ interval such that $n\times K=\mathrm{2,000,000}$. The reference BME solution is $7.68\times {10}^{23}\pm 7.34\times {10}^{21}$, and the 95% credible interval of the reference solution error is ±1.57%.

K | TI | SS | MOSS |
---|---|---|---|

5 | 1.38E+22 | 6.40E+23 | 7.48E+23 |

(−98.20%) | (−16.67%) | (−2.60%) | |

10 | 2.71E+23 | 6.02E+23 | 7.44E+23 |

(−64.71%) | (−21.61%) | (−3.13%) | |

20 | 4.52E+23 | 5.58E+23 | 7.51E+23 |

(−41.15%) | (−27.34%) | (−2.21%) | |

100 | 6.10E+23 | 5.13E+23 | 7.77E+23 |

(−20.57%) | (−33.20%) | (1.17%) | |

200 | 6.01E+23 | 4.93E+23 | 7.617E+23 |

(−21.76%) | (−35.87) | (−3.27%) |

**Table 5.**BME (BME Relative Error [%]) for ADE2 model using thermodynamic integration (TI), steppingstone sampling (SS), and multiple one-steppingstone sampling (MOSS), for $n=\mathrm{20,000}$ number of samples per $K$ interval. The reference BME solution is $7.68\times {10}^{23}\pm 7.34\times {10}^{21}$, and the 95% credible interval of the reference solution error is ±1.57%.

K | TI | SS | MOSS |
---|---|---|---|

5 | 1.24E+22 | 5.76E+23 | 7.38E+23 |

(−98.39%) | (−25.00%) | (−3.91%) | |

10 | 2.65E+23 | 5.99E+23 | 7.41E+23 |

(−65.49%) | (−22.01%) | (−3.52%) | |

20 | 4.50E+23 | 5.00E+23 | 7.50E+23 |

(−41.41%) | (−34.90%) | (−2.34%) | |

100 | 6.10E+23 | 5.13E+23 | 7.77E+23 |

(−20.57%) | (−33.20%) | (1.17%) |

**Table 6.**BME (BME Relative Error [%]) for ADE2 model using thermodynamic integration (TI), steppingstone sampling (SS), and multiple one-steppingstone sampling (MOSS), for $K=5$ with a varying $n$ number of samples per $K$ interval. The reference BME solution is 7.68 × 10

^{23}± 7.34 × 10

^{21}, and the 95% credible interval of the reference solution error is ±1.57%.

n | TI | SS | MOSS |
---|---|---|---|

1000 | 1.23E+22 | 5.72E+23 | 7.44E+23 |

(−98.40%) | (−25.52%) | (−3.13%) | |

100,000 | 1.29E+22 | 6.27E+23 | 7.57E+23 |

(−98.32%) | (−18.36%) | (−1.43%) | |

200,000 | 1.29E+22 | 6.33E+23 | 7.47E+23 |

(−98.32%) | (−17.58%) | (−2.73%) | |

300,000 | 1.33E+22 | 6.35E+23 | 7.46E+23 |

(−98.27%) | (−17.32%) | (−2.86%) | |

400,000 | 1.38E+22 | 6.40E+23 | 7.48E+23 |

(−98.20%) | (−16.67%) | (−2.60%) |

**Table 7.**BME (BME Relative Error [%]) and model weight (model ranking) for the candidate modes using harmonic mean (HM), thermodynamic integration (TI), steppingstone sampling (SS) and multiple one-steppingstone sampling (MOSS), for $K=5$ and $n=\mathrm{10,000}$ samples per $K$ interval.

AME1 | AME2 | MIM1 | MIM2 | |||||
---|---|---|---|---|---|---|---|---|

BME | Weight | BME | Weight | BME | Weight | BME | Weight | |

(Error) | (Ranking) | (Error) | (Ranking) | (Error) | (Ranking) | (Error) | (Ranking) | |

Reference | 6.23E+23 | 0.1836 | 7.68E+23 | 0.2263 | 1.73E+24 | 0.5097 | 2.73E+23 | 0.0804 |

(±0.29%) | (3) | (±0.96%) | (2) | (±0.49%) | (1) | (±0.94%) | (4) | |

HM | 6.20E+24 | 0.0812 | 4.01E+25 | 0.53 | 2.19E+25 | 0.2878 | 8.01E+24 | 0.11 |

(894.43%) | (1) | (5121.43%) | (1) | (1168.72%) | (2) | (2834.03%) | (3) | |

TI | 1.61E+22 | 0.11 | 1.23E+22 | 0.0824 | 1.12E+23 | 0.7507 | 8.78E+21 | 0.0589 |

(−97.41%) | (2) | (−98.40%) | (3) | (−93.53%) | (1) | (−96.78%) | (4) | |

SS | 5.57E+23 | 0.2096 | 5.72E+23 | 0.2151 | 1.38E+24 | 0.5183 | 1.51E+23 | 0.0570 |

(−10.53%) | (3) | (−25.52%) | (2) | (−20.33%) | (1) | (−44.52%) | (4) | |

MOSS | 6.02E+23 | 0.1794 | 7.44E+23 | 0.2217 | 1.72E+24 | 0.5138 | 2.85E+23 | 0.0850 |

(−3.37%) | (3) | (−3.13%) | (2) | (−0.36%) | (1) | (4.45%) | (4) |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Elshall, A.S.; Ye, M. Making Steppingstones out of Stumbling Blocks: A Bayesian Model Evidence Estimator with Application to Groundwater Transport Model Selection. *Water* **2019**, *11*, 1579.
https://doi.org/10.3390/w11081579

**AMA Style**

Elshall AS, Ye M. Making Steppingstones out of Stumbling Blocks: A Bayesian Model Evidence Estimator with Application to Groundwater Transport Model Selection. *Water*. 2019; 11(8):1579.
https://doi.org/10.3390/w11081579

**Chicago/Turabian Style**

Elshall, Ahmed S., and Ming Ye. 2019. "Making Steppingstones out of Stumbling Blocks: A Bayesian Model Evidence Estimator with Application to Groundwater Transport Model Selection" *Water* 11, no. 8: 1579.
https://doi.org/10.3390/w11081579