Model averaging makes it possible to use multiple models for one modelling task, like predicting a certain quantity of interest. Several Bayesian approaches exist that all yield a weighted average of predictive distributions. However, often, they are not properly applied which can lead to false conclusions. In this study, we focus on Bayesian Model Selection (BMS) and Averaging (BMA), Pseudo-BMS/BMA and Bayesian Stacking. We want to foster their proper use by, first, clarifying their theoretical background and, second, contrasting their behaviours in an applied groundwater modelling task. We show that only Bayesian Stacking has the goal of model averaging for improved predictions by model combination. The other approaches pursue the quest of finding a single best model as the ultimate goal, and use model averaging only as a preliminary stage to prevent rash model choice. Improved predictions are thereby not guaranteed. In accordance with so-called
-settings that clarify the alleged relations between models and truth, we elicit which method is most promising.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited