Some Novel Bayesian Model Combination Schemes : An Application to Commodities Prices

Forecasting commodities prices on vividly changing markets is a hard problem to tackle. However, being able to determine important price predictors in a time-varying setting is crucial for sustainability initiatives. For example, the 2000s commodities boom gave rise to questioning whether commodities markets become over-financialized. In case of agricultural commodities, it was questioned if the speculative pressures increase food prices. Recently, some newly proposed Bayesian model combination scheme has been proposed, i.e., Dynamic Model Averaging (DMA). This method has already been applied with success in certain markets. It joins together uncertainty about the model and explanatory variables and a time-varying parameters approach. It can also capture structural breaks and respond to market disturbances. Secondly, it can deal with numerous explanatory variables in a data-rich environment. Similarly, like Bayesian Model Averaging (BMA), Dynamic Model Averaging (DMA), Dynamic Model Selection (DMS) and Median Probability Model (MED) start from Time-Varying Parameters’ (TVP) regressions. All of these methods were applied to 69 spot commodities prices. The period between Dec 1983 and Oct 2017 was analysed. In approximately 80% of cases, according to the Diebold–Mariano test, DMA produced statistically significant more accurate forecast than benchmark forecasts (like the naive method or ARIMA). Moreover, amongst all the considered model types, DMA was in 22% of cases the most accurate one (significantly). MED was most often minimising the forecast errors (28%). However, in the text, it is clarified that this was due to some specific initial parameters setting. The second “best” model type was MED, meaning that, in the case of model selection, relying on the highest posterior probability is not always preferable.


Introduction
During the 2000s, a commodities price boom was observed.Naturally, this led to the question of what was the purpose of such price spikes.Some of the hypotheses were based on the growing financialization of commodities markets.In other words, it was observed since the 1990s that the links between commodities prices and equity markets become more and more tight.For the previous periods, more research emphasis was put on the fundamental factors like supply and demand, etc.
The problem of sustainability is clearly seen in agricultural commodities.Especially in recent years, this has been noticed not only by the final goods' consumers, but also by the producers.Nevertheless, such initiatives spread over whole commodities' markets.Sustainable products might generate more stable revenues to producers, stakeholders can reduce their risk exposure, consumers become more aware of natural resources.Within this context, it seems interesting to construct a tool which would be able to model commodities prices, and which would be able to deal with variable uncertainty, as well as having its initial feature of time-varying coefficients.These adaptive abilities might make it a superior method in a highly volatile market background.
Forecasting commodity prices is a very difficult task.The reason for this is that the structure and the behaviour of commodities markets can be quite complex.For example, certain nonlinear effects can emerge, which, furthermore, can lead to chaotic behaviours.However, even without incorporating the developed conceptual machinery, many previous studies indicated that, in different time periods, different factors are the leading determinants of the prices of various commodities [1][2][3].Therefore, in some periods, certain models work quite well, but, in the other periods, some other models work much better.
In other words, when a longer time horizon is analysed, the uncertainty about the true model arises (i.e., the set of predictors).However, except for the model uncertainty, there are strong arguments to also allow for time-varying parameters in the considered models (i.e., to allow regression coefficients, if the regression models are considered, to vary in time).This corresponds to possible changes in the strength (or even in the direction) of the impact of a predictor on the considered commodity price.
Naturally, this problem can be tackled with the Bayesian methods, and indeed, Bayesian Model Averaging (BMA) and its slight modification, i.e., Bayesian Model Selection (BMS), have been recently used in various branches of economics with a success.In order to extend this method, recently, a new framework was proposed [4].Dynamic Model Averaging (DMA) allows for both time-varying: the state space of the model and its parameters (i.e., regression coefficients).Moreover, the final DMA forecast is computed as a weighted forecast from the predictions of all included models.However, the weights used are updated recursively with respect to the quality of forecast (i.e., predictive density) produced by the models in preceding periods.
A few years after presenting the theoretical foundations of this method, a rapid interest in it for the purpose of economics and finance can be observed.Of course, most of the published results provide arguments in favour of DMA, which can beat some alternative forecasts.Therefore, the first aim of this paper is to report a unified and consistent application of DMA to possibly large number of commodities.In addition, to provide some hints about the "calibration" of the initial parameters for this method.In particular, 69 time-series provided by The World Bank [5] were taken.The second aim of this paper is to compare DMA with the other, similar Bayesian methods.In other words, to verify whether DMA is superior to the previously proposed averaging method, and whether the model averaging itself improves the forecast accuracy in the case of commodity prices.Following Gargano and Timmermann [6], this study is based on macroeconomic and financial predictors.The second motivation behind using macroeconomic and financial predictors is to obtain a common set of predictors and use quite numerous time-series of commodity prices.
Indeed, this paper tries to fill some literature gaps.Contrary to previous DMA studies focusing on one approach to a relatively small number of independent variables (for example, [7][8][9], here, applying a similar pattern to possibly many commodities is attempted to see if some pattern will be recovered.However, this is not a somehow veiled data dredging.It is obvious that a more tailored data set of predictors can result in higher forecast accuracy.On the other hand, if certain research argues that some macroeconomic and financial variables (therefore common ones for different commodities) can serve as satisfactory commodities prices' predictors [6], then it is worth checking if DMA can produce more accurate forecasts in comparison to the existing methods.In other words, the results of this paper might provide some arguments that, indeed, DMA forecast combination scheme is highly beneficial.However, they might also bring some shadow on DMA and provide rather the following argument: the already reported success of DMA (see, for example [10][11][12]) is mostly due to the suitable choice of predictors (and analysed period), not due to the method itself.
In the next section, a brief overview of the results already obtained with DMA is presented.In addition, the short review motivating the selection of possible commodity price predictors is given.It is also discussed why model averaging can lead to better forecast performance.Next, data are described in details.Furthermore, a short overview of DMA and other methods used in this research is sketched.Next, the results are presented and discussed.Moreover, some preliminary results, motivating the selection of the initial parameters for the main DMA models are discussed.This is believed to be of some help for future researches based on DMA methods.For the reader's convenience, the list of abbreviations is given at the end of the paper.

Literature Review
This section is divided into two parts.First, which fields DMA has already been applied in is summarized.Next, a few recent research papers are presented with a stress put on the novelty of the methodology.Finally, which variables can be initially considered as potentially useful predictors in the construction of the econometric model of the selected commodities is presented.

Dynamic Model Averaging
DMA has already been applied to model various time-series and, in the case of commodities: copper, crude oil and gold.Aye et al. [13] used DMA to forecast gold price returns.Most of the predictors were chosen on the basis of a recursive principal component analysis.Both DMA and DMS were found to be superior to BMA and the naive forecast.In addition, Baur et al. [10] found similar conclusions.Risse and Ohl [14] compared the DMA improved by the Dynamic Occam's Window of Onorante and Raftery [15] with the original DMA and some other machine learning techniques.They found that such a modification produces the best forecasts of gold prices.
Buncic and Moretto [7] found that DMA and DMS produces more accurate forecasts than the historical mean and the naive forecast in case of copper price.They used as predictors both macroeconomic and financial variables as well as fundamental factors specific for copper.For example, supply and demand, returns from specific stocks (firms heavy depending on commodity market), etc. Drachal [8] found that, in case of crude oil prices, it is quite hard to produce a statistically significant more accurate forecast than the naive one and autoregressive models with DMA.However, these benchmark forecasts were also not found to be superior to DMA and DMS.On the other hand, Naser [11] used DMA with a large set of predictors, and concluded that DMA outperforms alternative models as well as forecasts based on future prices.
Baxa et al. [16] successfully applied DMA to forecast inflation in G7 countries.Similarly, inflation was modeled with DMA by Del Negro et al. [17].However, Di Filippo [18] concluded that, in the case of inflation, DMS produced better forecasts than DMA.In other words, model selection was more appropriate than model averaging.Ferreira and Palma [19] concluded that, in the case of modeling, the Brazilian inflation DMA behaves very well.These studies were usually done in the context of the generalized Phillips curve [20].Indeed, such studies were usually made in a general macroeconomic context.For example, the economic growth was also modeled with DMA [21,22].
DMA was also used in real property markets.Bork and Moller [23] used it (and also DMS) to forecast house prices in the US.Risse and Kern [9] found DMA a superior method for forecasting house prices in the Euro area countries.Wei and Cao [12] applied DMA to model house prices in major Chinese cities.They also implemented a slight modification of the original DMA method.They concluded that this method is more accurate than BMA, information-theoretic averaging, and that, indeed, the time-varying weights' scheme significantly improves the forecast quality.
De Bruyn et al. [24] applied DMA to forecast South African rand to the US dollar exchange rate and to the UK pound.The comparison was made with BMA and the random walk model.DMA was found to be superior to the benchmark models.Gupta et al. [25] examined with DMA the growth of China's foreign exchange reserves.DMA and DMS were found superior to the random walk, recursive OLS-AR(1), recursive OLS and BMA.
DMA was also found useful in modeling carbon prices [26].Liu et al. [27] used this method to forecast the realized range-based volatility.They concluded that time-varying weights indeed generate more accurate forecasts.Similar conclusions were derived by Wang et al. [28].Finally, Naser and Alaali [29] applied a DMA framework to model the US stock prices' returns.

Other Models
Indeed, the fact that forecast combination can lead to a more accurate forecast than the selection of one model is both an empirical and theoretically understood fact.The seminal paper in this context was published quite long ago [30].In case of commodities, the recent findings support averaging over various models [31] as well as the necessity to consider a possibly wide set of predictors [32,33].However, there are various methods of forecast combinations [34,35] and, yet, within the context of data-rich models, it is natural to focus on the Bayesian methodology [36,37].This is, for example, due to the fact that, if the number of predictors exceeds the number of observations, then the conventional methods, like OLS are no longer meaningful.On the other hand, the Bayesian formulas are still reasonable in such a case.For example, recently, Lee and Huh [38] proposed some improvements in choosing informative priors for Bayesian models of crude oil price.Yin et al. [39] found that replacing the predictive densities with modified hit ratios in DMA weights' construction can result in forecast accuracy improvement for oil prices.
Of course, the Bayesian approach is not the only one, which has been applied to commodity prices.However, except for purely classical and fundamental methods like multilinear regression models, etc., various improvements were applied.This was done in order to capture the specific features of the commodity market.For example, Kriechbaumer et al. [40] joined a well-known ARIMA modeling with a wavelet decomposition approach to forecast metals' prices.The motivation behind using the wavelet analysis was to encompass in the model the cyclical behaviour of metals prices.The obtained framework allowed for increasing the accuracy of one-month ahead forecasts in comparison with the basic ARIMA models.Wang and Sun [41] studied the relationship between oil prices and economic activity and political tension with a structural equation model.
In addition, VAR/VECM methods were applied in the case of bidirectional relationships and the presence of many variables influencing each other.For example, Cross and Nguyen [42] focused on links between China's economic growth and global oil market fluctuations.They modified the basic VAR methodology to include time-varying parameters' VAR.Gangopadhyay et al. [43] used VECM methodology to model gold prices in the Indian market, investment decision and inflation.
Gil-Alana et al. [44] used the fractional integration modeling framework to identify structural breaks in metals prices.Kim and Jung [45] improved the predictive ability of standard forecasting models of crude oil prices by a functional partial least squares approach.This was done to deal with multicollinearity of predictors.The framework was compared with principal component regression and least absolute shrinkage and selection operator models.
In the case of a nonconventional approach, neural networks and various types of machine learning techniques are also popular.The review of such methods can be found, for example, in the paper by Hamdi and Aloui [46].Liu et al. [47] proposed a machine learning algorithm with a decision tree.This method was able to significantly improve copper price forecasting.Zhao et al. [48] focused on the problems with uncertainty about the predictors of oil price.The study was based on almost 200 time-series.They proposed a certain deep learning ensemble approach.They joined an advanced deep neural network model to capture the nonlinearities and an ensemble method to generate multiple training data sets.The implemented algorithm was able to outperform the alternative models.

Commodities Prices Predictors
In the case of particular commodities prices' predictors this research follows Gargano and Timmermann [6].They selected various macroeconomic and financial predictors, which can be common for different commodities.In other words, they were using predictors like interest rates, term spreads, industrial production growth, etc., whereas no specific predictors suitable for just one particular commodity (like, for example, production quotas) was used.Their selection and construction of predictors was much influenced by the studies on stock prices and equity premium forecasting [49].Of course, this is some kind of a limitation.Indeed, the fundamental factors, for example, inventory quotas, are known to have a significant impact on commodities prices [50].Nevertheless, the narrowing to macroeconomic and financial predictors is treated as another modeling approach, rather than a sign of ignoring the importance of other predictors.Moreover, such an approach is quite often adapted [51].
It has already been tried to find such common predictors for various commodities in general.For example, Kagraoka [52] within a generalized dynamic factor model concluded that the U.S. inflation rate, the world industrial production, the world stock index and the price of crude oil impact commodities prices.Such methodology was also used by Lübbers and Posch [53] in the analysis of futures prices.
However, even based on a different methodological approach Arango et al. [54], as well as Byrne et al. [55] provided arguments that interest rates significantly impact numerous commodities prices.Alam and Gilbert [56] within a VAR framework concluded that the impact of monetary policy, global economic conditions and U.S. dollar exchange rates are significant for agricultural commodities.However, this study was based also on commodity-specific predictors like inventories quotas, etc.The motivation of this study was similar to the one presented herein: to analyse a big data set of commodities and find common predictors.In case of analysing the impact of exchange rates on commodities prices, a more detailed research was done by, for example, Chen et al. [57].
Much research was also devoted to links between commodities spot and futures prices [58].However, Hong and Yogo [59] concluded that open interest can serve as an even better proxy than future prices itself.Chen [60] provided arguments that commodity-sensitive stock prices contain an important information for forecasting commodities prices.Indeed, such predictors were used in the already mentioned DMA study of copper prices by Buncic and Moretto [7].
Finally, it should be stressed that, except for various already existing methodologies and continual improvements and development of new ones, forecasting commodities prices remains a very difficult task [61].
Summarizing this considerations, it can be seen that there is a recent rapid increasing interest in the methods combining time-varying parameters and uncertainty about the predictors (or dealing with data-rich models).Economically it is motivated by the increasing complexity of the markets, and time-varying relationships between predictors and commodity prices: both in a sense of the strength and the direction, and even in a sense which predictors are important in particular moments of time.From the mentioned literature of these novel methods, a certain group can be classified, i.e., BMA, BMS, DMA and DMS.All of them emerge from time-varying parameters (TVP) regressions with the Kalman filter.The differences between these methods lie in whether from numerous models the final forecast emerge through the model averaging or model selection-in addition, in case of estimation of the model's performance, how the weights are updated.It seems that currently there is no paper presenting a unified and consistent approach comparing all these methods for a relatively large set of independent variables (i.e., various spot commodities prices).The current paper is a try to fill this gap.
In particular, it was checked whether DMA indeed outperforms these other Bayesian methods.Secondly, it was checked whether these methods and time-varying regression with all predictors produce significantly more accurate forecasts than ARIMA models or the naive forecast.Once again, it is stressed that the aim was not to focus on some similar set of commodities, but to consider a possibly wide set of commodities and apply a unified methodology to them and see if this methodology leads to some improvement in forecasting.For certain conventional methods, such an approach was reported by Gargano and Timmermann [6].

Data
The monthly data between January 1984 and October 2017 were taken.The frequency was chosen as a kind of compromise.Indeed, many macroeconomic data are available even only in quarterly frequency.Therefore, more frequent than monthly data would result in problems with macroeconomic data.On the other hand, too high frequency would need a separate analysis of particular problems connected with very short periods.For example, it can be expected that, in monthly frequency, short-term speculative behaviour is flattened.The period before 1984 was not included due to two facts.First, many commodities prices before that time were not available or their volatilities were so small that the analysis would not be interesting.Secondly, as it is explained further, some variables are missing for certain periods before 1984.Resolving this problem-how to deal with missing observations-would need another discussion and may not seem worth the current research.
Commodities prices were obtained from The World Bank [5].After trimming the time-series to the above-mentioned period, the remaining time-series which contained missing observations were excluded.As a result, 69 time-series remained.The full list of the analysed commodities is given in Table 1.
Table 1.Commodities abbreviations.Details can be found in the original source: The World Bank [5].

IBEVERAGES
Beverages index includes cocoa, coffee and tea.The predictors were mostly based on the already discussed papers with a great emphasis on the work of Gargano and Timmermann [6].Dividend to price ratio (dpr) was taken as the difference between the logarithm of dividends and the logarithm of prices.These data were obtained from Schiller [62].In particular, prices were taken as S&P Comp.Dividends were transformed to 12-month moving sums.Treasury bill rate (str) was taken as the 3-month treasury bill: secondary market rate (TB3MS) from FRED [63].Long-term rate of bonds (ltr), because of the lack of exact recent data and problems with methodology, was taken as long-term government bond yields: 10-year, main (including benchmark) for the United States (IRLTLT01USM156N).Term spread (ts) was taken as the difference between the long-term rate of bonds and treasury bill rate.Default return spread (drs) was taken as the difference between long-term corporate bonds yield and treasury bill rate.Long-term corporate bond yield was taken as the index based on bonds with maturities 20-years and above (AAA).Inflation (cpi) was taken as the logarithmic difference of the CPI (Consumer Price Index CPIAUCSL).Industrial production growth (ip) was taken as the logarithmic difference of industrial production (INDPRO).Money stock (m) was taken as the logarithmic difference of M1 money stock (M1SL).In order to measure the economic growth, the classically used GDP was not appropriate because it is available in quarterly-not monthly-frequency.However, it can be found in literature that Kilian [64] global economic activity index can serve as a good proxy (kei).All of these data, as well as the unemployment rate (une, UNRATE), were obtained from FRED [63].Following Chen et al. [57], the logarithmic differences of Australian dollar (aud) and Indian rupee to the US dollar (inr) exchange rates were taken (CCUSMA02INM618N and CCUSSP01AUM650N).These currencies were selected as exchange rate representatives because the respective countries are the largest exporters of industrial and agricultural commodities [6].

IFOOD
Open interest (op) was obtained from U.S. Commodity Futures Trading Commission [65] and directly from the website of Yogo [66].During an 11-month period from January 1982 to November 1982, U.S. Commodity Futures Trading Commission did not collect data, due to budgetary reasons.Because the monthly growth rate of open interest is noisy, it was smoothed by taking a 12-month geometric mean in the time-series [59].For the period before 2009, data by Yogo [66] were taken.Since 2009, data from U.S. Commodity Futures Trading Commission [65] were taken.As these data are provided in higher than monthly frequency, mean values from every month were computed.
Behaviour of commodities prices through some one, general measure (tr) was proxied by logarithmic differences of Thomson Reuters Equal Weight Commodity Index [67].By equal weighting, this index, featuring 17 commodities, provides quite even distribution into the major commodity sectors, rather than overweighting in certain sectors such as energy or agriculture.
Prices of the commodities were transformed to logarithmic differences.The abbreviations of commodities are explained in Table 1.
It should be mentioned that these predictors were already used in a DMA-type scheme applied to modeling selected energy commodities prices [68].However, in that study, these predictors were only a part of a much bigger collection of various other predictors.In addition, in that study, the collection of predictors was individually tailored for each commodity.For example, predictors indicating supply and demand forces were included, whereas, herein, they were taken as an optimal collection of predictors common for all analysed commodities.Finally, here the original DMA scheme is used [4], whereas the mentioned study (due to the high number of predictors inserted) used the Dynamic Occam's Window of Onorante and Raftery [15].Moreover, the mentioned study covered a different period (mostly due to the fact that it was also using Internet search queries available since 2004).
The descriptive statistics are reported in Table 2. ADF, Phillips-Perron and KPSS stationarity tests are reported in Table 3.No important discrepancies were found.Only in the case of dpr and une, stationarity cannot be assumed.kei is stationary by the construction [64].The reported opposition is due to analysing just a cut part of this time-series.However, it should be stressed that, in case of DMA, stationarity is not required, as this model is time-adapting.As it was already described, variables were taken in their differences mostly due to the convention, and to provide comparable outcomes with the previously mentioned literature.On the other hand, it is quite desirable due to computational issues in DMA that variables are of similar size.Therefore, kei was divided by 100.

Methodology
All the methods used herein are well explained in the original papers.Therefore, to keep both the clarity of the explanation and the reader's convenience, only a short sketch is presented below.The computational details can be recovered from the cited sources, while the emphasis is put on the general idea of the methods.All computations were done in R Core Team [69] with the help of some additional packages [70,71].

Dynamic Model Averaging (DMA)
DMA was proposed by Raftery et al. [4].Suppose that there are m potential predictors for the independent variable y t .In this research, y t denotes the logarithmic differences of the selected commodity price and m = 15 because the possible predictors are: lagged one period back dpr, str, ltr, ts, drs, cpi, ip, m, kei, une, aud, inr, op, tr and the first lag of y t .The lag of y t was taken because of the serious arguments in the literature and significant improvement observed in pre-simulations.On the other hand, more lags were decided not to be included because one of the alternative models is ARIMA.Thus, it was tried to check if there is some trade-off between the inclusion of other predictors and autoregressive lags.
Out of these m = 15 predictors, K = 2 15 = 32768 linear regression models can be constructed.Therefore, let x (k) t be the vector of predictors in k-th model, with k = 1, . . ., K.Then, the state space model of DMA is given by where θ t is the vector of regression coefficients.It is assumed that errors follow the normal distribution, i.e., t ).All of these K models are estimated as Time-Varying Parameters regressions (TVP) with the help of the Kalman filter [4].Due to the computational issues, a certain forgetting procedure is used in updating W (k) t .This needs a forgetting factor λ ∈ (0, 1] to be specified [4].From the interpretative point of view, this factor is responsible for the allowed variability of regression coefficients, and it corresponds to the effective rolling window size of 1  1−λ .Lower values correspond to higher variability of regression coefficients. Variance V (k) t can be updated by the recursive moment estimation [4].However, in the case of the suspicion of ARCH effects in residuals, some authors suggest using the Exponentially Weighted Moving Average (EWMA) method [72].Here, both approaches were used and compared.
Further in this paper, by TVP, it will be denoted the model for all possible m = 15 predictors estimated in this way.Equivalently, it can be seen as the above DMA model, but reduced to K = 1 and keeping only this k-th model which contains all m = 15 predictors.Now, having estimated all these K time-varying parameters models, the DMA introduces a set of two weights, which are recursively updated.Therefore, this method to forecast at time t uses only the information available up to time t − 1.
The weights are given by and , where α ∈ (0, 1] is a (second) forgetting factor, and f k (y t |Y t−1 ) is the predictive density of k-th model at y t .In order to start the above recursive computations, some initial values have to be set.Assuming that initially all K models are equally "good" (i.e., using the noninformative prior), it is set π 0|0,k = 1 K and θ (k) 0 = 0 for all k = 1, . . ., K. V (k) 0 should be initialized with respect to the magnitude of the variables used.It should be big enough to allow for the changes in parameters.On the other hand, too big V (k) 0 and small λ can result in catching noise rather than signal.From standard deviations reported in Table 2, it seems reasonable to set V (k) 0 = 1 for all k = 1, . . ., K. Unifying these values for all models used in averaging also has some computational advantage: Finally, the DMA forecast is given by Notice that, if α = 1, equal weights are produced.Secondly, if α = 1 = λ is taken, then DMA reproduces in a (computationally efficient) way the Bayesian Model Averaging, further in the text denoted simply as BMA [4].Usually, it is suggested to set α = 0.99 = λ in DMA [4,72].However, in some cases, more accurate forecasts can be generated, if different pairs of these parameters are taken [8].Therefore, in this research, all possible pairs α, λ = {1, 0.99, 0.98, 0.95, 0.90} were considered.More frequent search around α = 1 = λ, and less-around α = 0.90 = λ-was decided, because the previous simulations suggest that forecast accuracy can change much when switching between higher values of forgetting parameters, whereas, for lower values, the change is rather small [73].
At this point, it can be mentioned that DMA is an extension of BMA in the following sense.First, it allows for time-varying parameters.Secondly, it allows the weights in model averaging to vary in time.Equivalently, DMA can be seen as dynamic (i.e., with time-varying weights) averaging over TVP models.

Dynamic Model Selection (DMS)
Of course, it is easy to replace the above averaging scheme with model selection [72].Simply, Equation ( 2) can be modified to where h t denotes that model (out of K ones), which corresponds to the highest π t|t−1,k .Such a scheme is called the Dynamic Model Selection (DMS).Similarly, as with DMA and BMA, if in the above scheme α = 1 = λ is taken, then the resulting model is called the Bayesian Model Selection (BMS).

Median Probability Model (MED)
However, Barbieri and Berger [74] noticed that the selection of the model with the highest posterior probability π t|t−1,k is not always desirable.Indeed, it is under certain very general conditions, but it is still optimal only in the case of only two models competing.Secondly, this is so in the case of linear models having orthogonal design matrices.Therefore, they proposed Median Probability Model (MED).
First, relative variable importance needs to be computed for every predictor.This is defined as the sum of posterior probabilities π t|t−1,k for those models out of all K, which contain a particular predictor.Next, the model which contains as the predictors exactly those whose relative variable importances are equal to or greater than 0.5 is selected.Of course, relative variable importance can vary in time, therefore the selected model also varies in time.
As previously, if in the above scheme α = 1 = λ is taken, then the resulting model is called the Bayesian Median Probability Model (BMED).It should be noticed that models pre-named "Bayesian" are the special cases of their more general encompassing combination schemes, just that their forgetting parameters are α = 1 = λ.This name-based emphasis is done to stress the particular forgetting factor values, corresponding to equal-weighting in Equation ( 1) and no variability of regression coefficients assumed.

Evaluation of Models
To summarize, for each of the commodities, DMA, DMS and MED models were estimated.These models were estimated with all possible pairs α, λ = {1, 0.99, 0.98, 0.95, 0.90}, i.e., in 25 different versions.Moreover, as described before, each of these models were estimated with AR(1) term and without it, and with the mentioned EWMA procedure or with the recursive moment estimation.For each of the commodities, TVP models with λ = {1, 0.99, 0.98, 0.95, 0.90} were also estimated.They were also estimated with and without AR(1) terms, and with EWMA and recursive moment estimation.This resulted in 320 models being estimated for each of the commodity.
As for the conventional alternative forecasting model, two choices were done: Auto ARIMA model by Hyndman and Khandakar [71] implemented in a recursive way, and the naive method, i.e., the observation from time t is taken as the forecast for time t + 1 (NAIVE).The model of Hyndman and Khandakar [71] is choosing AR and MA order dynamically in each step by starting from the one term and checking if more terms improve the selected criterion (in this case: Akaike Information Criterion).To guarantee stopping, no more than five AR terms and 5 MA terms are considered.
The evaluation of the above described Bayesian schemes was done in the following way.First of all, the forecast accuracy was measured by RMSE (Root Mean Squared Error).The estimated models were divided into two groups.The first was consisting of DMA, DMS and MED models.In other words, they are the models based on some kind of a model combination scheme.The second group was consisting of TVP, ARIMA and NAIVE models, i.e., models not based on any model combination scheme.
In order to compare the forecast accuracy generated by two competing models, the Diebold-Mariano test (DM) was used [75].This test was chosen because of its relative little assumptions and popularity.Its null hypothesis is that the two forecasts have the same accuracy.
This test was used to compare forecasts from the most accurate models representing the two above described groups.
Secondly, for each commodity, the model minimising RMSE was chosen out of the various versions of the DMA model.It was compared with ARIMA and NAIVE models, i.e., with the very common benchmark models.
The aim was to statistically check if DMA can produce more accurate forecasts than the conventional methods and if the considered Bayesian model combination schemes in general have some forecast accuracy advantage over the conventional methods.
It was also analysed if a certain pattern in forecast errors depending on the forgetting factors can be identified.
Finally, it should be remembered that the used methods were modeling the logarithmic differences of commodities prices.However, RMSE and DM tests were applied to errors from the forecast of commodities prices (i.e., the direct outcomes from the considered models were transformed).

Results and Discussion
This section is devoted to, at first, presenting the preliminary results and observations derived from the summary of the estimation of the considered models.Indeed, as the selected Bayesian techniques rely on recursive computations and a large quantity of models, the computational issues result in certain obstacles.For example, the estimation of one DMA model for one particular commodity took on an average computer no more than few minutes.It should be noticed that adding one extra predictor doubles the time needed for computations; therefore, the computational issues can easily switch from negligible to insuperable ones.As mentioned in the previous section, for all 69 commodities, a total of 22,218 models were estimated.The estimation was speeded up by the use of both the cloud computing and the parallel computations.Nevertheless, the whole estimation took over a day, despite the use of Amazon Web Services (AWS).Therefore, due to the number of the estimated models, a certain degree of subjectivity in the summary outcomes presentation is a must (i.e., how to present them, and on which put the emphasis).
The second part (i.e., Section 5.2) is devoted to the main results.In other words, RMSEs (Root Mean Squared Errors) are reported.Additionally, the forecasts' accuracies from different models are statistically tested.It is also described whether some type of a model combination scheme seem to play the dominant role in producing most accurate forecasts.
The last part (i.e., Section 5.3) is devoted to a short discussion of the analysis, which seemed interesting after looking at the main results.As just mentioned, a large number of models was estimated, therefore it is not possible to present quite extensive analysis.In the first two parts, the outcomes are presented and discussed within the context of the aims of this research, as they were stated in the beginning.However, after the estimation, some other comparisons become interesting.For example, forecast accuracy of DMA with α = 0.99 and λ = 1 was compared with forecast accuracy of BMA (i.e., DMA with α = 1 = λ).The importance of these two models become evident after the knowledge of the outcomes reported in the first and the second part.
Another example of subjectivity is that, due to the large amount of models if, for example, a DMA model combination scheme was analysed and the model minimising RMSE was sought, then all variations (with all considered forgetting factors, both with EWMA and recursive moment estimation method used, with and without AR(1) term) of the models were put into one group.Some readers might be willing to, for example, analyse all of these models in separate groups.However, as several variations were considered, the number of such choices is quite big, and reporting all of that would obscure the outcomes.
The considered Bayesian models require some initial number of observations to "learn".In other words, for the first observations, some chaotic outcomes are usually generated as the initial parameters tune up to the upcoming new observations.For example, as explained in the previous section, all models are initially set with zero regression coefficients.Therefore, the first 100 observations, i.e., approximately 1/4 of all observations were treated as in-sample.All reported outcomes (RMSE, p-values of the Diebold-Mariano test, etc.) are based on the out-of-sample consisting of the next 3/4 of observations.It should be stressed that the considered models are recursive ones, therefore their predictors coefficients are updated with each extra observation.Therefore, these models still evolve during the out-of-sample period.The in-sample period was chosen to cut off observations during which models try to catch the signal.This method is well known and is sometimes called evaluation on a rolling forecasting origin, walk forward validation, forward-chaining validation, or time-series cross-validation [76].
All estimated models were divided into the following groups: DMA, DMS, MED, TVP, ARIMA and NAIVE.Models in different versions due to the choice of forgetting factors, method of variance updating: EWMA or recursive moment estimation, with and without AR(1) term, if not stated otherwise, were not analysed separately.In particular, BMA was treated as a special case of DMA, BMS-a special case of DMS, etc.

Descriptive Analysis of the First Results
First of all, it was observed that, in 87% of cases, DMA produced smaller RMSE than the NAIVE model.Assuming a 5% significance level, according to the Diebold-Mariano test, these results were statistically significant in 83% of cases.(The null hypothesis was that both methods have the same forecast accuracy.The alternative is that DMA is more accurate than the NAIVE method.)If the benchmark model was taken as the ARIMA, then DMA produced smaller errors in 80% of cases, and they were statistically significant in 78% of cases.Such results can serve as an argument that, in general, despite its quite high complexity, DMA is an interesting alternative for simple forecasting models.
However, if the considerations are restricted only to DMA, DMS and MED models, then α < 1 allowed to minimise RMSE for 42% of DMA models, 54% of DMS models and 51% of MED models.If the considerations are restricted only to DMA, DMS, MED and TVP models, then λ < 1 allowed to minimise RMSE for 9% of DMA models, 12% of DMS models, 14% of MED models, and 6% of TVP models.These outcomes suggest that λ = 1 is very strongly preferred for minimising RMSE, whereas setting the value of α is not so obvious.
Indeed, narrowing the considerations only to DMA, DMS and MED models, the combinations of forgetting factors such that α, λ < 1 minimised RMSE only in 1% of DMA models, 3% of DMS models, and 6% of MED models.Therefore, even though the Bayesian model combination schemes seem to be beneficial as reducing the forecast error, this is mostly due to the time-varying weights construction.As λ = 1 is the most often preferred value, the regression coefficients' variability is not so beneficial for forecast accuracy.
The DMA model with both forgetting factors being strictly less than 1 was the model for POTASH.Interestingly, this model generated a significantly more accurate forecast (at 5% significance level) than the NAIVE and ARIMA models.It was also the one which (for this commodity) minimised RMSE out of all the considered types of models.However, this model did not generate a statistically significant more accurate forecast than TVP model.
The preferences of forgetting factors can be visualised on a bubble plot in Figure 1.Clearly, it can be seen that, in the case of DMA, DMS and MED models, λ = 1 is preferred if minimising is the aim.In many cases, α = 1 = λ is also preferred, meaning that BMA, BMS or BMED is preferred.However, in many cases, α < 1 remains the preferred option.More discussion about this result is postponed for the end of this section.The next observation is that the model minimising RMSE was the one with the AR(1) term added in 75% cases if the DMA type was considered.For DMS models, it was 62%, for MED-68%, and for TVP-71%.This means that, in the majority of cases, it is desirable to add an autoregressive term; however, in a reasonable number of cases, it is not.
EWMA (Exponentially Weighted Moving Average) variance updating method was present only in 20% of DMA models which had the smallest RMSE.In the case of DMS models, it was 35%, for MED models-38%, and for TVP models-35%.This suggests that, if minimising errors is the aim, then considering EWMA method is not desirable, and one can stick to the recursive moment estimation (which was proposed in the original paper introducing the DMA method [4]).
Anyway, the idea behind the EWMA method is to erase the possible ARCH effect in residuals.Performing LM-ARCH test with a 5% significance level indicated that the selected models do not have ARCH effects in residuals in 35% of cases of DMA type models, 32% of DMS type models, 33% of MED type models and 28% of TVP type models.Fortunately, as explained in the previous section, the considered Bayesian model combination schemes are recursively updating the error term in equations, so this is not an important obstacle in the case of forecasting.It should be stressed that models were chosen due to minimisation of RMSE, and the verification of ARCH effects was done as an additional check.In other words, for the purpose of this research, the model with ARCH effects, but with smaller RMSE was preferred, even if some model did not possess ARCH effects, but would have larger RMSE.
Of course, it is interesting to estimate how often the lack of ARCH effects could be assumed at a 5% significance level, and, simultaneously, the chosen model was the one with the EWMA estimation method.For DMA type, it was in 10% of cases, for DMS-12%, for MED-14%, and for TVP-10%.Therefore, in line with the previous observation, it can be stated that EWMA allowed for erasing ARCH effects efficiently only in 1/3-1/2 cases in which this method was applied.
To sum up the above results, the following is indicated: first, that DMA is an interesting alternative method (if compared with ARIMA or NAIVE models); secondly, that λ = 1 is usually preferred, but α < 1, meaning that most gains in forecast accuracy is due to the certain weighting procedure-not time-varying regression coefficients; third, that the EWMA method is not necessarily "better" than the recursive moment estimation; fourth, that adding the autoregressive term usually (but not always) leads to smaller forecast errors.

Main Results
Table 4 presents the main outcomes for various model combination schemes.In other words, Table 4 reports the normalized RMSE (nRMSE) from all estimated model.As each type of model was estimated in various versions, in Table 4, the outcomes are present for the one which minimised RMSE for the given type of a model (i.e., out of DMA type models, out of DMS type models, etc.).The normalization of RMSE is done simply by dividing by the mean value of the forecasted time-series.It can be seen (Table 2) that the orders of magnitudes of forecasted commodity prices are quite different.It is easy to switch between RMSE and normalized RMSE having the information from Table 2 and Table 4.However, normalized values seem easier for interpreting the forecasting benefit of each of the models.
Generally, the normalized RMSE is around 0.07.This means that quite accurate forecasts can be produced.In a few cases, very small errors were produced (0.01) or very high errors were produced (0.19).However, for the given commodity, the errors from different models do not differ much.
More information can be derived from Table 5, which presents the results of the Diebold-Mariano tests.The first columns of this table are devoted to comparison of the accuracies of forecasts generated by the DMA method and the NAIVE and the ARIMA models.As a reminder, the null hypothesis of the test is that both forecasts have the same accuracy.For each of the commodities, whether DMA produced smaller RMSE than the NAIVE, and whether it produced smaller RMSE than the ARIMA, was checked.If so, then the alternative hypothesis for the test was taken that the DMA forecast is more accurate than the one from the NAIVE (or from the ARIMA).Otherwise, the NAIVE forecast (or the ARIMA forecast) is more accurate.Clearly, it can be seen that, in the majority of cases, there is a strong evidence to treat DMA as a significantly more accurate forecasting method.
The notation X < Y in Table 5 represents the alternative hypothesis for the Diebold-Mariano test, and it should be read: "The forecast generated by model X is more accurate than the forecast generated by model Y".Similarly, the notation X > Y should be read: "The forecast generated by model Y is more accurate than the forecast generated by model X".
Finally, the last two columns in Table 5 are devoted to choosing the "best" model out of all considered ones, for the given commodity price.The "best" model is understood to be the one that generated the smallest RMSE.Next, as already mentioned in the Methodology section, the estimated models were divided into two groups.The first contained DMA, DMS and MED models and the second the TVP, ARIMA and NAIVE.In each group, the representatives with smallest (in the given group) RMSE were chosen.Then, the Diebold-Mariano test was performed to check that, indeed, the forecast generated by one of these representative models is significantly more accurate than the forecasts generated by the second representative model (from another group).The summary of these considerations is presented in Table 6.combination schemes, but they cannot generate significantly more accurate forecasts than the ARIMA or the NAIVE models.On the other hand, if the Bayesian model combination scheme generated such a significantly more accurate forecast, then it was the model with α < 1 or λ < 1.Again, looking at Figure 1, it can be expected that this was rather due to α < 1 than to λ < 1.
Finally, the NAIVE model generated the smallest RMSE in only 6% of cases.In none of cases did it do so significantly.The ARIMA model generated the smallest RMSE in 16% of cases.In only 4% of cases did it do so, and, simultaneously, this result was significant.
It can be concluded that the considered Bayesian model combination schemes can result in significantly more accurate forecasts.Secondly, both model averaging (like in DMA), and model selection (like in DMS and MED) are efficient methods.It should be stressed that, due to Equation ( 1) and the way the weights are updated recursively, the forgetting factor α plays an important role also in model selection schemes.Its role is connected with the way that the past information is treated.Roughly speaking, the information (how the model behaved) from i periods back is given α i weight [7,10].
Those figures present normalized RMSE from DMA models with an AR(1) term and with recursive moment estimation used for variance updating.In Figure 2, various α = λ are considered.It can be seen that, under such a restriction, higher values of forgetting factors are preferred.A similar conclusion is valid if α = 1 is fixed and λ is changing, as it can be seen in Figure 3.However, in Figure 4, it can be seen that, if λ = 1 is fixed, in most cases, changing the value of α does not lead to important changes of normalized RMSE.Anyway, in a few cases, the smaller values of α are preferred.

Further Remarks
By looking at rows in Table 4, it could be seen that the average gain in lowering normalized RMSE by applying some of the Bayesian model combination scheme over TVP, ARIMA or NAIVE models is 0.002, which is very small.On the other hand, from Table 5, it was seen that Bayesian model combination schemes very often produce significantly more accurate forecasts, so this gain is important.
It is also interesting to come back to Figure 1 and discussion about the preferred forgetting factors.Usually, if considerations are narrowed just to DMA, DMS and MED models, then the combination of α = 1 = λ minimises errors.However, when considerations are narrowed to those of DMA, DMS and MED models, which generate significantly more accurate forecasts than TVP, ARIMA and NAIVE, then λ = 1 is preferred, but α < 1.Therefore, it is interesting to analyse whether the difference between these two types of models is statistically significant.For this, DMA models with AR(1) term and based on recursive moment estimation for variance updating were compared-in other words, those models with α = 1 = λ (BMA) with α = 0.99 and λ = 1 (DMA).At the 5% significance level, according to the Diebold-Mariano test, DMA generated a more accurate forecast than BMA in 14% of cases, at a 10% significance level-in 23% of cases.
Actually, in Figure 1, some interesting patterns can also be seen for fixed α = 1.From the interpretative point of view, the combination α = λ is also quite natural.Therefore, Figures 2-4 present normalized RMSE for all analysed commodities prices with different combinations of forgetting factors α and λ.

Conclusions
It was found that, in general, the Bayesian model combination schemes like Dynamic Model Averaging (DMA), Dynamic Model Selection (DMS) and Median Probability Model (MED) in most of the cases produced significantly more accurate forecasts than the ARIMA, NAIVE or TVP (Time-Varying Parameters regression) models.Interestingly, the MED model in many cases generated smaller RMSE (Root Mean Squared Error) than DMS.
It was found that, indeed, the schemes based on recursive estimation like in DMA, allowing for time-varying regression coefficients and time-varying models' weights, resulted in smaller RMSE and more accurate forecasts.However, the more thorough analysis suggested that the forecasting benefit from this approach is rather due to time-varying weights, not time-varying regression coefficients.From the economic point of view, this might be interpreted that, in different periods, it is rather the model itself (the collection of predictors) which really matters as opposed to the strength of the relationship between predictors and the given commodity price, which should be modeled more carefully.
Indeed, all considered modeling schemes used in this research were based on the recursive estimations.In other words, forecast at time t was based only on the information available up to the time t − 1.Additionally, all information available up to time t − 1 was used, so, as the new information become available, it was used to update the model's parameters.
On the other hand, in a numerical sense, the decline in RMSE due to the use of the applied schemes was not very high.Anyway, according to the Diebold-Mariano (DM) test, the applied schemes generated significantly more accurate forecasts.
This research was not based on some index as a proxy of commodities prices.Instead, 69 individual time-series were modeled.Generally, it was confirmed that adding the autoregressive first lag as a predictor improves the forecast accuracy.The advantage of Exponentially Weighted Moving Average (EWMA) in variance updating over the originally proposed recursive moment estimation was not confirmed.
Summarizing the outcomes, the considered Bayesian model combination schemes were found useful, i.e., generating more accurate forecasts in most of the cases.

Figure 1 .
Figure 1.Combination of forgetting factors for DMA, DMS and MED models, which minimise RMSE.Circles are proportional to the number of models.

Figure 2 .
Figure 2. Normalized RMSE for all commodities vs. different values of forgetting factors.

Figure 3 .
Figure 3. Normalized RMSE for all commodities vs. different values of forgetting factors.

Figure 4 .
Figure 4. Normalized RMSE for all commodities vs. different values of forgetting factors.

Table 1 .
Food index includes fats and oils, grains and other food items.IFATS_OILSFats and oils index includes coconut oil, groundnut oil, palm oil, soybeans, soybean oil and soybean meal.IGRAINSGrains index includes barley, maize, rice and wheat.Cont.IENERGYEnergy index, a Laspeyres Index with fixed weights based on 2002-2004 average developing countries export values, for coal, crude oil and natural gas.