Forecasting Commodity Prices: Looking for a Benchmark

: The random walk, no-change forecast is a customary benchmark in the literature on forecasting commodity prices. We challenge this custom by examining whether alternative models are more suited for this purpose. Based on a literature review and the results of two out-of-sample forecasting experiments, we draw two conclusions. First, in forecasting nominal commodity prices at shorter horizons, the random walk benchmark should be supplemented by futures-based forecasts. Second, in forecasting real commodity prices, the random walk benchmark should be supplemented, if not substituted, by forecasts from the local projection models. In both cases, the alternative benchmarks deliver forecasts of comparable and, in many cases, of superior accuracy.


Introduction
Commodities play an important role in the global economy, whose 5% share is made up by just the oil and natural gas industry, not to mention the significant contribution of other energy, metal, mineral and agriculture commodities. It is, therefore, evident that commodity prices are the key drivers of inflation, economic activity or current account balances. At an individual level, they influence the costs of living of households around the globe, both directly (e.g., through petrol prices at gas stations) and indirectly through their impact on other consumer goods prices (through their production costs). At a firm level, commodity prices significantly influence profits. For instance, in the airline industry, with profits depending heavily on jet fuel prices, oil price forecasts are used to set tariffs and assess aircraft purchasing decisions. In turn, car manufacturers use oil price forecasts to design new products with a focus on fuel economy, whereas the financial industry uses commodities to hedge against inflation [1,2]. For the above reasons, understanding commodity price dynamics and the ability to formulate reliable forecasts are important to many economic agents. For companies, they would help in assessing strategic policies or investment decisions with a long-term impact. For countries where commodities are important sources of export revenues, price forecasts would help predict their budget balances. Ultimately, for central banks, they would improve the assessment of the future path of inflation, GDP or external imbalances, hence allowing for a better monetary policy conduct.
A question of whether it is possible to forecast commodity prices reliably is the subject of a long-standing debate in the economic literature. It turns out that commodity prices are more volatile than stock prices or exchange rates, hence accurate forecasts are considered to be a real challenge. So far, various methods have been compared to the simple random walk (no-change) forecast in out-of-sample forecasting competitions. The results of numerous studies show that, for selected time spans and commodities, some methods deliver significantly better forecasts than the random walk model. However, these studies are usually mute on whether the random walk is indeed a tough benchmark to beat in a forecasting competition.
Instead of searching for the best forecasting method, we ask whether the random walk is a proper benchmark in commodity price forecasting. We address this issue by comparing two reasonable alternatives. For nominal commodity prices and short-term horizons, we compare naive forecasts to those based on futures prices. For real commodity prices and longer horizons, we use local projection models that assume reversion towards the recursive sample mean. Our analysis is conducted for four groups of commodities: energy, precious and industrial metals, and agriculture products. We also review the literature supporting futures prices and mean-reversing models as additional benchmarks to the random walk model.
The key message of our analysis is twofold. In nominal commodity price forecasting, investigated methods should be compared to both the random walk and futures-based forecasts, whereas in real commodity price forecasting, the random walk and local projection models should be used as benchmarks.
The rest of the paper is structured as follows. Sections 2 and 3 discuss nominal and real commodity price forecasting competitions. Section 4 concludes.

Futures Prices as a Benchmark for Nominal Prices
In this section, we argue why futures prices could and should be used as an additional benchmark in nominal commodity price forecasting. We start by reviewing the literature that explains the joint dynamics of spot and futures prices, then we focus on studies using futures prices in forecasting competitions. We finish the section by presenting a comparison of futures-based forecasts to those from the random walk model and by discussing additional topics, such as nonlinear dynamics of commodity markets.
The attempts to explain the joint dynamics of commodity spot and futures prices go back to the development of the theory of storage by Working [3], which tells that the spread between the spot and futures prices, i.e., "the basis", is equal to the sum of three factors: the interest rate, the cost of storage and the convenience yield, defined as an implied return on holding inventories of a commodity. Depending on the value of the convenience yield, the market might be in one of two regimes: contango, when future prices are higher than the spot price, and backwardation, when futures prices are lower than the spot price. The theory of storage claims that, when the supplies of a commodity are high and working inventories are low, the market is in contango, volatilities of both spot and futures prices are low, and futures premiums reflect the full cost of storage. On the other hand, when supplies are low and companies try to replenish their inventories, the market is in backwardation and volatilities of spot and nearby futures are high compared to those of distant futures. Later on, the concept of convenience yield, i.e., the return implied from the inventory being hold, was added to the picture, altogether with the demand and supply curves for storage.
A number of studies analyze quantitatively and verify empirically the implications of the above theory. In one of the earliest studies, Fama and French [4] used selected agricultural commodities to show that the seasonality in convenience yield causes variations in the basis. Geman and Ohana [5], who used the theory of storage to explain the shape of futures curve for oil and natural gas, found a significant correlation between price volatility and the level of inventory. Similarly, Symeonidis et al. [6], analyzing 21 commodities, showed that when inventories are low (high) the chance of backwardation (contango) increases, moreover that price volatility is a decreasing function of inventory, more pronouncedly in backwardated markets. More recently, Fernandez [7] argued on the ground of the theory of storage that spot and futures prices should be more highly correlated in contango than in backwardation and confirmed this contemporaneously for six base metals. She also found Granger causality from futures to spot returns in both market regimes. Additionally, Fernandez [8], evidenced the link of futures market and business cycles of industrial production and production of industrial metals. There are also studies showing that the joint dynamics of spot and futures is regime dependent. For example, Huang et al. [9], Gulley and Tilton [10] and Fernandez [11] demonstrated that spot and futures prices are more correlated in contango than in backwardation. In turn, Almansour [12] demonstrated that the dynamics of the basis for energy commodities is well described by a regime switching process, which allows to identify contango and backwardation regimes.
The above-mentioned relationship between spot and futures prices is often explored in the context of commodity price forecasting. Perhaps the most popular study on this topic by Fama and French [4] showed that only for selected commodities, forecasts derived from direct regressions (with future spot price changes explained by the basis) are more accurate than those from the random walk benchmark. It can be noted that the direct forecast regressions implicitly assume that spot and futures prices are cointegrated, where deviations from the long-term relationship are eliminated by spot price adjustments. This hypothesis is partially confirmed by Shrestha [13] within the vector error correction framework for crude oil and natural gas markets. On the contrary, Chang and Lee [14] used multivariate autoregressive framework to evidence the bilateral causality between spot and futures prices. The error correction framework was also applied in a natural gas market forecasting contest by Modjtahedi and Movassagh [15,16], who found that futures are biased predictors of spot prices, but also characterized by significant directional forecasting power. For crude oil prices, Coppola [17] found that a vector error correction model for spot and futures prices outperforms random walk in a forecasting competition. In a more extensive study, Reichsfeld and Roache [18] demonstrated significant predictive content of futures for energy commodities. In particular, they showed that forecasts derived from futures curves are difficult to beat using more elaborate econometric models, e.g., based on direct forecast regression or error correction specification. Less favorable results on the predictive content of crude oil futures prices are presented in comprehensive studies by Alquist and Kilian [19] and Alquist et al. [20], which show that futures-based forecasts can outperform random walk out-of-sample only at very short horizons. These negative results on the limited predictive content of futures for energy commodities were later confirmed by Chinn and Coibion [21], Fernandez [22] and Pak [23]. On the contrary, Ellwanger and Snudden [24] show that futures prices have become better predictors of crude oil spot prices since the mid-2000s. The above results are explained by Manescu and Van Robays [25], who pointed to time variability in the relative usefulness of different methods.
Therefore, a key question remains whether futures prices are a worse or better benchmark than the random walk for a broad class of commodities and forecast horizons. To explore this topic, we gathered end-of-month series of forward curves for eight important commodities: WTI and Brent crude oils, natural gas, copper, gold, silver, wheat and maize (sourced from Refinitiv Eikon Datastream). The choice of these commodities was based on two criteria. Our aim was to cover commodities most frequently analyzed in the literature and with reasonably high liquidity. We also wanted to cover a broad spectrum of commodity markets, i.e., energy, agriculture, industrial and precious metals. Next, for each month t from the period December 1999-February 2021 and horizon h ranging from one to twelve months ahead, we have calculated forecasts for spot prices S i,t+h|t based on the quotation of futures contracts F it,h using the formula: Since futures for 7 out of 8 analyzed commodities are physically settled (Brent oil is an exception), for the sake of consistency, we assumed the closest to expiration 1 month futures prices as spot prices to be forecasted, with 2-13 month futures being their predictors.
The values of these forecasts are presented in Figure 1. Given that the forecast evaluation sample covers data from January 2000-March 2021, for hhorizon forecasts, we calculated 256 − h forecast errors: It can be noted that we decided to work with logarithms, so that errors are calculated as percentage deviations of realizations from forecasts. In the final step, we computed the Root Mean Squared Forecast Errors (RMSFE) for the futures-based forecasts and divided them by the values of RMSFE for the random walk benchmark, so that values below unity indicate that futures-based forecast outperforms the RW benchmark. We have also tested whether the ratio is significantly below 1 using the Coroneo and Iacone [26] version of the the Diebold and Mariano test. The results in Table 1 show that futures-based forecasts are on average more accurate-in many instances significantly more-than those from the random walk benchmark. The biggest improvement in forecast accuracy is observed for gold and natural gas prices, whereas the random walk benchmark turns out to be difficult to beat for copper as well as wheat and maize prices. (We have also calculated sequential RMSFE ratios to verify whether the performance of futures-based forecasts is sample-dependent, as indicated by Reeve and Vigfusson [27]. The unreported results, which are available upon request, showed that the results from Table 1 are hardly affected by the choice of the date in which the sample ends). We also verified the directional accuracy of futures-based forecasts by computing fractions of correct sign of change predictions, the so-called success ratios. They are presented in Table 2, accompanied by Pesaran and Timmermann [28] tests. The results are broadly consistent with those from Table 1 and confirm relatively good performance of futures-based forecasts. It can be noticed that all studies discussed so far, as well as the above results, explore the predictive content of futures prices within a linear setup. There are, however, numerous reasons to argue that the relationship between spot and futures prices might be non-linear. For instance, Tilton et al. [29] propose a theoretical framework in which an increase in futures prices should influence spot prices in strong contango, but not in backwardation. This claim is empirically supported by Gulley and Tilton [10] for copper prices and Fernandez [11] for a broader group of commodities. In the same vein, Almansour [12] demonstrate that for energy commodities, the dynamics of the basis is well described by a regime-dependent process, which allows contango and backwardation regimes to be identified. Additionally, Huang et al. [9] show, within the threshold vector autoregression framework, that the joint dynamics of spot and futures prices of crude oil is better described (and their in-sample predictability improved) by the regime-dependent model, with the basis as a threshold variable, than by its linear counterpart. In a similar vein, Lin and Liang [30] and Mamatzakis and Remoundos [31] show that the threshold vector error correction model, in which the error correction term is a threshold variable, describes well the dynamics of the crude oil market. This framework is developed further by Beckmann et al. [32], who consider a broader group of energy commodities and allow for a smooth transition between the two regimes. This study also shows that in-sample, the predictive content of futures prices on different energy markets is regime-dependent. In the out-of-sample context, Rubaszek et al. [33] investigate the predictive power of futures for energy commodities within the threshold framework, which allows for a different dynamics in contango and in backwardation. The authors show that the futures curves deliver relatively accurate forecasts for energy commodity prices. Moreover, they evidence that the relationship between spot and futures prices is regime-dependent but accounting for this property does not improve the quality of out-of-sample forecasts.
To summarize, the literature review as well as the results from Table 1 imply that predicting nominal commodity prices with futures prices does not deliver consistently better forecasts than those from the random walk model. However, the average out-of sample forecast accuracy gain from using futures prices instead of naive forecast, albeit low, still amounts up to over ten percent for selected commodities and horizons. This would suggest that the use of a random walk benchmark in nominal commodity price forecasting is somewhat justified, while recommending futures prices as a second supplementary benchmark.

Local Projection as a Benchmark for Real Commodity Forecasts
In this section, we discuss whether models accounting for mean-reversion of real commodity prices could and should be used as a benchmark in real commodity price forecasting. We start by describing the literature that evidence the mean-reversion of real commodity prices. Then, we focus on studies exploring this feature in out-of-sample forecasting competitions. We complete this section by presenting the comparison of forecasts based on local projections to those from the random walk model and by discussing additional topics, such as structural models for commodity markets.
Theoretical models often imply persistence and stationarity of real commodity prices (Deaton and Laroque [34]). The reason behind stationarity is that, due to non-zero price elasticities of demand and supply over the long-term horizon, any deviations from the equilibrium price would lead to imbalance between supply and demand for a physical commodity. As a result, the impact of shocks on real commodity prices, e.g., financial speculation (Fattouh et al. [35], Dvir and Rogoff [36]), should be short-lived. The implications of these theoretical models is only partially supported by empirical studies. For the crude oil market, Dvir and Rogoff [37] show that mean-reversion can be observed over very long horizons and might depend on the oil market structure. Similarly, Kruse and Wegener [38] show periods with explosive and mean reverting oil price movements and claim a non-linear adjustment of prices to equilibrium. As regards industrial and precious metals prices, Adewuyi et al. [39] show that they tend to be stationary, especially if unit root tests accounting for structural breaks are used. For the agricultural products, Wang and Tomek [40] claim that the impact of shocks on agricultural product prices is transitory. On the contrary, the conclusions of a more recent study by Ghoshray [41] are more in favor of agricultural price shocks being permanent in nature.
The stationarity implied by theoretical models and unit root tests would suggest that autoregressive models, addressing the gradual reversion of real commodity prices to equilibrium, could be exploited in forecasting, which is in fact shown by a few studies. Baumeister and Kilian [42,43] are probably the most well-known examples of this kind of analyses. The authors assess the quality of forecasts for real oil prices over years 1991-2010 derived with a set of models, among them the vector autoregressive (VAR) model proposed by Kilian and Murphy [44]. They show that this setup performs well in comparison to the no-change forecast, especially at shorter horizons. This result was recently reassessed by Funk [45], who focused on real oil prices and a prolonged forecast evaluation period, covering the years 1991-2017. The performance of the Kilian-Murphy VAR was fine-tuned by Snudden [46], who applied growth rate filter to the analyzed series and showed that this transformation boosts the accuracy of forecasts for crude oil over the period 1995-2015. On the contrary, less favorable results on the usefulness of VAR models in oil price fore-casting are provided by Manescu and Van Robays [25]. They indicate that the VAR model of Kilian and Murphy [44] is not powerful enough to consistently deliver better forecasts than those from the random walk model, with its outstanding performance presented in Baumeister and Kilian [42] driven predominantly by an upward trend in oil prices observed in the 2002-2007 period. Degiannakis and Filis [47] also show that the performance of the Kilian and Murphy [44] VAR model for years 2011-2015 is relatively bad for shorter horizons. In the discussion on crude oil forecasting, it is noteworthy that Rubaszek [48] show that the oil market VAR model does not deliver competitive forecasts over the period 2008-2018. As regards forecasting industrial metal prices, the results are even more mixed. For instance, Dooley and Lenihan [49] discuss how linear autoregressive models can be applied to forecast the prices of zinc and lead. In addition, Rubaszek et al. [50] show that the dynamics of real prices for main industrial metals (aluminium, copper, nickel and zinc) can be forecasted with mean-reverting models, which deliver significantly better forecasts than the naive random walk. Moreover, the authors show that, allowing for non-linearity by introducing threshold structure, does not lead to further improvement in the quality of forecasts. Finally, Kwas et al. [51] show that the factor augmented mean-reverting direct forecast models (a.k.a local projection models) can deliver competitive forecasts for prices of main cereals (barley, maize, rice and wheat) relatively to the random walk, especially for longer horizons.
A key question arises whether mean-reverting models are better or worse benchmark than the random walk for real prices of a broad class of commodities and various forecast horizons. To answer this question, we use a model that is based on the local projection (LP) regressions: where ∆ h q it = q it − q i,t−h is the (logarithmic) percentage change in the real price of commodity i over the h-period horizon. We have decided to use the LP framework due to its advantages enumerated by Jorda [52]. First, the model can be estimated by simple regression techniques. Second, LP regressions are robust to misspecification compared to a single model for all horizons. Third, it accommodates nonlinear and flexible specifications. Moreover, the LP framework turned out to be competitive to other autoregressive models, e.g., in exchange rate forecasting literature (Ca' Zorzi and Rubaszek [53]).
To compare the LP-based forecasts to the RW benchmark, we gathered monthly averages of real prices for eight commodities analyzed in the previous section for the period January 1980-March 2021 (Data were sourced from World Bank "Pink Sheet". For WTI prices, data sample starts in Jan 1982.). Next, for each month t from the period December 1999-February 2021 and horizon h ranging from one to sixty months ahead, we estimated the parameters of model (1) and calculated forecasts for log real prices q t+h|t using the formula: where α ih|t and β ih|t are parameter estimates on a sample ending in month t. The values of these forecasts are presented in Figure 2. Given that the forecast evaluation sample covers data from January 2000-March 2021, for forecasts at horizon h, we calculated 256 − h forecast errors: It should be noted that since real commodity prices are expressed as logarithms, forecast errors are interpreted as percentage deviations of realizations from forecasts. In the final step, we computed the RMSFE for the LP forecasts and rescaled it by the value of RMSFE for the RW benchmark, so that values below unity indicated that a futures-based forecast outperforms the RW benchmark. The results in Table 3 show that LP based forecasts are on average more accurate, in many instances significantly, than those from the random walk benchmark. The biggest improvement in forecast accuracy is observed for silver, copper and wheat, whereas the random walk benchmark prevails for oil as well as gold prices.
We also tested the LP forecasts by computing the fractions of correct sign of change predictions. The results in Table 4 show that a substantial reduction in RMSFE is consistent with significant directional accuracy. This reinforces the findings from Table 1 and confirms the strong predictive properties of LP models.    [28], with asterisks ***, ** and * denoting the 1%, 5% and 10% significance levels. See also comments under Table 3.

Forecasting Horizon in Months
An alternative to the LP method would be to use a simple autoregressive model, AR: which assumes gradual reversion of real commodity prices (at a constant pace ρ i ) to equilibrium level µ i . In this model, the forecast at horizon h is equal to: Equations (2) and (3) clearly demonstrate that AR model is a restricted version of the LP method, in which β ih = ρ h i − 1 and α ih = (1 − ρ h i )µ i . Here, we present the results for this model, as it is often used in studies on the forecasting of crude oil prices, e.g., by Benmoussa et al. [54] or Rubaszek [48].
In Table 5, we present the RMSFEs for forecasts form the AR model. They show that the AR model delivers better forecasts than the RW benchmark for all commodities, but gold. A comparison of AR and LP models (Tables 3 and 5) shows that both models are broadly comparable. The AR model seems to perform better for five out of eight analyzed commodities (crude oils, natural gas, copper and wheat), whereas the LP method is more suitable for the remaining three commodities (gold, silver and wheat). Altogether, Tables 3 and 5 show comparable performances of these two approaches, thus supporting any of the two as a good candidate for a benchmark. Let us point that the above discussion focuses on the mean reversion of real commodity prices to equilibrium level, which is the recursive sample mean. Arguably, that long-term equilibrium might be driven by some macroeconomic or financial determinants. A number of studies thus explored which variable might be a good predictor of commodity prices. For example, some of them focus on the relationship between commodity prices and exchange rates and present a strong evidence on the predictive power of commodity currencies. The well-known study by Chen et al. [55] show that the currencies of commodity exporters (commodity currencies) are helpful in predicting the future movements in many commodities. This result is confirmed later by Gargano and Timmermann [56] for the oil market, Ciner [57] for precious metals and the South African rand or Pincheira-Brown and Hardy [58] for industrial metals and the Chilean peso. The main explanation of these findings is that selected exchange rates discount future movements in commodity prices. Other studies show that a common factor to commodity prices (CF) can be used to forecast individual commodities. West and Wong [59] evidence that CF is a good predictor for energy commodities and a poor predictor for industrial metals, whereas Kwas et al. [51] exploit the predictive content of CF in agricultural products forecasting.
Finally, it can be noticed that the mean-reversion of real commodity prices would imply that structural theoretical models should deliver competitive forecasts. In fact, there are two studies that use Dynamic Stochastic General Equilibrium (DSGE) model to forecast real oil prices. First, Manescu and Van Robays [25] apply a calibrated model of Nakov and Nuno [60] to find that its out-of-sample performance is relatively good. Second, Rubaszek [48] evaluates the recursively estimated DSGE model of Nakov and Pescatori [61] to show that it delivers competitive forecasts for real oil prices. The author explains that this good performance is due to the fact that the DSGE model assumes that real oil prices are mean reverting over long horizons and predicts conservative dynamics towards equilibrium, in the sense that it does not attempt to explain a large fraction of oil price volatility out of sample.
The above literature review as well as the results from Table 3 imply that predicting real commodity prices with LP models tend to deliver better forecasts than those from the random walk model, especially for longer horizons. The gain in the out-of sample forecast accuracy from using the LP model instead of naive forecast might amount up to 20%. This would suggest that the use of the RW benchmark in real commodity price forecasting is mildly justified, and the LP method should be used as a supplementary or even as an alternative benchmark.

Conclusions
The random walk no-change forecast is the most popular benchmark against which other methods used for forecasting commodity prices are evaluated. In this paper, we investigate whether the random walk is indeed a proper benchmark or whether some alternative models could be used instead. Based on a thorough literature review, we identify two candidates. First, for nominal commodity prices, some studies imply that forecasts based on futures prices could give a competitive advantage over the random walk. Second, based on the literature evidence on mean-reversion of real commodity prices, we conjecture that this feature could be exploited out-of-sample within the local projection framework.
To verify the quality of these alternative benchmarks, we have conducted an outof-sample forecasting contest using data for a broad spectrum of commodities (WTI and Brent crude oils, natural gas, copper, gold, silver, wheat and maize), which are most often analyzed in the forecasting literature, and over a long time span covering the period 2000-2021. Our results demonstrate that the two alternative benchmarks deliver forecasts of comparable and in many instances superior quality. Our conclusions are twofold. First, futures-based forecasts should supplement the random walk as a benchmark in forecasting nominal commodity prices. Second, we recommend using a local projection forecast as an additional or even as an alternative benchmark in forecasting real commodity prices.