# Forecasting Principles from Experience with Forecasting Competitions

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- (I)
- dampen trends/growth rates;
- (II)
- average across forecasts from ‘non-poisonous’ methods;
- (III)
- include forecasts from robust devices in that average;
- (IV)
- select variables in forecasting models at a loose significance;
- (V)
- update estimates as data arrive, especially after forecast failure;
- (VI)
- ‘shrink’ estimates of autoregressive parameters in small samples;
- (VII)
- adapt choice of predictors to data frequency;
- (VIII)
- address ‘special features’ like seasonality.

## 2. M4 Competition

#### 2.1. Overview

#### 2.2. M4 Benchmark Forecasting Methods

SES | Simple exponential smoothing | $\delta ={b}_{0}=0$, |

HES | Holt’s exponential smoothing, | |

Theta2 | Theta(2) method | $\delta =0,{b}_{0}=\widehat{\tau}/2$ defined in (1). |

#### 2.3. Seasonality in the M4 Benchmark Methods

#### 2.4. M4 Forecast Evaluation

## 3. M4 Data

#### 3.1. Properties of Interest

#### 3.2. Sample Size

#### 3.3. Logarithms

#### 3.4. Persistence

#### 3.5. Seasonality

## 4. Revisiting the M4 Benchmark Methods

#### 4.1. Expected Performance of Naive Forecasts

#### 4.2. A Simplified Theta Method: THIMA and THIMA.log

- (1)
- Starting from ${y}_{t},\phantom{\rule{4pt}{0ex}}t=1,\dots ,T$, the first differences $\mathsf{\Delta}{y}_{t},\phantom{\rule{4pt}{0ex}}t=2,\dots ,T$ have mean $\tilde{\tau}$. Construct ${x}_{t}=\mathsf{\Delta}{y}_{t}-{\scriptstyle \frac{1}{2}}\tilde{\tau}$.
- (2)
- Estimate the following MA(1) model by nonlinear least squares (NLS) with $\widehat{\theta}\in [-0.95,0.95]$:$${x}_{t}={\u03f5}_{t}+\theta {\u03f5}_{t-1}.$$
- (3)
- The forecasts are:$${\widehat{y}}_{T+H}={y}_{T}+{\scriptstyle \frac{1}{2}}\tilde{\tau}H+\widehat{\theta}{\widehat{\u03f5}}_{T}.$$

- if (7) using ${c}_{l}=1.3$ suggests logarithms, take logs;
- forecast using THIMA;
- exponentiate the forecasts if logs were used, then add seasonality if the variable was deseasonalized.

## 5. Heterogeneity and Independence

#### 5.1. Unexpected Heterogeneity

#### 5.2. An M4-Like Data Generation Process

#### 5.3. The Role of Sample Dates

## 6. The Cardt Method

#### 6.1. The Original Card Method

- Let ${y}_{t},t=1,\dots ,T$ denote the initial series. If $min({y}_{1},\dots ,{y}_{T})>1$: ${x}_{t}=log\left({y}_{t}\right)$, else ${x}_{t}={y}_{t}$.This entails that logs were always used in both the M3 and M4 data.
- If $\mathrm{var}\left[\mathsf{\Delta}{x}_{t}\right]\le 1.2\phantom{\rule{4pt}{0ex}}\mathrm{var}\left[{x}_{t}\right]$ then forecast from a dynamic model, else directly forecast the levels using a static model.The static model only occurs in M4 at a rate of $1.5\%$ (yearly), $4\%$ (quarterly), $6\%$ (monthly), and almost never at the other data frequencies.
- The presence of seasonality is tested at $10\%$ based on the ANOVA test (8) using $\mathsf{\Delta}{x}_{t}$ or ${x}_{t}$ (depending on the previous step).

#### 6.2. Robust Adjustments to Card

#### 6.2.1. Robust 1-Step Forecasts of AR(1) Model

#### 6.2.2. One-Step Ahead Robust Adjustment for Rho

#### 6.2.3. One and Two-Step Robust Adjustment after Calibration

#### 6.3. Cardt: More Averaging by Adding THIMA

#### 6.4. Forecast Intervals

## 7. Evaluation

#### 7.1. Averaging and Calibration

#### 7.2. Comparison of Cardt and Card

#### 7.3. Interval Forecasts

#### 7.4. Overall Performance

## 8. Cardt and COVID-19

## 9. Conclusions

- (I)
- dampen trends/growth rates;This certainly holds for our methods and Theta-like methods. Both Delta and Rho explicitly squash the growth rates. Theta(2) halves the trend. The THIMA method that we introduced halves the mean of the differences, which has the same effect.
- (II)
- average across forecasts from "non-poisonous" methods;This principle, which goes back to [16], is strongly supported by our results, as well as the successful methods in M4. There may be some scope for clever weighting schemes for the combination, as used in some M4 submissions that did well. It may be that a judicious few would be better than using very many.A small amount of averaging also helped with forecast intervals, although the intervals from annual data in levels turned out to be ‘poisonous.’
- (III)
- include forecasts from robust devices in that average;We showed that short-horizon forecasts of Rho could be improved by overdifferencing when using levels. The differenced method already has some robustness, because it reintegrates from the last observation. This, in turn, could be an adjustment that is somewhat too large. The IMA model of the THIMA method effectively estimates an intercept correction, so has this robustness property (as does Theta(2), which estimates it by exponential smoothing).
- (IV)
- select variables in forecasting models at a loose significance;Some experimentation showed that the seasonality decisions work best at $10\%$, in line with this principle. Subsequent pruning of seasonal dummies in the calibration model does not seem to do much, probably because we already conditioned on the presence of seasonality. However, for forecast uncertainty, a stricter selection helps to avoid underestimating the residual variance. Ref. [17] find support for this in a theoretical analysis.
- (V)
- update estimates as data arrive, especially after forecast failure;This aspect was only covered here by restricting estimation samples to say, 40 years for annual data given the many large shifts that occurred in earlier data. Recursive and moving windows forecasts are quite widely used in practical forecasting.
- (VI)
- ‘shrink’ estimates of autoregressive parameters in small samples;As the forecast error variance can only be estimated from out-of-sample extrapolation, it is essential to avoid explosive behaviour, so constrain all $\widehat{\rho}\le 1$.
- (VII)
- adapt choice of predictors to data frequency;For example, method 118 by [50] had the best performance for yearly and monthly forecasting but Card was best at forecasting the hourly data.
- (VIII)
- address ‘special features’ like seasonality.Appropriate handling of seasonality was important as described in Section 2.3 and even transpired to be an important feature of forecasting COVID-19 cases and deaths as in [49].

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A. Accuracy Measures under Naive Forecasts

#### Appendix A.1. Stationary Case

**Table A1.**Simulated and approximated mean and bias of MASE and sMAPE for one-step ahead naive forecasts. DGP N[$\mu ,1$], T = 15, M = 100,000 replications.

$\mathsf{\mu}$ = 0.5 | $\mathsf{\mu}$ = 1 | $\mathsf{\mu}=2$ | $\mathsf{\mu}=5$ | $\mathsf{\mu}=10$ | |
---|---|---|---|---|---|

Simulated | |||||

E[MASE] | 1.136 | 1.136 | 1.136 | 1.136 | 1.136 |

E[sMAPE] | 133.9 | 109.1 | 63.2 | 23.0 | 11.3 |

Bias[MASE] | −0.006 | −0.006 | −0.006 | −0.006 | −0.006 |

Bias[sMAPE] | 1.100 | 0.923 | 0.573 | 0.198 | 0.093 |

Bias[MAPE] | −0.279 | −0.932 | −1.640 | −0.222 | −0.108 |

Bias[MAAPE] | 0.064 | 0.038 | −0.109 | −0.159 | −0.100 |

Approximated | |||||

E[MASE] | 1 | 1 | 1 | 1 | 1 |

E[sMAPE] | 225.7 | 112.8 | 56.4 | 22.6 | 11.3 |

Bias[MASE] | 0 | 0 | 0 | 0 | 0 |

Bias[sMAPE] | 2 | 1 | 0.5 | 0.2 | 0.1 |

#### Appendix A.2. Nonstationary Case

#### Appendix A.3. Nonstationary Case in Levels

**Table A2.**Simulated means and standard deviations of MASE, sMAPE, MAPE, and MAAPE for one-step ahead naive forecasts. $T=15$, $M=100,000$ replications.

${\mathit{y}}_{\mathit{t}}\sim $N[$\mathit{\mu},{\mathit{\sigma}}^{2}$] | $\mathbf{\Delta}{\mathit{y}}_{\mathit{t}}\sim $N[$\mathit{\mu},{\mathit{\sigma}}^{2}$] | $\mathbf{\Delta}log{\mathit{y}}_{\mathit{t}}\phantom{\rule{-0.166667em}{0ex}}\sim \phantom{\rule{-0.166667em}{0ex}}$N[$\mathit{\mu},{\mathit{\sigma}}^{2}$] | ||||
---|---|---|---|---|---|---|

Mean | Sdev | Mean | Sdev | Mean | Sdev | |

$\mathsf{\mu}=0,\sigma =1$, T = 15 | ||||||

MASE | 1.14 | 0.92 | 1.04 | 0.84 | 2.9 | 9.8 |

sMAPE | 144.1 | 68.5 | 51.0 | 58.9 | 69.9 | 45.4 |

MAPE | 6629.4 | $18\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{3.33333pt}{0ex}}{10}^{5}$ | 809.6 | $1.6\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{3.33333pt}{0ex}}{10}^{5}$ | 112.4 | 195.1 |

MAAPE | 90.8 | 40.1 | 40.3 | 38.8 | 58.7 | 37.9 |

$\mathsf{\mu}=0.025,\sigma =0.1$, T = 15 | ||||||

MASE | 1.14 | 0.92 | 1.05 | 0.85 | 1.4 | 1.2 |

sMAPE | 141.3 | 69.3 | 37.7 | 50.0 | 8.3 | 6.3 |

MAPE | 654.0 | 10871 | 365.7 | 15478 | 8.5 | 6.8 |

MAAPE | 89.5 | 40.6 | 31.7 | 34.8 | 8.4 | 6.7 |

$\mathsf{\mu}=0.1,\sigma =1$, T=15 | ||||||

MASE | 1.14 | 0.92 | 1.04 | 0.84 | 4.3 | 13.2 |

sMAPE | 143.8 | 68.6 | 48.5 | 57.6 | 70.1 | 45.5 |

MAPE | 797.4 | 23462 | 188.7 | 6304.6 | 124.6 | 219.1 |

MAAPE | 90.6 | 40.2 | 38.6 | 38.2 | 60.8 | 39.6 |

$\mathsf{\mu}=1,\sigma =1$, T = 15 | ||||||

MASE | 1.14 | 0.92 | 1.04 | 0.75 | 41.8 | 71.6 |

sMAPE | 109.1 | 71.0 | 8.60 | 7.02 | 94.0 | 51.6 |

MAPE | 648.1 | 27743 | 9.31 | 21.5 | 358.8 | 579.4 |

MAAPE | 75.3 | 43.5 | 9.07 | 7.52 | 91.9 | 46.8 |

$\mathsf{\mu}=10,\sigma =1$, T = 15 | ||||||

MASE | 1.14 | 0.92 | 1.00 | 0.104 | $5.1\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{3.33333pt}{0ex}}{10}^{5}$ | $6.6\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{3.33333pt}{0ex}}{10}^{5}$ |

sMAPE | 11.3 | 8.63 | 6.90 | 0.69 | 200.0 | 0.039 |

MAPE | 11.5 | 9.05 | 7.15 | 0.74 | $36\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{3.33333pt}{0ex}}{10}^{5}$ | $47\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{3.33333pt}{0ex}}{10}^{5}$ |

MAAPE | 11.3 | 8.71 | 7.14 | 0.74 | 157.1 | 0.010 |

## Appendix B. Forecast Intervals

- remove the broken intercept and trend (if present, so setting ${I}_{6}=0$);
- remove deterministic variables that are insignificant at $2\%$; the intercept is kept;
- remove ${z}_{t-R-1}$ if present;
- add the absolute residuals from (A5) as a regressor;
- estimate the reformulated calibration model;
- if $\widehat{\rho}>0.999$ then impose the unit root, and re-estimate;
- if $\widehat{\rho}<0$ then set $\rho =0$, and re-estimate.

## Appendix C. Comparison with R Code

Theta2 | Ox Implementation | R Implementation | ||||||
---|---|---|---|---|---|---|---|---|

Time (s) | sMAPE | Time (s) | sMAPE | |||||

Total | Data | Forecast | Total | |||||

Yearly | 3.67 | 0.876 | 4.61 | 375 | 380 | 0.880 | ||

Quarterly | 4.83 | 0.949 | 4.31 | 627 | 631 | 0.950 | ||

Monthly | 10.49 | 1.017 | 4.22 | 1499 | 1503 | 1.016 | ||

Weekly | 0.45 | 0.838 | 3.95 | 33 | 37 | 0.886 | ||

Daily | 6.96 | 1.007 | 4.05 | 1019 | 1023 | 1.008 | ||

Hourly | 0.27 | 0.991 | 3.89 | 28 | 32 | 0.991 |

## References

- Makridakis, S.; Hibon, M. Accuracy of forecasting: An empirical investigation. J. R. Stat. Soc. A
**1979**, 142, 97–145. [Google Scholar] [CrossRef] - Makridakis, S.; Andersen, A.; Carbone, R.; Fildes, R.; Hibon, M.; Lewandowski, R.; Newton, J.; Parzen, E.; Winkler, R. The accuracy of extrapolation (time series) methods: Results of a forecasting competition. J. Forecast.
**1982**, 1, 111–153. [Google Scholar] [CrossRef] - Makridakis, S.; Chatfield, C.; Hibon, M.; Lawrence, M.; Mills, T.; Ord, K.; Simmons, L.F. The M2-competition: A real-time judgmentally based forecasting study. Int. J. Forecast.
**1993**, 9, 5–22. [Google Scholar] [CrossRef] - Makridakis, S.; Hibon, M. The M3-competition: Results, conclusions and implications. Int. J. Forecast.
**2000**, 16, 451–476. [Google Scholar] [CrossRef] - Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. The M4 competition: 100,000 time series and 61 forecasting methods. Int. J. Forecast.
**2020**, 36, 54–74. [Google Scholar] [CrossRef] - Hyndman, R.J. A brief history of forecasting competitions. Int. J. Forecast.
**2020**, 36, 7–14. [Google Scholar] [CrossRef] - Fildes, R.; Ord, K. Forecasting competitions—Their role in improving forecasting practice and research. In A Companion to Economic Forecasting; Clements, M.P., Hendry, D.F., Eds.; Blackwells: Oxford, UK, 2002; pp. 322–353. [Google Scholar]
- Clements, M.P.; Hendry, D.F. Explaining the results of the M3 forecasting competition. Int. J. Forecast.
**2001**, 17, 550–554. [Google Scholar] - Fildes, R.A.; Makridakis, S. The impact of empirical accuracy studies on time series analysis and forecasting. Int. Stat. Rev.
**1995**, 63, 289–308. [Google Scholar] [CrossRef][Green Version] - Önkal-Atay, D.; Wilkie-Thomson, M.E.; Pollock, A.C. Judgemental forecasting. In A Companion to Economic Forecasting; Clements, M.P., Hendry, D.F., Eds.; Blackwells: Oxford, UK, 2002; pp. 133–151. [Google Scholar]
- Britton, E.; Fisher, P.; Whitley, J. Inflation Report projections: Understanding the fan chart. Bank Engl. Q. Bull.
**1998**, 38, 30–37. [Google Scholar] - Doornik, J.A.; Castle, J.L.; Hendry, D.F. Card forecasts for M4. Int. J. Forecast.
**2020**, 36, 129–134. [Google Scholar] [CrossRef] - Gardner, E.S.; McKenzie, E. Forecasting trends in time series. Manag. Sci.
**1985**, 31, 1237–1246. [Google Scholar] [CrossRef] - Clements, M.P.; Hendry, D.F. Forecasting Economic Time Series; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
- Hendry, D.F.; Mizon, G.E. Unpredictability in economic analysis, econometric modeling and forecasting. J. Econom.
**2014**, 182, 186–195. [Google Scholar] [CrossRef][Green Version] - Bates, J.M.; Granger, C.W.J. The combination of forecasts. Oper. Res. Q.
**1969**, 20, 451–468. [Google Scholar] [CrossRef] - Castle, J.L.; Doornik, J.A.; Hendry, D.F. Selecting a model for forecasting. Econometrics
**2021**, in press. [Google Scholar] - Hendry, D.F.; Mizon, G.E. Open-model forecast-error taxonomies. In Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis; Chen, X., Swanson, N.R., Eds.; Springer: New York, NY, USA, 2012; pp. 219–240. [Google Scholar]
- Castle, J.L.; Fawcett, N.W.P.; Hendry, D.F. Forecasting with equilibrium-correction models during structural breaks. J. Econom.
**2010**, 158, 25–36. [Google Scholar] [CrossRef][Green Version] - Clements, M.P.; Hendry, D.F. On the limitations of comparing mean squared forecast errors (with discussion). J. Forecast.
**1993**, 12, 617–637. [Google Scholar] [CrossRef] - Findley, D.F.; Monsell, B.C.; Bell, W.R.; Otto, W.R.; Chen, B.-C. New capabilities and methods of the X-12-ARIMA seasonal-adjustment program (with discussion). J. Bus. Econ. Stat.
**1998**, 16, 127–177. [Google Scholar] - Ord, K. Commentaries on the M3-Competition. An introduction, some comments and a scorecard. Int. J. Forecast.
**2001**, 17, 537–584. [Google Scholar] - Koning, A.J.; Franses, P.H.; Hibon, M.; Stekler, H.O. The M3-Competition: Statistical tests of the results. Int. J. Forecast.
**2005**, 21, 397–409. [Google Scholar] [CrossRef] - Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. Statistical and machine learning forecasting methods: Concerns and ways forward. PLoS ONE
**2018**, 13, 176–179. [Google Scholar] [CrossRef][Green Version] - Hyndman, R.J.; Koehler, A.B.; Ord, J.K.; Snyder, R.D. Forecasting with Exponential Smoothing; Springer: New York, NY, USA, 2008. [Google Scholar]
- Hyndman, R.J.; Koehler, A.B.; Snyder, R.D.; Grose, S. A state space framework for automatic forecasting using exponential smoothing. Int. J. Forecast.
**2002**, 18, 439–454. [Google Scholar] [CrossRef][Green Version] - Winters, P.R. Forecasting sales by exponentially weighted moving averages. Manag. Sci.
**1960**, 6, 324–342. [Google Scholar] [CrossRef] - Holt, C.C. Forecasting seasonals and trends by exponentially weighted moving averages. In ONR Research Memorandum 52; Carnegie Institute of Technology: Pittsburgh, PA, USA, 1957. [Google Scholar]
- Assimakopoulos, V.; Nikolopoulos, K. The theta model: A decomposition approach to forecasting. Int. J. Forecast.
**2003**, 16, 521–530. [Google Scholar] [CrossRef] - Hyndman, R.J.; Billah, B. Unmasking the theta method. Int. J. Forecast.
**2003**, 19, 287–290. [Google Scholar] [CrossRef] - Hyndman, R.J.; O’Hara-Wild, M.; Bergmeir, C.; Razbash, S.; Wang, E. R Package ‘Forecast’, Version 8.2. Technical Report. 2017. Available online: https://cran.r-project.org/web/packages/forecast/index.html (accessed on 25 January 2021).
- Hyndman, R.J.; Koehler, A.B. Another look at measures of forecast accuracy. Int. J. Forecast.
**2006**, 22, 679–688. [Google Scholar] [CrossRef][Green Version] - Makridakis, S. Accuracy measures: Theoretical and practical concerns. Int. J. Forecast.
**1993**, 9, 527–529. [Google Scholar] [CrossRef] - Gneiting, T.; Raftery, A.E. Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc.
**2007**, 102, 359–378. [Google Scholar] [CrossRef] - Kang, Y.; Hyndman, R.J.; Smith-Miles, K. Visualising forecasting algorithm performance using time series instance spaces. Int. J. Forecast.
**2017**, 33, 345–358. [Google Scholar] [CrossRef][Green Version] - Spiliotis, E.; Kouloumos, A.; Assimakopoulos, V.; Makridakis, S. Are forecasting competitions data representative of the reality? Int. J. Forecast.
**2020**, 36, 37–53. [Google Scholar] [CrossRef] - Bergmeir, C.; Hyndman, J.M.; Benítez, R.J. Bagging exponential smoothing methods using stl decomposition and Box–Cox transformation. Int. J. Forecast.
**2016**, 32, 303–312. [Google Scholar] [CrossRef][Green Version] - Box, G.E.P.; Cox, D.R. An analysis of transformations. J. R. Stat. Soc. B
**1964**, 26, 211–243; discussion 244–252. [Google Scholar] [CrossRef] - Legaki, N.Z.; Koutsouri, K. Submission 260 to the M4 competition; Github; National Technical University of Athens: Athens, Greece, 2018. [Google Scholar]
- Ermini, L.; Hendry, D.F. Log income versus linear income: An application of the encompassing principle. Oxf. Bull. Econ. Stat.
**2008**, 70, 807–827. [Google Scholar] [CrossRef] - Ladiray, D.; Quenneville, B. Seasonal Adjustment with the X-11 Method; Springer: Berlin, Germany, 2001. [Google Scholar]
- Goodwin, P.B.; Lawton, R. On the asymmetry of the symmetric MAPE. Int. J. Forecast.
**1999**, 15, 405–408. [Google Scholar] [CrossRef] - Koehler, A.B. The asymmetry of the sAPE measure and other comments on the M3-competition. Int. J. Forecast.
**2001**, 17, 570–574. [Google Scholar] - Hendry, D.F.; Clements, M.P. On a theory of intercept corrections in macro-economic forecasting. In Money, Inflation and Employment: Essays in Honour of James Ball; Holly, S., Ed.; Edward Elgar: Aldershot, Hants, UK, 1994; pp. 160–182. [Google Scholar]
- Engler, E.; Nielsen, B. The empirical process of autoregressive residuals. Econom. J.
**2009**, 12, 367–381. [Google Scholar] [CrossRef] - Doornik, J.A. Object-Oriented Matrix Programming Using Ox, 8th ed.; Timberlake Consultants Press: London, UK, 2018. [Google Scholar]
- Castle, J.L.; Doornik, J.A.; Hendry, D.F. The value of robust statistical forecasts in the Covid-19 pandemic. Natl. Inst. Econ. Rev.
**2021**. [Google Scholar] - Doornik, J.A.; Castle, J.L.; Hendry, D.F. Short-term forecasting of the coronavirus pandemic. Int. J. Forecast.
**2020**. [Google Scholar] [CrossRef] [PubMed] - Doornik, J.A.; Castle, J.L.; Hendry, D.F. Statistical short-term forecasting of the COVID-19 pandemic. J. Clin. Immunol. Immunother.
**2020**, 6. [Google Scholar] [CrossRef] - Smyl, S. A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting. Int. J. Forecast.
**2020**, 36, 75–85. [Google Scholar] [CrossRef] - Kim, S.; Kim, H. A new metric of absolute percentage error for intermittent demand forecasts. Int. J. Forecast.
**2016**, 32, 669–679. [Google Scholar] [CrossRef]

**Figure 1.**Distribution of sample size after truncation. Truncated at 40 years, except daily at 1500 observations.

**Figure 4.**Tests of seasonality for quarterly, monthly, weekly and daily (with $S=5$) M4. First row for ${y}_{t}$, second row for $\mathsf{\Delta}log{y}_{t}$, third row seasonal ANOVA test for $\mathsf{\Delta}log{y}_{t}$.

**Figure 5.**Average one-step ahead forecast accuracy for yearly M3 and M4, withholding from 12 to 1 observations at the end from the full datasets. Normalized by the naive results.

**Figure 6.**QQ plots of annual and quarterly residuals against Normal and closely matching Student-t distribution, t(4) for yearly and t(8) for quarterly data.

**Figure 8.**Average one-step ahead forecast accuracy for DGP and M4, withholding from 12 to 1 observations at the end from the full datasets.

**Figure 9.**H-step sMAPE relative to that of Naive2 for all non-seasonal data (simulated, M4, M3, daily M4), retaining from $2H$ to H observations for evaluation. Forecast methods are Delta, Rho, $(\mathit{Delta}+\mathit{Rho})/2$, Card.

**Figure 10.**H-step performance relative to that of Naive2 for all seasonal data (M4Q, M4M, M4W, M4H for quarterly, monthly, weekly, hourly), retaining from $2H$ to H observations for evaluation. Forecast methods are Delta, Rho, $(\mathit{Delta}+\mathit{Rho})/2$, Card.

**Figure 11.**H-step forecast accuracy relative to that of Naive2. Forecast methods are Card, Cardt, THIMA.log.

**Figure 12.**Average rejection of $95\%$ and $90\%$H-step forecast intervals for all frequencies of M4, retaining from $2H$ to H observations for evaluation.

Dimension | Evaluation | Sample Size | Forecasts | |||
---|---|---|---|---|---|---|

# Series | % | m | ${\mathit{T}}_{\mathbf{min}}$ | ${\mathit{T}}_{\mathbf{max}}$ | H | |

Yearly | 23,000 | $23.0\%$ | 1 | 13 | 835 | 6 |

Quarterly | 24,000 | $24.0\%$ | 4 | 16 | 866 | 8 |

Monthly | 48,000 | $48.0\%$ | 12 | 42 | 2794 | 18 |

Weekly | 359 | $0.4\%$ | 1 | 80 | 2597 | 13 |

Daily | 4227 | $4.2\%$ | 1 | 93 | 9919 | 14 |

Hourly | 414 | $0.4\%$ | 24 | 700 | 960 | 48 |

**Table 2.**Approximate expectations of sMAPE and MASE under different data generation processes, $H=1$. $\Phi $ is the standard normal cdf, $\varphi $ the density, ${m}_{3}=exp(\mu +{\sigma}^{2}/2).$

DGP | sMAPE | MASE |
---|---|---|

${y}_{t}\sim \mathrm{IN}[\mu ,{\sigma}^{2}]$ | $113\frac{\sigma}{\left|\mu \right|}$ | 1 |

$\mathsf{\Delta}{y}_{t}\sim \mathrm{IN}[\mu ,{\sigma}^{2}]$ | $\frac{200}{2T+1}\left[2\frac{\sigma}{\mu}\varphi \left(\frac{-\mu}{\sigma}\right)+1-2\Phi \left(\frac{-\mu}{\sigma}\right)\right]$ | 1 |

$\mathsf{\Delta}log{y}_{t}\sim \mathrm{IN}[\mu ,{\sigma}^{2}]$ | $200\frac{{m}_{3}-1}{{m}_{3}+1}$ | $\frac{{m}_{3}^{T}}{\frac{1}{T}{\sum}_{t=1}^{T}{m}_{3}^{t-1}}$ |

**Table 3.**Average MASE and sMAPE of 1-step Naive forecasts, forecasting the last observation of the training sample.

Yearly M3 (H = 1) | Yearly M4 (H = 1) | |||
---|---|---|---|---|

sMAPE | MASE | sMAPE | MASE | |

Naive2 | 9.585 | 1.416 | 8.390 | 1.688 |

**Table 4.**M3 performance of MASE and sMAPE for Theta(2) and revised benchmark methods. Lowest in

**bold**.

Yearly (H = 6) | Quarterly (H = 8) | Monthly (H = 18) | ||||
---|---|---|---|---|---|---|

M3 | sMAPE | MASE | sMAPE | MASE | sMAPE | MASE |

Full sample, holdback H | ||||||

Naive2 | 17.88 | 3.17 | 10.03 | 1.25 | 16.77 | 1.04 |

Theta(2) | 16.72 | 2.77 | 9.24 | 1.12 | 13.91 | 0.87 |

Theta.log | 16.00 | 2.68 | 9.15 | 1.11 | 13.57 | 0.85 |

THIMA.log | 16.10 | 2.68 | 9.19 | 1.11 | 13.75 | 0.86 |

With last observation removed, holdback H | ||||||

Naive2 | 18.57 | 3.31 | 9.54 | 1.22 | 16.11 | 1.01 |

Theta(2) | 17.07 | 2.87 | 9.26 | 1.13 | 13.61 | 0.84 |

Theta.log | 15.91 | 2.64 | 9.26 | 1.13 | 13.22 | 0.82 |

THIMA.log | 15.61 | 2.57 | 9.07 | 1.10 | 13.22 | 0.82 |

$1\cdots \cdots {\mathit{T}}_{\mathit{i}}\phantom{\rule{-0.166667em}{0ex}}-\phantom{\rule{-0.166667em}{0ex}}\mathit{H}$ | ${\mathit{T}}_{\mathit{i}}\phantom{\rule{-0.166667em}{0ex}}$ | $-\phantom{\rule{-0.166667em}{0ex}}\mathit{H}$ | $+1\cdots \cdots {\mathit{T}}_{\mathit{i}}$ | ${\mathit{T}}_{\mathit{i}}\phantom{\rule{-0.166667em}{0ex}}+\phantom{\rule{-0.166667em}{0ex}}1\cdots \cdots {\mathit{T}}_{\mathit{i}}+$ | H | |
---|---|---|---|---|---|---|

development | training | Test forecasts | unavailable | |||

competition | competitor forecasts from this | M4 team tests | ||||

first 1-step forecast | estimation | T | unused | |||

second 1-step forecast | estimation | T | unused | |||

last 1-step forecast | estimation | T |

End Year | 1991 | 2001 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 |

Part of sample | 15.2% | 1.6% | 3.9% | 4.3% | 1.1% | 1.1% | 5.2% | 21.7% | 26.2% | 14.0% |

**Table 7.**Summary performance in M4 competition. Absolute coverage difference (ACD) is for a $95\%$ forecast interval except for submitted Card which used $90\%$. OWA is the overall weighted average of sMAPE and MASE, with weights determined by the relative number of series for each frequency. * denotes used ACD at 90%.

M4 | Y | Q | M | W | D | H | Y | Q | M | W | D | H | All | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

sMAPE | MASE | sMAPE | MASE | OWA | |||||||||||

new Cardt | 13.51 | 9.91 | 12.67 | 6.75 | 3.01 | 8.92 | 3.10 | 1.15 | 0.93 | 2.33 | 3.21 | 0.81 | 11.757 | 1.582 | 0.849 |

submitted Card | 13.91 | 10.00 | 12.78 | 6.73 | 3.05 | 8.91 | 3.26 | 1.16 | 0.93 | 2.30 | 3.28 | 0.80 | 11.924 | 1.627 | 0.865 |

new THIMA.log | 13.51 | 10.02 | 13.21 | 7.90 | 3.03 | 18.41 | 3.05 | 1.17 | 0.97 | 2.54 | 3.24 | 2.50 | 12.090 | 1.601 | 0.864 |

MSIS | ACD 90%*/95% | MSIS | ACD | ||||||||||||

new Cardt | 25.72 | 8.92 | 8.23 | 16.01 | 27.01 | 5.84 | 0.002 | 0.000 | 0.003 | 0.007 | 0.005 | 0.013 | 13.23 | 0.002 | |

submitted Card * | 30.20 | 9.85 | 9.49 | 16.47 | 29.13 | 6.14 | 0.013 | 0.021 | 0.004 | 0.003 | 0.009 | 0.048 | 15.18 | 0.007 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Castle, J.L.; Doornik, J.A.; Hendry, D.F. Forecasting Principles from Experience with Forecasting Competitions. *Forecasting* **2021**, *3*, 138-165.
https://doi.org/10.3390/forecast3010010

**AMA Style**

Castle JL, Doornik JA, Hendry DF. Forecasting Principles from Experience with Forecasting Competitions. *Forecasting*. 2021; 3(1):138-165.
https://doi.org/10.3390/forecast3010010

**Chicago/Turabian Style**

Castle, Jennifer L., Jurgen A. Doornik, and David F. Hendry. 2021. "Forecasting Principles from Experience with Forecasting Competitions" *Forecasting* 3, no. 1: 138-165.
https://doi.org/10.3390/forecast3010010