# On the Disagreement of Forecasting Model Selection Criteria

## Abstract

## 1. Introduction

## 2. Model Selection Criteria

#### 2.1. Criteria Based on In-Sample Accuracy Measurements

#### 2.2. Information Criteria

`auto.arima`and

`ets`functions of the forecast package for R, which allow the automatic selection of ARIMA and exponential smoothing models, respectively [27].

#### 2.3. Criteria Based on Cross-Validation

## 3. Forecasting Models

`ets`function of the forecast package for R, which was used to implement exponential smoothing in our study, limits the candidate models to 15 for seasonal data and 6 for non-seasonal data.

`ets`uses the likelihood to estimate the parameters of the models and AICc to select the most appropriate model form. Depending on the form of the error component, likelihood is defined as follows:

## 4. Empirical Evaluation

#### 4.1. Experimental Setup

#### 4.2. Results and Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## Appendix A. Forecasting Accuracy According to sMAPE

**Table A1.**Forecasting accuracy (sMAPE) of the examined criteria used for model selection. The average ranks of the selected models are also displayed. The results are presented per data frequency and for the complete data set. The bold numbers highlight the most accurate criterion per case.

Criterion | MASE | Average Rank | ||||||
---|---|---|---|---|---|---|---|---|

Yearly | Quarterly | Monthly | Total | Yearly | Quarterly | Monthly | Total | |

MSE | 15.065 | 10.212 | 13.176 | 12.821 | 3.190 | 6.868 | 6.692 | 5.969 |

MAE | 15.022 | 10.050 | 13.013 | 12.684 | 3.165 | 6.723 | 6.553 | 5.853 |

MSEh | 15.183 | 10.306 | 13.084 | 12.823 | 3.230 | 6.964 | 6.671 | 5.992 |

MAEh | 15.258 | 10.256 | 13.025 | 12.796 | 3.206 | 6.957 | 6.665 | 5.981 |

L | 15.307 | 10.276 | 13.173 | 12.889 | 3.168 | 6.901 | 6.728 | 5.991 |

AIC | 15.039 | 10.211 | 13.194 | 12.824 | 3.235 | 6.896 | 6.734 | 6.008 |

AICc | 14.784 | 10.200 | 13.272 | 12.805 | 3.245 | 6.919 | 6.789 | 6.045 |

BIC | 14.802 | 10.143 | 13.359 | 12.840 | 3.256 | 7.059 | 6.963 | 6.174 |

MSEv | 14.463 | 10.401 | 13.331 | 12.818 | 3.168 | 7.190 | 6.893 | 6.152 |

MAEv | 14.543 | 10.399 | 13.309 | 12.824 | 3.177 | 7.188 | 6.886 | 6.150 |

**Figure 1.**Percentage of time series where the model being selected based on a particular criterion is the same as that being selected according to another criterion. The percentages are computed in a pairwise fashion by considering the complete data set (91,444 series).

Additive Error | Multiplicative Error | ||||||
---|---|---|---|---|---|---|---|

Seasonality | Seasonality | ||||||

Trend | N | A | M | Trend | N | A | M |

N | ANN | ANA | ANM | N | MNN | MNA | MNM |

A | AAN | AAA | AAM | A | MAN | MAA | MAM |

Ad | AAdN | AAdA | AAdM | Ad | MAdN | MAdA | MAdM |

M | AMN | AMA | AMM | M | MMN | MMA | MMM |

Md | AMdN | AMdA | AMdM | Md | MMdN | MMdA | MMdM |

**Table 2.**Categorization of ETS models based on their complexity, i.e., number of estimated components.

Complexity | Models |
---|---|

Low | ANN, MNN |

Moderate | AAN, MAN, ANA, MNA, MNM |

Significant | AAdN, MAdN, AAA, MAA, MAM |

High | AAdA, MAdA, MAdM |

**Table 3.**Forecasting accuracy (MASE) of the examined criteria used for model selection. The average ranks of the selected models are also displayed. The results are presented per data frequency and for the complete data set. The bold numbers highlight the most accurate criterion per case.

Criterion | MASE | Average Rank | ||||||
---|---|---|---|---|---|---|---|---|

Yearly | Quarterly | Monthly | Total | Yearly | Quarterly | Monthly | Total | |

MSE | 3.471 | 1.151 | 0.923 | 1.542 | 3.200 | 6.865 | 6.675 | 5.962 |

MAE | 3.441 | 1.141 | 0.921 | 1.531 | 3.177 | 6.721 | 6.543 | 5.850 |

MSEh | 3.485 | 1.162 | 0.925 | 1.548 | 3.240 | 6.962 | 6.663 | 5.989 |

MAEh | 3.459 | 1.163 | 0.924 | 1.543 | 3.219 | 6.959 | 6.656 | 5.980 |

L | 3.432 | 1.159 | 0.934 | 1.541 | 3.180 | 6.906 | 6.728 | 5.995 |

AIC | 3.436 | 1.158 | 0.936 | 1.543 | 3.246 | 6.900 | 6.739 | 6.014 |

AICc | 3.407 | 1.158 | 0.939 | 1.538 | 3.256 | 6.924 | 6.795 | 6.051 |

BIC | 3.426 | 1.162 | 0.948 | 1.548 | 3.265 | 7.057 | 6.971 | 6.180 |

MSEv | 3.349 | 1.175 | 0.940 | 1.530 | 3.178 | 7.194 | 6.887 | 6.152 |

MAEv | 3.367 | 1.175 | 0.940 | 1.534 | 3.187 | 7.190 | 6.881 | 6.150 |

**Table 4.**Percentage of time series where the examined criteria selected a model of low, moderate, significant, and high complexity in terms of estimated components. The last column displays the percentage of time series where the most accurate model truly falls in the respective categories. The figures are computed based on the complete data set (91,444 series).

Complexity | MSE | MAE | MSEh | MAEh | L | AIC | AICc | BIC | MSEv | MAEv | Actual |
---|---|---|---|---|---|---|---|---|---|---|---|

Low | 1.81 | 8.15 | 7.04 | 8.92 | 1.96 | 23.77 | 27.99 | 41.78 | 12.31 | 12.44 | 12.12 |

Moderate | 22.59 | 25.85 | 36.13 | 35.93 | 24.82 | 39.48 | 39.18 | 36.98 | 40.06 | 39.94 | 37.14 |

Significant | 33.11 | 32.92 | 37.20 | 36.38 | 32.93 | 23.27 | 21.16 | 15.51 | 36.17 | 35.92 | 38.68 |

High | 42.49 | 33.08 | 19.63 | 18.77 | 40.30 | 13.48 | 11.67 | 5.74 | 11.45 | 11.70 | 12.07 |

**Table 5.**Percentage of time series where the examined criteria used for model selection successfully identified the most accurate alternative. The results are presented per data frequency and for the complete data set. The bold numbers highlight the most successful criterion per case.

Criterion | Yearly | Quarterly | Monthly | Total |
---|---|---|---|---|

MSE | 19.95 | 7.62 | 7.77 | 10.40 |

MAE | 20.70 | 9.26 | 8.94 | 11.61 |

MSEh | 22.11 | 11.68 | 11.76 | 14.01 |

MAEh | 22.33 | 11.47 | 11.70 | 13.97 |

L | 19.86 | 8.17 | 8.09 | 10.69 |

AIC | 20.46 | 9.25 | 8.46 | 11.30 |

AICc | 20.44 | 9.28 | 8.36 | 11.25 |

BIC | 20.39 | 9.02 | 7.86 | 10.91 |

MSEv | 20.86 | 11.27 | 11.37 | 13.43 |

MAEv | 20.76 | 11.14 | 11.38 | 13.38 |

