# Forecast Combination under Heavy-Tailed Errors

^{*}

## Abstract

**:**

## 1. Introduction

## 2. The t-AFTER Methodology

#### 2.1. Problem Setting

#### 2.2. The Existing AFTER Methods: The ${L}_{2}$- and ${L}_{1}$-AFTER Methods

#### 2.3. The t-AFTER Methods

- We decide a pool of candidate degrees of freedom with size K. The elements in the pool are considered to be close to the degrees of freedom of the Students’ t-distribution that describes the random errors well. For each element in the set, we assume it is the true degrees of freedom to estimate the related scale parameter. Therefore, we have K sets of estimate for the degrees of freedom and scale parameter pair.
- For each of the K sets of the estimate, we find its probability to be the true one based on the relative historical performances.

- Estimate (e.g., by MLE) ${s}_{i}$ for each ${\nu}_{k}\in \Omega $ and for each candidate forecaster. The estimate for ${s}_{i}$ from the j-th forecaster given ${\nu}_{k}$ is denoted as ${\widehat{s}}_{i,j,k}$.
- Calculate ${W}_{i}^{{A}_{t}}$ and ${\widehat{y}}_{i}^{{A}_{t}}$:$${W}_{i}^{{A}_{t}}=\frac{{\mathbf{l}}_{i-1}^{{A}_{t}}}{\left|\right|{\mathbf{l}}_{i-1}^{{A}_{t}}{\left|\right|}_{1}},\phantom{\rule{1.em}{0ex}}{\widehat{y}}_{i}^{{A}_{t}}=\langle {\widehat{Y}}_{i},{W}_{i}^{{A}_{t}}\rangle ,$$$${l}_{i-1,j}^{{A}_{t}}=\sum _{k=1}^{K}{l}_{i-1,j,k}^{{A}_{t}}\phantom{\rule{1.em}{0ex}}\text{with}\phantom{\rule{4.pt}{0ex}}\phantom{\rule{1.em}{0ex}}{l}_{i-1,j,k}^{{A}_{t}}={w}_{j,k}\prod _{{i}^{\prime}\ge {i}_{0}}^{i-1}\frac{1}{{\widehat{s}}_{{i}^{\prime},j,k}}{f}_{t}\left(\frac{{y}_{{i}^{\prime}}-{\widehat{y}}_{{i}^{\prime},j}}{{\widehat{s}}_{{i}^{\prime},j,k}}|{\nu}_{k}\right),$$

#### 2.4. Risk Bounds of the t-AFTER

#### 2.4.1. Conditions

#### 2.4.2. Risk Bounds for the t-AFTER with a Known ν

**Theorem 1.**If the random errors are from a scaled Student’s t-distribution with degrees of freedom ν and Condition 2 holds, then:

- When only Condition 2 is satisfied, Theorem 1 shows that the cumulative distance between the true densities and their estimators from the t-AFTER is upper bounded by the cumulative (standardized) forecast errors of the best candidate forecaster plus a penalty that has two parts: the squared relative estimation errors of the scale parameters and the logarithm of the initial weights. This risk bound is obtained without assuming the existence of the variances of the random errors, and ${\widehat{s}}_{i,j}/{s}_{i}$ is only required to be lower bounded.
- When ν is assumed to be strictly larger than two and both Conditions 1 and 2${}^{\prime}$ are satisfied, Theorem 1 shows that the cumulative forecast errors have the same convergence rate of the cumulative forecast errors of the best candidate forecaster plus a penalty that depends on the initial weights and efficiency of scale parameter estimation. The risk bounds hold even if the the distribution of random errors has tails as heavy as ${t}_{3}$.
- If there is no prior information to decide the ${w}_{j}$’s in (6), then equal initial weights could be applied. That is, ${w}_{j}=1/J$ for all j. In this case, it is easy to see that the number of candidate forecasters plays a role in the penalty. When the candidate pool is large, some preliminary analysis should be done to eliminate the significantly less competitive ones before applying the t-AFTER.

## 3. The g-AFTER Methodology

#### 3.1. The g-AFTER Method

#### 3.2. Conditions

#### 3.3. Risk Bounds for the g-AFTER

**Theorem 2.**If Conditions 3 and 4 hold, then for ${\widehat{y}}_{i}^{{A}_{g}}$ from a g-AFTER procedure, we have:

- Theorem 2 provides a risk bound for more general situations compared to Theorem 1. That is, as long as the the true random errors are from one of the three popular families, similar risk bounds hold.
- When strong evidence is shown that the errors are highly heavy tailed, Ω can be very small with only small degrees of freedom, and the ${c}_{2}{w}_{j,k}^{{A}_{t}}$ in G can be relatively large (relative to ${w}_{j}^{{A}_{2}}$ and ${c}_{1}{w}_{j}^{{A}_{1}}$). The more information on the tails of the error distributions is available, the more efficient the allocation of the initial weights can be.
- Specially, when the true random errors have tails significantly heavier than normal and double-exponential, they could be assumed to be from a scaled Student’s t-distribution with unknown ν, and a (general) t-AFTER procedure is more reasonable. In this case, ${l}_{i-1,j}^{{A}_{g}}={l}_{i-1,j}^{{A}_{t}}$.Let ${q}_{i}=\frac{1}{{s}_{i}}{f}_{t}\left(\frac{{\widehat{y}}_{i,j}-{y}_{i}}{{s}_{i}}\right)$ and ${\widehat{q}}_{i}^{{A}_{t}}={\sum}_{j,k}{\widehat{w}}_{i,j,k}^{{A}_{t}}\frac{1}{{\widehat{s}}_{i,j,k}}{f}_{t}\left(\frac{{\widehat{y}}_{i,j}-{y}_{i}}{{\widehat{s}}_{i,j,k}}|{\nu}_{k}\right)$ and ${\widehat{w}}_{i,j,k}^{{A}_{t}}\ge 0$ for all j and k. Without assuming Condition 1 is satisfied, it follows for any $n\ge 1$:$$\frac{1}{n}\sum _{i={i}_{0}+1}^{{i}_{0}+n}ED({q}_{i}\left|\right|{\widehat{q}}_{i}^{{A}_{t}})\le \underset{1\le j\le J}{inf}\left(\frac{log(1/{w}_{i,j}^{{A}_{t}})}{n}+\frac{{B}_{1}}{n}\sum _{i={i}_{0}+1}^{{i}_{0}+n}E\frac{{({m}_{i}-{\widehat{y}}_{i,j})}^{2}}{{\sigma}_{i}^{2}}+{R}^{*}\right),$$$${R}^{*}=\underset{1\le k\le K}{inf}\left(\frac{{B}_{2}}{n}\sum _{i={i}_{0}+1}^{{i}_{0}+n}E\frac{{({\widehat{s}}_{i,j,k}-{s}_{i})}^{2}}{{s}_{i}^{2}}+{B}_{3}\left|\frac{\nu -{\nu}_{k}}{\nu}\right|\right).$$If Condition 1 is also satisfied, then it follows:$$\frac{1}{n}\sum _{i={i}_{0}+1}^{{i}_{0}+n}E\frac{{({m}_{i}-{\widehat{y}}_{i}^{{A}_{t}})}^{2}}{{\sigma}_{i}^{2}}\le C\underset{1\le j\le J}{inf}(\frac{log(1/{w}_{i,j}^{{A}_{t}})}{n}+\frac{{B}_{1}}{n}\sum _{i={i}_{0}+1}^{{i}_{0}+n}E\frac{{({m}_{i}-{\widehat{y}}_{i,j})}^{2}}{{\sigma}_{i}^{2}}+{R}^{*}),$$

## 4. Simulations

- Use $\Omega =\{1,3\}$ as the set of candidate degrees of freedom for the scaled Student’s t-distributions considered in the t-AFTER method. The t-AFTER is proposed mostly to be applied when the error terms exhibit very strong heavy-tailed behaviors. When the degrees of freedom of the Student’s t-distribution gets larger, the t-AFTER becomes similar to the ${L}_{1}$- or ${L}_{2}$-AFTER. Thus, a choice of Ω with relatively small degrees of freedom in the g-AFTER should provide a good enough adaption capability. In fact, other options for Ω, such as $\Omega =\{1,3,5,8,15\}$, were considered, and similar results were found.
- Since it is usually the case that g-AFTER is preferred when the users have no consistent and strong evidence to identify the distribution of the error terms from the three candidate distribution families, we give equal initial weights to the candidate distributions. Therefore, ${c}_{1}=1$, ${c}_{2}=2$, ${w}_{j}^{{A}_{1}}={w}_{j}^{{A}_{2}}=1/J$ and ${w}_{j,k}^{{A}_{t}}=\frac{1}{2J}$ are used in the g-AFTER. Note that, for example, if there is clear and consistent evidence that the error distribution is more likely to be from the normal distribution family, then putting relatively large initial weights on the ${L}_{2}$-AFTER procedure in a g-AFTER can be more appropriate than using equal weights.
- The ${\widehat{s}}_{i,j,k}$’s are the sample median of the absolute forecast errors before time point i from the forecaster j divided by the theoretical median of the absolute value of a random variable with distribution ${t}_{{\nu}_{k}}$.

#### 4.1. Linear Regression Models

#### 4.1.1. Simulation Settings

#### 4.1.2. Results

${t}_{3}$ | $DE$ | ${t}_{10}$ | $\mathit{Normal}$ | |||||
---|---|---|---|---|---|---|---|---|

${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{1}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{9}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{1}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{9}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{1}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{9}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{1}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{9}$ | |

${p}_{0}=3$ | ||||||||

$A2$ | 1.302 | 1.043 | 1.116 | 1.028 | 0.983 | 0.958 | 0.926 | 0.931 |

(0.009) | (0.003) | (0.004) | (0.001) | (0.003) | (0.001) | (0.002) | (0.001) | |

$At$ | 0.943 | 0.980 | 0.983 | 0.995 | 0.941 | 0.955 | 0.932 | 0.942 |

(0.002) | (0.001) | (0.001) | (0.001) | (0.003) | (0.001) | (0.001) | (0.001) | |

$Ag$ | 0.944 | 0.967 | 0.974 | 0.977 | 0.940 | 0.950 | 0.926 | 0.938 |

(0.002) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | |

${p}_{0}=5$ | ||||||||

$A2$ | 1.257 | 1.066 | 1.088 | 1.026 | 0.980 | 0.955 | 0.937 | 0.927 |

(0.008) | (0.004) | (0.003) | (0.001) | (0.002) | (0.001) | (0.002) | (0.001) | |

$At$ | 0.950 | 0.967 | 0.976 | 0.982 | 0.951 | 0.950 | 0.943 | 0.938 |

(0.002) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | |

$Ag$ | 0.951 | 0.958 | 0.971 | 0.970 | 0.949 | 0.944 | 0.939 | 0.933 |

(0.001) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | |

${p}_{0}=10$ | ||||||||

$A2$ | 1.166 | 1.056 | 1.035 | 0.998 | 0.968 | 0.949 | 0.946 | 0.929 |

(0.006) | (0.003) | (0.002) | (0.001) | (0.002) | (0.001) | (0.001) | (0.001) | |

$At$ | 0.950 | 0.957 | 0.964 | 0.965 | 0.949 | 0.946 | 0.948 | 0.939 |

(0.002) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | |

$Ag$ | 0.945 | 0.949 | 0.961 | 0.955 | 0.944 | 0.939 | 0.942 | 0.933 |

(0.001) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) |

_{2}-, t- and g-adaptive forecasting through exponential re-weighting (AFTER) methods, respectively. The parameter p0 is the number of explanatory variables in the data-generating model. The true parameter (β) values are randomly generated from a uniform distribution, and the candidate forecasts are obtained from linear regressions with 1, 2 and up to the maximum number of explanatory variables. For each set of true parameters, 200 replicated datasets are generated to simulate the mean average squared estimation error (ASEE) for each combination method. The ratio of the mean ASEE of each method over that of the L

_{1}-AFTER is used to measure the relative performance of the competitors. The process is replicated 200 times, each time with independently-generated true β values. The means and their standard errors of the 200 sets of ratios are summarized in this table (the numbers in the parentheses are the standard errors).

#### 4.2. AR Models

#### 4.2.1. Simulation Settings

#### 4.2.2. Other Combination Methods

#### 4.2.3. Results

$\mathit{Normal}$ | ${t}_{10}$ | $\mathit{DE}$ | |||||||
---|---|---|---|---|---|---|---|---|---|

${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{1}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{4}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{9}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{1}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{4}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{9}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{1}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{4}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{9}$ | |

$A2$ | 0.941 | 0.940 | 0.940 | 0.972 | 0.972 | 0.971 | 1.030 | 1.032 | 1.033 |

(0.004) | (0.004) | (0.004) | (0.004) | (0.003) | (0.003) | (0.004) | (0.003) | (0.004) | |

$At$ | 0.954 | 0.953 | 0.954 | 0.961 | 0.962 | 0.962 | 0.997 | 1.001 | 0.995 |

(0.003) | (0.003) | (0.003) | (0.002) | (0.003) | (0.003) | (0.001) | (0.001) | (0.001) | |

$Ag$ | 0.948 | 0.947 | 0.948 | 0.957 | 0.959 | 0.958 | 0.978 | 0.983 | 0.976 |

(0.003) | (0.004) | (0.004) | (0.003) | (0.003) | (0.003) | (0.002) | (0.001) | (0.002) | |

$SA$ | 2.892 | 2.484 | 2.408 | 2.372 | 2.297 | 2.070 | 2.278 | 2.176 | 2.483 |

(0.268) | (0.166) | (0.189) | (0.167) | (0.174) | (0.127) | (0.148) | (0.151) | (0.148) | |

$MD$ | 1.681 | 2.025 | 1.824 | 1.884 | 1.874 | 1.421 | 1.740 | 1.602 | 1.943 |

(0.137) | (0.191) | (0.187) | (0.243) | (0.197) | (0.076) | (0.137) | (0.144) | (0.168) | |

$TM$ | 1.805 | 1.946 | 1.754 | 1.838 | 1.705 | 1.469 | 1.723 | 1.571 | 1.885 |

(0.121) | (0.144) | (0.134) | (0.156) | (0.138) | (0.066) | (0.109) | (0.093) | (0.120) | |

$BG$ | 1.441 | 1.462 | 1.389 | 1.425 | 1.364 | 1.321 | 1.431 | 1.357 | 1.500 |

(0.047) | (0.051) | (0.047) | (0.042) | (0.040) | (0.032) | (0.046) | (0.035) | (0.045) | |

$B{G}_{0.95}$ | 1.432 | 1.453 | 1.381 | 1.417 | 1.358 | 1.315 | 1.427 | 1.353 | 1.495 |

(0.047) | (0.050) | (0.047) | (0.042) | (0.040) | (0.032) | (0.045) | (0.035) | (0.045) | |

$B{G}_{0.9}$ | 1.429 | 1.449 | 1.378 | 1.414 | 1.355 | 1.313 | 1.425 | 1.352 | 1.492 |

(0.047) | (0.049) | (0.047) | (0.042) | (0.039) | (0.032) | (0.045) | (0.035) | (0.045) | |

$B{G}_{0.8}$ | 1.433 | 1.452 | 1.382 | 1.417 | 1.357 | 1.315 | 1.427 | 1.353 | 1.491 |

(0.047) | (0.050) | (0.047) | (0.042) | (0.040) | (0.032) | (0.045) | (0.035) | (0.044) | |

$B{G}_{0.7}$ | 1.447 | 1.464 | 1.394 | 1.428 | 1.366 | 1.322 | 1.432 | 1.357 | 1.495 |

(0.048) | (0.051) | (0.049) | (0.043) | (0.040) | (0.033) | (0.046) | (0.036) | (0.045) | |

$LR$ | 7.956 | 8.355 | 8.491 | 8.856 | 10.210 | 9.138 | 11.110 | 11.240 | 10.040 |

(0.346) | (0.339) | (0.342) | (0.387) | (1.032) | (0.363) | (0.504) | (0.509) | (0.513) | |

$CLR$ | 1.036 | 1.024 | 1.036 | 1.032 | 1.036 | 1.042 | 1.072 | 1.070 | 1.045 |

(0.011) | (0.013) | (0.012) | (0.011) | (0.010) | (0.011) | (0.011) | (0.011) | (0.013) |

**Table 3.**Simulation results on the $AR$ models with $p=5$ (heavy tailed) under squared estimation error.

${t}_{3}$ | Log-Normal | |||||
---|---|---|---|---|---|---|

${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{1}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{4}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{9}$ | $\mathbf{\sigma}=\mathbf{0}.\mathbf{25}$ | $\mathbf{\sigma}=\mathbf{0}.\mathbf{5}$ | $\mathbf{\sigma}=\mathbf{1}$ | |

$A2$ | 1.058 | 1.056 | 1.053 | 0.964 | 1.024 | 1.051 |

(0.009) | (0.008) | (0.008) | (0.003) | (0.004) | (0.010) | |

$At$ | 0.955 | 0.947 | 0.961 | 0.951 | 0.940 | 0.921 |

(0.006) | (0.006) | (0.006) | (0.003) | (0.004) | (0.008) | |

$Ag$ | 0.950 | 0.943 | 0.957 | 0.950 | 0.946 | 0.926 |

(0.006) | (0.006) | (0.006) | (0.003) | (0.004) | (0.008) | |

$SA$ | 2.047 | 1.889 | 1.931 | 2.253 | 2.143 | 1.730 |

(0.107) | (0.098) | (0.139) | (0.173) | (0.115) | (0.087) | |

$MD$ | 1.692 | 1.396 | 1.657 | 1.517 | 1.441 | 1.370 |

(0.135) | (0.066) | (0.182) | (0.097) | (0.085) | (0.078) | |

$TM$ | 1.625 | 1.438 | 1.508 | 1.559 | 1.555 | 1.404 |

(0.091) | (0.060) | (0.112) | (0.086) | (0.080) | (0.057) | |

$BG$ | 1.369 | 1.307 | 1.286 | 1.329 | 1.374 | 1.278 |

(0.034) | (0.025) | (0.033) | (0.039) | (0.038) | (0.025) | |

$B{G}_{0.95}$ | 1.365 | 1.303 | 1.282 | 1.322 | 1.370 | 1.275 |

(0.033) | (0.025) | (0.033) | (0.038) | (0.038) | (0.025) | |

$B{G}_{0.9}$ | 1.360 | 1.299 | 1.277 | 1.319 | 1.367 | 1.271 |

(0.033) | (0.025) | (0.032) | (0.037) | (0.037) | (0.024) | |

$B{G}_{0.8}$ | 1.352 | 1.290 | 1.269 | 1.320 | 1.366 | 1.259 |

(0.032) | (0.024) | (0.030) | (0.038) | (0.037) | (0.023) | |

$B{G}_{0.7}$ | 1.345 | 1.284 | 1.263 | 1.327 | 1.368 | 1.248 |

(0.032) | (0.023) | (0.030) | (0.039) | (0.037) | (0.023) | |

$LR$ | 95.280 | 38.290 | 46.220 | 9.316 | 13.180 | 174.000 |

(60.670) | (7.566) | (9.192) | (0.375) | (0.891) | (56.286) | |

$CLR$ | 1.014 | 1.007 | 1.016 | 1.046 | 1.032 | 0.974 |

(0.010) | (0.010) | (0.010) | (0.011) | (0.011) | (0.010) |

**Table 4.**Simulation results on the $AR$ models with $p=5$ (heavy tailed) under absolute estimation error.

${t}_{3}$ | Log-Normal | |||||
---|---|---|---|---|---|---|

${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{1}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{4}$ | ${\mathbf{\sigma}}^{\mathbf{2}}=\mathbf{9}$ | $\mathbf{\sigma}=\mathbf{0}.\mathbf{25}$ | $\mathbf{\sigma}=\mathbf{0}.\mathbf{5}$ | $\mathbf{\sigma}=\mathbf{1}$ | |

$A2$ | 1.018 | 1.019 | 1.019 | 0.981 | 0.997 | 1.017 |

(0.003) | (0.002) | (0.008) | (0.002) | (0.002) | (0.003) | |

$At$ | 0.990 | 0.988 | 0.993 | 0.982 | 0.976 | 0.975 |

(0.002) | (0.002) | (0.002) | (0.002) | (0.002) | (0.003) | |

$Ag$ | 0.988 | 0.986 | 0.991 | 0.979 | 0.978 | 0.977 |

(0.002) | (0.002) | (0.002) | (0.002) | (0.002) | (0.002) | |

$SA$ | 1.469 | 1.666 | 1.724 | 1.435 | 1.543 | 1.483 |

(0.064) | (0.076) | (0.080) | (0.069) | (0.064) | (0.064) | |

$MD$ | 1.209 | 1.314 | 1.412 | 1.129 | 1.279 | 1.196 |

(0.043) | (0.068) | (0.094) | (0.035) | (0.060) | (0.037) | |

$TM$ | 1.226 | 1.367 | 1.312 | 1.183 | 1.331 | 1.272 |

(0.040) | (0.056) | (0.085) | (0.033) | (0.050) | (0.040) | |

$BG$ | 1.187 | 1.272 | 1.489 | 1.159 | 1.245 | 1.210 |

(0.023) | (0.029) | (0.035) | (0.021) | (0.027) | (0.023) | |

$B{G}_{0.95}$ | 1.184 | 1.269 | 1.401 | 1.157 | 1.242 | 1.206 |

(0.022) | (0.029) | (0.034) | (0.021) | (0.027) | (0.023) | |

$B{G}_{0.9}$ | 1.181 | 1.266 | 1.378 | 1.156 | 1.240 | 1.201 |

(0.022) | (0.029) | (0.033) | (0.021) | (0.027) | (0.022) | |

$B{G}_{0.8}$ | 1.176 | 1.260 | 1.450 | 1.156 | 1.237 | 1.192 |

(0.021) | (0.028) | (0.033) | (0.021) | (0.027) | (0.021) | |

$B{G}_{0.7}$ | 1.173 | 1.256 | 1.352 | 1.159 | 1.236 | 1.185 |

(0.021) | (0.028) | (0.032) | (0.021) | (0.026) | (0.020) | |

$LR$ | 2.891 | 2.862 | 3.647 | 2.690 | 2.610 | 3.296 |

(0.084) | (0.097) | (1.393) | (0.074) | (0.077) | (0.121) | |

$CLR$ | 1.029 | 1.025 | 1.022 | 1.019 | 1.004 | 1.015 |

(0.006) | (0.006) | (0.006) | (0.008) | (0.008) | (0.006) |

## 5. Real Data Example

#### 5.1. Data and Settings

Mean | Se | Median | Min | ${Q}_{1}$ | ${Q}_{3}$ | Max | |
---|---|---|---|---|---|---|---|

$A1$ | 0.708 | 0.016 | 0.649 | 0.001 | 0.307 | 0.994 | 11.50 |

0.758 | 0.009 | 0.773 | 0.038 | 0.507 | 0.990 | 2.901 | |

$A2$ | 0.697 | 0.017 | 0.639 | 0.001 | 0.309 | 0.979 | 13.32 |

0.766 | 0.010 | 0.766 | 0.030 | 0.517 | 0.992 | 4.138 | |

$At$ | 0.708 | 0.015 | 0.646 | 0.001 | 0.312 | 1.003 | 8.632 |

0.760 | 0.009 | 0.769 | 0.034 | 0.509 | 0.993 | 3.717 | |

$Ag$ | 0.696 | 0.014 | 0.645 | 0.001 | 0.308 | 0.987 | 7.710 |

0.757 | 0.009 | 0.770 | 0.033 | 0.508 | 0.990 | 3.298 | |

$MD$ | 1.050 | 0.010 | 1.022 | 0.002 | 0.910 | 1.143 | 5.341 |

1.015 | 0.005 | 1.015 | 0.065 | 0.944 | 1.078 | 2.821 | |

$TM$ | 0.990 | 0.004 | 1.000 | 0.002 | 0.974 | 1.023 | 2.437 |

0.992 | 0.002 | 0.999 | 0.062 | 0.984 | 1.013 | 1.747 | |

$BG$ | 0.784 | 0.010 | 0.838 | 0.001 | 0.596 | 0.973 | 5.227 |

0.849 | 0.006 | 0.902 | 0.039 | 0.758 | 0.983 | 3.051 | |

$B{G}_{0.95}$ | 0.775 | 0.010 | 0.832 | 0.001 | 0.582 | 0.969 | 7.715 |

0.842 | 0.006 | 0.896 | 0.037 | 0.749 | 0.981 | 2.841 | |

$B{G}_{0.9}$ | 0.768 | 0.012 | 0.825 | 0.001 | 0.564 | 0.966 | 11.45 |

0.835 | 0.006 | 0.893 | 0.036 | 0.739 | 0.978 | 2.643 | |

$B{G}_{0.8}$ | 0.758 | 0.019 | 0.806 | 0.001 | 0.529 | 0.960 | 24.08 |

0.822 | 0.006 | 0.883 | 0.040 | 0.709 | 0.974 | 2.712 | |

$B{G}_{0.7}$ | 0.757 | 0.031 | 0.793 | 0.001 | 0.503 | 0.956 | 43.19 |

0.810 | 0.007 | 0.870 | 0.036 | 0.684 | 0.971 | 3.517 |

Mean | Se | Median | Min | ${Q}_{1}$ | ${Q}_{3}$ | Max | |
---|---|---|---|---|---|---|---|

$SA$ | 7.738 | 1.695 | 2.259 | 0.131 | 1.311 | 5.244 | 82.734 |

2.044 | 0.166 | 1.422 | 0.327 | 1.056 | 2.147 | 25.784 | |

$MD$ | 8.088 | 2.005 | 1.912 | 0.222 | 1.162 | 4.974 | 120.428 |

1.998 | 0.153 | 1.406 | 0.477 | 1.030 | 2.055 | 21.229 | |

$TM$ | 7.607 | 1.664 | 2.299 | 0.129 | 1.267 | 5.175 | 78.481 |

2.014 | 0.165 | 1.416 | 0.316 | 1.035 | 2.150 | 26.039 | |

$BG$ | 2.073 | 0.245 | 1.266 | 0.245 | 0.961 | 2.160 | 40.137 |

1.349 | 0.053 | 1.157 | 0.468 | 0.971 | 1.565 | 7.845 | |

$B{G}_{0.95}$ | 2.017 | 0.217 | 1.431 | 0.241 | 0.965 | 2.472 | 12.551 |

1.322 | 0.048 | 1.154 | 0.465 | 0.965 | 1.525 | 6.703 | |

$B{G}_{0.9}$ | 1.846 | 0.182 | 1.337 | 0.208 | 0.958 | 2.444 | 10.383 |

1.295 | 0.043 | 1.114 | 0.461 | 0.954 | 1.497 | 5.655 | |

$B{G}_{0.8}$ | 1.656 | 0.150 | 1.340 | 0.179 | 0.851 | 2.074 | 8.577 |

1.246 | 0.036 | 1.100 | 0.454 | 0.940 | 1.448 | 3.985 | |

$B{G}_{0.7}$ | 1.536 | 0.141 | 1.256 | 0.158 | 0.813 | 1.673 | 7.746 |

1.202 | 0.032 | 1.089 | 0.431 | 0.928 | 1.371 | 3.461 |

#### 5.2. Results

## 6. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Appendix

#### A.1

- Fact 1: $1-{(1-t)}^{a}\le \frac{at}{1-t}$ for $a\ge 0,0\le t<1$. Let $f(t,a)=1-{(1-t)}^{a}-at/(1-t)$, then $f(t,a)\le 0$, since $\partial f/\partial t=a{(1-t)}^{-2}({(1-t)}^{a+1}-1)\le 0$ and $f(0,a)=0$.
- Fact 2: $log\left(x\right)\le x-1$ for $x\ge 0$.
- Fact 3: For any $c>0$, $B(a,b)/B(a,b+c)$ decreases as b increases. The proof is pure arithmetic, and the key point is using the fact that $B(x,y)=\frac{x+y}{xy}{\prod}_{n=1}^{\infty}{\left(1+{\displaystyle \frac{xy}{n(x+y+n)}}\right)}^{-1}$.
- Fact 4: $E{(1+\frac{{Y}^{2}}{\nu})}^{-1}=\nu /(\nu +1)$, where $Y\sim {t}_{\nu}$ conditional on ν. Let $Z=Y\sqrt{(\nu +2)/\nu}$, then it is easy to show that $E{(1+\frac{{Y}^{2}}{\nu})}^{-1}=B(1/2,(\nu +2)/2)/B(1/2,\nu /2)=\nu /(\nu +1)$.
- Fact 5: $({s}^{2}-1)/2-log\left(s\right)\le \frac{{s}_{0}+2}{2{s}_{0}}{(1-s)}^{2}$ if $s\ge {s}_{0}>0$. Use Fact 2 to show that $-log\left(s\right)=log(1+(1-s)/s)\le (1-s)/s$.

#### A.2

**Lemma 1.**Let ${h}_{\nu}\left(x\right)$ be the density function of ${t}_{\nu}$, $\underline{\nu}>0$ and $\lambda >0$ be constants. Then, for any $0<{s}_{0}\le s$, $\underline{\nu}\le min(\nu ,{\nu}^{\prime})-2\le \overline{\nu}$ and $|\nu -{\nu}^{\prime}|\le \lambda $, we have:

**Proof.**After a proper reorganization, we have:

- Let ${\nu}^{*}=min(\nu ,{\nu}^{\prime})$ and using Facts 1, 2 and 3, then:$$\begin{array}{cc}& \phantom{\rule{1.em}{0ex}}log\frac{B(\frac{1}{2},\frac{{\nu}^{\prime}}{2})}{B(\frac{1}{2},\frac{\nu}{2})}\le \frac{|B(\frac{1}{2},\frac{\nu}{2})-B(\frac{1}{2},\frac{{\nu}^{\prime}}{2})|}{B(\frac{1}{2},\frac{\nu}{2})}=\frac{\int {t}^{-1/2}{(1-t)}^{{\nu}^{*}/2-1}(1-{(1-t)}^{|\nu -{\nu}^{\prime}|/2})dt}{B(\frac{1}{2},\frac{\nu}{2})}\hfill \\ & \le \frac{\frac{|\nu -{\nu}^{\prime}|}{2}\int {t}^{1/2}{(1-t)}^{{\nu}^{*}/2-2}dt}{B(\frac{1}{2},\frac{\nu}{2})}=\frac{|\nu -{\nu}^{\prime}|}{2}\frac{B(\frac{3}{2},\frac{{\nu}^{*}-2}{2})}{B(\frac{1}{2},\frac{\nu}{2})}=\frac{|\nu -{\nu}^{\prime}|}{2}\frac{B(\frac{3}{2},\frac{{\nu}^{*}-2}{2})}{B(\frac{1}{2},\frac{{\nu}^{*}-2}{2})}\frac{B(\frac{1}{2},\frac{{\nu}^{*}-2}{2})}{B(\frac{1}{2},\frac{\nu}{2})}\hfill \\ & =\frac{|\nu -{\nu}^{\prime}|}{2}\frac{1}{{\nu}^{*}-1}\frac{B(\frac{1}{2},\frac{\underline{\nu}}{2})}{B(\frac{1}{2},\frac{\underline{\nu}+2}{2})}=\frac{|\nu -{\nu}^{\prime}|}{\nu}\frac{\nu}{{\nu}^{*}-1}\frac{B(\frac{1}{2},\frac{\underline{\nu}}{2})}{B(\frac{1}{2},\frac{\underline{\nu}+2}{2})}\le \frac{|\nu -{\nu}^{\prime}|}{\nu}\frac{\underline{\nu}+\lambda}{\underline{\nu}+1}\frac{B(\frac{1}{2},\frac{\underline{\nu}}{2})}{B(\frac{1}{2},\frac{\underline{\nu}+2}{2})}\hfill \\ & \le \frac{|\nu -{\nu}^{\prime}|}{\nu}\frac{\underline{\nu}+\lambda}{\underline{\nu}+1}\hfill \end{array}$$
- Using Fact 2 in Subsection A.1, it follows: $\frac{1}{2}log\frac{{\nu}^{\prime}}{\nu}\le \frac{1}{2}\frac{{\nu}^{\prime}-\nu}{\nu}\le \frac{1}{2}\frac{|{\nu}^{\prime}-\nu |}{\nu}.$
- It is easy to show that:$$\begin{array}{cc}& \phantom{\rule{1.em}{0ex}}E\left\{log\left(s\right)+\frac{1+{\nu}^{\prime}}{2}log(1+\frac{{(X-t)}^{2}}{{s}^{2}{\nu}^{\prime}})-\frac{1+\nu}{2}log(1+\frac{{X}^{2}}{\nu})\right\}\hfill \\ & =E\left\{log\left(s\right)-(1+{\nu}^{\prime})log\left(s\right)+\frac{1+{\nu}^{\prime}}{2}log\left(\frac{{s}^{2}+\frac{{(X-t)}^{2}}{{\nu}^{\prime}}}{1+\frac{{X}^{2}}{\nu}}\right)+\frac{{\nu}^{\prime}-\nu}{2}log(1+{X}^{2}/\nu )\right\}\hfill \\ & \le -{\nu}^{\prime}log\left(s\right)+E\left\{\frac{1+{\nu}^{\prime}}{2}\frac{{s}^{2}-1+{(X-t)}^{2}/{\nu}^{\prime}-{X}^{2}/\nu}{1+{X}^{2}/\nu}+{X}^{2}|{\nu}^{\prime}-\nu |/\nu \right\}\hfill \\ & \le (2+\overline{\nu})\frac{2+{s}_{0}}{2{s}_{0}}{(1-s)}^{2}+\frac{\underline{\nu}+3}{\underline{\nu}+2}{t}^{2}+{C}_{3}^{*}\frac{|{\nu}^{\prime}-\nu |}{\nu},\hfill \end{array}$$

**Lemma 2.**Let $h\left(x\right)$ be the density function of a double-exponential distribution with $\mu =0$ and $d=1$, then for ${s}_{0}>0$ and $s\ge {s}_{0}$, it follows:

**Proof.**Since $h\left(y\right)=\frac{1}{2}exp(-|y\left|\right)$ and $exp(-x)\le 1-x+\frac{{x}^{2}}{2}$ for $x\ge 0$, then:

**Lemma 3.**Let $h\left(y\right)$ be the density function of a standard normal distribution, then for ${s}_{0}>0$ and $s\ge {s}_{0}$, it follows:

**Proof.**Using Fact 2,

#### A.3

#### A.4

## References

- J.M. Bates, and C.W.J. Granger. “The combination of forecasts.” OR 20 (1969): 451–468. [Google Scholar] [CrossRef]
- R.T. Clemen. “Combining forecasts: A review and annotated bibliography.” Int. J. Forecast. 5 (1989): 559–583. [Google Scholar] [CrossRef]
- P. Newbold, and D.I. Harvey. “Forecast combination and encompassing.” In A Companion to Economic Forecasting. Edited by M.P. Clements and D.F. Hendry. Malden, MA, USA: WILEY, 2002, pp. 268–283. [Google Scholar]
- A. Timmermann. “Forecast combinations.” In Handbook of Economic Forecasting. Amsterdam, The Netherlands: NORTH-HOLLAND, 2006, Volume 1, pp. 135–196. [Google Scholar]
- K. Lahiri, H. Peng, and Y. Zhao. “Machine Learning and Forecast Combination in Incomplete Panels.” 2013. Available online: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2359523 (accessed on 10 October 2015).
- J.S. Armstrong, K.C. Green, and A. Graefe. “Golden rule of forecasting: Be conservative.” J. Bus. Res. 68 (2015): 1717–1731. [Google Scholar] [CrossRef]
- K.C. Green, and J.S. Armstrong. “Simple versus complex forecasting: The evidence.” J. Bus. Res. 68 (2015): 1678–1685. [Google Scholar] [CrossRef]
- Y. Yang. “Combining forecasting procedures: Some theoretical results.” Econom. Theory 20 (2004): 176–222. [Google Scholar] [CrossRef]
- C. Marinelli, S. Rachev, and R. Roll. “Subordinated exchange rate models: Evidence for heavy tailed distributions and long-range dependence.” Math. Comput. Model. 34 (2001): 955–1001. [Google Scholar] [CrossRef]
- A.C. Harvey. Dynamic Models for Volatility and Heavy Tails: With Applications to Financial and Economical Time Series. New York, NY, USA: Cambridge University Press, 2013, p. 69. [Google Scholar]
- H. Zou, and Y. Yang. “Combining time series models for forecasting.” Int. J. Forecast. 20 (2004): 69–84. [Google Scholar] [CrossRef]
- X. Wei, and Y. Yang. “Robust forecast combinations.” J. Econom. 166 (2012): 224–236. [Google Scholar] [CrossRef]
- C. Fernandez, and M.F.J. Steel. “Multivariate Student-t regression models: Pitfalls and inference.” Biometrika 86 (1999): 153–167. [Google Scholar] [CrossRef]
- T.C.O. Fonseca, M.A.R. Ferreira, and H.S. Migon. “Objective bayesian analysis for the Student-t regression model.” Biometrika 95 (2008): 325–333. [Google Scholar] [CrossRef]
- R. Kan, and G. Zhou. Modeling Non-Normality Using Multivariate T: Implications for Asset Pricing. Technical Report; Toronto, ON, Canada: Rotman School of Management, University of Toronto, 2003. [Google Scholar]
- C.W.J. Granger, and R. Ramanathan. “Improved methods of forecasting.” J. Forecast. 3 (1984): 197–204. [Google Scholar] [CrossRef]
- A. Sancetta. “Recursive forecast combination for dependent heterogeneous data.” Econom. Theory 26 (2010): 598–631. [Google Scholar] [CrossRef]
- G. Cheng, and Y. Yang. “Forecast combination with outlier protection.” Int. J. Forecast. 31 (2015): 223–237. [Google Scholar] [CrossRef]
- S. Makridakis, and M. Hibon. “The M3-Competition: Results, conclusions and implications.” Int. J. Forecast. 16 (2000): 451–476. [Google Scholar] [CrossRef]
- A. Inoue, and L. Kilian. “How useful is bagging in forecasting economic time series? A case study of U.S. consumer price inflation.” J. Am. Stat. Assoc. 103 (2008): 511–522. [Google Scholar] [CrossRef]
- I. Sanchez. “Adaptive combination of forecasts with application to wind energy.” Int. J. Forecast. 24 (2008): 679–693. [Google Scholar] [CrossRef]
- C. Altavilla, and P. de Grauwe. “Forecasting and combining competing models of exchange rate determination.” Appl. Econ. 42 (2010): 3455–3480. [Google Scholar] [CrossRef]
- X. Zhang, Z. Lu, and G. Zou. “Adaptively combined forecasting for discrete response time series.” J. Econom. 176 (2013): 80–91. [Google Scholar] [CrossRef]
- C.K. Ing. “Accumulated prediction errors, information criteria and optimal forecasting for autoregressive time series.” Ann. Stat. 35 (2007): 1238–1277. [Google Scholar] [CrossRef]
- C.K. Ing, C.-Y. Sin, and S.-H. Yu. “Model selection for integrated autoregressive processes of infinite order.” J. Multivar. Anal. 106 (2012): 57–71. [Google Scholar] [CrossRef]
- B.E. Hansen. “Least squares forecast averaging.” J. Econom. 146 (2008): 342–350. [Google Scholar] [CrossRef]
- J.H. Stock, and M.W. Watson. “Forecasting with many predictors.” In Handbook of Economic Forecasting. Amsterdam, The Netherlands: NORTH-HOLLAND, 2006, Volume 1, pp. 515–554. [Google Scholar]
- “M3-Competition data.” Available online: http://forecasters.org/resources/time-series-data/m3-comp-etition/ (accessed on 9 October 2015).
- Z. Yuan, and Y. Yang. “Combining Linear Regression Models: When and How? ” J. Am. Stat. Assoc. 100 (2005): 1202–1204. [Google Scholar] [CrossRef]
- P. Hansen, A. Lunde, and J. Nason. “The model confidence set.” Econometrica 79 (2011): 453–497. [Google Scholar] [CrossRef]
- D. Ferrari, and Y. Yang. “Confidence sets for model selection by F-testing.” Stat. Sin. 25 (2015): 1637–1658. [Google Scholar] [CrossRef]
- J.D. Samuels, and R.M. Sekkel. “Forecasting with Many Models: Model Confidence Sets and Forecast Combination.” Working Paper. 2013. Available online: http://www.bankofcanada.ca/wp-content/uploads/2013/04/wp2013-11.pdf (accessed on 10 October 2015).

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license ( http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Cheng, G.; Wang, S.; Yang, Y. Forecast Combination under Heavy-Tailed Errors. *Econometrics* **2015**, *3*, 797-824.
https://doi.org/10.3390/econometrics3040797

**AMA Style**

Cheng G, Wang S, Yang Y. Forecast Combination under Heavy-Tailed Errors. *Econometrics*. 2015; 3(4):797-824.
https://doi.org/10.3390/econometrics3040797

**Chicago/Turabian Style**

Cheng, Gang, Sicong Wang, and Yuhong Yang. 2015. "Forecast Combination under Heavy-Tailed Errors" *Econometrics* 3, no. 4: 797-824.
https://doi.org/10.3390/econometrics3040797