# Financial Time Series Forecasting Using Empirical Mode Decomposition and Support Vector Regression

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Results: Forecasting the S&P 500 Index up to 25 min Ahead

#### 2.1. Data

#### 2.2. Intraday Forecasting: Example of a Single Time Series for 7 August 2014

- $m=1$ lagged values of each IMF and the residue.
- $m=5$ lagged values of each IMF and the residue.
- $m=p+d$, where p denotes the number of autoregressive terms and d is the number of differentiations of an autoregressive integrated moving average (ARIMA$(p,d,q)$) model that was fitted to each of the IMFs and to the residue. For the implementation of the ARIMA$(p,d,q)$ models, the software package auto.arima function available in R was used (Hyndman and Khandakar 2008) (see Appendix A).

#### 2.2.1. Univariate EMD–SVR Results

#### 2.2.2. Multivariate EMD–SVR Results

#### 2.3. Intraday Forecasting: Analysis of the Complete Dataset

- Naive model, which keeps constant the last observed value in the time series.
- ARIMA$(p,d,q)$ model.
- SVR model on the original (nondecomposed) time series, with the same setting as for the EMD–SVR models (i.e., Gaussian kernel; three-dimensional grid to look for the optimal parameters: ${log}_{10}\left(C\right)\in (-4,4)$, ${log}_{10}\left(\gamma \right)\in (-4,4)$, and ${log}_{10}\left(\u03f5\right)\in (-4,0)$; and a sixfold moving validation used in each iteration of the grid search for parameter tuning).

#### 2.4. Statistical Significance

## 3. Discussion

- Across all forecasting models, the MAE increased with the forecast horizon following the intuition that the distant future is harder to predict.
- The direct strategy achieved more accurate forecasts than the recursive strategy in almost all the tested models.
- The smallest errors were observed for the input vector of length $m=p+d$. Similar results were obtained for models with input vector $m=5$, as $m=p+d$ was often around 5, whereas the case $m=1$ produced poorer results.
- For short time-horizons ($h\le 5$), the best results were obtained by the direct univariate EMD–SVR model that included all IMFs and the residue. For large time-horizons ($h\ge 30$), the best results were obtained by the direct univariate EMD–SVR model that included the residue only. The intermediate case of $h=20$ favored the inclusion of the last two IMFs.
- The direct EMD–SVR multivariate strategy performed better than the naive and ARIMA$(p,d,q)$ benchmarks across all horizons and performed better than the direct and recursive SVR benchmarks for $h\ge 5$.

- The direct EMD–SVR strategy provides consistently better results than the recursive strategy.
- For short time-horizons ($h\le 5$), better results are obtained with models including all the IMFs and the residue. For longer time-horizons ($h\ge 20$), models with the residue only or models with only few slowly oscillating IMFs become significantly better than the naive model.
- The direct EMD–SVR multivariate strategy provides forecasting results that significantly outperform the naive model for all time-horizons greater than $h=1$ and outperforms the other models from $h\ge 5$.

## 4. Materials and Methods

#### 4.1. EMD

#### 4.2. Forecasting Financial Time Series

#### 4.3. Recursive and Direct Strategies for h-Steps-Ahead Forecast

- The recursive strategy constructs a prediction model, which optimizes the one-step-ahead prediction: ${\widehat{z}}_{t+1}=f({z}_{t},\dots ,{z}_{t-m+1})$. Then it uses the same model for the next forecasted value: ${\widehat{z}}_{t+2}=f({\widehat{z}}_{t+1},\dots ,{z}_{t-m})$, with the forecasted value of ${\widehat{z}}_{t+1}$ used instead of the true value, which is unknown. The procedure continues recursively:$${\widehat{z}}_{t+h}=f({\widehat{z}}_{t+h-1},\dots {\widehat{z}}_{t+1},{z}_{t},\dots {z}_{t+h-m})$$
- The direct strategy uses a different model for each forecast horizon. The various forecasting models are independently estimated. In this case, the h-step-ahead forecast is expressed as follows:$${\widehat{z}}_{t+h}={f}_{h}({z}_{t},\dots {z}_{t-m+1})$$

#### 4.4. Support Vector Regression

#### 4.5. EMD–SVR Forecasting

#### 4.5.1. Univariate EMD–SVR

#### 4.5.2. Multivariate EMD–SVR

#### 4.6. Model Selection and Parameter Estimation

#### 4.7. Measure of Performance

#### 4.8. Statistical Significance Test

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Appendix A. Autoregressive Integrated Moving Average

## Appendix B. Results for m = 1 and m = 5

**Table A1.**Mean absolute error (MAE): mean and standard deviation (std) for the considered forecasting models with input vector $m=1$ lagged values. The smallest MAE of each forecast horizon is set in boldface.

Steps Ahead h | 1 | 2 | 3 | 5 | 10 | 20 | 30 | 50 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Model | Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | |

Benchmarks | Naive | 0.147 | (0.186) | 0.256 | (0.322) | 0.328 | (0.418) | 0.423 | (0.514) | 0.651 | (0.801) | 0.991 | (1.053) | 1.143 | (1.294) | 1.559 | (1.803) |

ARIMA | 0.145 | (0.185) | 0.247 | (0.303) | 0.313 | (0.374) | 0.421 | (0.477) | 0.663 | (0.783) | 1.021 | (1.035) | 1.154 | (1.304) | 1.644 | (1.789) | |

Direct SVR | 0.204 | (0.200) | 0.313 | (0.395) | 0.378 | (0.473) | 0.529 | (0.610) | 0.832 | (1.040) | 1.270 | (1.380) | 1.540 | (1.587) | 2.185 | (2.336) | |

Recursive SVR | 0.204 | (0.200) | 0.283 | (0.365) | 0.324 | (0.412) | 0.439 | (0.556) | 0.628 | (0.831) | 0.986 | (1.140) | 1.077 | (1.221) | 1.599 | (1.868) | |

Direct EMD–SVR | Multivariate | 0.162 | (0.180) | 0.264 | (0.317) | 0.330 | (0.374) | 0.417 | (0.484) | 0.650 | (0.718) | 0.921 | (1.010) | 1.229 | (1.292) | 1.642 | (1.947) |

$R+{\displaystyle \sum _{i=1}^{5}}{\mathrm{IMF}}_{i}$ | 0.196 | (0.205) | 0.308 | (0.356) | 0.378 | (0.425) | 0.491 | (0.504) | 0.748 | (0.805) | 1.180 | (1.259) | 1.498 | (1.361) | 1.903 | (2.103) | |

$R+{\displaystyle \sum _{i=2}^{5}}{\mathrm{IMF}}_{i}$ | 0.196 | (0.198) | 0.304 | (0.354) | 0.376 | (0.432) | 0.489 | (0.502) | 0.747 | (0.803) | 1.181 | (1.258) | 1.498 | (1.362) | 1.903 | (2.102) | |

$R+{\displaystyle \sum _{i=3}^{5}}{\mathrm{IMF}}_{i}$ | 0.243 | (0.244) | 0.310 | (0.345) | 0.384 | (0.433) | 0.490 | (0.514) | 0.749 | (0.806) | 1.180 | (1.259) | 1.505 | (1.362) | 1.904 | (2.098) | |

$R+{\displaystyle \sum _{i=4}^{5}}{\mathrm{IMF}}_{i}$ | 0.380 | (0.430) | 0.436 | (0.452) | 0.478 | (0.483) | 0.543 | (0.532) | 0.799 | (0.867) | 1.198 | (1.265) | 1.488 | (1.349) | 1.919 | (2.093) | |

$R+{\displaystyle \sum _{i=5}^{5}}{\mathrm{IMF}}_{i}$ | 0.616 | (0.677) | 0.657 | (0.712) | 0.691 | (0.709) | 0.730 | (0.764) | 0.869 | (1.021) | 1.218 | (1.317) | 1.506 | (1.382) | 1.915 | (2.085) | |

R | 0.890 | (0.971) | 0.924 | (0.989) | 0.934 | (0.946) | 0.964 | (0.966) | 1.033 | (1.076) | 1.297 | (1.358) | 1.546 | (1.505) | 1.924 | (2.036) | |

Recursive EMD–SVR | Multivariate | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |

$R+{\displaystyle \sum _{i=1}^{5}}{\mathrm{IMF}}_{i}$ | 0.196 | (0.205) | 0.278 | (0.331) | 0.333 | (0.432) | 0.416 | (0.474) | 0.633 | (0.774) | 0.939 | (1.036) | 1.143 | (1.307) | 1.545 | (1.775) | |

$R+{\displaystyle \sum _{i=2}^{5}}{\mathrm{IMF}}_{i}$ | 0.196 | (0.198) | 0.279 | (0.331) | 0.333 | (0.432) | 0.415 | (0.474) | 0.631 | (0.774) | 0.940 | (1.036) | 1.143 | (1.307) | 1.545 | (1.775) | |

$R+{\displaystyle \sum _{i=3}^{5}}{\mathrm{IMF}}_{i}$ | 0.243 | (0.244) | 0.301 | (0.332) | 0.354 | (0.402) | 0.437 | (0.469) | 0.653 | (0.770) | 0.950 | (1.035) | 1.158 | (1.312) | 1.557 | (1.789) | |

$R+{\displaystyle \sum _{i=4}^{5}}{\mathrm{IMF}}_{i}$ | 0.380 | (0.430) | 0.431 | (0.444) | 0.458 | (0.471) | 0.514 | (0.508) | 0.710 | (0.775) | 1.010 | (1.074) | 1.209 | (1.309) | 1.621 | (1.800) | |

$R+{\displaystyle \sum _{i=5}^{5}}{\mathrm{IMF}}_{i}$ | 0.616 | (0.677) | 0.647 | (0.708) | 0.675 | (0.695) | 0.685 | (0.741) | 0.816 | (0.982) | 1.076 | (1.237) | 1.295 | (1.282) | 1.644 | (1.851) | |

R | 0.890 | (0.971) | 0.920 | (0.987) | 0.923 | (0.940) | 0.937 | (0.951) | 0.989 | (1.049) | 1.209 | (1.237) | 1.414 | (1.389) | 1.721 | (1.848) |

**Table A2.**Mean absolute error (MAE) and standard deviation (std) for the considered forecasting models: naive, autoregressive integrated moving average (ARIMA$(p,d,q)$), univariate and multivariate empirical mode decomposition–support vector regression (EMD–SVR) with input vector $m=5$ lagged values. The smallest MAE of each forecast horizon is set in boldface.

Steps Ahead h | 1 | 2 | 3 | 5 | 10 | 20 | 30 | 50 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Model | Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | |

Benchmarks | Naive | 0.147 | (0.186) | 0.256 | (0.322) | 0.328 | (0.418) | 0.423 | (0.514) | 0.651 | (0.801) | 0.991 | (1.053) | 1.143 | (1.294) | 1.559 | (1.803) |

ARIMA | 0.145 | (0.185) | 0.247 | (0.303) | 0.313 | (0.374) | 0.421 | (0.477) | 0.663 | (0.783) | 1.021 | (1.035) | 1.154 | (1.304) | 1.644 | (1.789) | |

Direct SVR | 0.145 | (0.146) | 0.242 | (0.289) | 0.294 | (0.362) | 0.408 | (0.450) | 0.615 | (0.723) | 0.943 | (0.981) | 1.101 | (1.201) | 1.641 | (1.733) | |

Recursive SVR | 0.145 | (0.146) | 0.257 | (0.357) | 0.393 | (0.566) | 0.577 | (0.841) | 1.080 | (1.922) | 1.228 | (1.939) | 1.338 | (2.054) | 1.627 | (2.366) | |

Direct EMD–SVR | Multivariate | 0.144 | (0.180) | 0.222 | (0.289) | 0.284 | (0.353) | 0.379 | (0.448) | 0.585 | (0.711) | 0.866 | (0.928) | 1.017 | (1.153) | 1.379 | (1.585) |

$R+{\displaystyle \sum _{i=1}^{5}}{\mathrm{IMF}}_{i}$ | 0.120 | (0.146) | 0.181 | (0.226) | 0.234 | (0.278) | 0.371 | (0.426) | 0.557 | (0.667) | 0.874 | (0.896) | 1.024 | (1.046) | 1.430 | (1.601) | |

$R+{\displaystyle \sum _{i=2}^{5}}{\mathrm{IMF}}_{i}$ | 0.124 | (0.138) | 0.182 | (0.228) | 0.234 | (0.286) | 0.371 | (0.421) | 0.557 | (0.658) | 0.879 | (0.898) | 1.023 | (1.049) | 1.433 | (1.602) | |

$R+{\displaystyle \sum _{i=3}^{5}}{\mathrm{IMF}}_{i}$ | 0.173 | (0.184) | 0.217 | (0.240) | 0.247 | (0.283) | 0.373 | (0.387) | 0.556 | (0.637) | 0.886 | (0.910) | 1.029 | (1.053) | 1.441 | (1.611) | |

$R+{\displaystyle \sum _{i=4}^{5}}{\mathrm{IMF}}_{i}$ | 0.277 | (0.324) | 0.316 | (0.331) | 0.334 | (0.349) | 0.389 | (0.390) | 0.544 | (0.604) | 0.795 | (0.848) | 0.916 | (0.930) | 1.298 | (1.425) | |

$R+{\displaystyle \sum _{i=5}^{5}}{\mathrm{IMF}}_{i}$ | 0.449 | (0.495) | 0.471 | (0.519) | 0.491 | (0.511) | 0.503 | (0.536) | 0.594 | (0.714) | 0.788 | (0.903) | 0.927 | (0.916) | 1.258 | (1.393) | |

R | 0.655 | (0.709) | 0.675 | (0.725) | 0.679 | (0.694) | 0.688 | (0.707) | 0.729 | (0.782) | 0.788 | (0.811) | 0.908 | (0.876) | 1.167 | (1.234) | |

Recursive EMD–SVR | Multivariate | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |

$R+{\displaystyle \sum _{i=1}^{5}}{\mathrm{IMF}}_{i}$ | 0.120 | (0.1460) | 0.241 | (0.321) | 0.383 | (0.463) | 0.634 | (0.813) | 0.829 | (0.915) | 1.077 | (1.138) | 1.298 | (1.391) | 1.807 | (1.949) | |

$R+{\displaystyle \sum _{i=2}^{5}}{\mathrm{IMF}}_{i}$ | 0.124 | (0.138) | 0.239 | (0.302) | 0.381 | (0.460) | 0.612 | (0.820) | 0.826 | (0.921) | 1.085 | (1.153) | 1.299 | (1.389) | 1.806 | (1.950) | |

$R+{\displaystyle \sum _{i=3}^{5}}{\mathrm{IMF}}_{i}$ | 0.173 | (0.184) | 0.245 | (0.299) | 0.370 | (0.442) | 0.535 | (0.703) | 0.823 | (0.909) | 1.082 | (1.143) | 1.341 | (1.410) | 1.807 | (1.939) | |

$R+{\displaystyle \sum _{i=4}^{5}}{\mathrm{IMF}}_{i}$ | 0.277 | (0.324) | 0.315 | (0.312) | 0.383 | (0.380) | 0.544 | (0.560) | 0.724 | (0.791) | 1.076 | (1.123) | 1.327 | (1.390) | 1.777 | (1.906) | |

$R+{\displaystyle \sum _{i=5}^{5}}{\mathrm{IMF}}_{i}$ | 0.449 | (0.495) | 0.464 | (0.501) | 0.482 | (0.497) | 0.522 | (0.552) | 0.673 | (0.773) | 0.929 | (1.070) | 1.211 | (1.329) | 1.736 | (1.896) | |

R | 0.655 | (0.709) | 0.666 | (0.709) | 0.657 | (0.669) | 0.651 | (0.674) | 0.736 | (0.756) | 0.962 | (1.019) | 1.134 | (1.240) | 1.523 | (1.887) |

**Figure A1.**Mean absolute error (MAE) as a function of the forecast horizon for the considered forecasting models: naive, autoregressive integrated moving average (ARIMA$(p,d,q)$), and univariate and multivariate empirical mode decomposition–support vector regression (EMD–SVR) with input vector $m=1$ lagged values.

**Figure A2.**Mean absolute error (MAE) as a function of the forecast horizon for the considered forecasting models: naive, autoregressive integrated moving average (ARIMA$(p,d,q)$), univariate and multivariate empirical mode decomposition–support vector regression (EMD–SVR) with input vector $m=5$ lagged values.

**Table A3.**Z-statistic for the Wilcoxon signed-rank test for the null hypothesis that the naive model is as accurate as the studied models: autoregressive integrated moving average (ARIMA$(p,d,q)$), and univariate and multivariate empirical mode decomposition–support vector regression (EMD–SVR) with input vector $m=1$. Top: direct strategy; bottom: recursive strategy. * Statistically significant at the 5% confidence level. ** Statistically significant at the 1% confidence level.

Model\Steps Ahead h | 1 | 2 | 3 | 5 | 10 | 20 | 30 | 50 | |
---|---|---|---|---|---|---|---|---|---|

Benchmarks | ARIMA | −0.22 | 0.16 | 0.89 | 0.30 | −1.45 | −1.59 | −0.49 | −1.47 |

Direct SVR | −4.72 ** | −3.05 ** | −2.54 * | −3.24 ** | −3.47 ** | −2.75 ** | −3.35 ** | −3.39 ** | |

Recursive SVR | −4.72 ** | −1.83 | −0.44 | −0.58 | 0.03 | 1.01 | 1.37 | 0.27 | |

Direct EMD–SVR | Multivariate | −1.83 | −0.80 | −0.39 | 0.94 | 0.22 | 1.37 | −0.18 | 1.41 |

$R+{\displaystyle \sum _{i=1}^{5}}{\mathrm{IMF}}_{i}$ | −4.07 ** | −3.07 ** | −3.11 * | −2.96 ** | −2.72 ** | −2.11 * | −3.70 ** | −2.32 * | |

$R+{\displaystyle \sum _{i=2}^{5}}{\mathrm{IMF}}_{i}$ | −4.21 ** | −3.12 ** | −2.84 ** | −2.88 ** | −2.73 ** | −2.12 * | −3.71 ** | −2.32 * | |

$R+{\displaystyle \sum _{i=3}^{5}}{\mathrm{IMF}}_{i}$ | −5.62 ** | −3.18 ** | −2.99 ** | −2.60 ** | −2.69 ** | −2.12 * | −3.74 ** | −2.37 * | |

$R+{\displaystyle \sum _{i=4}^{5}}{\mathrm{IMF}}_{i}$ | −7.25 ** | −5.73 ** | −4.65 ** | −3.48 ** | −3.37 ** | −2.33 * | −3.51 ** | −2.45 * | |

$R+{\displaystyle \sum _{i=5}^{5}}{\mathrm{IMF}}_{i}$ | −8.35 ** | −7.51 ** | −6.36 ** | −4.99 ** | −3.84 ** | −2.46 * | −3.20 ** | −2.47 * | |

R | −9.06 ** | −8.14 ** | −7.84 ** | −7.00 ** | −5.51 ** | −3.10 ** | −3.66 ** | −2.66 ** | |

Recursive EMD–SVR | Multivariate | — | — | — | — | — | — | — | — |

$R+{\displaystyle \sum _{i=1}^{5}}{\mathrm{IMF}}_{i}$ | −4.07 ** | −1.96 | −0.41 | −0.21 | 0.63 | 1.95 | 0.90 | 1.55 | |

$R+{\displaystyle \sum _{i=2}^{5}}{\mathrm{IMF}}_{i}$ | −4.21 ** | −2.08 * | −0.62 | −0.13 | 0.76 | 1.93 | 0.92 | 1.51 | |

$R+{\displaystyle \sum _{i=3}^{5}}{\mathrm{IMF}}_{i}$ | −5.62 ** | −2.82 ** | −1.76 | −1.11 | −0.09 | 1.44 | 0.43 | 1.34 | |

$R+{\displaystyle \sum _{i=4}^{5}}{\mathrm{IMF}}_{i}$ | −7.25 ** | −5.55 ** | −4.48 ** | −3.38 ** | −2.17 * | −1.09 | −1.05 | −0.10 | |

$R+{\displaystyle \sum _{i=5}^{5}}{\mathrm{IMF}}_{i}$ | −8.35 ** | −7.33 ** | −6.28 ** | −4.46 ** | −2.95 ** | −1.18 | −1.90 | 0.11 | |

R | −9.06 ** | −8.13 ** | −7.77 ** | −6.78 ** | −5.25 ** | −2.26 * | −2.84 ** | −0.86 |

**Table A4.**Z-statistic for the Wilcoxon signed-rank test for the null hypothesis that the naive model is as accurate as the studied models: autoregressive integrated moving average (ARIMA$(p,d,q)$), univariate and multivariate empirical mode decomposition–support vector regression (EMD–SVR) with input vector $m=5$. Top: direct strategy; bottom: recursive strategy. * Statistically significant at the 5% confidence level. ** Statistically significant at the 1% confidence level.

Model\Steps Ahead h | 1 | 2 | 3 | 5 | 10 | 20 | 30 | 50 | |
---|---|---|---|---|---|---|---|---|---|

Benchmarks | ARIMA | −0.22 | 0.16 | 0.89 | 0.30 | −1.45 | −1.59 | −0.49 | −1.47 |

Direct SVR | −0.45 | 1.08 | 1.54 | 0.72 | 0.57 | 1.24 | 0.67 | 0.14 | |

Recursive SVR | −0.45 | 1.53 | 1.15 | 0.87 | 0.36 | 1.79 | 1.59 | 2.29 * | |

Direct EMD–SVR | Multivariate | 0.20 | 5.58 ** | 4.70 ** | 4.57 ** | 5.67 ** | 8.43 ** | 7.70 ** | 7.66 ** |

$R+{\displaystyle \sum _{i=1}^{5}}{\mathrm{IMF}}_{i}$ | 3.33 ** | 6.71 ** | 6.41 ** | 3.17 ** | 2.79 ** | 2.79 ** | 1.61 | 2.39 ** | |

$R+{\displaystyle \sum _{i=2}^{5}}{\mathrm{IMF}}_{i}$ | 2.49 * | 5.94 ** | 6.24 ** | 3.21 ** | 2.64 ** | 2.72 ** | 1.70 | 2.37 * | |

$R+{\displaystyle \sum _{i=3}^{5}}{\mathrm{IMF}}_{i}$ | −1.94 | 1.88 | 3.93 ** | 1.75 | 1.94 | 2.54 * | 1.66 | 2.31 * | |

$R+{\displaystyle \sum _{i=4}^{5}}{\mathrm{IMF}}_{i}$ | −5.66 ** | −2.31 * | −0.47 | 0.59 | 2.57 * | 4.02 ** | 3.99 ** | 3.55 ** | |

$R+{\displaystyle \sum _{i=5}^{5}}{\mathrm{IMF}}_{i}$ | −7.45 ** | −5.54 ** | −3.82 ** | −1.76 | 0.61 | 3.56 ** | 3.42 ** | 3.98 ** | |

R | −8.59 ** | −7.12 ** | −6.28 ** | −4.12 ** | −2.12 * | 3.37 ** | 3.19 ** | 4.10 ** | |

Recursive EMD–SVR | Multivariate | — | — | — | — | — | — | — | — |

$R+{\displaystyle \sum _{i=1}^{5}}{\mathrm{IMF}}_{i}$ | 3.33 ** | 1.43 | −0.51 | −2.26 * | −1.31 | −0.15 | −0.95 | −1.24> | |

$R+{\displaystyle \sum _{i=2}^{5}}{\mathrm{IMF}}_{i}$ | 2.49 * | 1.45 | −0.49 | −1.98 * | −1.23 | −0.08 | −1.02 | −1.19 | |

$R+{\displaystyle \sum _{i=3}^{5}}{\mathrm{IMF}}_{i}$ | −1.94 | 1.03 | 0.11 | −1.32 | −1.38 | −0.22 | −1.33 | −1.31 | |

$R+{\displaystyle \sum _{i=4}^{5}}{\mathrm{IMF}}_{i}$ | −5.66 ** | −2.30 * | −1.72 | −2.38 * | −0.73 | 0.07 | −1.18 | −1.28 | |

$R+{\displaystyle \sum _{i=5}^{5}}{\mathrm{IMF}}_{i}$ | −7.45 ** | −5.59 ** | −3.86 ** | −2.01 * | −0.03 | 2.14 * | 0.34 | −1.00 | |

R | −8.59 ** | −7.13 ** | −6.06 ** | −3.86 ** | −2.11 * | 1.55 | 1.04 | 1.22 |

## References

- Alexander, Carol. 2001. Market Models: A Guide to Financial Data Analysis. New Yrok: John Wiley & Sons. [Google Scholar]
- Christianini, Nello, and John Shawe-Taylor. 2000. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge: Cambridge University Press, ISBN 0-521-78019-5. [Google Scholar]
- Aymanns, Christoph, Fabio Caccioli, J. Doyne Farmer, and Vincent W.C. Tan. 2016. Taming the Basel leverage cycle. Journal of Financial Stability 27: 263–77. [Google Scholar] [CrossRef] [Green Version]
- Bennett, Kristin P., Jing Hu, Xiaoyun Ji, G. Kunapuli, and Jong-Shi Pang. 2006. Model selection via bilevel optimization. Paper present at the IJCNN ’06 International Joint Conference on Neural Networks, Vancouver, BC, Canada, July 6–21; pp. 1922–29. [Google Scholar]
- Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. 1994. Time Series Analysis: Forecasting and Control, 3rd ed. Michigan: Prentice Hall. [Google Scholar]
- Brockwell, Peter J., and Richard A. Davis. 2002. Introduction to Time Series and Forecasting, 2nd ed. New York: Springer. [Google Scholar]
- Brooks, Chris. 2014. Introductory Econometrics for Finance. Cambridge: Cambridge University Press. [Google Scholar]
- Caccioli, Fabio, Imer Kondor, Matteo Marsili, and Susnne Still. 2016. Liquidity risk and instabilities in portfolio optimization. International Journal of Theoretical and Applied Finance 19: 1650035. [Google Scholar] [CrossRef]
- Chen, Chun-Fu, Ming-Cheng Lai, and Ching-Chiang Yeh. 2012. Forecasting tourism demand based on empirical mode decomposition and neural network. Knowledge-Based Systems 26: 281–87. [Google Scholar] [CrossRef]
- Cheng, Ching-Hsue, and Liang-Ying Wei. 2014. A novel time-series model based on empirical mode decomposition for forecasting TAIEX. Economic Modelling 36: 136–41. [Google Scholar] [CrossRef]
- Clements, Michael P., Philip Hans Franses, and Norman R. Swanson. 2004. Forecasting economic and financial time-series with non-linear models. International Journal of Forecasting 20: 169–83. [Google Scholar] [CrossRef]
- Di Matteo, Tiziana. 2007. Multi-scaling in finance. Quantitative Finance 7: 21–36. [Google Scholar] [CrossRef]
- Flandrin, Patrick, and Paulo Goncalves. 2004. Empirical mode decompositions as data-driven wavelet-like expansions. International Journal of Wavelets, Multiresolution and Information Processing 2: 477–96. [Google Scholar] [CrossRef]
- Huang, Norden E., Zheng Shen, Steven R. Long, Manli C. Wu, Hsing H. Shih, Quanan Zheng, Nai-Chyuan Yen, Chi Chao Tung, and Henry H. Liu. 1998. The empirical mode decomposition and the Hilbert spectrum for non-linear and non-stationary time series analysis. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences 454: 903–95. [Google Scholar] [CrossRef]
- Hyndman, Rob J., and Yeasmin Khandakar. 2008. Automatic time series forecasting: The forecast package for R. Journal of Statistical Software 26: 1–22. [Google Scholar]
- Kazem, Ahmad, Ebrahim Sharifi, Farookh Khadeer Hussain, Morteza Saberi, and Omar Khadeer Hussain. 2013. Support vector regression with chaos-based firefly algorithm for stock market price forecasting. Applied Soft Computing 13: 947–58. [Google Scholar] [CrossRef]
- Kim, Kyoung-Jae. 2003. Financial time series forecasting using support vector machines. Neurocomputing 55: 307–19. [Google Scholar] [CrossRef]
- Lin, Chiun-Sin, Sheng-Hsiung Chiu, and Tzu-Yu Lin. 2012. Empirical mode decomposition based least squares support vector regression for foreign exchange rate forecasting. Economic Modelling 29: 2583–90. [Google Scholar] [CrossRef]
- Liu, Hui, Chao Chen, Hong-Qi Tian, and Yan-Fei Li. 2012. A hybrid model for wind speed prediction using empirical mode decomposition and artificial neural networks. Renewable Energy 48: 545–56. [Google Scholar] [CrossRef]
- Lu, Chi-Jie, Tian-Shyug Lee, and Chih-Chou Chiu. 2009. Financial time series forecasting using independent component analysis and support vector regression. Decision Support Systems 47: 115–25. [Google Scholar] [CrossRef]
- Lu, Chi-Jie, and Yuehjen E. Shao. 2012. Forecasting computer products sales by integrating ensemble empirical mode decomposition and extreme learning machine. Mathematical Problems in Engineering 2012: 831201. [Google Scholar] [CrossRef]
- Montgomery, Douglas C., Cheryl L. Jennings, and Murat Kulahci. 2008. Introduction to Time Series Analysis and Forecasting. Wiley Series in Probability and Statistics; New York: Wiley. [Google Scholar]
- Nava, Noemi, Tiziana Di Matteo, and Tomaso Aste. 2016a. Time-dependent scaling patterns in high frequency financial data. The European Physical Journal Special Topics 225: 1997–2016. [Google Scholar] [CrossRef]
- Nava, Noemi, T. Di Matteo, and Tomaso Aste. 2017. Dynamic correlations at different time-scales with empirical mode decomposition. arXiv, arXiv:1708.06586. [Google Scholar]
- Nava, Noemi, T. Di Matteo, and Tomaso Aste. 2016b. Anomalous volatility scaling in high frequency financial data. Physica A: Statistical Mechanics and its Applications 447: 434–45. [Google Scholar] [CrossRef]
- Rilling, Gabriel, Patrick Flandrin, and Paulo Gonçalves. 2003. On empirical mode decomposition and its algorithms. Paper present at the IEEE EURASIP Workshop on Nonlinear Signal and Image Processing NSIP03, Grado, Italy, June; Rocquencourt: Inria, pp. 8–11. [Google Scholar]
- Schölkopf, Bernhard, Ralf Herbrich, and Alex J. Smola. 2001. A generalized representer theorem. In Computational Learning Theory. Berlin and Heidelberg: Springer, pp. 416–26. [Google Scholar]
- Smola, Alex J., and Bernhard Schölkopf. 2004. A tutorial on support vector regression. Statistics and Computing 14: 199–222. [Google Scholar] [CrossRef]
- Suykens, Johan A.K., Jos De Brabanter, Lukas Lukas, and Joos Vandewalle. 2002. Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing 48: 85–105. [Google Scholar] [CrossRef]
- Tay, F. E., and L. Cao. 2001. Application of support vector machines in financial time series forecasting. Omega 29: 309–17. [Google Scholar] [CrossRef]
- Varga-Haszonits, Istvan, Fabio Caccioli, and Imre Kondor. 2016. Replica approach to mean-variance portfolio optimization. Journal of Statistical Mechanics: Theory and Experiment 2016: 123404. [Google Scholar] [CrossRef]
- Wang, Jujie, Wenyu Zhang, Yaning Li, Jianzhou Wang, and Zhangli Dang. 2014. Forecasting wind speed using empirical mode decomposition and Elman neural network. Applied Soft Computing 23: 452–59. [Google Scholar] [CrossRef]
- Wilcoxon, Frank. 1945. Individual comparisons by ranking methods. Biometrics Bulletin 1: 80–83. [Google Scholar] [CrossRef]
- Willmott, Cort J., and Kenji Matsuura. 2005. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research 30: 79–82. [Google Scholar] [CrossRef]
- Yu, Lean, Shouyang Wang, and Kin Keung Lai. 2008. Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm. Energy Economics 30: 2623–35. [Google Scholar] [CrossRef]
- Zeng, Qingcheng, and Chenrui Qu. 2014. An approach for Baltic Dry Index analysis based on empirical mode decomposition. Maritime Policy & Management 41: 224–40. [Google Scholar]

**Figure 1.**Values of the S&P 500 index for the trading day, 7 August 2014. The first 5 min of the trading day are not included.

**Figure 2.**True (black lines) and forecasted (red and blue lines) intrinsic mode functions (IMFs) and residue extracted from the S&P 500 index for 7 August 2014 shown in Figure 1. The forecasted values were obtained using the univariate empirical mode decomposition–support vector regression (EMD–SVR) model, using both the recursive (blue line) and the direct strategies (red line). We note that the black lines end when forecasting begins. EMD was performed on the training set only.

**Figure 3.**True and forecasted values for the S&P 500 index for 7 August 2014 shown in Figure 1. Forecasted values were obtained using partial reconstructions of the univariate empirical mode decomposition–support vector regression (EMD–SVR) model (Equations (5)–(8)), both the recursive (blue line) and the direct (red line) strategies. We note that the EMD was performed on the training set only (i.e., using only data before beginning of forecasting).

**Figure 4.**True (black line) and forecasted (red line) values for the S&P 500 index. Forecasted values were obtained using the multivariate empirical mode decomposition–support vector regression (EMD–SVR) model (Equation (9)). The true S&P 500 index values are the same as in Figure 1 and Figure 3. We note that the EMD was performed on the training set only (i.e., using only data before beginning of forecasting).

**Figure 5.**Mean absolute error (MAE) as a function of the forecast horizon for all forecasting models: naive, ARIMA$(p,d,q)$, SVR on the original data, univariate and multivariate EMD-SVR with input vector $m=p+d$ lagged values. Smaller MAE indicate better forecasts. Direct strategy univariate EMD-SVR with different number of components (Equations (5)–(8)) outperform all benchmarks for at all steps ahead. Multivariate EMD-SVR also outperforms all benchmarks for $h\ge 5$.

**Table 1.**Order of the autoregressive integrated moving average (ARIMA$(p,d,q)$) models fitted to each intrinsic mode function (IMF) and to the residue. The number of lagged values $m=p+d$ was used to construct the input vectors for the empirical mode decomposition–support vector regression (EMD–SVR) models.

p | q | d | m | |
---|---|---|---|---|

${\mathrm{IMF}}_{1}$ | 2 | 1 | 0 | 2 |

${\mathrm{IMF}}_{2}$ | 2 | 5 | 0 | 2 |

${\mathrm{IMF}}_{3}$ | 5 | 1 | 0 | 5 |

${\mathrm{IMF}}_{4}$ | 5 | 3 | 0 | 5 |

${\mathrm{IMF}}_{5}$ | 0 | 3 | 1 | 1 |

Residue | 2 | 2 | 2 | 4 |

**Table 2.**Mean and standard deviation (std) of mean absolute error (MAE) computed for all the 128 days for all the forecasting models: Naive, ARIMA$(p,d,q)$, direct and recursive SVR on the original data, direct and recursive univariate and multivariate EMD-SVR with input vector $m=p+d$ lagged values, the same input vector as the ARIMA$(p,d,q)$ model. Small MAE indicate better forecasting. The smallest MAE of each forecast horizon is set in boldface. The values marked with a dagger (†) indicate the smallest MAE of each horizon across all the models with different input vector m (see Table A1 and Table A2).

Steps ahead h | 1 | 2 | 3 | 5 | 10 | 20 | 30 | 50 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Model | Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | Mean | Std | |

Benchmarks | Naive | 0.147 | (0.186) | 0.256 | (0.322) | 0.328 | (0.418) | 0.423 | (0.514) | 0.651 | (0.801) | 0.991 | (1.053) | 1.143 | (1.294) | 1.559 | (1.803) |

ARIMA | 0.145 | (0.185) | 0.247 | (0.303) | 0.313 | (0.374) | 0.421 | (0.477) | 0.663 | (0.783) | 1.021 | (1.035) | 1.154 | (1.304) | 1.644 | (1.789) | |

Direct SVR | 0.137 | (0.139) | 0.214 | (0.275) | 0.268 | (0.334) | 0.372 | (0.409) | 0.594 | (0.722) | 0.869 | (0.974) | 1.017 | (1.056) | 1.485 | (1.596) | |

Recursive SVR | 0.137 | (0.139) | 0.256 | (0.496) | 0.310 | (0.450) | 0.478 | (1.174) | 0.688 | (1.023) | 0.998 | (1.566) | 1.053 | (1.490) | 1.346 | (1.725) | |

Direct EMD-SVR | Multivariate | 0.141 | (0.178) | 0.224 | (0.281) | 0.299 | (0.356) | 0.369 | (0.445) | 0.571 | (0.692) | 0.858 | (0.916) | 1.001 | (1.120) | 1.364 | (1.577) |

$R+{\displaystyle \sum _{i=1}^{5}}{\mathrm{IMF}}_{i}$ | 0.116 ^{†} | (0.132) | 0.178 | (0.211) | 0.232 | (0.267) | 0.350 | (0.382) | 0.530 | (0.613) | 0.829 | (0.839) | 0.990 | (1.034) | 1.362 | (1.620) | |

$R+{\displaystyle \sum _{i=2}^{5}}{\mathrm{IMF}}_{i}$ | 0.118 | (0.124) | 0.175 ^{†} | (0.210) | 0.229 ^{†} | (0.271) | 0.350 ^{†} | (0.379) | 0.530 ^{†} | (0.609) | 0.826 | (0.842) | 0.989 | (1.033) | 1.363 | (1.621) | |

$R+{\displaystyle \sum _{i=3}^{5}}{\mathrm{IMF}}_{i}$ | 0.162 | (0.172) | 0.204 | (0.225) | 0.235 | (0.265) | 0.353 | (0.364) | 0.538 | (0.599) | 0.826 | (0.846) | 0.994 | (1.040) | 1.368 | (1.620) | |

$R+{\displaystyle \sum _{i=4}^{5}}{\mathrm{IMF}}_{i}$ | 0.261 | (0.303) | 0.298 | (0.308) | 0.315 | (0.327) | 0.369 | (0.361) | 0.516 | (0.567) | 0.736 ^{†} | (0.801) | 0.892 | (0.890) | 1.215 | (1.399) | |

$R+{\displaystyle \sum _{i=5}^{5}}{\mathrm{IMF}}_{i}$ | 0.421 | (0.463) | 0.441 | (0.486) | 0.462 | (0.479) | 0.478 | (0.506) | 0.574 | (0.678) | 0.746 | (0.853) | 0.883 | (0.869) | 1.185 | (1.363) | |

R | 0.614 | (0.664) | 0.632 | (0.679) | 0.636 | (0.650) | 0.646 | (0.662) | 0.684 | (0.734) | 0.739 | (0.761) | 0.856 ^{†} | (0.830) | 1.097 ^{†} | (1.166) | |

Recursive EMD-SVR | Multivariate | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – |

$R+{\displaystyle \sum _{i=1}^{5}}{\mathrm{IMF}}_{i}$ | 0.116 | (0.132) | 0.222 | (0.251) | 0.356 | (0.457) | 0.518 | (0.636) | 0.645 | (0.692) | 0.981 | (1.177) | 1.140 | (1.192) | 1.642 | (1.960) | |

$R+{\displaystyle \sum _{i=2}^{5}}{\mathrm{IMF}}_{i}$ | 0.118 | (0.124) | 0.222 | (0.246) | 0.361 | (0.465) | 0.511 | (0.637) | 0.646 | (0.693) | 0.979 | (1.178) | 1.141 | (1.193) | 1.641 | (1.962) | |

$R+{\displaystyle \sum _{i=3}^{5}}{\mathrm{IMF}}_{i}$ | 0.162 | (0.172) | 0.234 | (0.253) | 0.361 | (0.459) | 0.468 | (0.533) | 0.651 | (0.731) | 0.968 | (1.191) | 1.187 | (1.271) | 1.652 | (1.910) | |

$R+{\displaystyle \sum _{i=4}^{5}}{\mathrm{IMF}}_{i}$ | 0.261 | (0.303) | 0.306 | (0.304) | 0.346 | (0.333) | 0.464 | (0.438) | 0.619 | (0.680) | 0.933 | (0.992) | 1.146 | (1.183) | 1.594 | (1.822) | |

$R+{\displaystyle \sum _{i=5}^{5}}{\mathrm{IMF}}_{i}$ | 0.421 | (0.463) | 0.439 | (0.481) | 0.459 | (0.464) | 0.477 | (0.493) | 0.592 | (0.674) | 0.873 | (0.980) | 1.053 | (1.225) | 1.494 | (1.820) | |

R | 0.614 | (0.664) | 0.627 | (0.679) | 0.622 | (0.645) | 0.617 | (0.651) | 0.693 | (0.709) | 0.935 | (0.974) | 1.083 | (1.239) | 1.377 | (1.784) |

**Table 3.**Z-statistic for the Wilcoxon signed-rank testing difference between naive model and the other models: autoregressive integrated moving average (ARIMA$(p,d,q)$), direct and recursive support vector regression (SVR) on the original data, and univariate and multivariate empirical mode decomposition–SVR (EMD–SVR) with input vector $m=p+d$. Positive values indicate better performances than naive model; negative values indicate worse performance instead. The larger the value, the more significant the overperformance is with respect to naive model. Statistics were computed over all 128 days in the dataset. Best-performing models for each step ahead h are highlighted in boldface. * Statistically significant at the 5% confidence level. ** Statistically significant at the 1% confidence level.

Model\Steps Ahead h | 1 | 2 | 3 | 5 | 10 | 20 | 30 | 50 | |
---|---|---|---|---|---|---|---|---|---|

Benchmarks | ARIMA | −0.22 | 0.16 | 0.89 | 0.30 | −1.45 | −1.59 | −0.49 | −1.47 |

Direct SVR | 0.57 | 2.81 ** | 2.58 ** | 1.94 | 1.37 | 2.41 * | 1.69 | 1.56 | |

Recursive SVR | 0.57 | 3.12 ** | 2.58 ** | 3.24 ** | 2.66 ** | 3.57 ** | 3.67 ** | 4.18 ** | |

Direct EMD–SVR | Multivariate | 1.27 | 4.27 ** | 2.93 ** | 5.43 ** | 6.07 ** | 8.31 ** | 7.77 ** | 8.52 ** |

$R+{\displaystyle \sum _{i=1}^{5}}{\mathrm{IMF}}_{i}$ | 3.40 ** | 6.24 ** | 6.00 ** | 3.61 ** | 3.14 ** | 3.12 ** | 2.51* | 3.12 ** | |

$R+{\displaystyle \sum _{i=2}^{5}}{\mathrm{IMF}}_{i}$ | 2.80 ** | 6.05 ** | 6.28 ** | 3.67 * | 3.08 * | 3.16 ** | 2.54 ** | 3.06 ** | |

$R+{\displaystyle \sum _{i=3}^{5}}{\mathrm{IMF}}_{i}$ | −1.28 | 2.61 ** | 4.51 ** | 2.50 * | 2.54 * | 3.15 ** | 2.51* | 3.01 ** | |

$R+{\displaystyle \sum _{i=4}^{5}}{\mathrm{IMF}}_{i}$ | −5.34 ** | −1.67 | 0.20 | 1.30 | 3.41 ** | 4.79 ** | 4.08 ** | 4.20 ** | |

$R+{\displaystyle \sum _{i=5}^{5}}{\mathrm{IMF}}_{i}$ | −7.22 ** | −5.10 ** | −3.32 ** | −1.28 | 0.93 | 4.16 ** | 3.63 ** | 4.53 ** | |

R | −8.45 ** | −6.88 ** | −5.82 ** | −3.65 ** | −1.44 | 2.73 ** | 2.38* | 3.38 ** | |

Recursive EMD–SVR | Multivariate | — | — | — | — | — | — | — | — |

$R+{\displaystyle \sum _{i=1}^{5}}{\mathrm{IMF}}_{i}$ | 3.40 ** | 1.83 | 0.75 | −1.47 | 0.49 | 1.22 | 0.18 | −0.40 | |

$R+{\displaystyle \sum _{i=2}^{5}}{\mathrm{IMF}}_{i}$ | 2.80 ** | 1.69 | 0.57 | −1.33 | 0.45 | 1.26 | 0.19 | −0.38 | |

$R+{\displaystyle \sum _{i=3}^{5}}{\mathrm{IMF}}_{i}$ | −1.28 | 1.26 | 0.68 | −0.41 | 0.57 | 1.55 | −0.26 | −0.57 | |

$R+{\displaystyle \sum _{i=4}^{5}}{\mathrm{IMF}}_{i}$ | −5.34 ** | −1.93 | −0.84 | −1.23 | 0.78 | 2.02* | −0.09 | −0.36 | |

$R+{\displaystyle \sum _{i=5}^{5}}{\mathrm{IMF}}_{i}$ | −7.22 ** | −5.17 ** | −3.44 ** | −1.30 | 1.08 | 3.07 ** | 1.87 | 0.93 | |

R | −8.45 ** | −6.89 ** | −5.71 ** | −3.25 ** | −1.41 | 2.10 * | 1.84 | 2.15 * |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Nava, N.; Di Matteo, T.; Aste, T.
Financial Time Series Forecasting Using Empirical Mode Decomposition and Support Vector Regression. *Risks* **2018**, *6*, 7.
https://doi.org/10.3390/risks6010007

**AMA Style**

Nava N, Di Matteo T, Aste T.
Financial Time Series Forecasting Using Empirical Mode Decomposition and Support Vector Regression. *Risks*. 2018; 6(1):7.
https://doi.org/10.3390/risks6010007

**Chicago/Turabian Style**

Nava, Noemi, Tiziana Di Matteo, and Tomaso Aste.
2018. "Financial Time Series Forecasting Using Empirical Mode Decomposition and Support Vector Regression" *Risks* 6, no. 1: 7.
https://doi.org/10.3390/risks6010007