# Variable Selection in Time Series Forecasting Using Random Forests

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Time Series Forecasting and Random Forests

#### 1.2. A Framework to Assess the Performance of Random Forests in Time Series Forecasting

#### 1.3. Aim of the Study

## 2. Methods and Data

#### 2.1. Methods

#### 2.1.1. Definition of ARMA and ARFIMA Models

_{1}, x

_{2}, …, of a certain phenomenon, while the time t is stated as a subscript to each value x

_{t}. A time series can be modelled by a stochastic process. The latter is a sequence of random variables x

_{1}, x

_{2}, …. Random variables are underlined according to the notation used in [53].

_{t}],

_{t}])

^{0.5}

_{t}and x

_{t}

_{+k}, γ

_{k}of the stochastic process is defined by:

_{k}:= E[(x

_{t}− μ)(x

_{t}

_{+k}− μ)]

_{t}and x

_{t}

_{+k}, ρ

_{k}of the stochastic process is defined by:

_{k}:= γ

_{k}/σ

^{2}

_{t}} is called a white noise process, if it is a sequence of uncorrelated random variables. Let us consider hereinafter that the white noise is a variable with zero mean, unless mentioned otherwise, and standard deviation σ

_{a}.

_{t}} by:

_{t}:= x

_{t}− μ

_{t}= x

_{t−j}

_{p}(B) is defined by:

_{p}(B) := (1 − φ

_{1}B − … − φ

_{p}B

^{p})

_{t}} is an autoregressive AR(p) model, if:

_{p}(B)y

_{t}= a

_{t},

_{t}= φ

_{1}y

_{t}

_{−1}+ … + φ

_{p}y

_{t}

_{−p}+ a

_{t}

_{q}(B), which is defined by:

_{q}(B) := (1 + θ

_{1}B + … + θ

_{q}B

^{q})

_{t}} is a moving average MA(q) model, if:

_{t}= θ

_{q}(B) a

_{t},

_{t}= a

_{t}+ θ

_{1}a

_{t}

_{−1}+ … + θ

_{q}a

_{t}

_{−q}

_{t}} is an autoregressive moving average ARMA(p, q) model, if

_{p}(B)y

_{t}= θ

_{q}(B)a

_{t},

_{t}= φ

_{1}y

_{t}

_{−1}+ … + φ

_{p}y

_{t}

_{−p}+ a

_{t}+ θ

_{1}a

_{t}

_{−1}+ … + θ

_{q}a

_{t}

_{−q}

_{t}} is an ARFΙMA(p, d, q), if

_{p}(B)(1 − B)

^{d}x

_{t}= θ

_{q}(B)a

_{t}

#### 2.1.2. Simulation of ARMA and ARFIMA Models

#### 2.1.3. Forecasting Using ARMA and ARFIMA Models

_{n}

_{+1}. Let x

_{n}and ψ represent the last observation and the forecast of x

_{n}

_{+1}, respectively. The methods using ARMA and ARFIMA models can be used as benchmarks in the simulation experiments. In fact, these methods are expected to perform better than the rest, when applied to the synthetic time series, since the latter are simulated using ARMA or ARFIMA models (see Section 2.1.8). We examine two cases.

_{1}, ..., φ

_{p}, θ

_{1}, ..., θ

_{q}of the models. We use the fitted ARMA model in forecast mode by implementing the predict built in R function [52].

_{1}, ..., φ

_{p}, θ

_{1}, ..., θ

_{q}of the models. The order selection and parameter estimation procedures are explained, for example, in [57] (Chapter 8.6). We use the fitted ARFIMA model in forecast mode by implementing the forecast function of the forecast R package.

#### 2.1.4. Forecasting Using Naïve Methods

_{n}

_{1}+ … + x

_{n})/n

#### 2.1.5. Forecasting Using the Theta Method

#### 2.1.6. Random Forests

**u**is a random vector with k elements. The aim is to predict v by estimating the regression function:

**u**) = E[v|

**u**=

**u**]

_{s}= ((

**u**

_{1}, v

_{1}), …, (

**u**

_{s}, v

_{s}))

**u**, v). Therefore, the aim is to construct an estimate m

_{s}of the function m.

**u**is denoted by m

_{s}(

**u**; θ

_{j}, S

_{s}), where θ

_{1}, ..., θ

_{M}are independent random variables, distributed as θ and independent of S

_{s}. The random variable θ is used to resample the fitting set prior to the growing of individual trees and to select the successive directions for splitting. The prediction is then given by the average of the predicted values of all trees.

_{s}observations are randomly chosen from the elements of

**u**. These observations are used for growing the tree. At each cell of the tree, a split is performed by maximization of the CART-criterion (defined in [20]) by selecting mtry variables randomly among the k original ones, picking the best variable/split point among the mtry and splitting the node into two daughter nodes. The growing of the tree is stopped when each cell contains fewer than nodesize points.

_{s}∊ {1, …, s}, mtry ∊ {1, …, k}, nodesize ∊ {1, …, b

_{s}}, and M ∊ {1, 2, …}. In most studies, it is agreed that increasing the number of trees does not decrease the predictive performance; however, it results in an increase of the computational cost. Oshiro et al. [32] suggests a range between 64 and 128 trees in a forest based on experiments. Kuhn and Johnson [34] (p. 200) suggest using at least 1000 trees. Probst and Boulesteix [33] suggest that the highest performance gain is achieved when training 100 trees. In the present study, we use M = 500 trees.

#### 2.1.7. Time Series Forecasting Using Random Forests

_{n}

_{+1}, given x

_{1}, …, x

_{n}. If we use k lagged variables then the forecasted x

_{n}

_{+1}is given by the following equation for t = n + 1:

_{t}= g(x

_{t}

_{−1}, …, x

_{t}

_{−k}), t = k + 1, …, n + 1

_{t}, for t = k + 1, …, n + 1, while the predictor variables are x

_{t}

_{−1}, …, x

_{t}

_{−k}. When the number of predictor variables k increases, the size of the training set n − k decreases (an example is presented in Figure 1). The training set, which includes n – k samples, is created using the CasesSeries function of the rminer R package [61,62]. Finally, the fitting is performed using the train function of the caret R package and the forecasted value of x

_{n}

_{+1}is obtained using the predict function of the caret R package.

_{n}

_{+1}, e.g., if the minimum index is n + 1 − k, then the training set is of size n − k. However, using fewer predictor variables decreases the computational cost.

#### 2.1.8. Summary of the Methods

#### 2.1.9. Metrics

_{n}

_{+1}

_{n}

_{+1}|

_{n}

_{+1})

^{2}

_{n}

_{+1})/x

_{n}

_{+1}

_{n}

_{+1}/x

_{n}

_{+1}|

_{i}, AE

_{i}, SE

_{i}, PE

_{i}and APE

_{i}. Then, the mean of the errors (MoE) is defined by:

_{1}+ … + E

_{N})/N

_{1}, …, E

_{N})

_{1}+ … + AE

_{N})/N

_{1}, …, AE

_{N})

_{1}+ … + SE

_{N})/N

_{1}, …, SE

_{N})

_{1}+ … + PE

_{N})/N

_{1}, …, PE

_{N})

_{1}+ … + APE

_{N})/N

_{1}, …, APE

_{N})

_{i}and its corresponding true value x

_{n}

_{+1,i}. The regression coefficient a (or slope of the regression) is estimated to measure the dependence of ψ

_{1}, …, ψ

_{N}on x

_{n}

_{+1,1}, …, x

_{n}

_{+1,N}, when this dependence is expressed by the following linear regression model:

_{i}= a x

_{n}

_{+1,i}+ b

#### 2.2. Data

#### 2.2.1. Simulated Time Series

#### 2.2.2. Temperature Dataset

## 3. Results

#### 3.1. Simulations

_{1}= 0.6. In this typical example, we observe that all methods perform similarly with respect to the absolute and squared errors. The rf30 method is the best amongst the rf methods, while arfima has the best performance and naïve2 has the worst performance. The arfima method is approximately 5% better than the best rf methods. The naïve1 and theta methods perform similarly to the best rf methods. The simplest rf method, i.e., rf1, performs well. In fact, its performance is comparable to the best rf methods, i.e., rf20imp, rf25, and rf30. Regarding the use of important variables, introduced with the rf20imp and the rf50imp methods, their performance is similar to that of the rf20 and rf50 methods, respectively.

#### 3.2. Temperature Analysis

## 4. Discussion

## 5. Conclusions

## Supplementary Materials

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Appendix A

## References

- Shmueli, G. To explain or to predict? Stat. Sci.
**2010**, 25, 289–310. [Google Scholar] [CrossRef] - Bontempi, G.; Taieb, S.B.; Le Borgne, Y.A. Machine learning strategies for time series forecasting. In Business Intelligence (Lecture Notes in Business Information Processing); Aufaure, M.A., Zimányi, E., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; Volume 138, pp. 62–77. [Google Scholar] [CrossRef]
- De Gooijer, J.G.; Hyndman, R.J. 25 years of time series forecasting. Int. J. Forecast.
**2006**, 22, 443–473. [Google Scholar] [CrossRef] - Fildes, R.; Nikolopoulos, K.; Crone, S.F.; Syntetos, A.A. Forecasting and operational research: A review. J. Oper. Res. Soc.
**2008**, 59, 1150–1172. [Google Scholar] [CrossRef] [Green Version] - Weron, R. Electricity price forecasting: A review of the state-of-the-art with a look into the future. Int. J. Forecast.
**2014**, 30, 1030–1081. [Google Scholar] [CrossRef] - Hong, T.; Fan, S. Probabilistic electric load forecasting: A tutorial review. Int. J. Forecast.
**2016**, 32, 914–938. [Google Scholar] [CrossRef] - Taieb, S.B.; Bontempi, G.; Atiya, A.F.; Sorjamaa, A. A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition. Expert Syst. Appl.
**2012**, 39, 7067–7083. [Google Scholar] [CrossRef] - Mei-Ying, Y.; Xiao-Dong, W. Chaotic time series prediction using least squares support vector machines. Chin. Phys.
**2004**, 13, 454–458. [Google Scholar] [CrossRef] - Faraway, J.; Chatfield, C. Time series forecasting with neural networks: A comparative study using the air line data. J. R. Stat. Soc. C Appl. Stat.
**1998**, 47, 231–250. [Google Scholar] [CrossRef] - Yang, B.S.; Oh, M.S.; Tan, A.C.C. Machine condition prognosis based on regression trees and one-step-ahead prediction. Mech. Syst. Signal Process.
**2008**, 22, 1179–1193. [Google Scholar] [CrossRef] - Zou, H.; Yang, Y. Combining time series models for forecasting. Int. J. Forecast.
**2004**, 20, 69–84. [Google Scholar] [CrossRef] - Papacharalampous, G.A.; Tyralis, H.; Koutsoyiannis, D. Forecasting of geophysical processes using stochastic and machine learning algorithms. In Proceedings of the 10th World Congress of EWRA on Water Resources and Environment “Panta Rhei”, Athens, Greece, 5–9 July 2017. [Google Scholar]
- Pérez-Rodríguez, J.V.; Torra, S.; Andrada-Félix, J. STAR and ANN models: Forecasting performance on the Spanish “Ibex-35” stock index. J. Empir. Financ.
**2005**, 12, 490–509. [Google Scholar] [CrossRef] - Khashei, M.; Bijari, M. A novel hybridization of artificial neural networks and ARIMA models for time series forecasting. Appl. Soft Comput.
**2011**, 11, 2664–2675. [Google Scholar] [CrossRef] - Yan, W. Toward automatic time-series forecasting using neural networks. IEEE Trans. Neural Netw. Lear. Stat.
**2012**, 23, 1028–1039. [Google Scholar] [CrossRef] - Babu, C.N.; Reddy, B.E. A moving-average filter based hybrid ARIMA–ANN model for forecasting time series data. Appl. Soft Comput.
**2014**, 23, 27–38. [Google Scholar] [CrossRef] - Lin, L.; Wang, F.; Xie, X.; Zhong, S. Random forests-based extreme learning machine ensemble for multi-regime time series prediction. Expert Syst. Appl.
**2017**, 85, 164–176. [Google Scholar] [CrossRef] - Breiman, L. Random Forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] - Scornet, E.; Biau, G.; Vert, J.P. Consistency of random forests. Ann. Stat.
**2015**, 43, 1716–1741. [Google Scholar] [CrossRef] [Green Version] - Biau, G.; Scornet, E. A random forest guided tour. Test
**2016**, 25, 197–227. [Google Scholar] [CrossRef] - Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
- Verikas, A.; Gelzinis, A.; Bacauskiene, M. Mining data with random forests: A survey and results of new tests. Pattern Recognit.
**2011**, 44, 330–349. [Google Scholar] [CrossRef] - Herrera, M.; Torgo, L.; Izquierdo, J.; Pérez-García, R. Predictive models for forecasting hourly urban water demand. J. Hydrol.
**2010**, 387, 141–150. [Google Scholar] [CrossRef] - Dudek, G. Short-term load forecasting using random forests. In Proceedings of the 7th IEEE International Conference Intelligent Systems IS’2014 (Advances in Intelligent Systems and Computing), Warsaw, Poland, 24–26 September 2014; Filev, D., Jabłkowski, J., Kacprzyk, J., Krawczak, M., Popchev, I., Rutkowski, L., Sgurev, V., Sotirova, E., Szynkarczyk, P., Zadrozny, S., Eds.; Springer: Cham, Switzerland, 2015; Volume 323, pp. 821–828. [Google Scholar] [CrossRef]
- Chen, J.; Li, M.; Wang, W. Statistical uncertainty estimation using random forests and its application to drought forecast. Math. Probl. Eng.
**2012**, 2012, 915053. [Google Scholar] [CrossRef] - Naing, W.Y.N.; Htike, Z.Z. Forecasting of monthly temperature variations using random forests. APRN J. Eng. Appl. Sci.
**2015**, 10, 10109–10112. [Google Scholar] - Nguyen, T.T.; Huu, Q.N.; Li, M.J. Forecasting time series water levels on Mekong river using machine learning models. In Proceedings of the 2015 Seventh International Conference on Knowledge and Systems Engineering (KSE), Ho Chi Minh City, Vietnam, 8–10 October 2015. [Google Scholar] [CrossRef]
- Kumar, M.; Thenmozhi, M. Forecasting stock index movement: A comparison of support vector machines and random forest. In Indian Institute of Capital Markets 9th Capital Markets Conference Paper; Indian Institute of Capital Markets: Vashi, India, 2006. [Google Scholar] [CrossRef]
- Kumar, M.; Thenmozhi, M. Forecasting stock index returns using ARIMA-SVM, ARIMA-ANN, and ARIMA-random forest hybrid models. Int. J. Bank. Acc. Financ.
**2014**, 5, 284–308. [Google Scholar] [CrossRef] - Kane, M.J.; Price, N.; Scotch, M.; Rabinowitz, P. Comparison of ARIMA and Random Forest time series models for prediction of avian influenza H5N1 outbreaks. BMC Bioinform.
**2014**, 15. [Google Scholar] [CrossRef] [PubMed] - Genuer, R.; Poggi, J.M.; Tuleau-Malot, C. Variable selection using random forests. Pattern Recognit. Lett.
**2010**, 31, 2225–2236. [Google Scholar] [CrossRef] - Oshiro, T.M.; Perez, P.S.; Baranauskas, J.A. How many trees in a random forest? In Machine Learning and Data Mining in Pattern Recognition (Lecture Notes in Computer Science); Perner, P., Ed.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 154–168. [Google Scholar] [CrossRef]
- Probst, P.; Boulesteix, A.L. To tune or not to tune the number of trees in random forest? arXiv
**2017**, arXiv:1705.05654v1. [Google Scholar] - Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
- Díaz-Uriarte, R.; De Andres, S.A. Gene selection and classification of microarray data using random forest. BMC Bioinform.
**2006**, 7. [Google Scholar] [CrossRef] [PubMed] - Makridakis, S.; Hibon, M. Confidence intervals: An empirical investigation of the series in the M-competition. Int. J. Forecast.
**1987**, 3, 489–508. [Google Scholar] [CrossRef] - Makridakis, S.; Hibon, M. The M3-Competition: Results, conclusions and implications. Int. J. Forecast.
**2000**, 16, 451–476. [Google Scholar] [CrossRef] - Pritzsche, U. Benchmarking of classical and machine-learning algorithms (with special emphasis on bagging and boosting approaches) for time series forecasting. Master’s Thesis, Ludwig-Maximilians-Universität München, München, Germany, 2015. [Google Scholar]
- Bagnall, A.; Cawley, G.C. On the use of default parameter settings in the empirical evaluation of classification algorithms. arXiv
**2017**, arXiv:1703.06777v1. [Google Scholar] - Salles, R.; Assis, L.; Guedes, G.; Bezerra, E.; Porto, F.; Ogasawara, E. A framework for benchmarking machine learning methods using linear models for univariate time series prediction. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 2338–2345. [Google Scholar] [CrossRef]
- Bontempi, G. Machine Learning Strategies for Time Series Prediction. European Business Intelligence Summer School, Hammamet, Lecture. 2013. Available online: https://pdfs.semanticscholar.org/f8ad/a97c142b0a2b1bfe20d8317ef58527ee329a.pdf (accessed on 25 September 2017).
- McShane, B.B. Machine Learning Methods with Time Series Dependence. Ph.D. Thesis, University of Pennsylvania, Philadelphia, PA, USA, 2010. [Google Scholar]
- Bagnall, A.; Bostrom, A.; Large, J.; Lines, J. Simulated data experiments for time series classification part 1: Accuracy comparison with default settings. arXiv
**2017**, arXiv:1703.09480v1. [Google Scholar] - Box, G.E.P.; Jenkins, G.M. Some recent advances in forecasting and control. J. R. Stat. Soc. C Appl. Stat.
**1968**, 17, 91–109. [Google Scholar] [CrossRef] - Wei, W.W.S. Time Series Analysis, Univariate and Multivariate Methods, 2nd ed.; Pearson Addison Wesley: Boston, MA, USA, 2006; ISBN 0-321-322116-9. [Google Scholar]
- Thissen, U.; Van Brakel, R.; De Weijer, A.P.; Melssena, W.J.; Buydens, L.M.C. Using support vector machines for time series prediction. Chemom. Intell. Lab.
**2003**, 69, 35–49. [Google Scholar] [CrossRef] - Zhang, G.P. An investigation of neural networks for linear time-series forecasting. Comput. Oper. Res.
**2001**, 28, 1183–1202. [Google Scholar] [CrossRef] - Lawrimore, J.H.; Menne, M.J.; Gleason, B.E.; Williams, C.N.; Wuertz, D.B.; Vose, R.S.; Rennie, J. An overview of the Global Historical Climatology Network monthly mean temperature data set, version 3. J. Geophys. Res.
**2011**, 116. [Google Scholar] [CrossRef] - Assimakopoulos, V.; Nikolopoulos, K. The theta model: A decomposition approach to forecasting. Int. J. Forecast.
**2000**, 16, 521–530. [Google Scholar] [CrossRef] - Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw.
**2008**, 28. [Google Scholar] [CrossRef] - Kuhn, M.; Wing, J.; Weston, S.; Williams, A.; Keefer, C.; Engelhardt, A.; Cooper, T.; Mayer, Z.; Kenkel, B.; The R Core Team; et al. Caret: Classification and Regression Training, R package version 6.0-76; 2017. Available online: https://cran.r-project.org/web/packages/caret/index.html (accessed on 7 September 2017).
- The R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2017. [Google Scholar]
- Hemelrijk, J. Underlining random variables. Stat. Neerl.
**1966**, 20, 1–7. [Google Scholar] [CrossRef] - Fraley, C.; Leisch, F.; Maechler, M.; Reisen, V.; Lemonte, A. Fracdiff: Fractionally Differenced ARIMA aka ARFIMA(p,d,q) Models, R package version 1.4-2. 2012. Available online: https://rdrr.io/cran/fracdiff/(accessed on 2 December 2012).
- Hyndman, R.J.; O’Hara-Wild, M.; Bergmeir, C.; Razbash, S.; Wang, E. Forecast: Forecasting Functions for Time Series and Linear Models, R package version 8.1. 2017. Available online: https://rdrr.io/cran/forecast/(accessed on 25 September 2017).
- Hyndman, R.J.; Khandakar, Y. Automatic time series forecasting: The forecast package for R. J. Stat. Softw.
**2008**, 27. [Google Scholar] [CrossRef] - Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice; OTexts: Melbourne, Australia, 2013. Available online: http://otexts.org/fpp/ (accessed on 25 September 2017).
- Hyndman, R.J.; Billah, B. Unmasking the Theta method. Int. J. Forecast.
**2003**, 19, 287–290. [Google Scholar] [CrossRef] - Hyndman, R.J.; Koehler, A.B.; Ord, J.K.; Snyder, R.D. Forecasting with Exponential Smoothing: The State Space Approach; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar] [CrossRef]
- Liaw, A.; Wiener, M. Classification and Regression by randomForest. R News
**2002**, 2, 18–22. [Google Scholar] - Cortez, P. Data mining with neural networks and support vector machines using the R/rminer tool. In Advances in Data Mining. Applications and Theoretical Aspects (Lecture Notes in Artificial Intelligence); Perner, P., Ed.; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6171, pp. 572–583. [Google Scholar] [CrossRef] [Green Version]
- Cortez, P. Rminer: Data Mining Classification and Regression Methods, R package version 1.4.2. 2016. Available online: https://rdrr.io/cran/rminer/(accessed on 2 September 2016).
- Hyndman, R.J.; Koehler, A.B. Another look at measures of forecast accuracy. Int. J. Forecast.
**2006**, 22, 679–688. [Google Scholar] [CrossRef] - Alexander, D.L.J.; Tropsha, A.; Winkler, D.A. Beware of R
^{2}: Simple, unambiguous assessment of the prediction accuracy of QSAR and QSPR models. J. Chem. Inf. Model.**2015**, 55, 1316–1322. [Google Scholar] [CrossRef] [PubMed] - Gramatica, P.; Sangion, A. A historical excursus on the statistical validation parameters for QSAR models: A clarification concerning metrics and terminology. J. Chem. Inf. Model.
**2016**, 56, 1127–1131. [Google Scholar] [CrossRef] [PubMed] - Warnes, G.R.; Bolker, B.; Gorjanc, G.; Grothendieck, G.; Korosec, A.; Lumley, T.; MacQueen, D.; Magnusson, A.; Rogers, J. Gdata: Various R Programming Tools for Data Manipulation, R package version 2.18.0. 2017. Available online: https://cran.r-project.org/web/packages/gdata/index.html(accessed on 6 June 2017).
- Wickham, H. Ggplot2: Elegant Graphics for Data Analysis, 2nd ed.; Springer International Publishing: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
- Wickham, H.; Hester, J.; Francois, R.; Jylänki, J.; Jørgensen, M. Readr: Read Rectangular Text Data, R package version 1.1.1; 2017; Available online: https://cran.r-project.org/web/packages/readr/index.html (accessed on 16 May 2017).
- Wickham, H. Reshaping data with the reshape package. J. Stat. Softw.
**2007**, 21. [Google Scholar] [CrossRef]

**Figure 1.**Sketch explaining how the training sample changes with the number of predictor variables for time series with n = 5 and (

**a**) k = 1; (

**b**) k = 2.

**Figure 3.**Barplots of the medians of the absolute errors, medians of the squared errors and regression coefficients when forecasting the 101st value of 1000 simulated time series from an ARMA(1, 0) model with φ

_{1}= 0.6.

**Figure 4.**Boxplots of the absolute errors when forecasting the 101st value of 1000 simulated time series from each ARMA(p, 0) and ARMA(0, q) model used in the present study.

**Figure 5.**Boxplots of the absolute errors when forecasting the 101st value of 1000 simulated time series from each ARMA(p, q) model used in the present study.

**Figure 6.**Boxplots of the absolute errors when forecasting the 101st value of 1000 simulated time series from each ARFIMA model used in the present study.

**Figure 7.**Boxplots of the errors when forecasting the 101st value of 1000 simulated time series from each ARMA(p, 0) and ARMA(0, q) model used in the present study.

**Figure 8.**Boxplots of the errors when forecasting the 101st value of 1000 simulated time series from each ARMA(p, q) model used in the present study.

**Figure 9.**Boxplots of the errors when forecasting the 101st value of 1000 simulated time series from each ARFIMA model used in the present study.

**Figure 10.**Ranking of methods within each simulation experiment based on the mean (

**top**) and median (

**bottom**) of the absolute errors. Better methods are presented with lower ranking value and blue colours.

**Figure 11.**Ranking of methods in each simulation experiment based on the mean (

**top**) and median (

**bottom**) of the squared errors. Better methods are presented with lower ranking value and blue colours.

**Figure 12.**Ranking of methods in each simulation experiment based on the regression coefficients. Better methods are presented with lower ranking value and blue colours.

**Figure 14.**Boxplots of the error, absolute error, squared error, and absolute percentage error values of the temperature forecasts for all the methods.

**Figure 15.**Barplots of the medians of various types of metrics measuring the error of the temperature forecasts for all the methods. The types of errors are depicted in the vertical axes.

**Figure 16.**Barplots of the regression coefficient of the linear model between the forecasted and the test values of the temperature dataset.

**Table 1.**Summary of the methods presented in Section 2.1.3, Section 2.1.4, Section 2.1.5, Section 2.1.6 and Section 2.1.7 and their abbreviation as used in Section 3.

Method | Section | Brief Explanation |
---|---|---|

arfima | 2.1.3 | Uses fitted ARMA or ARFIMA models |

naïve1 | 2.1.4 | Forecast equal to the last observed value, Equation (16) |

naïve2 | 2.1.4 | Forecast equal to the mean of the fitted set, Equation (17) |

theta | 2.1.5 | Uses the theta method |

rf | 2.1.7 | Uses random forests, Equation (20) |

**Table 2.**Methods presented in Section 2.1.3 for forecasting using ARMA and ARFIMA models and their specific applications in Section 3.

Method | Application in Section 3 |
---|---|

1st case in Section 2.1.3 | Simulations from the family of ARMA models |

2nd case in Section 2.1.3 | Simulations from the family of ARFIMA models, with d ≠ 0 |

2nd case in Section 2.1.3 | Temperature data |

Method | Explanatory Variables |
---|---|

rf05, rf10, rf15, rf20, rf25, rf30, rf35, rf40, rf45, rf50 | Uses the last 5, …, 50 variables |

rf20imp, rf50imp | Uses the most important variables from the last 20 and 50 variables respectively |

**Table 4.**Metrics of forecasting performance, their range and respective values when the forecast is perfect.

Metric | Equation | Range | Metric Values for Perfect Forecast |
---|---|---|---|

error | (21) | [−∞, ∞] | 0 |

absolute error | (22) | [0, ∞] | 0 |

squared error | (23) | [0, ∞] | 0 |

percentage error | (24) | [−∞, ∞] | 0 |

absolute percentage error | (25) | [0, ∞] | 0 |

linear regression coefficient | (36) | [−∞, ∞] | 1 |

**Table 5.**Simulation experiments, their respective models and their defined parameters. See Section 2.1.1 for the definitions of the parameters.

Experiment | Model | Parameters |
---|---|---|

1 | ARMA(1, 0) | φ_{1} = 0.6 |

2 | ARMA(1, 0) | φ_{1} = −0.6 |

3 | ARMA(2, 0) | φ_{1} = 0.6, φ_{2} = 0.2 |

4 | ARMA(2, 0) | φ_{1} = −0.6, φ_{2} = 0.2 |

5 | ARMA(0, 1) | θ_{1} = 0.6 |

6 | ARMA(0, 1) | θ_{1} = −0.6 |

7 | ARMA(0, 2) | θ_{1} = 0.6, θ_{2} = 0.2 |

8 | ARMA(0, 2) | θ_{1} = −0.6, θ_{2} = −0.2 |

9 | ARMA(1, 1) | φ_{1} = 0.6, θ_{1} = 0.6 |

10 | ARMA(1, 1) | φ_{1} = −0.6, θ_{1} = −0.6 |

11 | ARMA(2, 2) | φ_{1} = 0.6, φ_{2} = 0.2, θ_{1} = 0.6, θ_{2} = 0.2 |

12 | ARFIMA(0, 0.40, 0) | |

13 | ARFIMA(1, 0.40, 0) | φ_{1} = 0.6 |

14 | ARFIMA(0, 0.40, 1) | θ_{1} = 0.6 |

15 | ARFIMA(1, 0.40, 1) | φ_{1} = 0.6, θ_{1} = 0.6 |

16 | ARFIMA(2, 0.40, 2) | φ_{1} = 0.6, φ_{2} = 0.2, θ_{1} = 0.6, θ_{2} = 0.2 |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Tyralis, H.; Papacharalampous, G.
Variable Selection in Time Series Forecasting Using Random Forests. *Algorithms* **2017**, *10*, 114.
https://doi.org/10.3390/a10040114

**AMA Style**

Tyralis H, Papacharalampous G.
Variable Selection in Time Series Forecasting Using Random Forests. *Algorithms*. 2017; 10(4):114.
https://doi.org/10.3390/a10040114

**Chicago/Turabian Style**

Tyralis, Hristos, and Georgia Papacharalampous.
2017. "Variable Selection in Time Series Forecasting Using Random Forests" *Algorithms* 10, no. 4: 114.
https://doi.org/10.3390/a10040114