Forecasting Volatility of Energy Commodities: Comparison of GARCH Models with Support Vector Regression

We compare the forecasting performance of the generalized autoregressive conditional heteroscedasticity (GARCH) -type models with support vector regression (SVR) for futures contracts of selected energy commodities: Crude oil, natural gas, heating oil, gasoil and gasoline. The GARCH models are commonly used in volatility analysis, while SVR is one of machine learning methods, which have gained attention and interest in recent years. We show that the accuracy of volatility forecasts depends substantially on the applied proxy of volatility. Our study confirms that SVR with properly determined hyperparameters can lead to lower forecasting errors than the GARCH models when the squared daily return is used as the proxy of volatility in an evaluation. Meanwhile, if we apply the Parkinson estimator which is a more accurate approximation of volatility, the results usually favor the GARCH models. Moreover, it is difficult to choose the best model among the GARCH models for all analyzed commodities, however, forecasts based on the asymmetric GARCH models are often the most accurate. While, in the class of the SVR models, the results indicate the forecasting superiority of the SVR model with the linear kernel and 15 lags, which has the lowest mean square error (MSE) and mean absolute error (MAE) among the SVR models in 92% cases.


Introduction
Energy risk has always been one of the major risk factors for most firms involved in key industrial sectors in both developed and developing countries. Risk management of energy commodities is a crucial issue for majority industrial firms, as it can seriously affect its competitiveness, viability and future profitability. Global economic developments, emerging technological advances and economic, geopolitical and environmental events have caused a significant increase in volatility of energy commodities prices in the last 20 years (cf. [1]). For these reasons, the ability to predict volatility of energy commodities has been gaining more and more importance.
In order to forecast volatility, machine learning techniques can also be applied. One such technique, which has been gaining high popularity in recent years, is support vector regression (SVR). In particular, SVR and SVR-based hybrid models have been applied in many studies to forecast prices of energy commodities, including: crude oil (e.g., [13][14][15][16][17][18]) and natural gas (e.g., [19,20]). However, they have been used only by Zhang and Zhang [8] to forecast volatility of energy commodities, specifically crude oil.
In this paper, we compare the volatility forecasting performance of the GARCH-type models with support vector regression for futures contracts of selected energy commodities, namely, crude oil, gasoil, gasoline, heating oil and natural gas, quoted at New York Mercantile Exchange and International Exchange. It should be noted that, nowadays, derivative markets are very important research areas since they play an essential role in trading energy commodities, and turnover on such markets has become significantly higher in comparison to spot markets.
This study has two main contributions. Firstly, it is the first comparison of the GARCH-type models and SVR for futures contracts for energy commodities. As we mentioned before, according to our knowledge, SVR has been used for volatility forecasting of energy commodities only by Zhang, and Zhang [8] who applied the least-squares version of a SVR model-least squares support vector machines (LSSVM) [21] for the West Texas Intermediate (WTI) and Brent crude oil spot prices. However, they used this model only as a part of the hybrid forecasting method-to forecast the residual series from the exponential generalized autoregressive conditional heteroscedasticity (EGARCH) model.
Secondly, we show that the accuracy of volatility forecasts depends substantially on the applied proxy of volatility. It has been shown in the literature that volatility forecasts of financial assets based on SVR models can be more precise than those from GARCHtype models (e.g., [22][23][24][25][26][27][28][29][30][31][32][33]). However, in most of these papers, the squared daily returns have been taken as the ex-post measure of volatility. Our study confirms that SVR with properly determined hyperparameters can lead to lower forecasting errors than the GARCH models when the squared daily return is used as the proxy of volatility in an evaluation. Meanwhile, if we apply the Parkinson estimator, which is a more accurate approximation of volatility, the results are different since they usually favor the GARCH models over SVR.
The rest of the paper is organized in the following way. Section 2 provides a description of applied models and methods. In Section 3 we introduce and describe data, explain the forecasting procedure and present the results of the research for energy commodities. Finally, in the last section, we give our concluding remarks.

GARCH-Type Models
The GARCH models are a standard tool applied in volatility research. The primary model is the standard GARCH model introduced by Bollerslev [34], which describes timevarying variance. Let us assume that is the innovation process which can be presented as: where ℎ is the conditional variance, is the set of all information available at time − 1, is the conditional normal distribution. Then the GARCH( , ) model (denoted as GARCH-n) is given as, where > 0, ≥ 0, ≥ 0 for = 1,2, … , ; = 1,2, … , , however Nelson, Cao [35] gave weaker conditions for non-negativity of the conditional variance. Instead of the conditional normal distribution in the Equation (1) the Student's t-distribution can be applied (the model denoted as GARCH-t) in order to better describe fatter tails and leptokurtosis of unconditional distributions of many empirical financial time series [36].
The conditional variance function of the standard GARCH model is symmetric in the lagged values of . Such function may be inappropriate for modelling the volatility of returns because it cannot represent a phenomenon, known as the leverage effect, i.e., the negative correlation between volatility and past returns. The first model describing asymmetric responses of the conditional variance to positive and negative errors is the exponential GARCH (EGARCH) model proposed by Nelson [37]. The EGARCH( , ) model is specified as, where ≡ 1, = /ℎ / , is the expected value. The logarithmic form of the conditional variance means that it is not necessary to introduce any restrictions on parameters to ensure the positivity of the conditional variance.
In many empirical studies, the sum of parameters estimates (except ) in the standard GARCH model is close to 1 which makes variance highly persistent. That is why Engle and Bollerslev [46] proposed the integrated GARCH (IGARCH) model in the form of Equation (2), where + ⋯ + + + ⋯ + = 1. A shock to the conditional variance in the IGARCH model is persistent in the sense that it remains significant for forecasts of all horizons. The last considered parameterization is the GARCH-in-mean (GARCH-M) model introduced by Engle et al. [47]. The GARCH-M( , ) model is described as, where > 0, ≥ 0, ≥ 0 for = 1,2, … , ; = 1,2, … , . The GARCH-M model is able to describe the fundamental trade-off relationship between return and risk. The parameters of the above GARCH-type models can be estimated by maximum likelihood or quasi-maximum likelihood methods. The log-likelihood function is described as, where is a vector containing unknown parameters of the model, is the number of observations used in estimation.

SVR Model
Let be the dependent variable and the vector of regressors. Based on a training data set ( , ) ,… , we want to estimate the regression function that has, at most, deviation from the outputs and that, at the same time, is as flat as possible [48]. The idea of SVR is to map the vectors onto a high-dimensional feature space using some fixed (nonlinear) transformation and then to estimate the linear model, where is the dimension of the space, ( ) denote transformations, are the coefficients and is the bias term [49,50]. To assess the estimated model Vapnik [51] proposed the -insensitive loss function, which does not penalize errors below . To estimate the coefficients of the SVR model (9) the -insensitive loss function is used, however at the same time, the postulate of the model complexity reduction is taken into account by minimizing the expression ‖ ‖ = , where = ( , , … , ) . In practice, it is not always possible to approximate all data of the training set with an error below (cf. [52]). In order to allow errors to be greater, the SVR model incorporates nonnegative slack variables and * representing the upper and lower constraints, s.t., for all = 1,2, . . . , . Finally, the regression function ( ) is obtained as the minimum of the functional, where is a pre-specified positive value. The first term of the Functional (13) is used to penalize large weights and to maintain regression function flatness, whereas the second term penalizes training errors by using the ε-insensitive loss function [53]. Where, is the hyperparameter to trade off these two terms. It controls the penalty imposed on observations that lie outside the -margin and, in consequence, helps to prevent overfitting. Both the and hyperparameters must be determined by the user. The optimization problem described above can be transformed into a corresponding dual problem using the Karush-Kuhn-Tucker conditions, with the solution, where and * are the Lagrange multipliers, is the number of support vectors and is the kernel function of the form, Any function satisfying the Mercer's condition ( [51]) can be used as the kernel. In practice, the most commonly used kernel functions include (cf. [54]

Ex-Post Volatility Measures
Volatility is not directly observable, even ex-post, therefore it has to be estimated. A popular proxy of the daily variance is the squared daily return, where is the daily logarithmic return at day . This proxy can be identified as the classical variance estimator.
Andersen and Bollerslev [55] showed that although the squared daily return is an unbiased estimator of the variance of return, it is also extremely noisy. A significantly more accurate measure of volatility is the realized variance ( ) calculated from intraday prices, where , is the intraday return (e.g., the 5-min return), is the number of intraday observations during a day. The realized variance is a significantly more efficient estimator of variance than the daily squared return, but high-frequency data are not commonly available and are significantly more expensive in comparison to daily data.
An alternative way is to calculate range-based variance estimators, based on daily opening, low and high prices. In practice, these values are easily available alongside with daily closing prices. The range-based estimators have already been used as the proxy of volatility in many studies (e.g., [56][57][58][59]). Furthermore, many volatility models have been proposed based on these estimators (e.g., [60][61][62][63][64]).
The Parkinson [65] estimator, the simplest of this class, is given as, where and are high and low prices over a day . This estimator assumes a zero drift process and is more than 4.9 times efficient than the classical variance estimator based on closing prices [65]. It has been shown that the accuracy of the Parkinson estimator is similar to the accuracy of the realized variance calculated from six observations during the day (see [59,66]).
Two other most popular range-based variance estimators are Garman-Klass [67] and Rogers-Satchell [68] estimators. The former can be presented as, where and are closing and opening prices at day , respectively. The Garman-Klass estimator assumes a zero drift process and is more than 7.4 times efficient than the classical variance estimator based on closing prices [67].
The Rogers-Satchell estimator is defined as: This estimator is independent of the drift. For a zero drift it is more than 6.0 times efficient than the classical variance estimator based on closing prices [68].
The realized variance is a more accurate proxy of volatility than the range-based estimators provided that intraday data are of good quality and the analyzed market is liquid (see [69]). However, the application of very high frequency data suffers from a large computational burden. Moreover, the range-based estimators are more robust than the realized variance to some microstructure effects like the bid-ask spread (e.g., [69,70]).

Data
In our study we investigate futures contracts of the selected energy commodities: WTI crude oil, gasoil, gasoline, heating oil and natural gas. Gasoil is quoted at the International Exchange (ICE) and the rest of considered contacts are listed at the New York Mercantile Exchange (NYMEX).
We analyze the data for the five-year period from 2 January 2015 to 31 December 2019. We apply two proxies of ex-post volatility in the forecast evaluation: the daily squared return (Equation (16)) and the Parkinson estimator (Equation (18)). In order to avoid the noise induced by measuring the overnight volatility we analyze open-to-close percentage returns = 100 ( / ) instead of close-to-close returns. Additionally, this concept allows for better comparability with the Parkinson estimator which, by definition, measures volatility only during the exchange session. Since we analyze percentage returns, we compute the Parkinson estimator multiplied by 100 2 . Daily closing prices and open-to-close returns for the all analyzed future contracts are presented in Figure 1. The descriptive statistics for daily returns and two considered proxies of volatility are presented in Table 1. The calculated means of returns are negative, despite the prices of all commodities (except natural gas) increased during the analyzed period. It stems from the fact that we analyze open-to-close returns instead of close-to-close ones. The absolute value of the minimum and maximum returns are relatively high. The highest volatility of returns (see standard deviations for returns) but also variation of volatility (see the standard deviations for both the squared returns and the Parkinson estimator) can be seen for natural gas, which is a well-known fact for this energy asset (see e.g., [71]. Crude oil is the second most volatile contract. Meanwhile, the lowest volatility is for gasoil and heating oil. The distributions of both proxies of volatility exhibit strong skewness and high kurtosis. Significantly higher variability can be seen for the squared returns than for the Parkinson estimator which indicates much stronger noise in the former. The autocorrelation of returns is weak and mostly insignificant. The autocorrelation of the squared returns and the Parkinson estimator is very high and significant, much stronger for the latter measure of volatility. These results shows that the Parkinson estimator can be useful as the volatility measure.

Forecasting Procedure
We compared the forecasting performance of the GARCH models with SVR. We have not detected any significant dependencies in the conditional mean, which is the reason for modelling only the conditional variance. To estimate both the GARCH and SVR models, we used a rolling window and apply the following procedure. For the starting sample (i.e., 2 January 2015 to 30 December 2016) we estimate models and obtain one-day-ahead forecasts. Consecutively, we added one new observation to the estimation sample, while at the same time removing the oldest observation. Then, based on each estimation sample, we re-estimated the models and made forecasts. We repeat this procedure until we obtain forecasts for the three-year period from 3 January 2017 to 31 December 2019. In our procedure we consider a small estimation sample, because the persistence of the conditional volatility in large samples could be exaggerated by the existence of structural breaks in the GARCH parameters (see [72]).
The considered GARCH models are GARCH-n, GARCH-t, EGARCH, GJR, APARCH, IGARCH and GARCH-M. The parameters of the models are estimated using the quasi-maximum likelihood method, except for the GARCH-t model, whereby the maximum likelihood method was applied.
We consider autoregressive SVR models, which means that we calculate the variance forecasts , using the lagged squared returns as predictor variables: where is the lag length. In order to construct the SVR models the regressors , , … , were first standardized, i.e., the lagged squared returns were centered by subtracting their mean and divided by standard deviation. After applying the Model (21) we use reverse standardization of  , to calculate the final variance forecast. We applied two kernels in the SVR models: The linear and RBF ones and four values of lags: = 1, = 5, = 10 and = 15, however, we present only the results for lags = 1 and = 15. The lag = 1 led to the simplest specification of the model, in which only the previous lagged squared return is used as the predictor variable. Obviously, higher lags led to a more general form of the model, which could potentially generate more accurate forecasts. On the other hand, high lags could introduce spurious information to the model, and additionally, substantially increase computation time. Our calculations show that = 15 leads to slightly more accurate forecasts than = 5 and = 10 and it seems to be optimal when considering the accuracy of the forecasts and computation time.
Finally, we consider four specifications of the SVR models: (1) SVR with the linear kernel and = 1 (hereafter SVR-lin-1), SVR with the RBF kernel and = 15 (SVR-rbf-15). As stated before, the values of the and hyperparameters (and additionally in the case of the RBF kernel) must be determined. For this aim, we applied the grid search technique. This method consisted in constructing many models for different values of the hyperparameters and selecting the optimal model on the basis of a validation set (we use the function fitrsvm in MATLAB R2015b to train the SVR models). We performed the grid search by considering consecutive values of = 0.5, 1, 1.5, … , 10, = 0.5, 0.6, 0.7, … , 2.5 and = 2 , 2 , … , 2 , 2 . To evaluate the model for each combination of hyperparameters, a 10-fold cross-validation procedure is applied. According to this approach, the investigated sample was randomly partitioned into 10 equal-sized subsamples. Nine of them were used to construct the SVR model, while the remaining one was used to validate the model. To this end, the mean square error (MSE) was computed on the observations in the validation subsample. This procedure is repeated 10 times (for each of the 10 subsamples used as a test set), and the average of 10 values of the obtained MSEs was calculated. Finally, the hyperparameters that led to the smallest MSE were considered to be optimal. It is worth noting that the optimal hyperparameters were determined for each forecast separately.
It should be emphasized that the computation of forecasts based on the SVR model (Equation (21)) does not ensure that it will always output non-negative values. In such cases we propose to take the previous squared return as the forecast, i.e.,  , = .
However, situations where our models led to negative outputs are very exceptional. Such problem occurs only for natural gas, for which SVR-lin-15 leads to 9 (out of 759) negative forecasts.
In the evaluation of forecasts we consider two proxies of volatility: the squared daily return and the Parkinson estimator. In Section 3.3 we present results for the squared daily return and Section 3.4 shows the results for the Parkinson estimator. Despite shortcomings of the squared daily return, we apply it due to its popularity in previous studies in which GARCH models have been compared with SVR models (see Introduction).
The forecasts are evaluated based on two primary measures, namely, the mean squared error and the mean absolute error.
The MSE is the most frequently used criterion in forecasting studies. It is written as, where , is the proxy of volatility of returns and , is the variance forecasts at time , is the number of forecasts. The MSE is robust to the use of a noisy volatility proxy (it yields the same ranking of competing forecasts using an unbiased volatility proxy, see e.g., [59]).
The mean absolute error (MAE) is less sensitive to outliers, which is very important when evaluating extraordinary events. It is given as: In order to assess whether the loss differentials between competing models are statistically significant two different tests are applied: the test of superior predictive ability (SPA) of Hansen [73] and the model confidence set (MCS) of Hansen et al. [74]. In the first test, it is checked whether each of the models considered is outperformed significantly by any of the alternatives. In this regard, the performance of the benchmark model relative to model can be described as, where , and , are the volatility forecasts from the benchmark model and model , respectively, ( , , , ), ( , , , ) denote the loss functions, and is the number of competing models (excluding the benchmark model). In this study, we applied two measures, namely MSE and MAE, to calculate ( , , , ). The null hypothesis of the SPA test is formulated as, : , ≤ 0, for all = 1, … , , meaning that the benchmark model is the best forecasting model compared to any of the models = 1, … , . The test statistic can be expressed as, where ̅ is the mean of , and is a consistent estimator of the asymptotic variance. The objective of the MCS procedure is to determine the set of best models, denoted as , from a given collection of models, . The set of the best models is defined as, where = ( , , , ) − ( , , , ) is the loss differential for , ∈ .
The null hypothesis is as follows, where ⊂ . The testing procedure begins with initially setting = . Then the null hypothesis is tested at a given significance level. If the null is not rejected then the = , otherwise the model that contributes most to the test statistic is removed from and the whole procedure is repeated until there is no more models to be removed. The is then referred to as the model confidence set (MCS). The best models are selected with a given level of confidence in terms of a criterion for the loss function that is user-specified. In our case we use the MSE and MAE as such criteria.

Results for the Squared Daily Return Used as a Proxy of Volatility
In this Section we evaluate the forecasts by applying the squared daily return as a proxy of volatility. The results are given in Tables 2 and 3, for the MSE, and MAE measures, respectively.  Both for the MSE and MAE measures, the highest errors of the forecasts are for natural gas, while the lowest are for gasoil. It corresponds with the fact that these commodities have the highest, and the lowest volatility of daily returns, respectively (see Table 1). The values of MSE are considerably higher than of MAE because the former measure is more sensitive to outliers. Figure 2 depicts the one-day-ahead volatility forecasts for one selected commodity, namely crude oil. For greater clarity, we present the forecasts only for the two best models in their classes chosen according to the MSE criterion, i.e., SVR-lin-15 and GARCH-t. We found that the forecasts from the GARCH models are greater than those from SVR in times of high volatility. It relates to the observation that the GARCH models react more quickly and strongly to the past huge changes in volatility than the SVR models. That is the reason squared forecasting errors are higher for the SVR models than for the GARCH ones. These general conclusions are also valid for other analyzed energy commodities. Generally, the GARCH models have lower MSE values than the SVR models, but when it comes to MAE we cannot derive such general conclusion. However, according to the MAE measure the SVR-lin-15 model is often preferable. In order to assess whether these findings are statistically significant we apply the SPA and MCS tests (calculated pvalues are given in Tables 2 and 3, for the MSE, and MAE measures, respectively).
The results of the SPA test for the MSE measure indicate that only the following models are outperformed significantly (at the 10% significance level) by any of the alternatives: IGARCH for crude oil, SVR-lin-1, SVR-rbf-1, SVR-rbf-15 for gasoil, SVR-rbf-15 for gasoline, IGARCH, GARCH-M, SVR-rbf-15 for heating oil and SVR-lin-1, SVR-rbf-1 for natural gas. According to the results of the MCS test, all models for all commodities belong to the model confidence set and there is no evidence to reject the null hypothesis of equal predictive ability. It means that the MCS test with the MSE criterion does not differentiate the examined models.
In contrast to the results for MSE, the SPA test for the MAE measure rejects the null hypothesis for most cases, indicating that most models are outperformed significantly by any of the alternatives. The only exceptions are: SVR-lin-15 for crude oil, EGARCH, APARCH for gasoil, SVR-lin-15 for gasoline, GARCH-n, GARCH-t, EGARCH, APARCH, SVR-lin-15 for heating oil and SVR-lin-1, SVR-lin-15, SVR-rbf-1 for natural gas. Similar conclusions come from the results of the MCS test and indicate that the most accurate forecast of volatility are obtained from SVR-lin-15 for crude oil, EGARCH, APARCH for gasoil, SVR-lin-1, SVR-rbf-1 for gasoline, EGARCH, APARCH, SVR-lin-15 for heating oil and SVR-lin-1, SVR-lin-15, SVR-rbf-1, SVR-rbf-15 for natural gas. It is worth noting that SVR-lin-15 is among the best models for all commodities except gasoil. It is difficult to choose the best model among the GARCH models for all analyzed commodities. However, forecasts based on the asymmetric GARCH models are often the most accurate.

Results for the Parkinson Estimator Used as a Proxy of Volatility
In this Section we evaluate the forecasting performance of the analyzed models by applying the Parkinson estimator as a proxy of volatility instead of the squared daily return. As it was discussed, this estimator is significantly more efficient than the classical variance estimator based on closing prices. For this reason we expect that the results of an evaluation of models may change significantly. The corresponding results are given in Tables 4 and 5 for the MSE, and MAE measures, respectively.  Both the MSE and MAE forecasting errors are significantly lower when the Parkinson estimator is used for the forecasts evaluation. The values of these measures are sometimes even more than three times lower than those obtained for the squared daily returns (compare Tables 2 and 3). The highest errors of forecasts are for natural gas, while the lowest for heating oil and gasoil. Figure 3 depicts the one-day-ahead volatility forecasts for the same commodity as shown in Figure 2, namely crude oil. We present the forecasts only for the two best models in their classes chosen according to the MSE criterion, i.e., SVR-lin-15 and GJR. The main difference between Figures 2 and 3 is that the values of the Parkinson estimator are considerably lower than the squared daily returns in the periods of high volatility. The findings derived from Figure 3 are similar to those from Figure 2. One can see that the forecasts from the GARCH models are greater than those from the SVR ones in times of high volatility. Moreover, the GARCH models fit even better to extreme changes in volatility than it is observed in the Figure 2. These general conclusions are also valid for the other investigated energy commodities. The results of the SPA test for the MSE measure show that the following models are not outperformed significantly by any of the alternatives: EGARCH, GJR, APARCH for crude oil, GARCH-t, APARCH, IGARCH, GARCH-M for gasoil, GARCH-n, EGARCH, GJR, APARCH, IGARCH, GARCH-M for gasoline, GARCH-n, GARCH-t, EGARCH, GJR, APARCH, GARCH-M, SVR-lin-15 for heating oil and all GARCH models with SVR-rbf-15 for natural gas. The results of the MCS test are similar and the following models belong to the model confidence set: EGARCH, GJR, APARCH for crude oil, GARCH-n, GARCHt, APARCH, IGARCH, GARCH-M for gasoil, all GARCH models with SVR-lin-1, SVR-lin-15 for gasoline, GARCH-n, GARCH-t, EGARCH, GJR, GARCH-M, SVR-lin-15 for heating oil and all GARCH models with SVR-rbf-15 for natural gas. According to the MSE measure the forecasts of volatility from the SVR models are generally inferior to the forecasts based on the GARCH models.
When it comes to the MAE measure the conclusions are quite similar to those for the MSE criterion. Significantly more precise forecasts of volatility are based on: EGARCH, GJR, APARCH, SVR-lin-15 for crude oil, APARCH for gasoil, all GARCH models with SVR-lin-15 (according to the SPA test) and all considered models except SVR-rbf-15 (according to the MCS test) for gasoline, EGARCH, APARCH for heating oil, GARCH-t, SVRlin-15 (according to the SPA test) and GARCH-n, GARCH-t, SVR-lin-15 (according to the MCS test) for natural gas. Therefore for three energy commodities the SVR-lin-15 model is not significantly inferior to the GARCH models. It is difficult to choose the best model among the GARCH models for all analyzed commodities, however, similarly to the results in Section 3.3, forecasts based on the asymmetric GARCH models are often the most accurate.

Discussion of the Results
In this study, we apply two proxies of volatility to evaluate forecasts: The squared return and the Parkinson estimator. The former is an extremely noisy variance estimator and its usage can lead to an unreliable evaluation of models. That is the reason why the Parkinson estimator, which is significantly more accurate measure of volatility is also adopted.
When the squared daily returns are used as an ex-post volatility measure the results are ambiguous. The MSE values are large (due to the existence of large outliers) and it is not possible to indicate significantly better models amongst analyzed ones. Meanwhile, the MAE criterion favors the SVR-lin-15 model for most commodities. On the other hand, when the Parkinson estimator is used as a proxy of volatility the forecasting errors are significantly lower, indicating more accurate predictions from the considered models and usually the obtained results favor the GARCH models over the SVR ones. Our findings indicate that the accuracy of volatility forecasts depends substantially on the applied proxy of volatility. This conclusion is important since in most papers concerning the application of SVR models to volatility forecasting, only the squared daily returns (or a moving average of the daily squared returns) have been analyzed (e.g., [22][23][24][25][26][27][28][29][30][31][32][33]). Our results, obtained for the squared daily returns, confirm the conclusion formulated in these studies that SVR can lead to lower forecasting errors than the GARCH models. However, we argue that this forecasting superiority of SVR models is not unequivocal, since it depends on the measure used to evaluate the forecasting errors and is valid only for models with properly determined hyperparameters.

Conclusions
Due to global economic developments, emerging technological advances and economic, geopolitical and environmental events a significant increase in volatility of energy commodities prices has occurred in the last 20 years. This highly volatile environment has become attractive to financial speculators, magnifying the risk on the energy commodities markets. That is reason there is a strong need to look for more accurate methods of volatility forecasting for such commodities.
In the paper we compare the forecasting performance of the GARCH-type models with support vector regression for futures contracts of selected energy commodities: Crude oil, natural gas, heating oil, gasoil and gasoline. The GARCH models are a standard tool applied in the volatility literature, while SVR is one of machine learning techniques which have been gaining huge popularity in recent years.
We show that the accuracy of volatility forecasts depends substantially on the applied proxy of volatility. Our study confirms that SVR with properly determined hyperparameters can lead to lower forecasting errors than the GARCH models when the squared daily return is used as the proxy of volatility in an evaluation. Meanwhile, if we apply the Parkinson estimator which is a more accurate approximation of volatility, the results are different since they usually favor the GARCH models over SVR.
Moreover, it is difficult to choose the best model among the GARCH models for all analyzed commodities, however, forecasts based on the asymmetric GARCH models are often the most accurate. While, in the class of the SVR models, the results indicate the forecasting superiority of the SVR model with the linear kernel and 15 lags. Precisely speaking, in 92% (i.e., 18 cases out of 20) the SVR model with the linear kernel and 15 lags has the lowest MSE or MAE among SVR models, for all analyzed time series and for both volatility proxies.
In the future, this study can be extended in several directions. Firstly, other machine learning methods like neural networks or hybrid models can be applied. Secondly, other proxies of volatility like the realized variance or the bi-power variation can be used for the evaluation of forecasts. Thirdly, the analysis can be done for the COVID-19 crisis that is, for a period with unheard-of volatility on the energy market. Fourthly, the comparison of the models can be performed for simulated data assuming different generating processes.
Author Contributions: W.O. described, estimated and applied the SVR models. M.F. and P.F. described, estimated and applied the GARCH-type models. All authors wrote, reviewed and commented on the manuscript. All authors have read and agreed to the published version of the manuscript.