Oil Market Efficiency Under a Machine Learning Perspective

Forecasting commodities and especially oil prices has attracted significant research interest, often concluding that oil prices are not easy to forecast and implying an efficient market. In this paper, we revisit the efficient market hypothesis of the oil market attempting to forecast the West Texas Intermediate oil prices under a machine learning framework. In doing so, we compile a dataset of 38 potential explanatory variables often used in the relevant literature and through a selection process we build forecasting models that use past oil prices, refined oil products and exchange rates as independent variables. Our empirical findings suggest that the Support Vector Machines (SVM) model coupled with the non-linear Radial Basis Function kernel outperforms the linear SVM and the traditional logistic regression (LOGIT) models. Moreover, we provide evidence that points to the rejection of even the weak form of efficiency in the oil market.


Introduction
How do oil prices respond to financial and macroeconomic shocks? Is there a link between commodity prices, stock markets and monetary policy? Moreover, should this link exist, which are the driving factors of oil price determination or in other words which are the variables that drive oil price evolution? Despite the vast research literature in the field, inference on the relationship between macroeconomic variables, financial variables and oil prices is still an active issue of debate in the literature.
In one of the first attempts to describe the relationship between oil prices and macroeconomic variables Hotelling (1931) suggests the existence of a direct link between oil prices and the implemented monetary policy, claiming that oil prices are determined by interest rates. Despite the critique of the Nobel laureate Robert Solow on what is now known as the "Hotelling's rule" (Solow, 1986), the detailed survey of Gaudet (2007) on the impact of Hotelling's work to the literature reveals that the relationship between oil prices and interest rates is still subject to research debate.
Another important milestone in the quest of the driving factors of oil prices is the work of Hamilton (1983). He detected a strong positive correlation between fluctuations of the business cycle and oil prices, suggesting an active link between economic conditions and oil prices. Nevertheless, his study includes a period with significant positive trend in economic output, leaving the area of oil prices during periods of economic downturn uncharted.
Under a different perspective, Baumeister and Killian (2014) build a model that forecasts crude oil prices to pinpoint the variables that actually foresee oil price shocks.
The scope of their study is to provide an empirical "rule of More recently, Baumeister and Killian (2016) review oil price shocks from the 1973-1974 oil crisis to the 2008 global financial crisis. Examining a variety of relations between oil prices and the economy, they conclude that oil prices are hard to forecast and that most of oil prices surges should be attributed to supply and demand shocks and not to causal relationship with other variables. In an alternative approach based on models used in measuring risk in financial markets, Knetsch (2007)  In a similar vein, Yu et al. (2008) apply signal processing techniques as a preprocessing step to neural network models in forecasting daily WTI and Brent oil prices.
Electronic copy available at: https://ssrn.com/abstract=3275923 For the period 1 January 1986 to 30 September 2006 they find that their autoregressive forecasting scheme outperforms econometric alternatives in out-of-sample forecasting of the last 968 observations. Their empirical findings also hold for an updated sample from 03 January 2011 to 17 July 2013 (Yu et al., 2014). The authors find that their hybrid forecasting setup that combines signal processing to machine learning reaches a 62.2% directional accuracy in out-of-sample forecasting of WTI oil prices. Shin et al. (2013) combine supervised to unsupervised machine learning in forecasting WTI oil prices for the period January 1992 to June 2008. Their approach exploits the forecasting ability of the supervised learning methods and the merit of unsupervised learning in modelling the structure of the data. Using the last 100 observations for out-of-sample forecasting they find that their autoregressive "semi-supervised" technique outperforms the RW model.
Overall, the review of the literature suggests that machine learning methodologies produce a higher forecasting accuracy in comparison to the typical econometric ones and they typically outperform the RW model, while econometric approaches often fail to do so.
In this paper, we attempt to uncover the possible relationship between oil prices, (namely West Texas Intermediate (WTI) prices) and other economic variables employing a machine learning framework on a monthly basis. Unlike previous machine learning approaches in forecasting oil prices that select variables atheoretically, we compile a pool of 38 potential regressors based on economic theory and the literature reported herein. and select the variables that are most relevant to oil prices forecasting.
Based on a Support Vector Machines (SVM) model coupled with the linear kernel and the nonlinear Radial Basis Function (RBF) kernel, we examine the directional forecasting performance of our models in comparison to the typical econometric logistic regression methodology.
The selection of the SVM methodology is motivated by the superior forecasting ability of the methodology, reported in the relevant literature, in forecasting economic and financial variables (see among others Plakandaras et al., 2015 andKhandani et al.,2010). Thus, the innovation of our paper stems from the application of a state-ofthe-art machine learning methodology and the empirical recognition of a causal 5 relationship between variables reported in the literature and oil prices. We also specifically test the relationship between oil prices and interest rates, as a possible empirical validation of Hotelling's rule under a machine learning framework. To the best of our knowledge this is the first attempt to do so. In section 2 of the paper we briefly describe the methodology and the data, section 3 presents our empirical findings, while section 4 concludes the paper.

Support Vector Machines
Support Vector Machines is a supervised machine learning methodology used in data classification. Proposed by Cortes and Vapnik (1995), the basic concept of an SVM is to select a small number of data points from a dataset, called Support Vectors (SV), defining a linear boundary that separates the data points into two classes. The methodology can be generalized for cases including more classes. Nonetheless, as in this study we focus on directional forecasting, the binary version of the model is adequate. In what follows, we briefly describe the mathematical derivation of the SVM theory.
We consider a dataset (vectors) x ∈ ℜ 2 ( = 1,2, … , ) belonging to two classes ∈ {−1, +1}. If the two classes are linearly separable, we define a boundary as: where w is the weight vector and b is the bias (Figure 1). This optimal hyperplane is defined as the decision boundary that classifies the dataset into its respective classes with the maximum accuracy and has the maximum distance from either class. This distance is often called the "margin". In Figure 1, the SVs are represented with a pronounced contour, the margin lines (defining the distance of the hyperplane from each class) are represented by solid lines and the hyperplane is represented by a dotted line.
In order to allow for a predefined level of error tolerance in the training procedure, Cortes and Vapnik (1995) introduced non-negative slack variables, ≥ 0, ∀ , and a parameter, C, describing the desired tolerance to classification errors. The solution to the problem of identifying the optimal hyperplane can be dealt with through the Lagrange relaxation procedure of the following equation: where ξi measures the distance of vector xi from the hyperplane when classified erroneously, and 1 , 2 , … , are the non-negative Lagrange multipliers.
The hyperplane is then defined as: where = { : 0 < < } is the set of support vector indices.
When the two-class dataset cannot be separated by a linear separator, the SVM is paired with the kernel projection trick. The concept is quite simple: the dataset is projected through a kernel function into a richer space of higher dimensionality (called a feature space), where the dataset is linearly separable. In Figure 2, we depict a dataset of two classes that are not linearly separable in the initial dimensional space (left graph).
After the projection onto a higher dimensional space (right graph), the linear separation is feasible.

Figure 2:
The data space: The non-separable two-class scenario (left) and the separable case in the feature space after the projection (right).
The solution to the dual problem with the projection of equation (2) now transforms to: under the constraints ∑ = 0 =1 and 0 ≤ ≤ , ∀ , where K( , ) is the kernel function.

8
In our models, we examine two kernels: the linear kernel and the radial basis function (RBF) kernel 2 . The linear kernel detects the separating hyperplane in the original dimensional space of the dataset, while the RBF projects the initial dataset onto a higher dimensional space ( Figure 3). The mathematical representation of each kernel is: where is a kernel parameter.

Figure 3:
An example of an SVM classification using the RBF kernel. The two classes are separated with a linear separator on a higher dimensional space, which when reprojected back into its original dimensions becomes a non-linear function. The circled instances are the Support Vectors defining the decision boundary and the instances with a diamond rounding are misclassified instances.
In order to avoid over-fitting the dataset (fitting the model to the data and not to the phenomenon), we use cross-validation in training step. According to the crossvalidation methodology, the training sample is split into n parts. After selecting an initial configuration, the model is trained iteratively on the n-1 parts, keeping each time one part for testing purposes. The in-sample accuracy of the model with the selected configuration is simply the mean value of the forecasts over all n segments. After changing the configuration of the model's parameters, the iterative training scheme is 9 repeated until the minimum forecasting error is achieved. This training scheme is called "n-fold cross validation". An overview of the n-fold cross validation is depicted in Figure 4.

Figure 4:
Overview of a 3-fold Cross Validation training scheme. It shows that each subset is used as a testing sample while the others are used for training the model for each parameters' value combination.
The search for the optimal parameter setup during cross validation is performed in a coarse-to-fine grid search evaluation scheme. In this type of grid search, the parameters are initially evaluated in a large step search procedure. Then, improved results are achieved by using a denser grid focusing only on the parts of the research area where the model achieves the highest accuracy. By narrowing the grid search, the point where no further improvement is achieved can be found. In Figure 5 we provide a graphical representation of a three-iteration coarse-to-fine grid search. Optimum results in terms of forecasting performance are depicted with the color gray. As the area becomes darker, the grid step becomes smaller and the search finer. Coarse-to-fine grid search is a lower complexity bypass of the exhaustive search in the finer level.

Figure 5:
A three step coarse to fine grid search procedure for 2 parameters x1 and x2, where the forecasting accuracy of the model rises as one moves from coarse to dense (darger) areas of the grid.

Logistic Regression
When it comes to directional forecasting, the dependent variable takes two states; 0 and 1 expressing negative and positive oil price returns, respectively. The drawback in estimating a binary dependent variable based on the OLS regression methodology is that the nature of the dependent variable makes OLS regression results irrelevant, due to the heteroskedasticity of the estimated errors and violations to the assumptions in the asymptotic efficiency of the estimated coefficients. Thus, instead of attacking directly the problem, we estimate the probability = ( = 1| ) = 1+ that the dependent variable is 1, given the values of the independent variables. Given the binary nature of our dependent variable, the logarithmic ratio of the probability in being in state 1 to state 0 is given by which is called the "logit" where is the vector of independent regressors and a vector of estimated coefficients. If the estimated is above 0.5 we classify it as belonging to class 1, while if it is below 0.5 we classify it in class 0.

The Data
We With the exception of interest rates, all variables are transformed into their natural logarithms. We do not test for stationarity and proceed with the levels of all variables, given that the results of the SVM methodology are robust in the existence of unit root processes in the data (for more details see Tay and Cao, 2002). The descriptive statistics for all variables in our sample are reported in Table 1.

Empirical Results
Given the scope of this paper, as reported in the introduction, we proceed in three steps.
The first step is to test for the best autoregressive (AR) model in directional forecasting of oil prices (rise and drop) based on the SVM and the logistic regression methodologies. A comparison of the AR model with the naïve RW model that is used 13 as a benchmark where the best guess about next period's directional change is the current one, reveals whether we can reject the weak form efficiency in the oil market.
Proposed by Fama (1965), the Efficient Market Hypothesis (EMH) states that the evolution of prices in an efficient market follows a random walk and thus it is impossible to create a forecasting model that achieves sustainable positive returns in the long-run. The EMH is usually presented in three forms; the weak, the semi-strong and the strong form of efficiency. An efficient market of the weak-form is observed when historic prices of the variable in question cannot forecast the future ones. Thus, autoregressive models have no forecasting power and the best forecast about next period's price is today's price. Semi-strong efficiency imposes more strict assumptions in that all historic prices and all publicly available information is already reflected in current asset prices and thus they cannot be used successfully in forecasting. Finally, the strong form of the EMH builds on the semi-strong case adding all private information and thus rendering impossible to forecast consistently the future evolution of an asset's price. Overall, outperforming the RW model is an indication of potential economic gains for a trader that follows an alternative trading strategy.
As a second step we examine whether we can build structural forecasting models that are more accurate that the AR ones. These models build on the best AR models by augmenting them with various relevant variables as potential regressors. In doing so, we test all possible combinations of variables in order to detect the most accurate forecasting models. Both in the AR and the structural models, we use quarterly dummy variables to account for seasonal fluctuations in oil demand (EIA, 2018).
Finally, the third step is to focus explicitly on the interest rates and evaluate their ability to forecast oil prices. By doing so, we empirically test Hotelling's rule.
We start our analysis with the AR versions of all models. We split our sample into two parts; the sub-sample from June 2006 to September 2015 i.e. 112 months is used for training the models and the sub-sample from October 2015 to February 2018 i.e. 29 months is used to evaluate the out-of-sample forecasting accuracy.
In terms of the best AR models, our empirical findings suggest that the most accurate SVM model coupled with the linear kernel is the one that includes 11 lags of the WTI price (SVM-linear-11). The most accurate SVM model with the RBF kernel includes 5 lags of the WTI price (SVM-RBF-5) and the most accurate logistic regression AR model has 12 lags 3 (Logit-12).
After determining of the most accurate AR models, we augment them with additional variables through an exhaustive search of all possible combinations. We keep the same subsamples for training and testing purposes. According to this scheme, the most accurate structural SVM-linear-11 model includes the dependent variable "Fuel oil, No.  Table 2. Moreover, we detect that certain variables are useful in forecasting oil prices as in Zhao 3 All detailed results are available from the authors upon request. et al. (2017), but unlike their approach, we are able to specifically identify these variables. Another finding is that all AR models are less accurate than their respective structural versions.
Interestingly, all AR or structural SVM models outperform the RW model in terms of out-of-sample forecasting accuracy. This finding suggests that the conclusion of Baumeister and Killian (2016) that no model outperforms a RW may be attributed to the low forecasting ability of the econometric methodology applied in their study. The machine learning approach we employ in this study clearly outperforms the RW model, rejecting their conclusion. Moreover, given that the AR SVM models outperform the RW model, provides evidence against the EMH for the WTI oil market, even in its weak form.
Our atheoretical approach has not highlighted, however, the existence of a potential causal relationship (lead-lag relationship) between interest rates and WTI oil prices, given that they were not selected through the exhaustive search step as informative variables. In order to test for the hypothesis suggested by the Hotelling's rule that interest rates determine oil prices, we build SVM and logistic regression models using the 4 interest rate variables in our sample as well as the spread between the 10-year and 5-year interest rates for the U.S. with the effective federal funds rate. The use of the term spread between long and short-term rates is motivated by the need to include the expectations about future economic conditions, which are captured by the term spread.
To keep things tractable, we use the same sub-samples for training and testing the forecasting accuracy and we also use the seasonal dummy variables. Unlike our atheoretical approach, we do not include lags of the dependent variable in this empirical analysis. We test alternative lag orders of interest rates, but the models with the highest forecasting accuracy are the ones that forecast next period's (month) oil prices based on this period's interest rates. In Table 3 we report the out-of-sample forecasting accuracy of the interest rate models. suggesting that oil market participants follow short-term interest rates closer than longterm ones.
Unlike the bulk of literature that rejects Hoteling's rule (Gaudet, 2007), our empirical findings provide evidence in favor of interest rates driving oil prices. Although the interest rates models do not achieve the highest forecasting accuracy (and thus they are not selected during the atheoretical step), they outperform the RW model suggesting that they are able to forecast oil prices evolution. In contrast older studies fail to do so.
This discrepancy with the existing literature should be attributed to the forecasting ability of the machine learning approach, while older studies are based on OLS regressions and logistic regressions. The use of our SVM methodology with the higher forecasting accuracy in comparison to typical econometric methodologies is able to unveil the relationship between interest rates and oil prices. We leave the issue open for further research.

Conclusions
In this paper we revisit the efficient market hypothesis for the oil market under a machine learning framework. In doing so, we build AR and structural forecasting models based on the Support Vector Machines methodology, spanning the period June 2006 to February 2018. Our empirical findings suggest that the AR SVM model outperform the RW, rejecting even the weak form of efficiency in the oil market.
Through an exhaustive search among a pool of 38 potential regressors, we find that when we couple the AR SVM model with the trade-weighted index of U.S. dollars to a basket of foreign currencies, we manage to increase the out-of-sample forecasting accuracy to 67.8% . Moreover, when we focus explicitly on the relationship between oil prices and interest rates our machine learning approach is able to unveil the existence of a causal relationship, unlike previous studies that use typical econometric methods.
This finding is an indication in support of Hotelling's rule, but we leave this issue for future research.