Forecasting Crude Oil Risk Using a Multivariate Multiscale Convolutional Neural Network Model

: In light of the increasing level of correlation and dependence between the crude oil markets and the external inﬂuencing factors in the related ﬁnancial markets, we propose a new multivariate empirical decomposition convolutional neural network model to incorporate the external inﬂuence of ﬁnancial markets such as stock market and exchange market in a multiscale setting into the modeling of crude oil market risk movement. We propose a multivariate empirical model decomposition to analyze the ﬁner details of interdependence among risk movement of different markets across different time horizons or scales. We also introduce the convolutional neural network to construct a new nonlinear ensemble algorithm to reduce the estimation bias and improve the forecasting accuracy. We used the major crude oil price data, stock market index, and the euro/United States dollar exchange rate data to evaluate the performance of the multivariate empirical model decomposition convolutional neural network model. The combination of both the multivariate empirical model decomposition and the convolutional neural network model in this paper has produced the risk forecasts with signiﬁcantly improved risk forecasting accuracy.


Introduction
With the deregulation wave spreading across the crude oil markets around the world and the widespread usage of energy derivatives, we have witnessed closer interaction and linkage between the crude oil markets and the global financial markets. Numerous empirical studies have reported the intensifying financialization of the crude oil markets, and the complicated relationship between crude oil markets, the stock market, and the exchange market [1,2]. For example, Ji [3] has discovered that before the 2008 global financial crisis, the crude oil markets were affected by the speculation factors in the short term and the fundamental factors in the long term. However, after 2008, the dependence between crude oil markets and influencing factors was embodied in the spillover effect between the crude oil markets and other financial markets such as the stock market, foreign exchange market, and commodities market. Du and He [4] found that there is a significant spillover effect between the stock markets and crude oil markets. Chen and Lv [5] used the extreme value theory to analyze the extreme correlation between the crude oil markets and Chinese stock markets. They confirmed that the extreme correlation reached its peak during the financial crisis. de Truchis and Keddad [6] used the fractal co-integration and the copula method to identify the volatility dependence between the crude oil price and the Japanese yen, subject to the influence of the financial crisis. Husain et al. [7] found that there is significant correlation and spillover effect between the crude oil price, stock index, and the metal price. However, the integration of these financial markets as the exogenous variables (factors) into the crude oil risk analysis and forecasting has been a difficult research problem in the literature, which attracted significant attention [8][9][10][11]. From a data analysis and forecasting perspective, the dependence between the crude oil markets and financial factors such as stock index and exchange rate are highly complex to capture in the modeling process. The investigations into the forecasting model incorporating the influences of these financial factors such as stock index and exchange rate are critical to the improvement of the risk forecasting accuracy.
There are a series of risk measures such as standard deviation, value at risk (VaR), and conditional value at risk (CVaR) in the literature. Among them, VaR is the most widely used one, which defines the tail loss at the pre-specified confidence level and investment time horizon [12]. It summarizes and reports the downside risk level as the critical information of the asset distribution that investors are concerned about. In this paper, VaR is used to measure the risk level that investors are most concerned about. The estimation and modeling of risk measures employ parametric models such as econometric and time series models, as well as nonparametric models. These models in the typical VaR estimation would rely on the historical data itself to infer the distribution and the estimation of the appropriate quantile. The difference between these two approaches is the additional assumptions imposed on the historical data in the parametric approach [12]. The autoregressive moving average-generalized autoregressive conditional heteroskedasticity (ARMA-GARCH) model is a parametric approach that imposes strict assumptions on the underlying distribution and model parameters. For example, Fiszeder et al. [13] proposed the range-based dynamic conditional correlation (DCC) GARCH model to estimate VaR in commodity and equity markets. Aloui and Mabrouk [14] found the fractionally integrated GARCH (FIAPARCH) model with skewed Student t distribution provided better forecasting accuracy compared to the benchmark model. Bedowska-Sojka [15] showed through comprehensive model evaluation that the GARCH model still represents one of the most robust models to estimate VaR with intraday data, better than the popular models such as the realized volatility model. Nonparametric models such as Monte Carlo and historical simulation models use the historical data changes to infer the future risk exposure and have been used extensively in the industry due to their assumption-free approach [16][17][18][19][20].
In the meantime, the semi-parametric approaches have emerged as the alternatives to combine the advantages of both parametric and nonparametric models in VaR research. New models in this category have demonstrated promising positive performance improvement and have attracted increasing attention. These models include extreme value theory, regime switching models, multiscale analysis, etc. [21]. Among them, multiscale models such as the empirical mode decomposition (EMD) are the latest development and have attracted increasing levels of attention in the risk measurement field. For example, Biage [22] used the wavelet analysis to model the stock volatility at different frequency scales and construct more accurate VaR estimates. Cifter [23] proposed a wavelet-based extreme value theory to estimate more accurate VaR. Zhu et al. [24] used the EMD model to construct the multiscale data structure for the carbon price and identify the extreme events. That way they improved the VaR estimation accuracy with better risk modeling. He et al. [25] identified the transient and extreme risk factors in the multiscale data domain extracted with the variational mode decomposition (VMD) model. Empirical studies conducted in the crude oil markets have demonstrated the evidence that the separate modeling of risk of different characteristics has produced VaR with higher-level reliability and accuracy. Zou et al. [26] have used the convolutional neural network (CNN) to consider risk forecasts at different scales to improve the overall risk estimation accuracy. The performance of the proposed model has been evaluated and verified using the trading data in the crude oil markets. Given the success of new techniques such as EMD in the modeling of univariate risk measures, their recent extension to the multivariate case can be introduced into the multivariate risk analysis and modeling. The widely popular deep learning model can also contribute to accurate forecast risk forecasting. Limited attempts in the literature may include recent research using the high dimensional multiscale analysis models such as bivariate EMD to estimate VaR for a portfolio consisting of multiple assets in metal markets, crude oil markets, etc. For example, He et al. [27] proposed the portfolio value at risk model based on bivariate EMD and the Copula technique and applied it to the precious metal market. Experiment results showed improved risk forecasting accuracy.
The purpose of this study is to apply the multivariate empirical mode decomposition (MEMD) model and the CNN model to investigate the multiscale nonlinear relationship between crude oil markets and financial markets such as foreign exchange markets and stock markets using crude oil price data. We also propose the MEMD-CNN VaR model to forecast the downside risk in the crude oil markets. This model incorporates the interaction and multiscale dependence between the crude oil markets and financial markets into the multiscale risk estimation process. We further constructed the nonlinear ensemble forecasting model based on the CNN. An empirical study has been conducted using the major crude oil price, euro/United States dollar (EUR/USD) exchange rate, and Dow Jones Industrial Average (DJIA) stock index data as the typical test case to evaluate the risk forecasting performance of the proposed model.
The major contribution of this paper is the proposition of the MEMD-CNN VaR model. It quantifies the impacts of external influences on financial sectors such as the stock market and exchange market over different investment time horizons in the MEMD-CNN VaR model. The MEMD-CNN VaR model is unique in that we have introduced the MEMD model to capture and model the correlation and multiscale dependence in the high dimensional multiscale risk structure and risk movement in the crude oil markets. Deep learning models such as the CNN model have also been introduced to construct a nonlinear ensemble algorithm to integrate individual forecasts across different scales and estimate the nonlinearity in the risk measure changes. To the best of our knowledge, very few approaches have been developed to address the modeling of risk measures in the multivariate multiscale setting, taking into account the external influences from the other financial markets. The work in this paper conducts an exploratory investigation into the use of multivariate multiscale models such as MEMD and deep learning models such as CNN to improve the VaR estimation accuracy.
We organize the rest of the paper as follows. In Section 2, we present details of the methodology used in the paper. This includes a brief introduction to the theory and algorithms for MEMD and CNN. Then we explain the theory and algorithm for the MEMD-CNN model in detail. Results from the experiments conducted to empirically evaluate the performance of the proposed model against the benchmark model have been reported in Section 3. Section 4 concludes and makes some summarizing remarks.

Multivariate Empirical Mode Decomposition and Convolutional Neural Network
MD model is an extension of the traditional univariate EMD model in multivariate data analysis [28][29][30][31]. The problem with the traditional EMD model is that the decomposed intrinsic mode functions (IMFs) differ in the maximum number, the fluctuation range, and the center frequency of modes when the EMD model is applied to the analysis of high dimensional data [29]. This results in the well-known mode mixing problem. This causes trouble for the interpretation of the statistical and economic meaning of the decomposed IMFs. The recently proposed MEMD model extended the volatility concept in the EMD model into the rotation concept in the higher dimensional space. In MEMD analysis, the multivariate data is viewed as the sphere in the geometric space. The decomposed IMFs constitute different surfaces for the spheres, all of which rotate around the center of the sphere [28,30,31]. The rotation speed can be used to distinguish the surfaces between spheres. Therefore, during the analysis, the correlation and dependence between different elements of the multivariate matrix data are incorporated into the rotation concept during the analysis. The decomposition process for the IMFs would reflect the correlation between them.
The MEMD algorithm is illustrated as follows. 1. Sample from n − 1 dimensional sphere.
2. Using the directional vector x θ k and the input v(t) T t=1 , calculate projection set vector i that corresponds to the maximal projection point. 4. Use the interpolation method to calculate the multivariate surface If the termination criteria is satisfied, repeat the above steps on x(t) − d(t). If the termination criteria is not satisfied, repeat the above steps on d(t).
In the MEMD model, firstly, the higher dimensional data are projected into the direction vectors in the lower dimensional space. The triple cubic spline method is used to perform interpolation on the direction vectors at extreme points and construct the projection surface in different directions. The accuracy of the constructed surface would increase with more directions from which the projection has been performed. Mainstream projection algorithms include the uniform sampling method for direction vectors, and pseudo-Monte Carlo sampling series [28].
Depending on the number of layers, the neural network can be classified as a shallow neural network or deep neural network [32]. The seminal universal approximation theory originally shows that the shallow neural network with one hidden layer can approximate any bounded continuous function with arbitrary lower errors [33]. Although this theoretical result is often cited to support the belief that the shallow neural network is capable of approximating the nonlinearity in data, in practice the number of parameters to estimate would grow exponentially when the complexity in the nonlinear data grows. This would result in a lower level of efficiency in training, and pose the upper bound on the depth of the neural network in its early days of development. When the structure of the network moves from the shallow one to the deeper one with continuously growing computational power, the neural network structure and training algorithm has been designed to reduce the number of parameters to estimate and improve the training efficiency. The universal approximation theorem has also been extended to the deep neural network case to show that the deep neural network is theoretically equivalent to the shallow neural network in approximating nonlinear continuous functions. In addition, in practice, the deep neural network would have a much higher level of efficiency in function approximation than the shadow neural network [34,35]. For example, if the objective function consists of functions with local characteristics and hierarchical characteristics, the deep neural network can significantly lower the number of parameters to estimate and avoid the curse of dimensionality problem [34]. Another interesting finding in recent research is that the deep learning models would have better generalization when the model significantly overfits the data during the deep learning training process. The generalization and the forecasting accuracy of the models can be improved simply by increasing the depth of the neural network and the number of neurons [36]. Mainstream deep learning algorithms refer to a series of algorithms such as the CNN for image processing, the long short-term memory (LSTM) network for natural language processing, and the deep belief network [37][38][39].
Inspired by the human visual processing system where neurons are responsive to local information, a CNN is a specially designed deep neural network that focuses on the extraction of space invariant data characteristics during the data training and modeling [37,40,41]. The CNN network uses different layers to process information. For the higher dimensional time series data, the neurons in the convolutional layer use the kernel function or the filters to perform the linear convolutional process on the input data. Neurons at the same layer share the filters, i.e., they use the same filters to produce a feature map of the represented data characteristic [41]. Filters may differ across different layers so that the feature maps generated by different layers correspond to different data characteristics. Therefore, the CNN can represent different data characteristics with the combination of different features and different convolutional layers. The typical layers in the CNN model include the in-put layer, the hidden layer, the pooling layer, and convolutional layer [32]. Each layer contains the corresponding neurons. CNN model has used the weight sharing and local responses during the network structure design and training. To process different data with different dimensions, the convolutional layer could be classified into the one-dimensional convolution layer, two-dimensional convolution layer, and higher-dimensional convolution. The one-dimensional convolution layer is especially suitable for the processing of natural language data and time series data. The two-dimensional convolution layer is the most popular one. It targets two-dimensional data such as images. Currently, the theoretical and mathematical theory for the convolutional layer is still in its early stage of development. The network structure and parameter design are usually determined based on the research of the empirical method.
Given input X, the feature map data at layer l is as in Equation (1) [42].
where W is the weight matrix, b is the bias matrix, and f is the activation function. Neurons in the pooling layers perform the pooling operation such as the maximum pooling and minimum pooling on the input data. The pooling function is shown in Equation (2).
where T is the pooling function, such as the max pooling or averaging pooling. Fully connected layer use the transfer function to process the input X as in Equation (3).
Typical activation functions include sigmoid, rectification linear unit (ReLU). ReLU is defined as the maximum value between 0 and X [42].

MEMD-CNN VaR Model
Given the random variable Y = (y 1 , y 2 , . . . , y T ) as the crude oil price, X = (x 1 , x 2 , . . . , x T ) and Z = (z 1 , z 2 , . . . , z T ) as the financial variable data, construct the high dimensional matrix MEMD model is used to decompose W into C up to scale S, as in Equation (4).
Suppose that there is a nonlinear relationship between the conditional mean forecasts U and the returns Y, and the nonlinear relationship f µ is estimated as in Equation (5).
f µ can be estimated using nonlinear estimation models. For example, f µ is estimated in the CNN model as the network structure and parameters in CNN models.
Suppose that there is nonlinear relationship between the conditional standard deviation forecasts G and the historical standard devaition forecasts σ t , the nonlinear function f σ is estimated as in Equation (6).
f σ can be estimated using nonlinear estimation models. For example, f σ is estimated in the CNN model as the network structure and parameters in CNN models.
With the estimated functions f µ and f σ , we can calculate the forecasts µ and σ using Equations (5) and (6). The VaR is calculated using parametric method as in Equation (7).
where a = 1 − cl, Z α is the quantile value for the standard normal distribution, h is the holding period, and VaR a,h,t is the VaR at the confidence level cl.

Data Collection
In this paper, we have used the West Taxes Intermediate (WTI) crude oil price in the U.S., Brent crude oil price in the United Kingdom (UK), the EUR/USD exchange rate, and the DJIA index to construct the empirical dataset to evaluate the risk forecasting accuracy of the proposed MEMD-CNN model against the benchmark models such as the ARMA-GARCH model and VMD-CNN model. Numerous research demonstrated the spillover effect between the crude oil markets and other financial markets such as foreign exchange market, stock market, etc. [4][5][6][7][43][44][45]. The EUR/USD exchange rate and DJIA index are selected and included in this dataset. The EUR/USD exchange rate and the DJIA index represent some of the most frequently quoted exchange rates and stock indices in the world, respectively [46,47]. The data were obtained from Quandl which is a central data warehouse that provides a comprehensive collection of financial and economic datasets. In this paper, we constructed the dataset that covers the daily closing price from 2 January 2003 to 19 February 2019. This resulted in the collection of 4066 daily observations in the WTI market, 4104 daily observations in the Brent market, 4048 daily observations in EUR/USD exchange rate data, and 4051 daily observations in the DJIA index. To facilitate the empirical model evaluations, the dataset is divided into three parts, such as the training set, the validation set, and the test set based on the 70:30 ratio [48]. The first 70% of the dataset is reserved as the training set to estimate the model parameters for the econometric models, the middle 21% of the dataset is reserved as the validation set to estimate the hyperparameter for the deep learning models, and the rest 9% of the dataset is used as the test set to conduct the out-of-sample model evaluation. The risk forecasting performance of the MEMD-CNN VaR models and benchmark models are evaluated using the Kupiec backtesting procedure, i.e., the popular unconditional coverage test [49].

Empirical Results
In this section, we report the results from a series of experiments conducted to analyze the data characteristics and evaluate the risk forecasting performance of the MEMD-CNN model and benchmark models such as the ARMA-GARCH model and VMD-CNN model.
Firstly, we calculated the descriptive statistics and statistical tests for the crude oil markets, foreign exchange markets, and stock markets in this paper. Results are reported in Table 1.
From the results in Table 1, it can be seen that the risk exposure for the EUR/USD and DJIA index is at a high level. There is generally significant volatility and fluctuations in all crude oil markets except for the EUR/USD exchange rate, indicated by the positive standard deviation of the market return. This observation suggests that there is a significant risk exposure, with a high probability of extreme and transient event occurrence. Interestingly, the risk exposure for the DJIA index is significantly larger than the EUR/USD exchange rate. Since the kurtosis value for the DJIA is at 13.7536, which is much larger than the kurtosis value for the EUR/USD at 5.6425, there is also a high correlation between the crude oil price and EUR/USD exchange rate. The correlation coefficients are larger than 0.65. The correlation between the crude oil markets and the DJIA index is at a low level, less than 0.05. In our experiment, we also include the VMD-CNN model as one of the benchmark models. The difference between the VMD model and the MEMD model is that the VMD model is a one-dimensional multiscale analysis technique. It decomposes multiple crude oil prices one by one, compared to the simultaneous decomposition process in the MEMD model. The purpose of the inclusion of this model in the empirical study is to evaluate the necessity of the introduction of the MEMD technique in the MEMD-CNN model.
Since the performance of the CNN is very sensitive to the hyperparameters, in our paper, the optimal network structure for the CNN is calculated using the greedy search method. The parameter to be optimized includes the number of kernels, i.e., the depth of the network. We calculate the risk forecasting accuracy for the CNN with different network parameters using the validation data set. The basic model parameters for the CNN model are as follows: the rolling window is 250; the validation data set is 252 for the WTI market and 252 for the Brent market; the decomposition scale is set to 8; the kernel shape is set to 2; the number of kernels is up to 8. We adopted the network structure of the classic Alex network and made some minor changes to the original network structure, such as the last layer changed from the classifier to the regression layer. The CNN model structure is as follows: input data layer-convolutional layer-ReLU-normalization-pooling-convolutional layer-ReLUnormalization-pooling-convolutional layer-ReLU-pooling-convolutional layer-ReLUpooling-fully link-regression layer.
We use the validation data to calculate the in-sample forecasting accuracy with the MEMD-CNN model. Results for both markets are reported in Table 2. Table 2. In-sample forecasting performance for the MEMD-CNN VaR model using the validation data.

Markets
(2, 1) It is clear that the forecasting accuracy of risk measures changes as the number of features changes. The reliability of the risk measure changes as the number of filters increases accordingly. We did not observe the strict monotonic increasing or decreasing relationship between the risk forecasting accuracy and the number of filters. In this paper, we use the mean squared error (MSE) as the selection criteria. Since MSE serves as the indication of risk forecasting accuracy in the literature, we argue that the smaller MSE would suggest more accurate risk measures. Based on MSE minimization, the optimal network structure for the MEMD-CNN model is determined as follows: for the WTI market, the filter size is 2 × 2, the number of filters is 4, and the pool size is 1 × 1. For the Brent market, the filter size is 2 × 2, the number of filters is 1, and the pool size is 1 × 1.
In-sample forecasting accuracy results using VMD-CNN model for both markets are reported in Table 3. We observe similar patterns in the results using the VMD-CNN model as those using the MEMD-CNN model. The accuracy and reliability of risk measures also change when the number of filters increases. Based on MSE minimization, the optimal network structure for the VMD-CNN model is determined as follows: for the WTI market, the filter size is 2 × 2, the number of filters is 7, and the pool size is 1 × 1. For the Brent market, the filter size is 2 × 2, the number of filters is 1, and the pool size is 1 × 1.
With the optimal network structure determined above, we further evaluate the risk forecasting accuracy using the out-of-sample test data with 360 daily observations in both markets. Performance measures are reported in Table 4. From results in Table 4, it is clear that the risk forecasting reliability and accuracy of the MEMD-CNN models have improved significantly in both WTI and Brent crude oil markets compared to the benchmark models in terms of the Kupiec backtesting procedure and MSE. The benchmark ARMA-GARCH model has failed to pass the statistical test at a 99% confidence level with a p value of 0.0054, less than the cutoff value of 0.05 in the WTI market. In terms of the number of exceedance, the proposed model achieves a much lower value than the benchmark model. This suggests that the VaR estimates from the proposed model are more conservative and provide better risk coverage, which corresponds to the lower number of cases of default and financial distress. In the meantime, MSE for the MEMD-CNN model decreases uniformly compared to the benchmark ARMA-GARCH model, as well as the VMD-CNN model. For example, at a 95% confidence level, MSE decreases from 1.1621 for the ARMA-GARCH model to 1.1049 for the MEMD-CNN model, and MSE decreases marginally from 1.1050 for the VMD-CNN model to 1.1049 for MEMD-CNN model. The forecasting accuracy improvement is more significant when the performance of the MEMD-CNN model is compared to the benchmark ARMA-GARCH model.
It is interesting to observe from experiment results that the choice of different multiscale models would affect the forecasting accuracy of the ensemble models. In general, the VaR from the proposed multivariate MEMD-CNN models is more accurate than VaR from univariate VMD-CNN models. VMD-CNN VaR model fails to pass the statistical tests at 95% and 99% confidence levels in the WTI market. Meanwhile, their forecasts from the multivariate models pass all the confidence levels in both markets. Based on the ensemble theory, the forecasting accuracy improvement suggest the reduction in the variance in the individual ensemble member forecasts and improvement in the heterogeneity of the ensemble members. The decomposed components from multivariate models serving as the ensemble members have a higher degree of heterogeneity. Our experiment results suggest that the introduction of the CNN-based nonlinear ensemble model can effectively reduce the estimation bias from the multiscale decomposition and risk forecasts. It would lead to improvement in forecasting accuracy and stability.

Conclusions
In this paper, we have applied the higher dimensional MEMD model to analyze the multiscale data characteristics of risk changes in the crude oil markets, as well as the interaction between the crude oil markets, the DJIA index, and the EUR/USD exchange rate. We have further constructed a crude oil price VaR forecasting model based on MEMD and CNN models. Risk movements at the individual scale are projected into the future using the ARMA-GARCH model. The risk forecast matrix is used as input to the CNN-based nonlinear ensemble forecasting model to predict the future risk movement. Empirical studies have been conducted using the crude oil price data, the EUR/USD exchange rate data, and stock index data as the typical test case. A comprehensive model performance comparison has been conducted in terms of risk forecasting reliability. Results show that the proposed multivariate multiscale ensemble model forecasts risk more accurately than the benchmark models. The deep learning model is especially helpful in providing robust risk forecasts.
Based on the analysis of the results, the work in this paper has two implications for the risk estimation field, in terms of testing of existence of multiscale data features and model construction. Firstly, the influence of the external risk factors for the crude oil markets can be incorporated with consideration of its fine multiscale risk structure. The potential performance improvement requires the introduction of new multivariate multiscale models, in addition to the existing multiscale models. This is expected to result in more accurate feature extraction. Secondly, there is clearly a nonlinear relationship between the movement of external influencing factors across different scales and the crude oil price movement, as indicated by the success of the deep learning based modeling approach to model it in an assumption-free model-free approach. Although it is preferable to derive the exact analytic form of these nonlinear data characteristics and functions, in practice the nonlinear function is often too complex to be computationally feasible in the mainstream approaches. Future research would be directed toward the continuous searches for the analytic form of the nonlinear function, as well as the emergence of a new design of innovative deep neural network structure for its better approximation.  Data Availability Statement: Publicly available datasets were analyzed in this study. These data can be found here: https://www.quandl.com (accessed on on 6 June 2022).

Conflicts of Interest:
The authors declare no conflict of interest.