Next Article in Journal
An MDL-Based Wavelet Scattering Features Selection for Signal Classification
Next Article in Special Issue
Fréchet Binomial Distribution: Statistical Properties, Acceptance Sampling Plan, Statistical Inference and Applications to Lifetime Data
Previous Article in Journal
An Explainable Machine Learning Framework for Forecasting Crude Oil Price during the COVID-19 Pandemic
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Forecasting Crude Oil Prices with Major S&P 500 Stock Prices: Deep Learning, Gaussian Process, and Vine Copula

1
Statistics Discipline, University of Minnesota at Morris, Morris, MN 56267, USA
2
School of Business Administration, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Korea
3
Department of Management and Information Systems, Dong-A University, Busan 49236, Korea
*
Author to whom correspondence should be addressed.
Axioms 2022, 11(8), 375; https://doi.org/10.3390/axioms11080375
Submission received: 2 June 2022 / Revised: 15 July 2022 / Accepted: 27 July 2022 / Published: 29 July 2022
(This article belongs to the Special Issue Statistical Methods and Applications)

Abstract

:
This paper introduces methodologies in forecasting oil prices (Brent and WTI) with multivariate time series of major S&P 500 stock prices using Gaussian process modeling, deep learning, and vine copula regression. We also apply Bayesian variable selection and nonlinear principal component analysis (NLPCA) for data dimension reduction. With a reduced number of important covariates, we also forecast oil prices (Brent and WTI) with multivariate time series of major S&P 500 stock prices using Gaussian process modeling, deep learning, and vine copula regression. To apply real data to the proposed methods, we select monthly log returns of 2 oil prices and 74 large-cap, major S&P 500 stock prices across the period of February 2001–October 2019. We conclude that vine copula regression with NLPCA is superior overall to other proposed methods in terms of the measures of prediction errors.

1. Introduction

Global monetary policies developed in response to the COVID-19 crisis and the 2022 Russia–Ukraine war have resulted in high crude oil prices, causing economic inflation and a bear market rally in 2022.
The relationship between the price of crude oil and the stock market has been a main research topic in economics and finance. The relationship between oil prices and the stock market, specifically in terms of forecasting stock returns and analyzing volatilities using oil prices, has been studied in [1,2]. Since oil-sensitive stocks have strong forecasting power on crude oil prices [3], we want to examine how oil prices can be affected by the most influential stock prices in this study. Some interesting statistical methods to predict oil prices have been proposed, such as investor attention (constructed by the Google search volume index [4]), the LASSO machine learning method [5], and copula dependence structures between oil prices, exchange rates, and interest rates [6]. In this study, we want to employ deep learning, Gaussian process modeling, and vine copula regression methods to predict the oil prices with the most influential stock prices.
Deep learning has been utilized to forecast stock prices in [7], and the comparison of stock-price prediction models using pre-trained neural networks has been performed in [8]. The LSTM and ARIMAX algorithms were employed in [9] to analyze the impact of sentiment analysis in stock market prediction. Gaussian process regression methods and extensions for stock market prediction have been studied in [10]. Stock prediction using Gaussian process regression has been studied in [11]. The amount of training data required for deep learning [11] and the choosing of hyperparameters can make the method difficult to use. The response of a Gaussian process model needs to be normally distributed if the hyperparameters are fixed. So, we propose an alternative forecasting model (vine copula regression) to predict oil-price returns using US stock returns. The copula method does not need assumptions, such as normality, linearity, and independence of errors. Additionally, vine copula can explain a flexible multivariate dependence structure. To show our proposed method’s superiority over the deep learning and Gaussian process models, we apply accuracy measures to deep learning, Gaussian process models, and vine copula regression models. This study also examines whether there are firms that are highly influential to oil prices. To do this, we use Bayesian variable selection and nonlinear principal component analysis for forecasting crude oil prices (Brent and WTI).
This paper is organized as follows: Section 2 presents the data description and summary, Section 3 gives an overview of the statistical models for forecasting, the illustrated comparison study of the proposed methods is presented in terms of the measures of errors in Section 4, and the discussion is presented in Section 5.

2. Summary Statistics

The sample contains the monthly log returns of Brent Crude, Western Texas Intermediate (WTI), and 74 major S&P 500 stock prices from February 2003 to October 2019. (Variable names for our sample data can be found in Appendix A.) The reason for choosing monthly log returns over daily log returns in this paper is our attempt to eliminate the noise from small economic factors, such as political news. The 74 stocks were selected based on the size of their market capitalization. Figure 1 plots the log returns for the 2 oil prices and 74 stock prices, along with a functional mean equation line of sample log returns. We observed a co-movement among oil and stock returns in Figure 1. This is already a well-known phenomenon. Our sample was collected from the Yahoo Finance website (https://finance.yahoo.com/) (accessed on 7 November 2020). We converted prices to log returns throughout our analyses.
Table 1 displays the summary statistics for the oil and stock price monthly log returns. We observed that Brent and WTI have similar distributional properties: the log returns of Brent and WTI prices are positively skewed with fat tails, while the average log returns of the Brent and WTI prices are close to zero.
We could expect that there would be a more prominent relationship among crude oil and major S&P 500 stock prices. We used February 1 as the beginning of the log return monthly data because of the 3 January 2003 base log return difference.
Let S t be a price time series at time t. For a log return series, r t = l o g ( S t S t 1 ) . Each of the datasets was given a new variable known as “log returns”. We summarized the descriptive statistics for the BRENT and WTI log return data, such as mean, skewness, and kurtosis, as well as 5 summary statistics in Table 1.
In Table 1, it can be observed that the standard deviations of the log returns of BRENT and WTI are about the same. The values of skewness in the log returns of BRENT and WTI in this period are positive, such that oil prices will increase in the future. In addition, the values of kurtosis in the log returns of BRENT and WTI are greater than 3, meaning that they have heavy tails compared to a normal distribution. Figure 1 shows the time trend of the Brent and WTI monthly log returns over the given period. If we look at the 2008 economic crisis period, as shown in Figure 1, the log returns of Brent and WTI were very high in the first half of 2008, and then they suddenly dropped to very low values in the second half of 2008. Because investors feared the tightening of monetary policy, a slowing economy, and an intensifying trade war between the U.S. and China in December 2018, the S&P 500 fell more than 9%, causing the log returns of Brent and WTI to be low.

3. Statistical Methods

3.1. Gaussian Process (GP) Model

The first forecasting method we used was the Gaussian process (GP) model, which leads to a supervised learning method aimed at solving regression and probabilistic classification problems. The GP models have been popular for use in studying uncertainty quantification (UQ), which is the science of the quantitative characterization and reduction of uncertainties in both computational and real-world applications. Eight software packages for fitting Gaussian processes to various functions have been compared based on the root-mean-square error of predictions over the input space [12].
A Gaussian process (GP) is a random process where any point x R d is assigned a random variable f ( x ) and the joint distribution of a finite number of these variables p ( f ( x 1 ) , , f ( x N ) ) is Gaussian: p ( f | X ) = N ( f | μ , K ) , where f = ( f ( x 1 ) , , f ( x N ) ) , μ = ( m ( x 1 ) , , m ( x N ) ) and K i j = κ ( x i , x j ) . m is the mean function, and it is common to use m ( x ) = 0 as GPs are flexible enough to model the mean. κ is a positive definite kernel function or covariance function. We recommend reading [13] to understand the GP model. We constructed the GP model to forecast the next month’s oil price returns. We used the GauPro function from the ‘GauPro’ R package [14]. The GP is a stochastic process where every finite linear combination of random variables has a multivariate normal distribution [15]. The GP model accurately predicts and reports standard errors for predictions as well. There are two main parameters in the GP. The theta determines how strong the correlation is between points in each parameter. The nugget is a smoothing parameter that allows for noise and improves computational stability [12].

3.2. Copulas

The second forecasting method we used was the copula method, which does not require any assumptions, such as independence or normality. Additionally, by using the copula method, we could avoid multicollinearity and heteroscedasticity issues when we performed the regression analysis. This is the reason that the copula method has been popular in economics and finance. Modeling for the way that corporate bond yield spreads are affected by explanatory variables, such as equity volatility, interest rate volatility, r, slope, rating, liquidity, coupon rate, and maturity, was studied by [16]. The dependence at the mean of the joint distribution by using the Gaussian copula marginal regression method and the dependence structure at the tails by using various copula functions was also studied by [16]. Recently, the impacts of COVID-19 on the dependence structure of the stock market were considered by Gaussian copula regression modeling in [17], and size anomalies in U.S. bank stock returns were investigated by using panel copula in [18].
A d-dimensional copula C is a d-variate distribution function on the unit hypercube [0,1]d with uniform marginal distribution functions. Sklar’s theorem [19] provides a link between multivariate distributions and their associated copulas. It states that, for every multivariate random vector X = (X1, …, Xd)′ ~ F with marginal distribution functions F1, …, Fd, there exists a copula C associated with X, such that
F (x1, …, xd) = C (F1 (x1), …, Fd (xd)).
This decomposition of the multivariate distribution into its margins and its associated copula is unique when X is absolutely continuous. Its marginals are Uj = Fj(Xj), j = 1, …, d. The Uj are then uniformly distributed, and their joint distribution function is the copula C associated with X. The Gaussian copula, t-copula, and Archimedean copulas are popularly used in finance and economics. The Gaussian copula is constructed from a multivariate normal distribution by using the probability integral transform, and the t-copula is the copula that underlies the multivariate Student’s t-distribution. The most-used Archimedean copulas are the Clayton copula, the Frank copula, and the Gumbel copula. The Clayton copula is used to look at the negative tail dependence, whereas the Gumbel copula is used for the positive tail dependence. The Frank copula is a symmetric Archimedean copula with no tail dependence. Refer to [20,21] for a detailed examination of copulas, including examples of parametric copulas, especially bivariate copulas. All numerical calculations in this study are performed using the programing language R and the package VineCopula [22]. Vine copula is a flexible multivariate dependence structure model. Researchers have used vine copula in economics and statistics. Using VineCopula for the Granger causality test in mean has been proposed by [23]. We also recommend reading that paper to understand vine copula.

3.3. Deep Learning

The third forecasting method we used was the deep learning and neural network model. Deep learning and neural network models for quality control research of count data were developed by [24].
Deep learning is the most promising research trajectory for big and complex data analysis. Based on the idea of imitating the interactions between brain neurons, researchers developed deep learning and neural network methods. For detailed explanations of deep learning and neural network, we recommend reading [24]. To perform deep learning data analysis, we used the R package deepnet [25], which trains single or multiple hidden layers in neural networks using the back propagation (BP) neural network algorithm, which is a multi-layer feedforward network trained according to the error back propagation algorithm. BP is one of the commonly used neural network models. The BP is used to regulate the weight value and threshold value of the network to achieve the minimum error sum of square. For training data, we used 2 hidden layers (30, 30) of neurons, and the activation function of the hidden unit is “sigm” for the logistic function. The function of the output unit is “sigm”, and other conditions are set as default in the R package. We forecast the Brent and WTI variables with the given stock data by using the ‘nn.predict’ command in the R package “deepnet”.

3.4. Bayesian Variable Selection

The Bayesian variable selection method is an efficient statistical method for selecting the most influential explanatory variables. So, we used the Bayesian variable selection method of the objective Bayesian model proposed by [26]. The R package BayesVarSel, developed by [27], was used for data analysis in this paper. We used a Gibbs sampling scheme to determine the optimal model for the data set. We also set the possible prior distribution for regression parameters within each model as ‘Constant’ and set possible prior distribution over the model space as ‘gZellner.’ The number of iterations was 10,000 times after the 100th number of iterations at the beginning of the Markov Chain Monte Carlo (MCMC) that were dropped.

3.5. Nonlinear PCA

Kernel principal component analysis (PCA), a nonlinear PCA method, was developed by [28]. If we use a kernel as described in [28], we know that this procedure exactly corresponds to standard PCA in a high-dimensional feature space, so we do not need to perform expensive computations in that space. To extract five principal components in high-dimensional feature spaces using kernel PCA, we used the ‘kernlab’ R package [29], which provides the most popular kernel functions. We used the Gaussian Radial Basis kernel function with a hyperparameter: sigma = 0.2, which is the inverse kernel width for the radial basis kernel function.

4. Data Analysis

First, we want to visualize the relationship between the crude oil and stock data. Functional data analysis is a popular big data dimension-reduction method for time-course data. Functional principal component analysis (FPCA) is an effective clustering visualization analysis for time-course data. It provides a much more informative way of examining the sample covariance structure than does PCA, and it is an effective statistical method for explaining the variance of components because of the use of nonlinear eigenfunctions. The PCA only shows the clustering pattern of the whole data at a certain year or certain time, but FPCA is the more suitable method for showing the clustering pattern of the time series oil data over the given period.
Figure 2 plots the log returns of the 2 oil prices and 74 stock prices, along with a functional mean equation line of sample log returns. We observed a co-movement among oil and stock returns, as shown in Figure 2. We then investigated the relationship in predicting oil price returns using stock returns.
We also performed a functional principal component analysis (FPCA) by using the FDA R package to determine factors (i.e., principal components) that explain the relationship between crude oil and stock prices. Figure 3 shows a variance proportion of total variations in individual stock and oil price returns as explained by each principal component. Each component explains the percentage contribution to the whole density variation. The first principal component accounts for 24.6%, the second component explains 18.5%, and the third component accounts for 13.4% of the whole variance proportion of the FPCA. Note that the first 3 principal components account for 56.5% of the whole variability.
Through visualizations, we illustrated the relationship among the 2 major crude oil prices and the major S&P 500 stock prices. From the 2D FPCA plot in Figure 4, we were able to classify the 2 crude oil and 74 major S&P 500 stock prices into 4 groups.
The 2D FPCA plot captures a limited view of the clusters among major stock price and oil price returns. For a detailed visualization of the relationship between the 2 crude oil and 74 stock prices, we have provided a 3D FPCA plot with the first 3 main harmonics (principal components) in Figure 5. From Figure 5, we can observe that most of the major stock returns are clustered together, implying a co-movement of those return series in our sample period.
The observations presented in Figure 4 and Figure 5 motivated our research to investigate whether there is a more prominent relationship between crude oil prices and major S&P 500 stock prices.
We also selected the most influential stocks in relation to crude oil price returns using the Bayesian variable selection method of the objective Bayesian model proposed by [26]. Our empirical analysis restricted our attention to the stock prices of 74 firms which are considered major stocks in the S&P 500. We performed Bayesian variable selection for the BRENT and WTI oil price returns separately. Interestingly, five covariates were selected for each of the Brent and WTI oil price returns based on inclusion probabilities. The inclusion probabilities included the highest posterior probability and the median probability in the Bayesian model selection
CB, HD, HON, LIN, and PG returns were critical factors in determining Brent oil price returns. Consistently, HD, HON, LIN, PG, and UNP returns were critical factors in determining WTI oil price returns. The inclusion probabilities of all of the covariates for the Brent and WTI oil price returns are displayed in Table 2 and Table 3.
Following ref. [28] (See Section 3), we extract the first 5 principal components, using kernel PCA, to examine the power of forecasting oil price returns. We have used the first five principal components in this paper. The kernel function was used in training and predicting. This parameter can be set to any function of class kernel, which computes a dot product between two vector arguments. The corresponding component eigenvalues were 0.034684495, 0.009930120, 0.007794364, 0.006680240, and 0.005580485. Eigenvalues are used to find the proportion of the total variance explained by the components.
We also performed Gaussian process modeling to forecast oil prices in time t+1 using S&P 500 stock prices in time t. To perform the analysis, we separated the training and forecast data from our sample, where February 2001–March 2017 was the training set for the stock data, and March 2001–April 2017 was the training set for the oil data. Test data for the stocks were collected from April 2017 to September 2019, and test data for the Brent and WTI prices were collected from May 2017 to October 2019. First, we considered all 74 major S&P 500 stock prices in our model and performed the forecasting for BRENT and WTI separately. Before we applied data to the GP model, we used the Bayesian variable selection method to select the five most influential stocks in relation to the Brent and WTI oil price returns. Then, we performed oil price forecasting using the Gaussian process model with the 5 covariates we found, as discussed in Section 3 and as seen in Table 2 and Table 3. Table 4 and Table 5 show the GP for Brent with Covariates (CB, HD, HON, LIN, PG) and the GP for Brent with Covariates (HD, HON, LIN, PG, UNP).
We also compared the forecasting accuracy of the GP, deep learning, and vine copula methods by employing two measures for predictive accuracy.
We denoted the predicted values and actual values ( y t   and   y ^ t ), and t = 1,2, …, n. (n = the total number of test dataset).
Root-mean-square (prediction) error (RMSE):
RMSE = t = 1 n ( y t y ^ t ) 2 n
where we define the quadratic loss function to be LOSS 1 = ( y t y ^ t ) 2 .
Mean absolute error deviation (MAD):
MAD = t = 1 n | y t y ^ t | n
where we define the L1 loss function to be LOSS 2 = | y t y ^ t | .
The metric errors, such as the MAD and RMSE, were used to analyze the performance of the methods. The mean absolute error is not sensitive to outliers as they are weighted less than the other observations when comparing actual and predicted values. The root-mean-square error takes bias and variance into account, but it normalizes the units. Each method also produces plots based on the actual and predicted price returns for visualization purposes.
Table 6 shows the RMSE and MAD for forecasting BRENT and WTI log return prices. From Table 6, we can see that the RMSE and MAD decrease when we use the 5 selected covariates as compared to when we include all 74 covariates. We can also observe that forecasting Brent log return prices with major stock data using vine copula regression with NLPCA is superior to other methods, and that forecasting WTI log return prices with major stock data using Gaussian process and vine copula regression with NLPCA is superior to other methods in terms of the RMSE and MAD.
Table 7 shows a 95% Confidence Interval for the Loss functions (LOSS1 and LOSS2) of the BRENT and WTI log return prices. From Table 7, we can observe that the width and center of the 95% Confidence Interval for the Loss functions (LOSS1 and LOSS2) of the BRENT and WTI log return prices using vine copula regression with NLPCA is smaller than that of other methods. We can confirm that forecasting BRENT and WTI log return prices with major stock data using vine copula regression with NLPCA is superior to other methods.

5. Discussion

This study is the first to investigate the predictability of oil prices using S&P 500 stock prices by using vine copula regression. We found that the BVS suggests that five stocks have the largest impact on each oil price. The selected companies are related to the energy/chemical industry or to the large retail industry. The important, selected companies for the Brent log returns are Chubb Limited (CB) (a Switzerland-based holding insurance company), Home Depot (HD) (a home improvement retailer), Honeywell International Inc. (HON) (a software–industrial company that operates through four segments: Aerospace, Honeywell Building Technologies, Performance Materials and Technologies, and Safety and Productivity Solutions), Linde plc (LIN) (an industrial gas company), and the Procter & Gamble Company (PG) (focused on providing branded consumer packaged goods to consumers across the world). The important, selected companies for the WTI log returns are HD, HON, LIN, PG, and Union Pacific Corporation (UNP), which is a railroad operating company in the United States. We also found that forecasting Brent log return prices with major stock data using vine copula regression with NLPCA is superior to other methods in terms of the RMSE and MAD. We also found that forecasting WTI log return prices with major stock data using GP and vine copula regression with NLPCA is superior to other methods in terms of the RMSE and MAD. In conclusion, the stock prices of both the energy/chemical industry and the large retail industry are effective in forecasting oil prices. This study contributes to forecasting oil prices by using a vine copula regression with selected stock prices. In future research, we will consider the prices of commodities, such as gold, silver, copper, platinum, natural gas, wheat, corn, and soybeans in relation to important financial indices, including the Consumer Price Index, inflation, and cryptocurrency prices, using the GP, deep learning, and vine copula methods.

Author Contributions

Conceptualization, J.-M.K. and S. Kim; methodology, J.-M.K.; software, J.-M.K.; validation, J.-M.K. and H.H.H.; formal analysis, J.-M.K.; investigation, J.-M.K. and H.H.H.; resources, J.-M.K.; data curation, S.K.; writing—original draft preparation, J.-M.K., H.H.H. and S.K.; writing—review and editing, J.-M.K., H.H.H. and S.K.; visualization, J.-M.K.; supervision, J.-M.K.; project administration, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the three anonymous, respected referees for their suggestions, which have improved the article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Variable names.
Table A1. Variable names.
VariableName
BRENTBrent Crude
WTIWestern Texas Intermediate
AAPLApple, Inc.
ABTAbbott Laboratories
ACNAccenture Plc
ADBEAdobe, Inc.
ADPAutomatic Data Processing, Inc.
AMGNAmgen, Inc.
AMTAmerican Tower Corp.
AMZNAmazon.com, Inc.
AXPAmerican Express Co.
BAThe Boeing Co.
BACBank of America Corp.
BDXBecton, Dickinson & Co.
BKNGBooking Holdings, Inc.
BMYBristol-Myers Squibb Co.
CCitigroup, Inc.
CBChubb Ltd.
CELGCelgene Corp.
CMCSAComcast Corp.
CMECME Group, Inc.
COSTCostco Wholesale Corp.
CSCOCisco Systems, Inc.
CVSCVS Health Corp.
CVXChevron Corp.
DHRDanaher Corp.
DISThe Walt Disney Co.
DUKDuke Energy Corp.
ELThe Estée Lauder Companies, Inc.
FISFidelity National Information Services, Inc.
FISVFiserv, Inc.
GEGeneral Electric Co.
GILDGilead Sciences, Inc.
GSThe Goldman Sachs Group, Inc.
HDThe Home Depot, Inc.
HONHoneywell International, Inc.
IBMInternational Business Machines Corp.
INTCIntel Corp.
INTUIntuit, Inc.
JNJJohnson & Johnson
JPMJPMorgan Chase & Co.
KOThe Coca-Cola Co.
LINLinde Plc
LLYEli Lilly & Co.
LMTLockheed Martin Corp.
LOWLowe’s Cos., Inc.
MCDMcDonald’s Corp.
MDLZMondelez International, Inc.
MDTMedtronic Plc
MMM3M Co.
MOAltria Group, Inc.
MRKMerck & Co., Inc.
MSMorgan Stanley
NEENextEra Energy, Inc.
NFLXNetflix, Inc.
NKENIKE, Inc.
NVDANVIDIA Corp.
ORCLOracle Corp.
PEPPepsiCo, Inc.
PFEPfizer Inc.
PGProcter & Gamble Co.
QCOMQUALCOMM, Inc.
SBUXStarbucks Corp.
SYKStryker Corp.
TAT&T, Inc.
TMOThermo Fisher Scientific, Inc.
TXNTexas Instruments Incorporated
UNHUnitedHealth Group, Inc.
UNPUnion Pacific Corp.
UPSUnited Parcel Service, Inc.
USBU.S. Bancorp
UTXUnited Technologies Corp.
VZVerizon Communications, Inc.
WFCWells Fargo & Co.
WMTWalmart, Inc.
XOMExxon Mobil Corp.
Table A2. Sector Information for Companies.
Table A2. Sector Information for Companies.
Grp.SymbolSecurityGICS SectorGICS Sub Industry
1AAPLApple Inc.Information TechnologyTechnology Hardware, Storage and PeripheralsBrent
1AMTAmerican Tower Corp.Real EstateSpecialized REITs
1AMZNAmazon.com Inc.Consumer DiscretionaryInternet and Direct Marketing Retail
1BKNGBooking Holdings IncConsumer DiscretionaryInternet and Direct Marketing RetailBrent
1CELGCelgeneHealth CareBiotechnologyWTI
removed: 21 November 2019
1DHRDanaher Corp.Health CareHealth Care Equipment
1ELEstee Lauder Cos.Consumer StaplesPersonal ProductsWTI
1GILDGilead SciencesHealth CareBiotechnology
1HDHome DepotConsumer DiscretionaryHome Improvement RetailBrent, WTI
1INTCIntel Corp.Information TechnologySemiconductors
1LOWLowe’s Cos.Consumer DiscretionaryHome Improvement Retail
1MCDMcDonald’s Corp.Consumer DiscretionaryRestaurants
1NFLXNetflix Inc.Communication ServicesMovies and Entertainment
1NKENikeConsumer DiscretionaryApparel, Accessories, and Luxury Goods
2ABTAbbott LaboratoriesHealth CareHealth Care Equipment
2ACNAccenture plcInformation TechnologyIT Consulting and Other Services
2ADPAutomatic Data ProcessingInformation TechnologyInternet Services and Infrastructure
2AMGNAmgen Inc.Health CareBiotechnology
2BDXBecton DickinsonHealth CareHealth Care EquipmentBrent, WTI
2BMYBristol-Myers SquibbHealth CareHealth Care DistributorsBrent
2CBChubb LimitedFinancialsProperty and Casualty Insurance
2COSTCostco Wholesale Corp.Consumer StaplesHypermarkets and Super Centers
2CVSCVS HealthHealth CareHealth Care Services
2CVXChevron Corp.EnergyIntegrated Oil and GasBrent, WTI
2DISThe Walt Disney CompanyCommunication ServicesMovies and Entertainment
2DUKDuke EnergyUtilitiesElectric UtilitiesBrent
2FISVFiserv IncInformation TechnologyData Processing and Outsourced Services
2IBMInternational Business MachinesInformation TechnologyIT Consulting and Other ServicesWTI
2INTUIntuit Inc.Information TechnologyApplication Software
2JNJJohnson & JohnsonHealth CarePharmaceuticals
2KOCoca-Cola CompanyConsumer StaplesSoft Drinks
2LINLinde plcMaterialsIndustrial GasesBrent, WTI
2MDLZMondelez InternationalConsumer StaplesPackaged Foods and Meats
2MOAltria Group IncConsumer StaplesTobaccoBrent, WTI
2NEENextEra EnergyUtilitiesMulti-Utilities
2ORCLOracle Corp.Information TechnologyApplication Software
2PEPPepsiCo Inc.Consumer StaplesSoft DrinksWTI
2PGProcter & GambleConsumer StaplesPersonal ProductsBrent, WTI
2QCOMQUALCOMM Inc.Information TechnologySemiconductorsBrent, WTI
2UNPUnion Pacific CorpIndustrialsRailroadsWTI
2VZVerizon CommunicationsCommunication ServicesIntegrated Telecommunication Services
2WMTWalmartConsumer StaplesHypermarkets and Super CentersWTI
2XOMExxon Mobil Corp.EnergyIntegrated Oil and Gas
3CMCSAComcast Corp.Communication ServicesCable and Satellite
3GEGeneral ElectricIndustrialsIndustrial Conglomerates
3LLYLilly (Eli) & Co.Health CarePharmaceuticalsBrent
3LMTLockheed Martin Corp.IndustrialsAerospace and Defense
3MDTMedtronic plcHealth CareHealth Care Equipment
3MRKMerck & Co.Health CarePharmaceuticalsBrent, WTI
3PFEPfizer Inc.Health CarePharmaceuticalsWTI
3TAT&T Inc.Communication ServicesIntegrated Telecommunication Services
3UPSUnited Parcel ServiceIndustrialsAir Freight and Logistics
3USBU.S. BancorpFinancialsDiversified Banks
3WFCWells FargoFinancialsDiversified Banks
4ADBEAdobe Systems IncInformation TechnologyApplication Software
4AXPAmerican Express CoFinancialsConsumer Finance
4BABoeing CompanyIndustrialsAerospace and Defense
4BACBank of America CorpFinancialsDiversified Banks
4CCitigroup Inc.FinancialsDiversified Banks
4CMECME Group Inc.FinancialsFinancial Exchanges and Data
4CSCOCisco SystemsInformation TechnologyCommunications EquipmentBrent, WTI
4FISFidelity National Information ServicesInformation TechnologyData Processing and Outsourced ServicesBrent, WTI
4GSGoldman Sachs GroupFinancialsInvestment Banking and BrokerageBrent
4HONHoneywell Int’l Inc.IndustrialsIndustrial ConglomeratesBrent
4JPMJPMorgan Chase & Co.FinancialsDiversified Banks
4MMM3M CompanyIndustrialsIndustrial Conglomerates
4MSMorgan StanleyFinancialsInvestment Banking and BrokerageWTI
4NVDANvidia CorporationInformation TechnologySemiconductorsBrent, WTI
4SBUXStarbucks Corp.Consumer DiscretionaryRestaurants
4SYKStryker Corp.Health CareHealth Care Equipment
4TMOThermo Fisher ScientificHealth CareLife Sciences Tools and Services
4TXNTexas InstrumentsInformation TechnologySemiconductors
4UNHUnited Health Group Inc.Health CareManaged Health CareWTI
4UTXUnited TechnologiesIndustrialsAerospace and Defense

References

  1. Reboredo, J.C.; Ugolini, A. Quantile dependence of oil price movements and stock returns. Energy Econ. 2016, 54, 33–49. [Google Scholar] [CrossRef]
  2. Narayan, P.K.; Gupta, R. Has oil price predicted stock returns for over a century? Energy Econ. 2015, 48, 18–23. [Google Scholar] [CrossRef] [Green Version]
  3. Chen, S.-S. Forecasting crude oil price movements with oil-sensitive stocks. Econ. Inq. 2014, 52, 830–844. [Google Scholar] [CrossRef]
  4. Han, L.; Lv, Q.; Yin, L. Can investor attention predict oil prices? Energy Econ. 2017, 66, 547–558. [Google Scholar] [CrossRef]
  5. Zhang, Y.; Ma, F.; Wang, Y. Forecasting crude oil prices with a large set of predictors: Can LASSO select powerful predictors? J. Empir. Financ. 2019, 54, 97–117. [Google Scholar] [CrossRef]
  6. Kim, J.-M.; Jung, H. Dependence Structure between Oil Prices, Exchange Rates, and Interest Rates. Energy J. 2018, 39, 259–280. [Google Scholar] [CrossRef] [Green Version]
  7. Kamalov, F.; Smail, L.; Gurrib, I. Stock price forecast with deep learning. In Proceedings of the 2020 International Conference on Decision Aid Sciences and Application (DASA), Online, 8–9 November 2020; pp. 1098–1102. [Google Scholar]
  8. Anand, C. Comparison of Stock Price Prediction Models using Pre-trained Neural Networks. J. Ubiquitous Comput. Commun. Technol. (UCCT) 2021, 3, 122–134. [Google Scholar]
  9. Sharma, A.; Tiwari, P.; Gupta, A.; Garg, P. Use of LSTM and ARIMAX Algorithms to Analyze Impact of Sentiment Analysis in Stock Market Prediction. In Intelligent Data Communication Technologies and Internet of Things. Lecture Notes on Data Engineering and Communications Technologies; Hemanth, J., Bestak, R., Chen, J.Z., Eds.; Springer: Singapore, 2021; Volume 57. [Google Scholar] [CrossRef]
  10. Chen, Z. Gaussian Process Regression Methods and Extensions for Stock Market Prediction. Ph.D. Thesis, University of Leicester, Leicester, UK, 2017. [Google Scholar]
  11. Bisht, A.; Chahar, A.; Kabthiyal, A.; Goel, A. Stock Prediction using Gaussian Process Regression. In Proceedings of the 6th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 29–31 March 2022; pp. 693–699. [Google Scholar] [CrossRef]
  12. Erickson, C.B.; Ankenman, B.E.; Sanchez, S.M. Comparison of Gaussian process modeling software. Eur. J. Oper. Res. 2018, 266, 179–192. [Google Scholar] [CrossRef] [Green Version]
  13. Sacks, J.; Welch, W.J.; Mitchell, T.J.; Wynn, H.P. Design and Analysis of Computer Experiments. Stat. Sci. 1989, 4, 409–423. [Google Scholar] [CrossRef]
  14. Erickson, C. GauPro: Gaussian Process Fitting. R Package Version 0.2.2. 2017. Available online: https://CRAN.R-project.org/package=GauPro (accessed on 19 March 2021).
  15. Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; Available online: www.GaussianProcess.org/gpml (accessed on 19 March 2021).
  16. Kim, J.-M.; Kim, D.H.; Jung, H. Modeling Non-normal Corporate Bond Yield Spreads by Copula. N. Am. J. Econ. Financ. 2020, 53, 101210. [Google Scholar] [CrossRef]
  17. Kim, J.-M.; Jung, H. The impacts of COVID-19 on the dependence structure of the stock market. Appl. Econ. Lett. 2022, 1–6. [Google Scholar] [CrossRef]
  18. Kim, J.-M.; Jung, H.; Yang, B. A revisit to size anomalies in U.S. bank stock returns by panel copula. Appl. Econ. Lett. 2022, 29, 750–754. [Google Scholar] [CrossRef]
  19. Sklar, M. Fonctions de Répartition À N Dimensions et Leurs Marges; Université Paris 8: Paris, France, 1959. [Google Scholar]
  20. Joe, H. Multivariate Models and Dependence Concepts; Chapman & Hall: London, UK, 1997. [Google Scholar]
  21. Nelsen, R.B. An Introduction to Copulas, 2nd ed.; Springer: New York, NY, USA, 2006. [Google Scholar]
  22. Nagler, T.; Schepsmeier, U.; Stoeber, J.; Brechmann, E.C.; Graeler, B.; Erhardt, T. VineCopula: Statistical Inference of Vine Copulas. In R Package, VineCopula; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
  23. Jang, H.; Kim, J.-M.; Noh, H. Vine Copula Granger Causality in Mean. Econ. Model. 2022, 109, 105798. [Google Scholar] [CrossRef]
  24. Kim, J.-M.; Ha, I.D. Deep Learning-Based Residual Control Chart for Count Data. Qual. Eng. 2022, 34, 370–381. [Google Scholar] [CrossRef]
  25. Rong, X. Deep Learning Toolkit in R. In R Package, Deepnet; R Foundation for Statistical Computing: Vienna, Austria, 2015. [Google Scholar]
  26. Bayarri, M.J.; Berger, J.O.; Forte, A.; García-Donato, G. Criteria for Bayesian model choice with application to variable selection. Ann. Stat. 2012, 40, 1550–1577. [Google Scholar] [CrossRef] [Green Version]
  27. Garcia-Donato, G.; Forte, A.; Vergara-Hernández, C. Bayes Factors, Model Choice and Variable Selection in Linear Models. In R Package, BayesVarSel; R Foundation for Statistical Computing: Vienna, Austria, 2020. [Google Scholar]
  28. Schölkopf, B.; Smola, A.; Müller, K.-R. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Comput. 1998, 10, 1299–1319. [Google Scholar] [CrossRef] [Green Version]
  29. Karatzoglou, A.; Smola, A.; Hornik, K.; National ICT Australia; Maniscalco, M.A.; Teo, C.H. Kernel-Based Machine Learning Lab. In R Package, Kernlab; R Foundation for Statistical Computing: Vienna, Austria, 2019. [Google Scholar]
Figure 1. Brent and WTI Monthly Log Returns.
Figure 1. Brent and WTI Monthly Log Returns.
Axioms 11 00375 g001
Figure 2. A total of 76 Monthly Log Returns (blue color) and the functional mean equation line (red color).
Figure 2. A total of 76 Monthly Log Returns (blue color) and the functional mean equation line (red color).
Axioms 11 00375 g002
Figure 3. Variance proportions of FPCA. Note: Index: principal components (1st, 2nd, etc.); Varpro: variance proportions.
Figure 3. Variance proportions of FPCA. Note: Index: principal components (1st, 2nd, etc.); Varpro: variance proportions.
Axioms 11 00375 g003
Figure 4. The 2D FPCA. Note: Harmonic I: 1st component; Harmonic II: 2nd component.
Figure 4. The 2D FPCA. Note: Harmonic I: 1st component; Harmonic II: 2nd component.
Axioms 11 00375 g004
Figure 5. The 3D FPCA. Note: Harmonic I: 1st component; Harmonic II: 2nd component; Harmonic III: 3rd component.
Figure 5. The 3D FPCA. Note: Harmonic I: 1st component; Harmonic II: 2nd component; Harmonic III: 3rd component.
Axioms 11 00375 g005
Table 1. Summary statistics.
Table 1. Summary statistics.
MeanMedianMinimumMaximumSt.DSkewnessKurtosis
BRENT−0.003−0.015−0.1920.3260.0890.9684.476
WTI−0.002−0.014−0.2150.3360.0880.8974.728
Table 2. Bayesian Variable Selection: Brent price as a dependent variable.
Table 2. Bayesian Variable Selection: Brent price as a dependent variable.
Prob.HPMMPM
CB0.54 *
HD0.61**
HON0.65**
LIN0.73**
PG0.56**
* means “included”. This table presents the inclusion probabilities of five selected stocks using the Bayesian variable selection method, where the dependent variable is the Brent price. HPM stands for the highest posterior probability model, and MPM stands for the median probability model. Probabilities are estimated based on visited models.
Table 3. Bayesian Variable Selection: WTI price as a dependent variable.
Table 3. Bayesian Variable Selection: WTI price as a dependent variable.
Prob.HPMMPM
HD0.58 *
HON0.57 *
LIN0.61**
PG0.44*
UNP0.68**
* means “included”. This table presents the inclusion probabilities of five selected stocks using the Bayesian variable selection method, where the dependent variable is the WTI price. HPM stands for the highest posterior probability model, and MPM stands for the median probability model. Probabilities are estimated based on visited models.
Table 4. Reduced Models.
Table 4. Reduced Models.
Theta
CB6.1
HD0.87
HON2.07
LIN9.58
PG2.51
Nugget = 0.244
RMSE = 0.088
N = 170
GP for Brent with Covariates (CB, HD, HON, LIN, PG).
Table 5. Reduced Models.
Table 5. Reduced Models.
Theta
HD24.8
HON8.08
LIN25
PG112
UNP2.86
Nugget = 0.546
RMSE = 0.0797
N = 170
GP for Brent with Covariates (HD, HON, LIN, PG, UNP).
Table 6. RMSE and MAD for forecasting BRENT and WTI log return prices.
Table 6. RMSE and MAD for forecasting BRENT and WTI log return prices.
MethodDeep LearningGaussian ProcessVine Copula
RMSEALLBVSNLPCAALLBVSNLPCABVSNLPCA
Brent0.0870.0900.0880.1000.0780.0770.0790.072
WTI0.0850.0840.0840.0860.0800.0730.0770.069
MADALLBVSNLPCAALLBVSNLPCABVSNLPCA
Brent0.0750.0780.0760.0800.0600.0580.0600.060
WTI0.0720.0710.0720.0730.0640.0590.0610.057
Table 7. The 95% Confidence Interval for the Loss functions (LOSS1 and LOSS2) of BRENT and WTI log return prices.
Table 7. The 95% Confidence Interval for the Loss functions (LOSS1 and LOSS2) of BRENT and WTI log return prices.
LOSSDeep Learning
LOSS1ALLBVSNLPCA
Brent(0.0048, 0.0106)(0.0048, 0.0107)(0.0049, 0.0107)
WTI(0.0044, 0.0098)(0.0042, 0.0096)(0.0042, 0.0096)
LOSS2ALLBVSNLPCA
Brent(0.0588, 0.0926)(0.0591, 0.0929)(0.0591, 0.0930)
WTI(0.0549, 0.0884)(0.0540, 0.0870)(0.0540, 0.0871)
LOSSGaussian Process
LOSS1ALLBVSNLPCA
Brent(0.0048, 0.0151)(0.0023, 0.0099)(0.0025, 0.0101)
WTI(0.0040, 0.0109)(0.0022, 0.0095)(0.0029, 0.0109)
LOSS2ALLBVSNLPCA
Brent(0.0566, 0.1025)(0.0414, 0.0793)(0.0483, 0.0826)
WTI(0.0550, 0.0904)(0.0462, 0.0794)(0.0488, 0.0856)
LOSSVine Copula
LOSS1ALLBVSNLPCA
BrentNA(0.0018, 0.0108)(0.0022, 0.0082)
WTINA(0.0019, 0.0101)(0.0019, 0.0076)
LOSS2ALLBVSNLPCA
BrentNA(0.0402, 0.0796)(0.0448, 0.0753)
WTINA(0.0434, 0.0793)(0.0418, 0.0714)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kim, J.-M.; Han, H.H.; Kim, S. Forecasting Crude Oil Prices with Major S&P 500 Stock Prices: Deep Learning, Gaussian Process, and Vine Copula. Axioms 2022, 11, 375. https://doi.org/10.3390/axioms11080375

AMA Style

Kim J-M, Han HH, Kim S. Forecasting Crude Oil Prices with Major S&P 500 Stock Prices: Deep Learning, Gaussian Process, and Vine Copula. Axioms. 2022; 11(8):375. https://doi.org/10.3390/axioms11080375

Chicago/Turabian Style

Kim, Jong-Min, Hope H. Han, and Sangjin Kim. 2022. "Forecasting Crude Oil Prices with Major S&P 500 Stock Prices: Deep Learning, Gaussian Process, and Vine Copula" Axioms 11, no. 8: 375. https://doi.org/10.3390/axioms11080375

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop