Next Article in Journal
Self-Weighted LSE and Residual-Based QMLE of ARMA-GARCH Models
Next Article in Special Issue
Are GARCH and DCC Values of 10 Cryptocurrencies Affected by COVID-19?
Previous Article in Journal
Optimum Structure of Corporate Groups
Previous Article in Special Issue
Storming the Beachhead: An Examination of Developed and Emerging Market Multinational Strategic Location Decisions in the U.S.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Statistical Analysis Dow Jones Stock Index—Cumulative Return Gap and Finite Difference Method

1
Department of Accounting, Finance and Economics, Griffith University, Nathan 4111, Australia
2
Griffith Business School, Griffith University, Nathan 4111, Australia
*
Author to whom correspondence should be addressed.
J. Risk Financial Manag. 2022, 15(2), 89; https://doi.org/10.3390/jrfm15020089
Submission received: 21 November 2021 / Revised: 4 February 2022 / Accepted: 8 February 2022 / Published: 19 February 2022
(This article belongs to the Special Issue Emerging Markets)

Abstract

:
This study was motivated by the poor performance of the current models used in stock return forecasting and aimed to improve the accuracy of the existing models in forecasting future stock returns. The current literature largely assumes that the residual term used in the existing model is white noise and, as such, has no valuable information. We exploit the valuable information contained in the residuals of the models in the context of cumulative return and construct a new cumulative return gap (CRG) model to overcome the weaknesses of the traditional cumulative abnormal returns (CAR) and buy-and-hold abnormal returns (BHAR) models. To deal with the residual items of the prediction model and improving the prediction accuracy, we also lead the finite difference (FD) method into the autoregressive (AR) model and autoregressive distributed lag (ARDL) model. The empirical results of the study show that the cumulative return (CR) model is better than the simple return model for stock return prediction. We found that the CRG model can improve prediction accuracy, the term of the residuals from the autoregressive analysis is very important in stock return prediction, and the FD model can improve prediction accuracy.

1. Introduction

The success of investment strategies lies in accurately forecasting the future returns of each of the stocks in the markets place. Analysts have been analyzing all available data and trends in an attempt to identify mispriced securities in order to make profits that are in excess of the profits based on the riskiness of the assets. Practitioners and analysts in this instance believe that markets are not informationally efficient, and that they are able to analyze available data so as to make superior profits. Participants have used the concepts of abnormal return and cumulative abnormal return (CAR) to identify whether the stock prices will rise or decline immediately following some trading activities or events.
Studies by Barber and Lyon (1997), Ziobrowski et al. (2004), Zamanian et al. (2013), Lamba and Tripathi (2015), Mitesh et al. (2016), Campbell et al. (2021), and Hillegeist and Weng (2021) test the impact of trading activities and events on stock prices based on the buy-and-hold abnormal return (BHAR) or the cumulative abnormal return (CAR) models. More efficient stock prices benefit shareholders by reducing information imbalance and improving liquidity. However, there are two main disadvantages for the BHAR model. Firstly, the formula of the BHAR model cannot present a consistent forecasting result with a zero abnormal return at the end of the time period. Based on the BHAR formula B H A R t = t = 1 t ( r t E ( r t ) ) , if the terminal value r T E ( r T ) 0 , then lim t T B H A R t = r 1 r 2 r t 1 ( r T E ( r T ) ) 0 . Secondly, there is a compounding effect suggested by the BHAR model, but the expectation of E ( r t ) is a geometric average return, or not a compound average return.
This study aims to overcome the weaknesses of the BHAR model by developing a new model referred to as the cumulative return gap (CRG) model. Moreover, we use the cumulative return gap (CRG) model as an improvement of the CAR and BHAR models to predict stock returns by the autoregressive distributed lag (ARDL) model1. The new CRG model will present a consistent forecasting result with a zero abnormal return at the end of the time period for a cumulative average compound return: lim t T r t ( c , g a p ) = lim t T { t = 1 t r t ( t = 1 t r t ) t / T } . The empirical study shows that the cumulative return gap is better than the simple abnormal return model for stock return prediction.
A very important role of index forecasting is analyzing time series and building a proper forecasting model. Gijon et al. (2021) focus on how traffic forecasting in telecommunication networks can be treated as a time series analysis problem. Linear time series models, such as autoregressive integrated moving average models, capture trend and short-range dependencies in traffic demand. Studies by Lin et al. (2021) consider interval-valued series data, the analysis of which is conducted in an auto-interval-regressive model using statistics from normal distribution. Similarly, Maratkhan et al. (2021) propose a three-step model on the framework on financial time series to take advantage of the powerful models offered for image classification. However, they all overlooked the residual part of the selected autoregression mode and many researchers have preferred to assume that the residual part is zero (e.g., Devi et al. 2013; Ye and Wei 2015; Zaham and Kenett 2013). However, the residual item generally includes a lot of information, and it is easy to reduce the accuracy of forecasting results when the residual is assumed to be zero. As a result, the key to improving the forecasting accuracy by using autoregressive-related models2 is to forecast the trend of residual items. For this reason, we will analyze the residual part by the normalized probability cumulative distribution function (CDF) and finite difference (FD) methods. Moreover, we will carry out a comparison between residual = 0 and residual ≠ 0 to detect the importance of the residual in stock return forecasting.

2. Literature Review

According to the efficient market hypothesis (EMH), investors and traders in stock markets are not able to make abnormal positive returns by using publicly available information (Hu et al. 2021). However, abnormal phenomena in the financial markets have brought about an impact on classical financial theory. Such assumptions about abnormal positive returns are unrealistic, because people acting to maximize their personal utility in their public capacities as well as their private lives is the most fundamental principle. Ziobrowski et al. (2004) conducted an empirical analysis to test whether U.S. Senators have an informational advantage over other investors in terms of common stock investments by testing for abnormal returns during the period of 1993–1998, proving that stocks purchased by U.S. Senators earn statistically significant positive abnormal returns and outperform the market by 85 basis points per month on a trade-weighted basis. This result proves that U.S. Senators have an informational advantage compared to other investors. Zamanian et al. (2013) used the cumulative abnormal return (CAR) method to test long-run returns from 1 February 2006 to 29 February 2011 on the initial public offerings (IPO) of 18 public and 15 private companies in the Tehran Stock Exchange (TSE), and proved that corporate ownership has no significant impact on the returns of IPOs in the short run or long run. Lamba and Tripathi (2015) used the concepts of average abnormal return (AAR) and cumulative average abnormal return (CAAR) to detect whether Indian firms are able to create value for shareholders after cross-border mergers and acquisitions. Their results proved that acquisitions do not create value to Indian acquiring companies in the long run, and abnormal returns and cumulative abnormal returns have significantly deteriorated since the period of 1998–2009; this value destruction could be attributed to the financial crisis. Bharandev and Rao (2021) examined the stock market and trading volume reaction with respect to the information content of 34 selected companies’ stock splitting announcements between 1 January to 31 July 2016; the average abnormal return (AAR) and cumulative average abnormal return (CAAR) were used to test whether an opportunity was available to make abnormal returns, and their study proved that no one can obtain abnormal returns from the Indian stock market, but stock splitting announcements have a negative impact on stock returns.
Some studies (Ritter 1991; Barber and Lyon 1997; Mohit and Aggarwal 2014; Mitesh et al. 2016) defined two kinds of abnormal return, CAR3 and BHAR4.
The difference between CAR and BHAR is that CAR ignores compounding, but BHAR includes the effect of compounding. Barber and Lyon (1997) proved that the empirical analysis of CAR may result in more bias than BHAR. In fact, there are two disadvantages for buy-and-hold abnormal returns (BHAR), even though they proved that the empirical results of BHAR are much better than CAR. When the variables r 1 , r 2 , , r t are the returns between the time periods of t [ 0 , 1 ] , [ 1 , 2 ] , , [ t 1 , t ] , the expression of t = 1 t r t represents the compounding return of the stock during the time period t [ 0 , t ] . When we consider the conditional compounding effect, the conditional expected value is E ( t = 1 t r t ) = t = 1 t 1 r t E ( r t ) , then the buy-and-hold abnormal returns will be B H A R t | F t 1 = E ( t = 1 t r t t = 1 t E ( r t ) ) | F t 1 = r 1 r 2 r t 1 E ( r t E ( r t ) | F t 1 ) . However, if E ( r T E ( r T ) ) 0 , there is lim t T B H A R t | F t 1 = lim t T { r 1 r 2 r t 1 E ( r t E ( r t ) | F t 1 ) } 0 , which cannot protect us by obtaining a consistent forecasting result with a zero abnormal return at the end of the time period. Another disadvantage of BHAR is that, theoretically, the expectation of E ( r t ) is a geometric average return, but not a compound average return. This is not consistent with the main assumption of the compounding effect suggested by the BHAR model.
To overcome these weaknesses of the traditional cumulative abnormal returns (CAR) and buy-and-hold abnormal returns (BHAR) models, we define a new cumulative return gap (CRG) model. The principal of our cumulative return gap (CRG) model is similar to the concept of buy-and-hold abnormal returns (BHAR).
Assume the time variable is t [ 0 , T ] , where T is the biggest width of the time window; variable p t represents the stock price, p 0 = p 1 ; the return index r t is defined as r t = p t / p t 1 , r 1 = 1 ; the new defined cumulative return index is defined as r t ( c ) = t = 1 t r t , r 1 ( c ) = r 1 = 1 ; the average compound return of the cumulative return r t ( c ) is defined as r t ( a v e ) = ( r t ( c ) ) 1 t , r T ( a v e ) = ( r T ( c ) ) 1 T ; and the average cumulative compound return index of the cumulative return index r t ( c ) is defined as r t ( c , a v e ) = ( r T ( a v e ) ) t . Based on these assumed variables, the cumulative return gap (CRG) is defined as
r t ( c , g a p ) = r t ( c ) r t ( c , a v e ) = t = 1 t r t ( r T ( a v e ) ) t = t = 1 t r t ( ( r T ( c ) ) 1 T ) t = t = 1 t r t ( t = 1 T r t ) t T
When comparing our new concept of the cumulative return gap (CRG) with the concept of buy-and-hold abnormal returns (BHAR), CRG will provide us with a consistent forecasting result with a zero abnormal return at the end of the time period.
lim t T r t ( c , g a p ) = lim t T { t = 1 t r t ( t = 1 t r t ) t T } = 0
Furthermore, the average compound return r T ( a v e ) = ( r T ( c ) ) 1 T is a constant during the time period t [ 0 , T ] , which is also a compound return.
Traditionally, the cumulative abnormal return (CAR) and BHAR models are used to study the long-term behavior of stock returns during a particular period, such as several days, several months, or several years. However, there are fewer studies using the cumulative abnormal return model to forecast stock returns. Our research will fill the research gap by using the cumulative return gap (CRG) model as an improvement of the cumulative abnormal return model to forecast stock returns.
The aim of prediction is to look for future information on the basis of previous information. Based on historical events, prediction is aimed towards forecasting the events which may happen in the future. Shen et al. (2012) believe that a single stock price can be directly predicted by its autocorrelation, because the performance of a stock market prediction heavily depends on the correlation between the data used. If the trend of a stock price is always an extension of yesterday, or if a time series of the stock market price has a high autocorrelation, the accuracy of prediction should be fairly high. The results of Shen et al. (2012) prove that autocorrelation is a very useful tool for predicting a single stock price; however, their analysis does not mention the disturbance of the regression model’s residual noise, which may influence the accuracy of the prediction values.
A very important part of forecasting is analyzing the time series and building a proper forecasting model, especially when the initial stochastic time series of the return is nonstationary in nature and can be analyzed based on the selection of any method (Rabbani et al. 2021). When autoregressive-related models are used to analyze time series, such as in the ES, AR, MA, ARMA, ARIMA and SARMA models, many researchers prefer to assume that the residual item is zero with the absolute lowest error.
Usually, A R I M A ( p , d , q ) , also known as the Box–Jenkins method, is used to remove the trend of the series by differencing so that a stationary series is obtained by transforming a non-stationary series (Dimri et al. 2020). Here, the parameter p represents the order of the autoregressive process, such as a model of A R ( p ) ; the parameter q represents the order of the moving average process, such as a model of M A ( q ) ; and the parameter d represents the order of differencing of the time series. Samrad et al. (2021) suggest that the ARIMA modelling approach, according to various measures, is the most effective and best model for predicting trend stock prices by keeping the residuals at zero. Zaham and Kenett (2013) also use ARIMA models such as A R I M A ( 1 , 1 , 1 ) and A R I M A ( 2 , 1 , 2 ) to forecast the stock prices by letting residuals be zero. Ye and Wei (2015) think that since the ARIMA model is a typical linear time series model, it is not easy to represent the nonlinear dynamic system of stock markets; if the ARIMA model is used to predict complex time series such as stock prices, the forecasting result will be not ideal. Skare et al. (2021) preferred to use the autoregressive model (AR) and the vector autoregressive model (VAR) to perform the purpose of forecasting. The autoregressive model is a good model when the dependent variable is a univariate; however, when the number of dependent variables is more than one, then the vector autoregressive model has an advantage over the former.
The residual item generally includes a lot of private information and some public information such as economic shocks, and it is easy to reduce the accuracy of forecasting when the residual is assumed to be zero, as Dimri et al. (2020) have done. Because the auto-regressive-related models such as SE, AR, MA, ARMA, and ARIMA are based on linear models, most of the nonlinear information is composited into the residual items. If the residual items are simply assumed to be zero, most of the nonlinear information will be removed, and the accuracy of forecasting will be disturbed. Even though the moving average (MA) model considers the influence of residual lagged items, it is based on linear models and not on nonlinear models. If the residual items are mostly not considered, the auto-regressive-related models will not be able to significantly improve the accuracy of forecasting within the models. The key of improving the forecasting accuracy by using auto-regressive-related models is to forecast the trend of residual items. For this reason, we will try to improve the forecasting accuracy by forecasting the residual items. The probability method and finite difference (FD) method will be used to deal with the residual items.
Thus, for this study, we chose the autoregressive distributed lag (ARDL) model as the regression model to predict the underlying stock returns. The ARDL model was first defined by Pesaran and Shin (1999). The purpose of the ARDL model is to represent the long-term relationships between variables in econometric analysis.
The general ARDL(p, q) model can be defined as
y t = ω 0 + ω 1 t + i = 1 p α i y t i + β 0 x t + j = 1 q β i x t j + u t
The ARDL model represents the long-term relationship between the variable y t and x t , where x t is the k   ( k > 1 ) dimensional order 1 difference stationary variable ( I ( 1 ) for short, meaning it has an order 1 unit root) or an order 0 difference stationary variable ( I ( 0 ) for short, meaning the level variable is stationary). If the variable x t is the order 1 difference stationary variable, even though it has an order 1 unit root, the vector autoregressive process in Δ x t is stable.
Wang et al. (2021) have approved that the ARDL model is good for dealing with the time series econometric variables; additionally, the ARDL model has the advantage of predicting consistent estimates of the long-run coefficients and cointegrating relationships between variables that are asymptotically normal but irrespective of whether the underlying stock prices’ regressions are I(1) or I(0).
Li et al. (2020) preferred to use the autoregressive distributed lag (ARDL) model proposed by Shin et al. (2014) for prediction, because the ARDL model has three important stages that include changes in the policy rate: first, it can be applied regardless of what levels of stationary or what orders of unit root the underlying variables; second, ARDL is suitable for both big and small samples; and third, the appropriate order modification of ARDL is sufficient for simultaneously correcting the residual serial correlation and the problem of endogenous variables. In this paper, the prediction model for the cumulative return index r t ( c ) will be defined by the following ARDL-CRG model
r t ( c ) = k 0 + k 1   r t ( c , a v e ) + β ln t + i = 1 p α i   r t i ( c , g a p ) + a t
For carrying out a comparison, the AR model is also usually used to build the prediction model
r t = α 0 + α 1 r t 1 + + α p r t p + a t
For both the ARDL and AR models, because the residual item a t is very important for building prediction models, we will borrow the finite difference method to deal with the residual item a t . For dealing with the residual variable a t , we will focus on dealing with the probability variable q t . The relationship between a t and q t is
a t = ln ( 1 q t 1 )   or   q t = 1 1 + e a t
There are seldom studies that use the finite difference method to deal with the residual items of a t . We will apply different orders of the finite difference to the probability variable q t .

3. Methodology and Data

3.1. Data

A daily closing price index of the Dow Jones Industry Index is used as the time series samples (Ranco et al. 2015; Stekelenburg et al. 2015). The time intervals are listed within the period of 1 April 2010 to 8 July 2016.5 The total transaction days, or the observations, are 1531 days. The daily closing price index is simply gathered from the calendar dates when the US stock markets were open. Very few data were canceled if the data were from a special holiday when the US stock markets were not open. All of the calculations in this paper will be conducted using EViews 8.0 statistical software. The variable r t is defined as a daily closing return index of the Dow Jones Industry Index. Table 1 shows the main variables used in this paper. Variables will be explained in detail when they are introduced in this paper.

3.2. Cumulative Return

Based on the definition of a one-period simple return, the time-varying variable r t can represent the simple return for a holding asset from the time interval [ t 1 , t ] (Tsay 2005).
When the time interval is defined as t [ 0 , t ] , the cumulative return index r t ( c ) of an underling stock can be rewritten as6
r t ( c ) = { r t 1 ( c ) r t , t = 1 , 2 , , T 1 t = 0   where   r t = { p t p t 1 , t = 1 , 2 , , T 1 t = 0
Then, for representing the gross return between a long time interval [ 0 , t ] , there is a relationship between the cumulative return r t ( c ) and the simple return r t , which can be written as
r t ( c ) = r t r t 1 r 2 r 1
The time variable T is the terminal point of the time period. When the simple return r t is based on the time interval t [ t 1 , t ] , the cumulative return r t ( c ) is based on the time interval t [ 0 , t ] .

3.3. Cumulative Average Compound Return and the Cumulative Return Gap

The principle of this study is to use the predicted value of the cumulative return r t ( c ) to obtain the forecasting value of a simple return r t by Formula (2). Thus, we need a deeper understanding of r t ( c ) for several parts. For this paper, we define two new factors r t ( c , a v e ) and r t ( c , g a p ) , which represent the cumulative average compound return (CACR) and the cumulative return gap (CRG), respectively.
If t = T is the final value of the cumulative return r t ( c ) , then r T ( a v e ) can represent the average change of r t ( c ) in a constant compound average rate
r t ( a v e ) = ( r t ( c ) ) 1 t ,       t = 1 , 2 , , T
As a result, the cumulative average compound return (CACR) will be defined as
r t ( c , a v e ) = ( r T ( a v e ) ) t = ( r T ( c ) ) t T ,       t = 1 , 2 , , T
where the curve of the cumulative return index r t ( c ) will move around the curve of the cumulative average compound return index r t ( c , a v e ) .
Then, the gap between the cumulative return r t ( c ) and the cumulative average compound return r t ( c , a v e ) can be represented as r t ( c , g a p )
r t ( c , g a p ) = r t ( c ) r t ( c , a v e ) ,       t = 1 , 2 , , T
The variable r t ( c , g a p ) represents the cumulative return gap, which can be seen as a cumulative risk premium of a risk asset. The curve of the cumulative return gap index r t ( c , g a p ) will move around the horizontal line. After carrying out the replacement of r t ( c ) r t ( c , a v e ) , the characteristics of the cumulative risk premium r t ( c , g a p ) during a long term period of t [ 0 , t ] are as similar as the characteristics of the risk premium r t r f in the CAPM model during a short-term period of t [ t 1 , t ] .7

3.4. ARDL-CRG Model

The first prediction model for this paper is to transfer the residual term of the ARDL regression model from a quantile to a probability. Once we have the factors of r t ( c , a v e ) and r t ( c , g a p ) , we can run an ARDL regression model to present the cumulative return r t ( c ) . Because the cumulative return gap (CRG) is introduced to the ARDL model, this model can be defined as an ARDL-CRG model
r t ( c ) = k 0 + k 1   r t ( c , a v e ) + β ln t + i = 1 p α i   r t i ( c , g a p ) + a t   where   E ( a t | F t 1 ) = 0
Here, the residual variable a t can be seen as a quantile of a probability variable q t . The probability of the cumulative distribution function (CDF) (Figure 1) can be defined as
F ( x ) = 1 1 + e x ,   x ( , + ) ,   lim x F ( x ) = 0 ,   lim x + F ( x ) = 1 ,   F ( x ) ( 0 , 1 )
When a t ( , + ) , assume variable q t represents the cumulative probability of the residual variable a t , then the probability function is q t = F ( a t ) and q t ( 0 , 1 ) , then
a t = ln ( 1 q t 1 )
Thus, the cumulative return prediction model of ARDL-CRG will be rewritten as a new type as follows:
r t ( c ) = k 0 + k 1   r t ( c , a v e ) + β ln t + i = 1 p α i   r t i ( c , g a p ) ln ( 1 q t 1 )
It is clear that the ARDL-CRG model has two types: one directly uses the residual item a t , and the other indirectly uses the probability item q t . Both are ARDL-CRG models.
Because the value interval of the function F ( x ) is ( 0 , 1 ) when x ( , + ) , it is a cumulative probability function. It is easy to transfer the residual item to a probability item.

3.5. ARDL-CRG-GARCH Model

The second prediction model for this paper is to use the GARCH8 model to present the residual term of ARDL regression. The conditional volatility in the GARCH (1,1) model is defined as
σ t 2 = ω + α   a t 1 2 + β σ t 1 2 ,     a t = σ t ε t ,     Var ( a t ) = σ a 2
Theoretically, the random variable ε t ~ N ( 0 , 1 ) is distributed as a standardized normal distribution. However, because the regressive error is unavoidable, for conducting regressive estimation accurately, assume the random variable ε t ~ N ( μ 0 , σ 0 2 ) , then define a standardized random variable e t as
e t = ε t μ 0 σ 0 ,     ε t = μ 0 + σ 0 e t
Thus, the residual variable a t can be defined as
a t = σ t ε t = σ t ( μ 0 + σ 0 e t ) , e t ~ N ( 0 , 1 )
Again, we can transfer the standardized residual item e t to a probability variable q t = F ( e t ) , and the inverse relation between them is e t = F ( q t ) = ln ( 1 q t 1 ) . If the dynamic volatility variable σ t is introduced to the ARDL-CRG model, then we can obtain an ARDL-CRG-GARCH model, which has two types, as follows:
r t ( c ) = k 0 + k 1 r t ( c , a v e ) + β ln t + i = 1 p α i r t i ( c , g a p ) + σ t ( μ 0 + σ 0 e t )
r t ( c ) = k 0 + k 1   r t ( c , a v e ) + β ln t + i = 1 p α i   r t i ( c , g a p ) + σ t ( μ 0 σ 0 ln ( 1 q t 1 ) )
The ARDL-CRG-GARCH model uses a standardized residual variable e t to represent the residual of the model, and then transfers this standardized residual variable to a probability variable q t to represent the residual of the model.

4. Empirical Results

4.1. Return Index

Figure 2 shows the moving curves of the return index r t of the US Dow Jones Industry Index between 1 April 2010 and 8 July 2016. The sample size is 1531, and the time interval is t [ 0 , T ] , T = 1531 . There are three cluster vibrations during 2010, 2011, and 2015. These cluster vibrations can be expressed by a GARCH model.
Here we can see that the term of the return index r t shows the moving trend of a stock price during a short period t [ t 1 , t ] . When the return index r t is defined as r t = p t p t 1 , it shows that the trend of a stock price is down when p t < p t 1 or up when p t > p t 1 . The purpose of forecasting is to predict the moving trend of a stock price in the next time t when the information set F t 1 = { r 1 , r 2 , , r t 1 } is already known.

4.2. Autocorrelation Test for the Return Index

Table 2 lists the test results of the autocorrelations, Ljung and Box (1978) statistics and related probabilities for the return index r t . It shows a significant autocorrelation between r t and r t 1   ( t = 1 , 2 , , t 1 ) at the probability degree levels of 5% and 1%.
These autocorrelations are better expressed in an A R ( p ) model as r t = α 0 + α 1 r t 1 + + + α p r t p + a t . Generally, when defined as μ t = α 0 + α 1 r t 1 + + + α p r t p , the A R ( p ) model will be r t = μ t + a t . If the information set F t 1 = { r 1 , r 2 , , r t 1 } is already known, then E ( r t | F t 1 ) = μ t , E ( a t | F t 1 ) = 0 , Var ( r t | F t 1 ) = Var ( r t | F t 1 ) = σ t 2 . Here, the expectations and variances are conditional expectations and conditional variances.
For building a stable autoregressive model, it is necessary to test if there are any unit roots for the time series of the return index. By using an ADF unit root test, Table 3 has listed the t-statistic values and probabilities under the three criteria of AIC, SIC and HQC. We can see that there are not any unit roots at the three levels’ time series of level variables, first-order difference variables, and second-order difference variables. Because the return index r t is an autocorrelation time series, and it does not have any unit roots, we will build an autoregressive model to carry out forecasting tasks. Figure 3 shows the residual item a t from AR model of r a , t = μ a , t + a t .

4.3. AR(p) Prediction Model for Return Index

Because the return index r t is an autocorrelation time series, and it does not have any unit roots, we will build an autoregressive model to carry out forecasting tasks.
After assessing many autoregressive models, next A R ( 5 ) model is selected
r a , t = μ a , t + a t
μ a , t = 1.190795 0.045747 r t 1 + 0.020058 r t 2 0.086443 r t 3 + 0.005487 r t 4 0.083679 r t 5
R 2 = 0.0181 ,   S . E . = 0.0095 ,   A I C = 6.4626 ,   S I C = 6.4416
This A R ( 5 ) model has a very small determined coefficient as R 2 = 0.0181 . When define r a , t = μ a , t + a t , the residual item a t may include too much information about the return index r t . Figure 3 shows the residual item a t form AR model of r a , t = μ a , t + a t . The correlation between the return index r t and its residual item a t of A R ( 5 ) model is 0.9908. The correlation is as high as Corr ( r t , a t ) 0.9908 . For this reason, it is very important to estimate the values of residual items.

4.4. Direct Prediction of the Return Index Based on the Finite Difference Method and the AR-FD Model

For improving the prediction accuracy of the AR model, we will introduce the finite difference (FD) method to the AR model and build a new AR-FD model.
Because the residual item a t has a strong impact on the prediction value of the return index r t , it is important to predict the trend of the residual item a t . When we define
q t ( a ) = 1 1 + e a t ,   or   a t = ln ( 1 q t ( a ) 1 )
Then, the variable q t ( a ) can be seen as a probability of a t . Assume the first-order difference is d q t ( a ) = q t ( a ) q t 1 ( a ) , the second-order difference is d 2 q t ( a ) = d q t ( a ) d q t 1 ( a ) , the third-order difference is d 3 q t ( a ) = d 2 q t ( a ) d 2 q t 1 ( a ) , and the n t h o r d e r difference is d n q t ( a ) = d n 1 q t ( a ) d n 1 q t 1 ( a ) , and if the level variable q t ( a ) is not the autocorrelation time series, the n t h o r d e r difference d n q t ( a ) may be the autocorrelation time series, then the higher degree t h o r d e r difference d n q t ( a ) can be expressed by a regression model as d n q t ( a ) = ω + α 0 q t 1 ( a ) + α 1 d q t 1 ( a ) + + α n 1 d n 1 q t 1 ( a ) + β 1 d n q t 1 ( a ) + + β p d n q t p ( a ) + c t .
The t h -order difference d n q t ( a ) can also be expressed by a regression model as
d n q t ( a ) = ω + i = 0 n 1 α i d i q t 1 ( a ) + j = 1 p β j d n q t j ( a ) + c t
Here, the variable c t is the residual item of the regression model. Then, according to the definition of the difference method, the probability q t ( a ) can be predicted by
q t ( a ) = q t 1 ( a ) + d q t 1 ( a ) + d 2 q t 1 ( a ) + + d n 1 q t 1 ( a ) + d n q t ( a )
It is important to determine a proper order number, which depends on both the degree of autocorrelation and the probability degree of the residual.
Table 3 has listed the autocorrelation (AC) values and Ljung and Box (1978) statistics and probabilities of the time series differences. When the difference orders of the probability time series q t ( a ) are increased, the autocorrelation degrees of the related time series will be increased. The autocorrelation of the level time series q t ( a ) is A C ( 1 ) = 0.001 , which is quite low and the level time series q t ( a ) cannot be called an autocorrelation time series. The autocorrelation of the first-order time series d q t ( a ) is A C ( 1 ) = 0.500 , which is much more than the autocorrelation of the level time series q t ( a ) . The autocorrelations of the second-order, third-order, and fourth-order difference time series d 2 q t ( a ) , d 3 q t ( a ) , d 4 q t ( a ) are A C ( 1 ) = 0.667 , A C ( 1 ) = 0.750 , and A C ( 1 ) = 0.800 , respectively. Obviously, the second-order, third-order, and fourth-order difference time series have a higher degree of autocorrelation.
The probability prediction models from the second-order difference are
d 2 q t ( a ) = 0.500528 1.001057 q t 1 ( a ) 0.999412 d q t 1 ( a ) R 2 = 0.8332 ,   S . E . = 0.0023 ,   A I C = 9.2381 ,   S I C = 9.2276 q t ( a ) = q t 1 ( a ) + d q t 1 ( a ) + d 2 q t 1 ( a )
The probability prediction models from the third-order difference are
d 3 q t ( a ) = 0.502031 1.004064 q t 1 ( a ) 0.993302 d q t 1 ( a ) 1.002993 d 2 q t 1 ( a ) R 2 = 0.9499 ,   S . E . = 0.0023 ,   A I C = 9.2362 ,   S I C = 9.2222 q t ( a ) = q t 1 ( a ) + d q t 1 ( a ) + d 2 q t 1 ( a ) + d 3 q t 1 ( a )
The probability prediction models from the fourth-order difference are
d 4 q t ( a ) = 0.502080 1.004163 q t 1 ( a ) 0.993244 d q t 1 ( a ) 1.003043 d 2 q t 1 ( a ) 1.000014 d 3 q t 1 ( a ) R 2 = 0.9857 ,   S . E . = 0.0023 ,   A I C = 9.2343 ,   S I C = 9.2168 q t ( a ) = q t 1 ( a ) + d q t 1 ( a ) + d 2 q t 1 ( a ) + d 3 q t 1 ( a ) + d 4 q t 1 ( a )
After obtaining the prediction value of q t ( a ) , the prediction value of the return index r t will be estimated by
r a , t = μ a , t ln ( 1 q t ( a ) 1 )
By applying the equation, it is easy to obtain the prediction value of the return index r t .
Assume variable μ a , t is the conditional mean from the autoregressive model r a , t = μ a , t + a t when a t = 0 or q t ( a ) = 0.5 . When a t 0 , assume variable r a , t ( 2 ) represents the prediction index of the return index r t from the second-order difference variable d 2 q t ( a ) ; variable r a , t ( 3 ) represents the prediction index of the return index r t from the third-order difference variable d 3 q t ( a ) ; and variable r a , t ( 4 ) represents the prediction index of the return index r t from the fourth-order difference variable d 4 q t ( a ) .
Figure 4 shows the return index r t and its prediction values of r a , t ( 2 ) , r a , t ( 3 ) , and r a , t ( 4 ) from the second-, third-, and fourth-order differences.
Figure 5 shows the prediction values of r a , t ( 2 ) , r a , t ( 3 ) , and r a , t ( 4 ) under the second-, third-, and fourth-order differences, and the conditional mean μ a , t of r t .
The correlation between the return index r t and its conditional mean μ a , t , and its prediction values of r a , t ( 2 ) , r a , t ( 3 ) , and r a , t ( 4 ) are 0.136470, 0.136472, 0.136519, 0.13652, respectively.
For improving the correlations between the return index r t and its prediction values, we will try to improve the lag order of the finite order differences’ variables.
d 2 q t ( a ) = 0.5073 1.0147 q t 1 ( a ) 0.9845 d q t 1 ( a ) 0.0053 d 2 q t 1 ( a ) + 0.0023 d 2 q t 2 ( a ) + 0.0091 d 2 q t 3 ( a ) + 0.0147 d 2 q t 4 ( a ) + 0.0161 d 2 q t 5 ( a ) + 0.0131 d 2 q t 6 ( a ) + 0.0024 d 2 q t 7 ( a ) + 0.0316 d 2 q t 8 ( a ) + 0.0664 d 2 q t 9 ( a ) + 0.0914 d 2 q t 10 ( a ) 0.0720 d 2 q t 11 ( a ) + 0.0454 d 2 q t 12 ( a )
d 3 q t ( a ) = 0.5184 1.0368 q t 1 ( a ) 0.6148 d q t 1 ( a ) 3.7711 d 2 q t 1 ( a ) + 2.4176 d 3 q t 1 ( a ) + 2.0949 d 3 q t 2 ( a ) + 1.8025 d 3 q t 3 ( a ) + 1.5403 d 3 q t 4 ( a ) + 1.3043 d 3 q t 5 ( a ) + 1.0923 d 3 q t 6 ( a ) + 0.8974 d 3 q t 7 ( a ) + 0.6973 d 3 q t 8 ( a ) + 0.4900 d 3 q t 9 ( a ) + 0.2860 d 3 q t 10 ( a ) + 0.1297 d 3 q t 11 ( a ) + 0.0279 d 3 q t 12 ( a )
d 4 q t ( a ) = 0.5376 1.0752 q t 1 ( a ) 0.0179 d q t 1 ( a ) 8.0777 d 2 q t 1 ( a ) + 30.61 d 3 q t 1 ( a ) + 25.44 d 4 q t 1 ( a ) + 20.13 d 4 q t 2 ( a ) 15.59 d 4 q t 3 ( a ) 11.77 d 4 q t 4 ( a ) 8.60 d 4 q t 5 ( a ) 6.01 d 4 q t 6 ( a ) 3.95 d 4 q t 7 ( a ) 2.38 d 4 q t 8 ( a ) 1.27 d 4 q t 9 ( a ) 0.57 d 4 q t 10 ( a ) 0.19 d 4 q t 11 ( a ) 0.04 d 4 q t 12 ( a )
After improving the lag order of the finite order differences’ variables, when a t 0 , assume variable r a , t ( 2 ) represents the prediction index of the return index from r t the second-order difference variable d 2 q t ( a ) ; variable r a , t ( 3 ) represents the prediction index of the return index r t from the third-order difference variable d 3 q t ( a ) ; and variable r a , t ( 4 ) represents the prediction index of the return index r t from the fourth-order difference variable d 4 q t ( a ) . Then, there is a correlation between the return index r t and its conditional mean μ a , t , and its prediction values of r a , t ( 2 ) , r a , t ( 3 ) , r a , t ( 4 ) are 0.136470, 0.156046, 0.158559, 0.163743, respectively. Obviously, improving the lag order of the finite order differences’ variables can improve the correlations between the return index and its prediction value a lot.

4.5. Return Index Prediction Based on the Second-Order Difference and the AR-FD Model

From the above empirical analysis, we find that if we increasingly improve the order of the finite order differences’ variables, the correlations between the return index and its prediction value cannot increase more and more. We will focus on conducting an analysis on the second-order finite difference regression model and test if higher lags of the probability variable q t ( a ) can lead to a higher correlation between the real return index r t and its prediction value.
The second-order finite difference d 2 q t ( a ) can be expressed as
d 2 q t ( a ) = ω + α 0 q t 1 ( a ) + α 1 d q t 1 ( a ) + j = 1 p β j d 2 q t j ( a ) + c t
When the lag order of the probability variable q t ( a ) is defined as p = 3 , 50 , 100 , 150 , 200 , 300 , 400 , 500 , 600 , 700 , we can obtain ten different prediction models of d 2 q t ( a ) . According to the equation of q t ( a ) = q t 1 ( a ) + d q t 1 ( a ) + d 2 q t ( a ) , r a , t = μ a , t ln ( 1 / q t ( a ) 1 ) , we will obtain the return index prediction values of r a , t | p = 3 , 50 , 100 , 150 , 200 , 300 , 400 , 500 , 600 , 700 .
Table 4 lists the first three parameters of the second-order difference regression models for the residual of the return index prediction model.
When the lag order p = 3 , the regression model of the second-order difference d 2 q t ( a ) includes the intercept ω , and the coefficient α 0 for item q t 1 ( a ) , the coefficient α 1 for item d q t 1 ( a ) , and the coefficient β 1 , β 2 , β 3 for item q t 1 ( a ) , q t 2 ( a ) , q t 3 ( a ) .
When the lag order p = 50 , the regression model of the second-order difference d 2 q t ( a ) includes the intercept ω and the coefficient α 0 for item q t 1 ( a ) , the coefficient α 1 for item d q t 1 ( a ) , and the coefficient β 1 , β 2 , , β 50 for item q t 1 ( a ) , q t 2 ( a ) , , q t 50 ( a ) .
Similarly, when the lag order p = 700 , the regression model of the second-order difference d 2 q t ( a ) includes the intercept ω and the coefficient α 0 for item q t 1 ( a ) , the coefficient α 1 for item d q t 1 ( a ) , and the coefficient β 1 , β 2 , , β 700 for item q t 1 ( a ) , q t 2 ( a ) , , q t 700 ( a ) .
Figure 6 depicts the curves of the return index r t and its prediction values of r a , t | p = 200 from the second-order difference regression model.
Figure 7 depicts the curves of the return index r t and its prediction values of r a , t | p = 700 from the second-order difference regression model.
From these regression models for the second-order difference variable d 2 q t ( a ) , there are three results:
First, when the lag order of the probability variable q t ( a ) increases, the determinate coefficient for the regression model will increase. When the lag order is increased from 3 to 50, 100, 150, 200, 300, 400, 500, 600, and 700, the R-squared value of the regression model is increased from 0.833302 to 0.839955, 0.844830, 0.851597, 0.856684, 0.872071, 0.888580, 0.899775, 0.924362, and 0.957954.
Second, when the lag order increases, the correlations between the real return index r t and its prediction values will increase. When the lag order is increased from 3 to 50, 100, 150, 200, 300, 400, 500, 600, and 700, the correlation between r t and its prediction value of r a , t | p = 3 , r a , t | p = 50 , r a , t | p = 100 , r a , t | p = 150 , r a , t | p = 200 , r a , t | p = 300 , r a , t | p = 400 , r a , t | p = 500 , r a , t | p = 600 , r a , t | p = 700 is increased from 0.136766 to 0.238128, 0.294749, 0.341969, 0.389903, 0.486086, 0.578318, 0.651674, 0.745966, and 0. 0.867847, respectively.
Third, when comparing both figures, we can see that the prediction values of r a , t | p = 700 are more approximated to the real return index r t than the prediction values of r a , t | p = 200 . This means that higher lags of the AR-FD prediction model can create a higher approximated result between the real return index r t and its prediction value.

4.6. GARCH Model

For the residual variable a t , the conditional volatility in the GARCH (1,1) model is regressed as:
σ a , t 2 = 3.65 E 06 + 0.142963 a t 1 2 + 0.819842 σ a , t 1 2 L L = 3698.91 ,   A I C = 4.84 ,   S I C = 4.83 ,   H I C = 4.84
where the static variance is σ ¯ 2 = 0.009907 , the coefficient of the ARCH item is α = 0.142963 > 0 , the coefficient of the GARCH item is β = 0.819842 > 0 , the intercept is ω = 0.00000365 > 0 , and the three parameters satisfy the relation of α + β = 0.962805 < 1 , ω + α + β = 0.96280865 < 1 .
The mean and variance of the random variable ε a , t = a t / σ a , t are −0.008643 and 1.000669, respectively. When the new standardized random variable is defined by e a , t = ( ε a , t μ a , 0 ) / σ a , 0 , the mean and variance of the random variable e a , t are 3.86E-17 and 1.000328, respectively. Obviously, the random variable e a , t is more approximate to the standardized normal distribution than the random variable ε a , t .

4.7. Return Index Prediction Based on the Second-Order Finite Difference AR-GARCH-FD Model

When μ a , 0 = m e a n ( ε a , t ) , σ a , 0 = Var ( ε a , t ) , the residual item can be defined as ε t = μ 0 , t + σ 0 , t e a , t , then the autoregressive prediction model of the return index r t is
r a , t = μ a , t + σ a , t ( μ a , 0 + σ a , 0 e a , t )
Generally, when Var ( ε a , t ) 1 , then σ a , 0 = Var ( ε a , t ) . For simplicity, we will use Var ( ε a , t ) to replace σ a , 0 . When the variable q a , t ( e ) represents the probability of the quantile of e a , t , let q a , t ( e ) = 1 / ( 1 + e e a , t ) . Assuming that the probability q a , t ( e ) is the same as the probability of the random variable, the autoregressive prediction model of the return index r t can be defined by
r a , t ( e ) = μ a , t + σ a , t [ μ a , 0 σ a , 0 ( ln ( 1 q a , t ( e ) 1 ) ) ]
We will test if a higher lag order of the probability variable q a , t ( e ) regression model can lead to a higher correlation between the real return index r t and its prediction value. For this purpose, we will focus on conducting an analysis of the second-order finite difference regression model.
The second-order finite difference d 2 q a , t ( e ) can be expressed by a regression model as
d 2 q a , t ( e ) = ω + α 0 q a , t 1 ( e ) + α 1 d q a , t 1 ( a ) + j = 1 p β j d 2 q a , t j ( e ) + c t
When the lag order is p = 3 , 50 , 100 , 150 , 200 , 300 , 400 , 500 , 600 , 700 , we can obtain ten different prediction regression models for the second-order finite difference d 2 q a , t ( e ) .
According to the second-order finite difference equation q a , t ( e ) = q a , t 1 ( e ) + d q a , t 1 ( e ) + d 2 q a , t ( e ) , the return index prediction regression model r a , t ( e ) = μ a , t + σ a , t ( m e a n ( ε a , t ) + Var ( ε a , t ) ( ln ( 1 / q a , t ( e ) 1 ) ) ) , we will be able to obtain the return index prediction values of r a , t ( e ) | p = 3 , 50 , 100 , 150 , 200 , 300 , 400 , 500 , 600 , 700 .
Table 5 lists the first three parameters of the second-order finite difference regression models for different lags of the probability from the residual of the return index prediction model.
When the lag order p = 3 , the regression model of the second-order difference d 2 q a , t ( e ) includes the intercept ω and the coefficient α 0 for item q a , t 1 ( e ) , the coefficient α 1 for item d q a , t ( e ) , and the coefficient β 1 ,   β 2 , β 3 for item q a , t 1 ( e ) ,   q a , t 2 ( e ) , q a , t 3 ( e ) .
When the lag order p = 50 , the regression model of the second-order difference d 2 q a , t ( e ) includes the intercept ω and the coefficient α 0 for item q a , t 1 ( e ) , the coefficient α 1 for item d q a , t ( e ) , and the coefficient β 1 ,   β 2 , , β 50 for item q a , t 1 ( e ) ,   q a , t 2 ( e ) , q a , t 50 ( e ) .
Similarly, When the lag order p = 700 , the regression model of the second-order difference d 2 q a , t ( e ) includes the intercept ω and the coefficient α 0 for item q a , t 1 ( e ) , the coefficient α 1 for item d q a , t ( e ) , and the coefficient β 1 ,   β 2 , , β 700 for item q a , t 1 ( e ) ,   q a , t 2 ( e ) , q a , t 700 ( e ) .
Figure 8 depicts the curves of the return index r t and its prediction values of r a , t ( e ) | p = 200 from the second-order difference regression model.
Figure 9 depicts the curves of the return index r t and its prediction values of r a , t ( e ) | p = 700 from the second-order difference regression model.
From these regression models for the second-order difference variable d 2 q t ( e ) , there are three results:
First, when the lag order increases, the determinate coefficient for the regression model will increase. When the lag order is increased from 3 to 50, 100, 150, 200, 300, 400, 500, 600, and 700, the R-squared value of the regression model is increased from 0.833870 to 0.837518, 0.843600, 0.849075, 0.853597, 0.869262, 0.887668, 0.905421, 0.930127, and 0.962585.
Secondly, when the lag order increases, the correlations between the real return index r t and its prediction values will increase. When the lag order is increased from 3 to 50, 100, 150, 200, 300, 400, 500, 600, and 700, the correlation between r t and its prediction values of r a , t ( e ) | p = 3 , r a , t ( e ) | p = 50 , r a , t ( e ) | p = 100 , r a , t ( e ) | p = 150 , r a , t ( e ) | p = 200 , r a , t ( e ) | p = 300 , r a , t ( e ) | p = 400 , r a , t ( e ) | p = 500 , r a , t ( e ) | p = 600 , r a , t ( e ) | p = 700 increases from 0.119018 to 0.209055, 0.268237, 0.315291, 0.367438, 0.472224, 0.552860, 0.640771, 0.701112, and 0.847974, respectively.
Thirdly, when comparing both figures, we can see that the prediction values of r a , t ( e ) | p = 700 are more approximated to the real return index r t than the prediction values of r a , t ( e ) | p = 200 . This means that higher lags of the AR-GARCH-FD prediction model can create a higher approximated result between the real return index r t and its prediction value.

5. Empirical Analysis Based on the Cumulative Return Index

5.1. The Cumulative Return Index

Figure 10 shows the moving curves of the average compound return index r t ( a v e ) between the time period t [ 0 , t ] and the average compound return index r T ( a v e ) when t = T between 4 January 2010 and 8 July 2016.
Figure 11 shows the cumulative return index r t ( c ) and the cumulative average compound return index r t ( c , a v e ) between 4 January 2010 and 8 July 2016.
According to statistics, the average arithmetic return index r ¯ = 1.000398 , and the average compound return index r T ( a v e ) = ( r T ( c ) ) 1 / T = 1.000352 . The average arithmetic return index is not equal to the average compound return index. The average compound return index reveals the characteristics of the risk assets’ return indices.
It is clear that the cumulative return index r t ( c ) represents the long-term moving trend of the return index r t , and the cumulative average compound return index r t ( c , a v e ) represents the long-term moving trend of the average compound return index r T ( a v e ) .
When comparing the trends between the short-term return index r t and the long-term cumulative return index r t ( c ) , it is obvious that the long-term cumulative return index has a clearer moving trend than the short-term return index. For this reason, we will focus on conducting an analysis of the long-term cumulative return index.
If we have already learned the prediction value of the cumulative return index r t ( c ) , we will obtain the prediction value of the stock price P t = P 0 r t ( c ) . Because the stock price on the first day (4 January 2010) is P 0 = P 1 = 10583.96 , if we can predict the value of the cumulative return index r t ( c ) , the prediction value of the price at any time t [ 1 , t ] will be P t = P 0 r t ( c ) = 10583.96 r t ( c ) .
Figure 12 has listed the stock price P t and its prediction value from the formula P t = 10583.96 r t ( c ) . Because the cumulative return index r t ( c ) is from the real value of the return index r t , the curves of both P t and P t = 10583.96 r t ( c ) are almost the same.
It is clear from comparing the curves of the cumulative return index r t ( c ) and the real return index r t that the forecasting procedure of the cumulative return index r t ( c ) may be much easier than the forecasting procedure of the real return index r t .

5.2. The Cumulative Return Gap Index

Figure 13 shows the moving curves of the cumulative return gap (CRG) index r t ( c , g a p ) and its lag 1 item r t 1 ( c , g a p ) between 4 January 2010 and 8 July 2016.
The cumulative return gap index r t ( c , g a p ) represents a long-term cumulative excess return, which has an average arithmetic mean of 0.0375. We can see that it is difficult to differentiate both curves of r t ( c , g a p ) and its lag 1 item r t 1 ( c , g a p ) ; this is because the time series of the cumulative return gap index r t ( c , g a p ) has a very high autocorrelation.
The cumulative return gap ( r t ( c , g a p ) = r t ( c ) r T ( c ) ) reveals a cumulative risk premium during the time period of t ( 0 , t ) . When the general risk premium ( r t ( g a p ) = r t r f ) reveals a difference between the return of a risk asset and the return of a risk-free asset, the cumulative return gap reveals the difference between the cumulative compound return and the cumulative average compound return of a risk asset.
Table 6 lists the autocorrelations and the probabilities of Ljung and Box (1978) statistics for the time series variable r t ( c , g a p ) . The autocorrelation between both time series r t ( c , g a p ) and r t 1 ( c , g a p ) is 0.988, which is much higher than the value of the autocorrelation of −0.052 between the time series r t and r t 1 .
Because the correlation between the cumulative return gap index and its lag 1 item is as high as 0.988, or ρ 1 = Corr ( r t ( c , g a p ) , r t 1 ( c , g a p ) ) = 0.988 , the term of r t 1 ( c , g a p ) can be applied into the prediction model to replace the value of r t ( c , g a p ) . We have already learned that the cumulative return index r t ( c ) can be depicted as r t ( c ) = r t ( c , a v e ) + r t ( c , g a p ) ; if r t ( c , g a p ) r t 1 ( c , g a p ) , then r t ( c ) = r t ( c , a v e ) + r t 1 ( c , g a p ) . For this reason, the expression of r t ( c ) will include these two items of r t ( c , a v e ) and r t 1 ( c , g a p ) .

5.3. ARDL-CRG Prediction Model for the Cumulative Return Index

According to the definition, the cumulative return index r t ( c ) is related to four components: the cumulative average compound return index r t ( c , a v e ) , the time function f ( t ) , the cumulative return gap index r t 1 ( c , g a p ) , and the residual variable b t . Because the cumulative return gap item r t 1 ( c , g a p ) is introduced to the ARDL model, the new model can be called the ARDL-CRG model with the following equation
r b , t ( c ) = μ b , t ( c ) + b t μ b , t ( c ) = 0.002475 + 0.998355 r t ( c , a v e ) + 0.985347 r t 1 ( c , g a p ) + 0.000818 ln t R 2 = 0.997444 ,   S . E . = 0.012508 ,   A I C = 5.922330
The ARDL-CRG model shows that the dependent variable r t ( c ) can be represented by the independent variable r t ( c , a v e ) , r t 1 ( c , g a p ) , and ln t very well. The determined coefficient is as high as R 2 = 0.997444 .
The coefficient of r t ( c , a v e ) is 0.998355. The coefficient of r t 1 ( c , g a p ) is 0.985347. Both of the coefficients are very close to one. Because the coefficient of ln t is 0.000818, this means that the long-term trend of the stock market increases when the time variable is moving forward.
When the residual value of b t is ignored, it is easy to obtain the predicted value μ b , t ( c ) from this ARDL-CRG model. From the ARDL_CRM model, we can predict the return index by following the equations
r b , t = μ b , t + b t ,   where   μ b , t = μ b , t ( c ) r t 1 ( c ) ,   b t = b t r t 1 ( c )
Figure 14 shows the return index r t and its prediction value of μ b , t . The prediction value of μ b , t is the conditional mean of r t , which is similar to the equation of r b , t = μ b , t when b t = 0 . The correlation between r t and μ b , t is 0.0984. Although the correlation is low, it is good for representing the relationship between the return index r t and the conditional mean μ b , t .
Figure 15 shows the residual b t from the prediction model of r b , t ( c ) = μ b , t ( c ) + b t and the residual b t from the prediction model of r b , t = μ b , t + b t . The correlation between b t and b t is 0.9803. The correlation is quite high. It means that the prediction model of the cumulative return index is consistent with the prediction model of the real return index, although the most historic information is included in the residual items.

5.4. Indirect Prediction of the Return Index Based on the Finite Difference Method ARDL-CRG-FD Model

When the finite difference method is introduced into the ARDL-CRG model, the model will become the ARDL-CRG-FD model.
Because the residual item b t has a strong impact on the prediction value of the cumulative return index r t ( c ) , it is important to predict the trend of the residual item b t . When defining
q t ( b ) = 1 1 + e b t ,   or   b t = ln ( 1 q t ( b ) 1 )
then the variable q t ( b ) can be seen as a probability of b t . If we assume the first-order difference is d q t ( b ) = q t ( b ) q t 1 ( b ) , the second-order difference is d 2 q t ( b ) = d q t ( b ) d q t 1 ( b ) , the third-order difference is d 3 q t ( b ) = d 2 q t ( b ) d 2 q t 1 ( b ) , and the n t h o r d e r difference is d n q t ( b ) = d n 1 q t ( b ) d n 1 q t 1 ( b ) . If the level variable q t ( b ) is not the autocorrelation time series, the n t h o r d e r difference d n q t ( b ) may be the autocorrelation time series. If the n t h o r d e r difference d n q t ( b ) can be expressed as
d n q t ( b ) = ω + α 0 q t 1 ( b ) + α 1 d q t 1 ( b ) + + α n 1 d n 1 q t 1 ( b ) + β 1 d n q t 1 ( b ) + + β p d n q t p ( b ) + c t
the n t h -order difference d n q t ( b ) can also be expressed as
d n q t ( b ) = ω + i = 0 n 1 α i d i q t 1 ( b ) + j = 1 p β j d n q t j ( b ) + c t
Then, according to the definition of the difference method, the probability q t ( b ) can be predicted by
q t ( b ) = q t 1 ( b ) + d q t 1 ( b ) + d 2 q t 1 ( b ) + + d n 1 q t 1 ( b ) + d n q t ( b )
The variable c t is the residual item of the regression model. It is important to determine a proper order number; for example, we will consider the first-, second- and fourth-order differences. For simplicity, we will not consider the residual c t again and assume c t = 0 .
After obtaining the prediction value of the probability q t ( b ) , it is easy to obtain the prediction value of the cumulative return index r t ( c ) by
r b , t ( c ) = μ b , t ( c ) ln ( 1 q t ( b ) 1 )
By applying the equation of r b , t = r b , t ( c ) / r t 1 ( c ) , it is easy to obtain the prediction value of the return index r t .
The probability prediction models from the second-order difference are
d 2 q t ( b ) = 0.563364 1.126737 q t 1 ( b ) 0.675543 d q t 1 ( b ) 0.225449 d 2 q t 1 ( b ) 0.103285 d 2 q t 2 ( b ) 0.051688 d 2 q t 3 ( b ) R 2 = 0.8426 ,   S . E . = 0.0031 ,   A I C = 8.6992 ,   S I C = 8.6782 q t ( b ) = q t 1 ( b ) + d q t 1 ( b ) + d 2 q t ( b )
The probability prediction models from the third-order difference are
d 3 q t ( b ) = 0.563364 1.126737 q t 1 ( b ) 0.675543 d q t 1 ( b ) 1.380421 d 2 q t 1 ( b ) + 0.154973 d 3 q t 1 ( b ) + 0.051688 d 3 q t 2 ( b ) R 2 = 0.9536 ,   S . E . = 0.0031 ,   A I C = 8.6992 ,   S I C = 8.6782 q t ( b ) = q t 1 ( b ) + d q t 1 ( b ) + d 2 q t 1 ( b ) + d 3 q t ( b )
The probability prediction models from the fourth-order difference are
d 4 q t ( b ) = 0.563364 1.126737 q t 1 ( b ) 0.675543 d q t 1 ( b ) 1.380421 d 2 q t 1 ( b ) 0.793340 d 3 q t 1 ( b ) 0.051688 d 4 q t 1 ( b ) R 2 = 0.9869 ,   S . E . = 0.0031 ,   A I C = 8.6992 ,   S I C = 8.6782 q t ( b ) = q t 1 ( b ) + d q t 1 ( b ) + d 2 q t 1 ( b ) + d 3 q t 1 ( b ) + d 4 q t ( b )
Assume variable r b , t ( 2 ) represents the prediction value of the return index r t from the second-order difference probability prediction value of q t ( b ) ; variable r b , t ( 3 ) represents the prediction value of the return index r t from the third-order difference probability prediction value of q t ( b ) ; and variable r b , t ( 4 ) represents the prediction value of the return index r t from the fourth-order difference probability prediction value of q t ( b ) .
Figure 16 shows the curves of the return index r t and its prediction values of r b , t ( 2 ) , r b , t ( 3 ) , and r b , t ( 4 ) from the second, third and fourth difference probability prediction values during 2010–2016.
Figure 17 shows the curves of the conditional mean μ b , t of the return index r t and the prediction values r b , t ( 2 ) , r b , t ( 3 ) , and r b , t ( 4 ) of the return index r t from the second, third, and fourth difference probability prediction values during 2010–2016.
The correlations between r t and μ b , t , r b , t ( 2 ) , r b , t ( 3 ) , r b , t ( 4 ) are 0.1018, 0.1614, 0.1435, 0.1614, respectively. It is obvious that the residual item b t has made the correlations between the return index r t and its prediction values, r b , t ( 2 ) , r b , t ( 3 ) , r b , t ( 4 ) increase much more than the correlation between the return index r t and its prediction values of the conditional mean μ b , t . This means that applying the second-, third-, and fourth-order finite differences to the residual item b t can improve the correlations between the real return index r t and its prediction values.

5.5. Return Index Prediction Based on the Second-Order Difference ARDL-CRG-FD Model

Because applying the second, third, and fourth-order finite difference methods to the residual item b t can improve the correlations between the real return index r t and its prediction values, we will test if higher lags of the probability q t ( b ) regression model can lead to a higher correlation between the real return index r t and its prediction value. For this purpose, we will focus on conducting an analysis of the second-order finite difference regression model.
The second-order difference d 2 q t ( b ) can be expressed by a regression model as
d 2 q t ( b ) = ω + α 0 q t 1 ( b ) + α 1 q t 1 ( b ) + j = 1 p β j d 2 q t j ( b ) + c t
When the lag-order is p = 3 , 50 , 100 , 150 , 200 , 300 , 400 , 500 , 600 , 700 , we can obtain ten different prediction regression models for the second-order difference d 2 q t ( b ) .
By applying the equations of q t ( b ) = q t 1 ( b ) + d q t 1 ( b ) + d 2 q t ( b ) , r b , t ( c ) = μ b , t ( c ) ln ( 1 q t ( b ) 1 ) , and r b , t = r b , t ( c ) / r t 1 ( c ) , we will be able to obtain the return index prediction values of r b , t | p = 3 , 50 , 150 , 200 , 300 , 400 , 500 , 600 , 700 .
Table 7 has listed the first three parameters of the second-order difference regression models for the residual of the cumulative return index prediction model.
When the lag order p = 3 , the regression model of the second-order difference d 2 q t ( b ) includes the intercept ω and the coefficient α 0 for item q t 1 ( b ) , the coefficient α 1 for item d q t 1 ( b ) , and the coefficient β 1 , β 2 , β 3 for item q t 1 ( b ) , q t 2 ( b ) , q t 3 ( b ) .
When the lag order p = 50 , the regression model of the second-order difference d 2 q t ( b ) includes the intercept ω and the coefficient α 0 for item q t 1 ( b ) , the coefficient α 1 for item d q t 1 ( b ) , and the coefficient β 1 , β 2 , , β 50 for item q t 1 ( b ) , q t 2 ( b ) , , q t 50 ( b ) .
Similarly, When the lag order p = 700 , the regression model of the second-order difference d 2 q t ( b ) includes the intercept ω and the coefficient α 0 for item q t 1 ( b ) , the coefficient α 1 for item d q t 1 ( b ) , and the coefficient β 1 , β 2 , , β 700 for item q t 1 ( b ) , q t 2 ( b ) , , q t 700 ( b ) .
Figure 18 depicts the curves of the return index r t and its prediction values of r b , t | p = 200 from the second-order difference regression model.
Figure 19 depicts the curves of the return index r t and its prediction values of r b , t | p = 700 from the second-order difference regression model.
From these regression models for the second-order difference variable d 2 q t ( b ) , there are three results:
First, when the lag order increases, the determinate coefficient for the regression model will increase. When the lag order increases from 3 to 50, 100, 150, 200, 300, 400, 500, 600, and 700, the R-squared value of the regression model increases from 0.842668 to 0.847195, 0.852919, 0.860050, 0.863607, 0.877231, 0.891028, 0.906935, 0.930281, and 0.961514, respectively.
Second, when the lag order increases, the correlations between the real return index r t and its prediction values will increase. When the lag order increases from 3 to 50, 100, 150, 200, 300, 400, 500, 600, and 700, the correlation between r t and its prediction value of r b , t | p = 3 , r b , t | p = 50 , r b , t | p = 100 , r b , t | p = 150 , r b , t | p = 200 , r b , t | p = 300 , r b , t | p = 400 , r b , t | p = 500 , r b , t | p = 600 , r b , t | p = 700 increases from 0.161486 to 0.242633, 0.296988, 0.344799, 0.382242, 0.478909, 0.584397, 0.656670, 0.752572, and 0.873537, respectively.
Third, when comparing both figures, we can see that the prediction values of r b , t | p = 700 are more approximated to the real return index r t than the prediction values of r b , t | p = 200 . This means that a higher lag order prediction model can create a higher approximated result between the real return index r t and its prediction value.

5.6. ARDL-CRG-GARCH-FD Model and Return Index Prediction Based on the Finite Difference Method

For the residual variable b t , the conditional volatility in the GARCH (1,1) model is regressed as
σ b , t 2 = 6.59 E 0.6 + 0.134894 b t 1 2 + 0.823964 σ b , t 1 2 L L = 3276.90 ,   A I C = 4.27 ,   S I C = 4.26 ,   H I C = 4.27
where the static variance is σ ¯ 2 = 0.01265 , the coefficient of the ARCH item is α = 0.134894 > 0 , the coefficient of the GARCH item is β = 0.823964 > 0 , the intercept is ω = 0.00000659 > 0 , and the three parameters satisfy the relation of α + β = 0.958858 < 1 , ω + α + β = 0.95886459 < 1 .
Because the regressive residual is unavoidable, the mean and variance of the random variable e b , t = b t / σ b , t are 0.000692 and 1.005242, respectively. When the new standardized random variable is defined by e b , t = ( ε b , t μ b , 0 ) / σ b , 0 , the mean and variance of the random variable e b , t are 1.74E-18 and 1.000327, respectively. Obviously, the random variable e b , t is more approximate to the standardized normal distribution than the random variable ε b , t .
Then, the prediction value of the cumulative return index r t ( c ) will be
r b , t ( c , e ) = μ b , t ( c ) + σ b , t ( m e a n ( ε b , t ) + Var ( ε b , t ) e b , t )
When the variable q b , t ( e ) represents the probability of the quantile of e b , t , let q b , t ( e ) = 1 1 + e e b , t . Assume that the probability q b , t ( e ) is the same as the probability of the random variable with the standard normal distribution, then for simplicity, the prediction model of the cumulative return index r t ( c ) can be defined as
r b , t ( c , e ) = μ b , t ( c ) + σ b , t ( m e a n ( ε b , t ) + Var ( ε b , t ) ( ln ( 1 q b , t ( e ) 1 ) ) )
Then, the prediction model of the return index r t can be defined as
r b , t ( e ) = r b , t ( c , e ) r b , t ( c )
The probability prediction models from the second-order difference are
d 2 q b , t ( e ) = 0.534293 1.060669 q b , t 1 ( e ) 0.820323 d q b , t 1 ( e ) 0.152446 d 2 q b , t 1 ( e ) 0.102566 d 2 q b , t 2 ( e ) 0.052680 d 2 q b , t 3 ( e ) R 2 = 0.9532 ,   S . E . = 0.2014 , A I C = 0.3625 ,   S I C = 0.3415 q b , t ( e ) = q b , t 1 ( e ) + d q b , t 1 ( e ) + d 2 q b , t ( e )
The probability prediction models from the third-order difference are
q b , t ( e ) = 0.534293 1.060669 q b , t 1 ( e ) 0.820323 d q b , t 1 ( e ) 1.307692 d 2 q b , t 1 ( e ) + 0.155246 d 3 q b , t 1 ( e ) 0.052680 d 3 q b , t 2 ( e ) R 2 = 0.9532 ,   S . E . = 0.2014 , A I C = 0.3625 ,   S I C = 0.3415 q b , t ( e ) = q b , t 1 ( e ) + d q b , t 1 ( e ) + d 2 q b , t 1 ( e ) + d 3 q b , t ( e )
The probability prediction models from the fourth-order difference are
d 4 q b , t ( e ) = 0.534293 1.060669 q b , t 1 ( e ) 0.820323 d q b , t 1 ( e ) 1.307692 d 2 q b , t 1 ( e ) 0.792074 d 3 q b , t 1 ( e ) 0.052680 d 4 q b , t 1 ( e ) R 2 = 0.9867 ,   S . E . = 0.2014 , A I C = 0.3625 ,   S I C = 0.3415 q b , t ( e ) = q b , t 1 ( e ) + d q b , t 1 ( e ) + d 2 q b , t 1 ( e ) + d 3 q b , t 1 ( e ) + d 4 q b , t ( e )
Assume variable r b , t ( e ) | 2 n d represents the prediction value of the return index r t from the second-order difference probability prediction value of q b , t ( e ) ; variable r b , t ( e ) | 3 r d represents the prediction value of the return index r t from the third-order difference probability prediction value of q b , t ( e ) ; and variable r b , t ( e ) | 4 t h represents the prediction value of the return index r t from the fourth-order difference probability prediction value of q b , t ( e ) .
Figure 20 shows the curves of the return index r t and its prediction values of r b , t ( e ) | 2 n d , r b , t ( e ) | 3 r d , and r b , t ( e ) | 4 t h from the second, third, and fourth difference probability prediction values during 2010–2016.
Figure 21 shows the curves of the conditional mean μ b , t of the return index r t and the prediction values r b , t ( e ) | 2 n d , r b , t ( e ) | 3 r d , and r b , t ( e ) | 4 t h of the return index r t from the second, third, and fourth order difference probability prediction values during 2010–2016.
The correlations between r t and μ b , t , r b , t ( e ) | 2 n d , r b , t ( e ) | 3 r d , and r b , t ( e ) | 4 t h are 0.1006, 0.1467, 0.1467, 0.1467, respectively. It is obvious that the residual item b t has made the correlations between the return index r t and its prediction values r b , t ( e ) | 2 n d , r b , t ( e ) | 3 r d , and r b , t ( e ) | 4 t h increase much more than the correlation between the return index r t and its prediction values of the conditional mean μ b , t .

5.7. Return Index Prediction Based on the Second-Order Difference ARDL-CRG-GARCH-FD Model

Because applying the second-, third-, and fourth-order finite difference methods to the residual item e b , t can improve the correlations between the real return index r t and its prediction values, we will test if higher lags of the probability prediction value of the q b , t ( e ) regression model can lead to a higher correlation between the real return index r t and its prediction value. For this purpose, we will focus on conducting an analysis of the second-order finite difference regression model.
The second-order difference d 2 q b , t ( e ) can be expressed by a regression model as
d 2 q b , t ( e ) = ω + α 0 q b , t 1 ( e ) + α 1 d q b , t 1 ( e ) + j = 1 p β j d 2 q b , t j ( e ) + c t
When the lag order is p = 3 , 50 , 100 , 150 , 200 , 300 , 400 , 500 , 600 , 700 , we can obtain ten different prediction regression models for the second-order difference d 2 q b , t ( e ) .
According to the second-order difference equation q b , t ( e ) = q b , t 1 ( e ) + d q b , t 1 ( e ) + d 2 q b , t ( e ) , the return index prediction regression model r b , t ( c , e ) = μ b , t ( c ) + σ b , t ( m e a n ( ε b , t ) + Var r ( ε b , t ) ( ln ( 1 / q b , t ( e ) 1 ) ) ) , and the return index prediction model r b , t ( e ) = r b , t ( c , e ) / r b , t ( c ) , we will be able to obtain the return index prediction values of r b , t ( e ) | p = 3 , 50 , 100 , 150 , 200 , 300 , 400 , 500 , 600 , 700 .
Table 8 has listed the first three parameters of the second-order finite difference regression models for the residual of the cumulative return index prediction model.
When the lag order p = 3 , the regression model of the second-order difference d 2 q t ( b ) includes the intercept ω and the coefficient α 0 for item q t 1 ( b ) , the coefficient α 1 for item d q t 1 ( b ) , and the coefficient β 1 , β 2 , β 3 for item q t 1 ( b ) , q t 2 ( b ) , q t 3 ( b ) .
When the lag order p = 50 , the regression model of the second-order difference d 2 q t ( b ) includes the intercept ω and the coefficient α 0 for item q t 1 ( b ) , the coefficient α 1 for item d q t 1 ( b ) , and the coefficient β 1 , β 2 , , β 50 for item q t 1 ( b ) , q t 2 ( b ) , , q t 50 ( b ) .
Similarly, when the lag order p = 700 , the regression model of the second-order difference d 2 q t ( b ) includes the intercept ω and the coefficient α 0 for item q t 1 ( b ) , the coefficient α 1 for item d q t 1 ( b ) , and the coefficient β 1 , β 2 , , β 700 for item q t 1 ( b ) , q t 2 ( b ) , , q t 700 ( b ) .
Figure 22 depicts the curves of the return index r t and its prediction values of r b , t ( e ) | p = 200 from the second-order finite difference regression model.
Figure 23 depicts the curves of the return index r t and its prediction values of r b , t ( e ) | p = 700 from the second-order finite difference regression model.
From these regression models for the second-order difference variable d 2 q t ( b ) , there are three results:
First, when the lag order increases, the determinate coefficient for the regression model will increase. When the lag order increases from 3 to 50, 100, 150, 200, 300, 400, 500, 600, and 700, the R-squared value of the regression model increases from 0.842384 to 0.845745, 0.852091, 0.857610, 0.861082, 0.875154, 0.892694, 0.910258, 0.933839, and 0.965393, respectively.
Second, when the lag order increases, the correlations between the real return index r t and its prediction values will increase. When the lag order increases from 3 to 50, 100, 150, 200, 300, 400, 500, 600, and 700, the correlation between r t and its prediction values of r b , t ( e ) | p = 3 , r b , t ( e ) | p = 50 , r b , t ( e ) | p = 100 , r b , t ( e ) | p = 150 , r b , t ( e ) | p = 200 , r b , t ( e ) | p = 300 , r b , t ( e ) | p = 400 , r b , t ( e ) | p = 500 , r b , t ( e ) | p = 600 , r b , t ( e ) | p = 700 increases from 0.146718 to 0.220724, 0.284660, 0.329443, 0.368404, 0.472701, 0.566134, 0.646393, 0.732672, and 0.840273, respectively.
Third, when we compare both figures, we can see that the prediction values of r b , t ( e ) | p = 700 are more approximated to the real return index r t than the prediction values of r b , t ( e ) | p = 200 . This means that a higher lag order of the probability r b , t ( e ) prediction model can create a higher approximated result between the real return index r t and its prediction value.

6. Tests of the Prediction Accuracy for the Four Kinds of Models

6.1. Comparison of the Correlations between the Real and Predicted Returns from the Four Different Models

From the probability prediction models, we have already learned that the correlations between the real return index r t and its prediction values from the higher order differences are higher than the correlations between the real return index r t and its prediction values from the lower order differences.
When we fixed the finite difference order of the probability variable q t that transferred from the residual variable a t at the second order, the higher lags of the probability variable q t will make higher correlations between the real return index r t and its prediction values for each of the four different prediction models.
It is clear that the higher correlations mean that the prediction accuracy is high. For the four different models, we will compare the empirical results based on the perspectives of the correlations.
From the previous study, we built an AR(5) model when the lag order is p = 5 and the lag items are r t 1 , r t 2 , r t 3 , r t 4 , r t 5 from the real return index r t . We also built an ARDL-CRG model when the cumulative gap lag order is 1 as item r t 1 ( c , g a p ) . Based on the two models’ residual items, by using a second-order finite difference method, we have already built four different models.
Table 9 has listed the correlations between the real return index r t and its prediction values of r a , t , r b , t , r a , t ( e ) , and r a , t ( e ) from the four different kinds of prediction models.
First, the perspective of AR-FD models is considered. The prediction values of r a , t are from the traditional autoregressive (AR) model r a , t = μ a , t + a t . When the probability is defined as q t ( a ) = 1 / ( 1 + e a t ) , the residual item a t = ln ( 1 / q t ( a ) 1 ) can be predicted by predicting the probability q t ( a ) . Because the second-order difference d 2 q t ( a ) of the probability q t ( a ) is an autoregressive time series, the probability q t ( a ) can be predicted by predicting its second-order difference d 2 q t ( a ) . When we choose different lag orders for the second-order difference d 2 q t ( a ) as d 2 q t 3 ( a ) , d 2 q t 50 ( a ) , …, and d 2 q t 700 ( a ) , we will obtain the prediction values of r a , t | p = 3 , r a , t | p = 50 , …, and r a , t | p = 700 . When the lag order increases, the correlation between the real return index r t and the prediction values of r a , t will increase.
Second, from the perspective of ARDL-CRG-FD models, the prediction values of r b , t are from the traditional autoregressive distribution lag (ARDL) model r b , t ( c ) = μ b , t ( c ) + b t . Because r b , t = r b , t ( c ) / r t 1 ( c ) , it is easy to predict the values of the return index if we know the prediction values of the cumulative return index. When the probability is defined as q t ( b ) = 1 / ( 1 + e b t ) , then the residual item b t = ln ( 1 / q t ( b ) 1 ) can be predicted by predicting the probability q t ( b ) . Because the second-order difference d 2 q t ( b ) of the probability q t ( b ) is an autoregressive time series, the probability q t ( b ) can be predicted by predicting its second-order difference d 2 q t ( b ) . When we choose different lag orders for the second-order difference d 2 q t ( b ) as d 2 q t 3 ( b ) , d 2 q t 50 ( b ) , …, and d 2 q t 700 ( b ) , we will get the prediction values of r b , t | p = 3 , r b , t | p = 50 , …, and r b , t | p = 700 . When the lag order increases, the correlation between the real return index r t and the prediction values of r b , t will increase.
Third, from the perspective of AR-GARCH-FD models, the prediction values of r a , t ( e ) are from the traditional autoregressive (AR) model and the generalized autoregressive conditional heteroscedasticity (GARCH) model r a , t = μ a , t + a t , a t = σ a , t ε a , t , ε a , t = a t / σ a , t , e a , t = ( ε a , t μ a , 0 ) / σ a , 0 . When the probability is defined as q a , t ( e ) = 1 / ( 1 + e e a , t ) , by predicting the probability q a , t ( e ) , the residual item a t = σ a , t ( μ a , 0 + σ a , 0 ( ln ( 1 / q a , t ( e ) 1 ) ) ) can be predicted. Because the second-order difference d 2 q a , t ( e ) of the probability q a , t ( e ) is an autoregressive time series, the probability q a , t ( e ) can be predicted by predicting its second-order difference d 2 q a , t ( e ) . When we choose different lag orders for the second-order difference d 2 q a , t ( e ) as d 2 q a , t 3 ( e ) , d 2 q a , t 50 ( e ) , …, and d 2 q a , t 700 ( e ) , we will get the prediction values of r a , t ( e ) | p = 3 , r a , t ( e ) | p = 50 , …, and r a , t ( e ) | p = 700 . When the lag order increases, the correlation between the real return index r t and the prediction values of r a , t ( e ) will increase.
Fourth, from the perspective of ARDL-CRG-GARCH-FD models, the prediction values of r b , t ( e ) are from the traditional autoregressive distribution lag (ARDL) model and the generalized autoregressive conditional heteroscedasticity (GARCH) model r b , t ( c ) = μ b , t ( c ) + b t , b t = σ b , t ε b , t , ε b , t = b t / σ b , t , e b , t = ( ε b , t μ b , 0 ) / σ b , 0 . When the probability is defined as q b , t ( e ) = 1 / ( 1 + e e b , t ) , by predicting the probability q b , t ( e ) , the residual item b t = σ b , t ( m e a n ( ε b , t ) + Var ( ε b , t ) ( ln ( 1 / q b , t ( e ) 1 ) ) ) can be predicted. Because the second-order difference d 2 q b , t ( e ) of the probability q b , t ( e ) is an autoregressive time series, the probability q b , t ( e ) can be predicted by predicting its second-order difference d 2 q b , t ( e ) . When we choose different lag orders for the second-order difference d 2 q b , t ( e ) as d 2 q b , t 3 ( e ) , d 2 q b , t 50 ( e ) , …, and d 2 q b , t 700 ( e ) , we will get the prediction values of r b , t ( e ) | p = 3 , r b , t ( e ) | p = 50 , …, and r b , t ( e ) | p = 700 . When the lag order increases, the correlation between the real return index r t and the prediction values of r b , t ( e ) will increase.
Table 10 lists the comparison values of the correlations between the return index and the prediction values from the AR, ARDL, AR-GARCH, and ARDL-GARCH models.
From the comparative results, we can get the following four results:
Firstly, the comparison between the correlations of ρ ( r b , t , r t ) and ρ ( r a , t , r t ) shows that the correlations of ρ ( r b , t , r t ) are mostly greater than the correlations of ρ ( r a , t , r t ) . It means that the correlations between the return index r t and the prediction values r b , t . from the ARDL-CRG-FD models for the cumulative return index are greater than the correlations between the return index r t and the prediction values r a , t from the AR-FD models for the return index. It reveals that the CRG model can improve the prediction accuracy.
Secondly, the comparison between the correlations of ρ ( r b , t ( e ) , r t ) and ρ ( r a , t ( e ) , r t ) shows that the correlations of ρ ( r b , t ( e ) , r t ) are mostly greater than the correlations of ρ ( r a , t ( e ) , r t ) . It means that the correlations between the return index r t and the prediction values r b , t ( e ) from the ARDL-CRG-GARCH-FD models for the cumulative return index are greater than the correlations between the return index r t and the prediction values r a , t ( e ) from the AR-GARCH-FD models for the return index. It reveals that the CRG model can improve the prediction accuracy.
Thirdly, the comparison between the correlations of ρ ( r a , t , r t ) and ρ ( r a , t ( e ) , r t ) shows that the correlations of ρ ( r a , t , r t ) are greater than the correlations of ρ ( r a , t ( e ) , r t ) . It means that the correlations between the return index r t and the prediction values r a , t from the AR-FD models for the return index are greater than the correlations between the return index r t and the prediction values r a , t ( e ) from the AR-GARCH-FD models for the return index. It means that the GARCH model has little impact on prediction values.
Fourthly, the comparison between the correlations of ρ ( r b , t , r t ) and ρ ( r b , t ( e ) , r t ) shows that the correlations of ρ ( r b , t , r t ) are greater than the correlations of ρ ( r b , t ( e ) , r t ) . It means that the correlations between the return index r t and the prediction values r b , t from the ARDL-CRG-FD models for the cumulative return index are greater than the correlations between the return index r t and the prediction values r b , t ( e ) from the ARDL-CRG-GARCH-FD models for the cumulative return index. It means that the GARCH model has little impact on prediction values.

6.2. Hit Ratio Tests

Hit ratio analysis includes four cases: both the return index and the prediction value are upward, both the return index and the prediction value are downward, the return index is up but the prediction value is down, and the return index is down but the prediction value is up.
The ideal prediction values are that the higher hit ratios are better under the two cases when both the return index and the prediction values move upward or downward together, or the lower hit ratios are better in the two cases in both the return index and the prediction values are moving in the inverse directions.
First, the hit ratios from the AR-FD models were analyzed.
Table 11 lists the hit ratios between the real return index r t and its prediction values of r a , t from the direct AR-FD model for the return index at ten levels of different lag orders.
Under the ideal prediction criteria, it is clear that a higher level of lag order leads to a higher hit ratio than a lower level of lag order when both the return index and the prediction values move upward or downward together.
Secondly, the hit ratios from the ARDL-CRG-FD models were analyzed.
Table 12 lists the hit ratios between the real return index r t and its prediction values of r b , t from the indirect ARDL-CRG-FD model for the return index at ten levels of different lag orders.
Under the ideal prediction criteria, it is clear that a higher level of lag order has led to a higher hit ratio than a lower level of lag order when both the return index and the prediction values move upward or downward together.
Third, we carried out a comparison between the hit ratios from the ARDL-CRG-FD models and from the AR-FD models.
Table 13 has listed the comparative results of hit ratios between the results from the direct AR-FD model and the results from the indirect ARDL-CRG-FD model.
Under the ideal prediction criteria, the comparison shows that the prediction values from the indirect prediction model ARDL-CRG-FD for the cumulative return index are mostly better than the prediction values from the direct prediction model AR-FD for the return index, especially when the lag order is higher and greater than 400. For example, under the two cases when both the return index and the prediction values move upward together and expressed as { r t 1 } { χ 1 } or downward together and expressed as { r t < 1 } { χ < 1 } , the hit ratios of r b , t | p = 100 , r b , t | p = 150 , r b , t | p = 200 , r b , t | p = 400 , r b , t | p = 500 , r b , t | p = 600 , and r b , t | p = 700 are greater than the hit ratios of r a , t | p = 100 , r a , t | p = 150 , r a , t | p = 200 , r a , t | p = 400 , r a , t | p = 500 , r a , t | p = 600 , and r a , t | p = 700 .
Inversely, in the case when the return index is downward but the prediction values are upward and expressed as { r t < 1 } { χ 1 } , the hit ratios of r b , t | p = 100 , r b , t | p = 150 , r b , t | p = 200 , r b , t | p = 400 , r b , t | p = 500 , r b , t | p = 600 , and r b , t | p = 700 are less than the hit ratios of r a , t | p = 100 , r a , t | p = 150 , r a , t | p = 200 , r a , t | p = 400 , r a , t | p = 500 , r a , t | p = 600 , and r a , t | p = 700 . It means that the ARDL-CRG-FD model is better for improving the hit ratios than the AR-FD models, especially when the difference orders or lags are higher.
Fourth, the hit ratios from the AR-GARCH-FD models were analyzed.
Table 14 lists the hit ratios between the real return index r t and its prediction values of r a , t ( e ) from the direct AR-GARCH-FD model for the return index at ten levels of different lag orders.
Under the ideal prediction criteria, it is clear that the higher level of lag order has led to a higher hit ratio than the lower level of lag order.
Fifth, the hit ratios from the ARDL-CRG-GARCH-FD models were analyzed.
Table 15 lists the hit ratios between the real return index r t and its prediction values of r b , t ( e ) from the indirect ARDL-CRG-GARCH-FD model for the cumulative return index at ten levels of different lag orders.
Under the ideal prediction criteria, it is clear that the higher level of lag order has led to a higher hit ratio than the lower level of lag order.
Sixth, a comparison between the hit ratios from the AR-GARCH-FD and the ARDL-CRG-GARCH-FD models was carried out.
Table 16 lists the comparative results of hit ratios between the results from the direct AR-GARCH-FD model and the results from the indirect ARDL-CRG-GARCH-FD model.
The comparison shows that the hit ratios from the indirect prediction values of the ARDL-CRG-GARCH-FD model for the cumulative return index are similar to the direct prediction values of the AR-GARCH-FD model for the return index. It means that in terms of the hit ratios, the ARDL-CRG-GARCH-FD model is similar to the AR-GARCH-FD model.
Seventh, a comparison between the hit ratios from the AR-FD and the AR-GARCH-FD models was carried out.
Table 17 lists the comparative results of the hit ratios between the results from the direct AR-FD models and the results from the indirect AR-GARCH-FD models.
The comparison shows that the hit ratios from the direct prediction values of the AR-GARCH-FD models for the return index are better than the hit ratios from the direct prediction values of the AR-FD model for the return index. It means that when it comes to the hit ratios, the AR-GARCH-FD models are better than the AR-FD models.
Eighth, a comparison between the hit ratios from the ARDL-CRG-FD and ARDL-CRG-GARCH-FD models was carried out.
Table 18 has listed the comparative results of the hit ratios between the results from the indirect ARDL-CRG-FD model and the results from the indirect ARDL-CRG-GARCH-FD model.
The comparison shows that the hit ratios from the indirect prediction values of the ARDL-CRG-GARCH-FD models for the cumulative return index are better than the hit ratios from the indirect prediction values of the ARDL-CRG-FD models for the cumulative return index. It means that when it comes to the hit ratios, the ARDL-CRG-GARCH-FD models are better than ARDL-CRG-FD models.

6.3. RMSE Tests

We will analyze the average values of the root mean square error (RMSE) for the four kinds of models.
Table 19 has listed the values of the RMSE including the prediction values from the direct prediction AR-FD and AR-GARCH-FD models for the return index and the indirect prediction ARDL-CRG-FD and ARDL-CRG-GARCH-FD models for the cumulative return index.
The RMSE is focused on summarizing the average values of the root mean square error (RMSE). The ideal criterion is that the smaller value is the better value.
In considering the ideal criterion of the RMSE, it is clear that the higher level lags of the second-order probability variable d 2 q t p led to a smaller RMSE value than the lower level lags of the second-order probability variable d 2 q t p for all of the four kinds of models including AR-FD, AR-GARCH-FD, ARDL-CRG-FD and ARDL-CRG-GARCH-FD.
Table 20 lists the comparison results between the RMSE values resulting from the AR-FD, AR-GARCH-FD, ARDL-CRM-FD, and ARDL-CRM-GARCH-FD models.
When we compare the results of the RMSE prediction values between the four kinds of models, there are four results.
First, we make a comparison between both the AR-FD and AR-GARCH-FD models. The RMSE of the AR-FD model is defined as t ( r t r a , t ) 2 . The RMSE of the AR-GARCH-FD model is defined as t ( r t r a , t ( e ) ) 2 . The comparison between the RMSE values of t ( r t r a , t ) 2 and t ( r t r a , t ( e ) ) 2 shows that the RMSE values of t ( r t r a , t ) 2 are less than the RMSE values of t ( r t r a , t ( e ) ) 2 . It means that the RMSE values between the return index r t and the prediction values r a , t from the AR-FD model for the return index are less than the RMSE values between the return index r t and the prediction values r a , t ( e ) from the AR-GARCH-FD model for the return index. It means that the GARCH model has little impact on the decrease in the RMSE value, or it means that when the finite difference method is used, the GARCH model cannot improve the prediction accuracy by a lot.
Second, we made a comparison between both the ARDL-CRG-FD and ARDL-CRG -GARCH-FD models. The RMSE of the ARDL-CRG-FD model is defined as t ( r t r b , t ) 2 . The RMSE of the ARDL-CRG -GARCH-FD model is defined as t ( r t r b , t ( e ) ) 2 . The comparison between the RMSE values of t ( r t r b , t ) 2 and t ( r t r b , t ( e ) ) 2 shows that the RMSE values of t ( r t r b , t ) 2 are less than the RMSE values of t ( r t r b , t ( e ) ) 2 . It means that the RMSE values between the return index r t and the prediction values r b , t from the ARDL-CRG-FD model for the cumulative return index are less than the RMSE values between the return index r t and the prediction values r b , t ( e ) from the ARDL-CRG -GARCH-FD model for the cumulative return index. It means that the GARCH model has little impact on the decrease in the RMS value, or it means that when the finite difference method is used, the GARCH model cannot improve the prediction accuracy by a lot.
Third, we made a comparison between both the AR-FD and ARDL-CRM-FD models. The RMSE of the AR-FD model is defined as t ( r t r a , t ) 2 . The RMSE of the ARDL-CRM-FD model is defined as t ( r t r b , t ) 2 . The comparison between the RMSE values of both the AR-FD and ARDL-CRM-FD models shows that mostly the values of t ( r t r b , t ) 2 are less than the values of t ( r t r a , t ) 2 . It means that mostly the RMSE values between the return index r t and the prediction values r b , t from the ARDL-CRG-FD model for the cumulative return index are less than the RMSE values between the return index r t and the prediction values r a , t from the AR-FD model for the return index. It means that the ARDL-CRG-FD model has a higher impact on the decrease in the RMSE value than the AR-FD model, or it means that the ARDL-CRG-FD model can improve the prediction accuracy more than the AR-FD model.
Fourth, we made a comparison between both the AR-GARCH-FD and ARDL-CRG-GARCH-FD models. The RMSE of the AR-GARCH-FD model is defined as t ( r t r a , t ( e ) ) 2 . The RMSE of the ARDL-CRG-GARCH-FD model is defined as t ( r t r b , t ( e ) ) 2 . The comparison between the RMSE values of both the AR-GARCH-FD and ARDL-CRG-GARCH-FD models shows that mostly the RMSE values of t ( r t r b , t ( e ) ) 2 are less than the RMSE values of t ( r t r a , t ( e ) ) 2 . It means that, for the most part, the RMSE values between the return index r t and the prediction values r b , t ( e ) from the ARDL-CRG-GARCH-FD model for the cumulative return index are less than the RMSE values between the return index r t and the prediction values r a , t ( e ) from the AR-GARCH-FD model for the return index. It means that the ARDL-CRG-GARCH-FD model has a higher impact on the decrease in the RMS value than the AR-GARCH-FD model, or that the CRG model can improve the prediction accuracy by a lot.

7. Conclusions

The empirical analysis results of ARDL-CRG-FD models have approved that improving the difference order of the probability variables can improve the determinate correlations of FD models; also when the difference order of the probability variables are fixed in second, third, or fourth order, improving the lag-order of the probability variable can improve the determinate correlations of FD models. When the FD model is fixed on the second-order finite difference regression model, after testing the lags of the probability variable d2qt(ab), the ARDLCRG-FD models and ARDL-CRG-GARCH-FD models have got three similar results: first, when the lag-order increases, the determinate coefficient for the regression model will increase; second, when the lag-order increases, the correlations between the real return index and its prediction values will increase; third, a higher lag-order prediction model can create a higher approximated result between the real return index and its prediction value. Thirdly, when compare the correlations between the real and predicted returns from the four kinds of models, it has approved: first, the CRG model can improve the prediction accuracy; second, the GARCH model has little impact on prediction values. Fourthly, when compare the hit ratios from the four different models, it has approved: first, the higher level of the lag-order has led to a higher hit ratio than the lower level of the lag-order when both of the return index and the prediction values are upward or downward together; second, the ARDL-CRG-FD model is better to improve the hit ratios than AR-FD models; third, the ARDL-CRG-GARCH-FD model on the hit ratios is similar to AR-GARCH-FD model; fourth, the AR-GARCH-FD models on the hit ratios is better than AR-FD models; fifth, the ARDL-CRG-GARCHFD models on the hit ratios is better than ARDL-CRG-FD models. Fifthly, when compare the RMSE test results from the four different models, it has approved: first, when the finite difference method is used, GARCH model cannot improve the prediction accuracy a lot; second, ARDL-CRGFD model can improve the prediction accuracy than AR-FD model; third, ARDL-CRG-GARCH-FD model has higher impact on the decrease of RMSE value than AR-GARCH-FD model; fourth, the CRG model can improve the prediction accuracy a lot.

Author Contributions

Conceptualization, K.Y. and R.G.; methodology, K.Y.; software, K.Y.; validation, K.Y.; formal analysis, K.Y.; investigation, K.Y.; resources, K.Y.; data curation, K.Y.; writing—original draft preparation, K.Y.; writing—review and editing, K.Y. and S.H.; visualization, K.Y.; supervision, R.G.; project administration, R.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The open data source Dow Jones Industry Index is used as data sources for this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Notes

1
Based on the assumed variables, the variable r t ( c , g a p ) represented the cumulative return gap and can be defined as a formula as: r t ( c , g a p ) = r t ( c ) r t ( c , a v e ) = t = 1 t r t ( r T ( a v e ) ) t = t = 1 t r t ( ( r T ( c ) ) 1 T ) t = t = 1 t r t ( t = 1 T r t ) t T .
2
AR, MA, ARMA, ARIMA and ARDL models and so on.
3
CAR is the cumulative abnormal return, where the variable r t is the return index of the risk asset, E ( r t ) is the expected return index of the sample asset, CAR is equal to the difference between the sum of the real return and the sum of the expected return, C A R t = t = 1 t r t t = 1 t E ( r t ) .
4
BHAR is the cumulative excess return of a buy-and-hold investment, which is equal to the difference between the real cumulative return of a buy-and-hold investment and the cumulative expected return of a buy-and-hold investment, B H A R t = t = 1 t r t t = 1 t E ( r t ) .
5
This paper focuses on the presentation of the methodology, and we wanted to minimize the impact of COVID-19. Thus, the data were chosen more conservatively. Period of 2016 was riddled with oil price shock due to oil prices falling below $27 a barrel in January 2016 (Yoshino and Taghizadeh-Hesary 2016). This is followed by Covid19 and as such we have excluded the period from 2016 onwards for the analysis. Excluding 2016 and then including 2017 and 2018 would be confusing in terms of explanation and discussion with not much benefit outcomes of the study. However, model is valid and uses data for the period 2010 to 2016 providing sufficient length of period and number of observations for the model validity and tractability to draw meaningful analysis and conclusion. Including 2017 and 2018 for analysis will add complexity to the model without much benefit to the overall objective of the study.
6
If we can predict the value of the long-term cumulative return index r t ( c ) , it will be easy to obtain the predicted value of the stock price p t when p 1 = p 0 as p t = p t 1 r t = = p 0 r 1 r 2 r t = p 0 r t ( c ) . In addition, the logarithm cumulative return ln r t ( c ) can be represented by the logarithms of the return index as ln r t ( c ) = ln ( r 1 r 2 r t ) = ln r 1 + ln r 2 + + ln r t , or ln r t ( c ) = ln p 1 p 0 + ln p 2 p 1 + + ln p t p t 1 , which is perfectly matched with the logarithm return ln r t between the time intervals t [ 0 , t ] .
7
Because the cumulative risk premium r t ( c , g a p ) represents the cumulative excess return during a long-term period, it is autocorrelation and time-varying. The model A R ( p ) can be used to model the time-varying variable r t ( c , g a p ) as r t ( c , g a p ) = α 0 + i = 1 p α i r t i ( c , g a p ) + a t ( c , g a p ) . Here, variable a t ( c , g a p ) represents the residual of the A R ( p ) model. The A R ( p ) model can be used as a prediction model for the cumulative excess return r t ( c , g a p ) . If variable μ t ( c , g a p ) represents the mean of the A R ( p ) model, it can be represented as μ t ( c , g a p ) = E ( r t ( c , g a p ) | F t 1 ) = α 0 + i = 1 p α i r t i ( c , g a p ) . Here, the information set F t 1 includes any information that relates to the time t [ 0 , t 1 ] . When we assume the residual is a t ( c , g a p ) , it includes the relation of E ( a t ( c , g a p ) ) = 0 .
8
The unconditional volatility for the residual variable a t is defined as σ ¯ 2 = ω 1 ( α + β ) . Here, α is the coefficient of the ARCH item, β is the coefficient of the GARCH item, and there is a limitation that the three parameters should satisfy the relations: ω > 0 , α 0 , β 0 , α + β < 1 , ω + α + β 1 . The GARCH model can be used to calculate and predict the volatility of the cumulative risk premium r t ( c , g a p ) .

References

  1. Barber, Brad M., and John D. Lyon. 1997. Detecting long-run abnormal stock return: The empirical power and specification of test statistics. Journal of Financial Economics 43: 341–72. [Google Scholar] [CrossRef] [Green Version]
  2. Bharandev, Sravani, and Sapar Narayan Rao. 2021. Does The Association Between Abnormal Trading Volumes And Historical Prices Explain Disposition Effect? Asia-Pacific Financial Markets 28: 141–51. [Google Scholar] [CrossRef]
  3. Campbell, John L., Brady J. Twedt, and Benjamin C. Whipple. 2021. Trading Prior to the Disclosure of Material Information: Evidence from Regulation Fair Disclosure Form 8-Ks. Contemporary Accounting Research 38: 412–42. [Google Scholar] [CrossRef]
  4. Devi, B. Uma, D. Sundar, and P. Alli. 2013. An effective time series analysis for stock trend prediction using ARIMA model for Nifty Midcap-50. International Journal of Data Mining & Knowledge Management Process (IJDKP) 3: 65–78. [Google Scholar]
  5. Dimri, Tripti, Shamshad Ahmad, and Mohammad Sharif. 2020. Time series analysis of climate variables using seasonal ARIMA approach. Journal of Earth System Science 129: 149. [Google Scholar] [CrossRef]
  6. Gijon, Carolina, Matías Toril, Salvador Luna-Ramírez, María Luisa Marí-Altozano, and José María Ruiz-Avilés. 2021. Long-Term Data Traffic Forecasting for Network Dimensioning in LTE with Short Time Series. Electronics 10: 1151. [Google Scholar] [CrossRef]
  7. Hillegeist, Stephen A., and Liwei Weng. 2021. Quasi-Indexer Ownership and Insider Trading: Evidence from Russell Index Reconstitutions. Contemporary Accounting Research 38: 2192–223. [Google Scholar] [CrossRef]
  8. Hu, Jiangshan, Yunyun Sui, and Fang Ma. 2021. The Measurement Method of Investor Sentiment and Its Relationship with Stock Market. Computational Intelligence and Neuroscience 2021: 6672677. [Google Scholar] [CrossRef]
  9. Lamba, Ashu, and Vanita Tripathi. 2015. Long run value creation from cross border mergers and acquisitions: Evidence from Indian acquirer companies. The International Journal Of Business & Management 3: 162–66. [Google Scholar]
  10. Li, Xiao-Lin, Xin Li, and Deng-Kui Si. 2020. Asymmetric determinants of corporate bond credit spreads in China: Evidence from a nonlinear ARDL model. The North American Journal of Economics and Finance 52. [Google Scholar] [CrossRef]
  11. Lin, Liang-Ching, Hsiang-Lin Chien, and Sangyeol Lee Symbolic. 2021. interval-valued data analysis for time series based on auto-interval-regressive models. Statistical Methods and Applications (SMA) 30: 295–315. [Google Scholar] [CrossRef]
  12. Ljung, Greta Marianne, and George Edward Pelham Box. 1978. On a measure of lack of fit in time series models. Biometrika 66: 67–72. [Google Scholar] [CrossRef]
  13. Maratkhan, Anuarn, Ibrakhim Ilyassov, Madiyar Aitzhanov, M. Fatih Demirci, and A. Murat Ozbayoglu. 2021. Deep learning-based investment strategy: Technical indicator clustering and residual blocks. Soft Computing 25: 5151–61. [Google Scholar] [CrossRef]
  14. Mitesh, Patel, Munjal Dave, and Mayur Shah. 2016. Stock price and liquidity effect of stock split: Evidence from Indian stock market. International Journal of Management Research & Review 6: 1030–39. [Google Scholar]
  15. Mohit, Gupta, and Navdeep Aggarwal. 2014. The impact of stock name change on shareholder wealth—evidence from Indian capital markets. Journal of Management Research 14: 15–24. [Google Scholar]
  16. Pesaran, Hashem M., and Yongcheol Shin. 1999. An autoregressive distributed lag modelling approach to cointegration analysis. In Econometrics and Economic Theory in the 20th Century: The Ragnar Frisch Centennial Symposium. Edited by S. Strom. Cambridge: Cambridge University Press, chp. 11. [Google Scholar]
  17. Rabbani, Muhammad Babar Ali, Muhammad Ali Musarat, Wesam Salah Alaloul, Muhammad Shoaib Rabbani, Ahsen Maqsoom, Saba Ayub, Hamna Bukhari, and Muhammad Altaf. 2021. A Comparison Between Seasonal Autoregressive Integrated Moving Average (SARIMA) and Exponential Smoothing (ES) Based on Time Series Model for Forecasting Road Accidents. The Arabian Journal for Science and Engineering (AJSE) 46: 11113–38. [Google Scholar] [CrossRef]
  18. Ranco, Gabriele, Darko Aleksovski, Guido Caldarelli, Miha Grčar, and Igor Mozetič. 2015. The Effects of Twitter Sentiment on Stock Price Returns. PLoS ONE 10: e0138441. [Google Scholar] [CrossRef] [Green Version]
  19. Ritter, Jay R. 1991. The Long-Run Performance of initial Public Offerings. The Journal of Finance 46: 3–27. [Google Scholar] [CrossRef]
  20. Samrad, Jafarian-Namin, Seyyed Mohammad Taghi Fatemi Ghomi, Mohsen Shojaie, and Saeed Shavvalpour. 2021. Annual forecasting of i.nflation rate in Iran: Autoregressive integrated moving average modeling approach. Engineering Reports 3: e12344. [Google Scholar] [CrossRef]
  21. Shen, Shunrong, Haomiao Jiang, and Tongda Zhang. 2012. Stock Market Forecasting Using Machine Learning Algorithms. Stanford: Department of Electrical Engineering, Stanford University, pp. 1–5. Available online: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.278.6139 (accessed on 1 November 2021).
  22. Shin, Donghee, Namchul Kim, Hongsuk Yoon, Jaeyeol Jeong, and Jaegil Lee. 2014. A comparative case study of regulatory approaches in the US and Korea. Paper presented at the 25th European Regional Conference of the International Telecommunications Society (ITS): Disruptive Innovation in the ICT Industries: Challenges for European Policy and Business, Brussels, Belgium, June 22–25; Calgary: International Telecommunications Society (ITS). [Google Scholar]
  23. Skare, Marinko, Dalia Streimikiene, and Damian Skare. 2021. Measuring carbon emission sensitivity to economic shocks: A panel structural vector autoregression 1870–2016. Environmental Science and Pollution Research (ESPR) 28: 44505–21. [Google Scholar] [CrossRef]
  24. Stekelenburg, Akim, Georgios Georgakopoulos, Virginia Sotiropoulou, Konstantinos Vasileiou, and Ilias Vlachos. 2015. The relation between sustainability performance and stock market returns: An Empirical analysis of the Dow Jones Sustainability Index Europe. International Journal of Economics and Finance 7: 7. [Google Scholar] [CrossRef] [Green Version]
  25. Tsay, Ruey S. 2005. Analysis of Financial Time Series, 2nd ed. Hoboken: John Wiley & Sons Inc., pp. 25–30. [Google Scholar]
  26. Wang, Yongbin, Chunjie Xu, Jingchao Ren, Yuchun Li, Weidong Wu, and Sanqiao Yao. 2021. Use of meteorological parameters for forecasting scarlet fever morbidity in Tianjin, Northern China. Environmental Science and Pollution Research (ESPR) 28: 7281–94. [Google Scholar] [CrossRef] [PubMed]
  27. Ye, Qinglan, and Lianxin Wei. 2015. The prediction of stock price based on improved wavelet neural network. Open Journal of Applied Sciences 5: 115–20. [Google Scholar] [CrossRef] [Green Version]
  28. Yoshino, Naoyuki, and Farhad Taghizadeh-Hesary. 2016. Introductory Remarks: What’s Behind the Recent Oil Price Drop? In Monetary Policy and the Oil Market. ADB Institute Series on Development Economics. Tokyo: Springer. [Google Scholar] [CrossRef]
  29. Zaham, Muslima, and Ron S. Kenett. 2013. Comparative prices forecast model of conventional and Islamic bank stock listed in London stock exchange. Electronic Journal of Applied Statistical Analysis 4: 33–46. [Google Scholar]
  30. Zamanian, Gholamreza, Saber Khodaparati, and Mohammad Mirbagherijam. 2013. Long-run and short-run returns of initial public offerings (IPO) of public and private companies in Tehran stock exchange (TSE) market. International Journal of Academic Research in Business and Social Sciences 3: 69–84. [Google Scholar]
  31. Ziobrowski, Alan J., Ping Cheng, James W. Boyd, and Brigitte J. Ziobrowski. 2004. Abnormal returns from the common stock investments of the U.S. Senate. Journal of Financial and Quantitative Analysis 39: 661–76. [Google Scholar] [CrossRef] [Green Version]
Figure 1. The curve of the normalized formula of the probability distribution function.
Figure 1. The curve of the normalized formula of the probability distribution function.
Jrfm 15 00089 g001
Figure 2. The return index of the US Dow Jones Industry Index between 1 April 2010 and 8 July 2016.
Figure 2. The return index of the US Dow Jones Industry Index between 1 April 2010 and 8 July 2016.
Jrfm 15 00089 g002
Figure 3. The residual item a t forms the autoregressive model of AR(5) as r a , t = μ a , t + a t .
Figure 3. The residual item a t forms the autoregressive model of AR(5) as r a , t = μ a , t + a t .
Jrfm 15 00089 g003
Figure 4. The return index r t and its prediction values of r a , t ( 2 ) , r a , t ( 3 ) , and r a , t ( 4 ) from the 2nd-, 3rd-, and 4th-order differences.
Figure 4. The return index r t and its prediction values of r a , t ( 2 ) , r a , t ( 3 ) , and r a , t ( 4 ) from the 2nd-, 3rd-, and 4th-order differences.
Jrfm 15 00089 g004
Figure 5. Under the 2nd-, 3rd-, and 4th-order differences, the prediction values of r a , t ( 2 ) , r a , t ( 3 ) , and r a , t ( 4 ) , and the conditional mean μ a , t of r t .
Figure 5. Under the 2nd-, 3rd-, and 4th-order differences, the prediction values of r a , t ( 2 ) , r a , t ( 3 ) , and r a , t ( 4 ) , and the conditional mean μ a , t of r t .
Jrfm 15 00089 g005
Figure 6. The return index r t and its prediction values of r a , t | p = 200 from the second-order difference regression model.
Figure 6. The return index r t and its prediction values of r a , t | p = 200 from the second-order difference regression model.
Jrfm 15 00089 g006
Figure 7. The return index r t and its prediction values of r a , t | p = 700 from the second-order difference regression model.
Figure 7. The return index r t and its prediction values of r a , t | p = 700 from the second-order difference regression model.
Jrfm 15 00089 g007
Figure 8. The return index r t and its prediction values of r a , t ( e ) | p = 200 from the second-order difference regression model.
Figure 8. The return index r t and its prediction values of r a , t ( e ) | p = 200 from the second-order difference regression model.
Jrfm 15 00089 g008
Figure 9. The return index r t and its prediction values of r a , t ( e ) | p = 700 from the second-order difference regression model.
Figure 9. The return index r t and its prediction values of r a , t ( e ) | p = 700 from the second-order difference regression model.
Jrfm 15 00089 g009
Figure 10. The average compound return index r t ( a v e ) and the average compound return index r T ( a v e ) .
Figure 10. The average compound return index r t ( a v e ) and the average compound return index r T ( a v e ) .
Jrfm 15 00089 g010
Figure 11. The cumulative return index r t ( c ) and the cumulative average compound return index r t ( c , a v e ) .
Figure 11. The cumulative return index r t ( c ) and the cumulative average compound return index r t ( c , a v e ) .
Jrfm 15 00089 g011
Figure 12. The stock price P t and its equivalent value from the formula P t = 10583.96 r t ( c ) .
Figure 12. The stock price P t and its equivalent value from the formula P t = 10583.96 r t ( c ) .
Jrfm 15 00089 g012
Figure 13. The cumulative return gap index r t ( c , g a p ) and its lag 1 item r t 1 ( c , g a p ) .
Figure 13. The cumulative return gap index r t ( c , g a p ) and its lag 1 item r t 1 ( c , g a p ) .
Jrfm 15 00089 g013
Figure 14. The return index r t and mean μ b , t .
Figure 14. The return index r t and mean μ b , t .
Jrfm 15 00089 g014
Figure 15. The residual b t and the residual b t .
Figure 15. The residual b t and the residual b t .
Jrfm 15 00089 g015
Figure 16. Curves of the return index r t and its prediction values of r b , t ( 2 ) , r b , t ( 3 ) , and r b , t ( 4 ) from the 2nd, 3rd, and 4th difference probability prediction values during 2010–2016.
Figure 16. Curves of the return index r t and its prediction values of r b , t ( 2 ) , r b , t ( 3 ) , and r b , t ( 4 ) from the 2nd, 3rd, and 4th difference probability prediction values during 2010–2016.
Jrfm 15 00089 g016
Figure 17. Curves of the conditional mean μ b , t of the return index r t and the prediction values r b , t ( 2 ) , r b , t ( 3 ) , and r b , t ( 4 ) from the 2nd, 3rd, and 4th difference probability prediction values during 2010–2016.
Figure 17. Curves of the conditional mean μ b , t of the return index r t and the prediction values r b , t ( 2 ) , r b , t ( 3 ) , and r b , t ( 4 ) from the 2nd, 3rd, and 4th difference probability prediction values during 2010–2016.
Jrfm 15 00089 g017
Figure 18. The return index r t and its prediction values of r b , t | p = 200 from the second-order difference regression model.
Figure 18. The return index r t and its prediction values of r b , t | p = 200 from the second-order difference regression model.
Jrfm 15 00089 g018
Figure 19. The return index r t and its prediction values of r b , t | p = 700 from the second-order difference regression model.
Figure 19. The return index r t and its prediction values of r b , t | p = 700 from the second-order difference regression model.
Jrfm 15 00089 g019
Figure 20. The return index r t and its prediction values of r b , t ( e ) | 2 n d , r b , t ( e ) | 3 r d , and r b , t ( e ) | 4 t h from the 2nd-, 3rd-, and 4th- order differences.
Figure 20. The return index r t and its prediction values of r b , t ( e ) | 2 n d , r b , t ( e ) | 3 r d , and r b , t ( e ) | 4 t h from the 2nd-, 3rd-, and 4th- order differences.
Jrfm 15 00089 g020
Figure 21. Under the 2nd-, 3rd-, and 4th- order differences, the prediction values of r b , t ( e ) | 2 n d , r b , t ( e ) | 3 r d , and r b , t ( e ) | 4 t h , and μ b , t of r t .
Figure 21. Under the 2nd-, 3rd-, and 4th- order differences, the prediction values of r b , t ( e ) | 2 n d , r b , t ( e ) | 3 r d , and r b , t ( e ) | 4 t h , and μ b , t of r t .
Jrfm 15 00089 g021
Figure 22. The return index r t and its prediction values of r b , t ( e ) | p = 200 from the second-order difference regression model.
Figure 22. The return index r t and its prediction values of r b , t ( e ) | p = 200 from the second-order difference regression model.
Jrfm 15 00089 g022
Figure 23. The return index r t and its prediction values of r b , t ( e ) | p = 700 from the second-order difference regression model.
Figure 23. The return index r t and its prediction values of r b , t ( e ) | p = 700 from the second-order difference regression model.
Jrfm 15 00089 g023
Table 1. Main variables for building the four kinds of models: AR-FD, AR-GARCH-FD, ARDL-CRG-FD and ARDL-CRG-GARCH-FD.
Table 1. Main variables for building the four kinds of models: AR-FD, AR-GARCH-FD, ARDL-CRG-FD and ARDL-CRG-GARCH-FD.
VariablesExplanationsModelsVariablesExplanations
p t Price of an assetAR r a , t Return index of an asset
r t Return index of an assetAR μ a , t Expected   value   of   r a , t
r t ( c ) Cumulative compound return indexAR a t Residual item of AR model
r T ( c ) Cumulative compound return indexAR-FD q t ( a ) Cumulative   probability   of   quantile   a t
r T ( a v e ) Average cumulative compound returnAR-FD r a , t ( 2 ) Return from the 2nd-order difference
r t ( c , a v e ) Cumulative average compound returnAR-FD r a , t ( 3 ) Return from the 3rd-order difference
r t ( c , g a p ) Cumulative return gap (CRG)AR-FD r a , t ( 4 ) Return from the 4th-order difference
q t ( a ) Cumulative   probability   of   quantile   a t AR-FD r a , t | j = p Predictions   of   r a , t   when   q t p ( a ) used
d n q t ( a ) nth - order   finite   difference   of   q t ( a ) AR-GARCH-FD q a , t ( e ) Cumulative   probability   of   quantile   e a , t
d 2 q t ( a ) 2nd-order finite difference of q t ( a ) AR-GARCH-FD r a , t ( e ) Predictions   of   r a , t   when   q a , t ( e ) used
d 3 q t ( a ) 3rd-order finite difference of q t ( a ) AR-GARCH-FD r a , t | j = p ( e ) Predictions   of   r a , t   when   q t p ( a ) used
d 4 q t ( a ) 4th-order finite difference of q t ( a ) ARDL-CRG r b , t ( c ) Prediction   value   of   r t ( c )
a t Residual item of a regression modelARDL-CRG μ b , t ( c ) Expected   value   of   r t ( c )
q t Cumulative   probability   of   quantile   a t ARDL-CRG b t Residual of the ARDL-CRG model
σ t Dynamic   volatility   based   on   a t ARDL-CRG-FD q t ( b ) Cumulative   probability   of   quantile   b t
ε t Standardized   error   item   from   a t / σ t ARDL-CRG-FD r b , t ( 2 ) Return from the 2nd-order difference
e t Standardized   error   item   from   ε t ARDL-CRG-FD r b , t ( 3 ) Return from the 3rd-order difference
σ a , t Dynamic   volatility   based   on   a t ARDL-CRG-FD r b , t ( 4 ) Return from the 4th-order difference
ε a , t Standardized   error   item   from   a t / σ a , t ARDL-CRG-FD μ b , t Expected   value   of   r t
e a , t Standardized   error   item   from   ε a , t ARDL-CRG-FD r b , t | j = p Predictions   of   r b , t   when   q t p ( b ) used
σ b , t Dynamic   volatility   based   on   b t ARDL-CRG-GARCH-FD q b , t ( e ) Cumulative   probability   of   quantile   e b , t
ε b , t Standardized   error   item   from   b t / σ b , t ARDL-CRG-GARCH-FD r b , t ( e ) Predictions   of   r b , t   when   q b , t ( e ) used
e b , t Standardized   error   item   from   ε b , t ARDL-CRG-GARCH-FD r b , t | j = p ( e ) Predictions   of   r b , t   when   q b , t p ( e ) used
Table 2. Autocorrelation (AC) values and Ljung and Box (1978) statistics and probabilities for time series of the return index.
Table 2. Autocorrelation (AC) values and Ljung and Box (1978) statistics and probabilities for time series of the return index.
VariableAC(1)Q(1)P(1)AC(5)Q(5)P(5)AC(10)Q(10)P(10)AC(15)Q(15)P(15)
r t −0.0524.10990.043−0.08931.8480.0000.01332.8540.000−0.01742.5040.000
VariableAC(20)Q(20)P(20)AC(25)Q(25)P(25)AC(30)Q(30)P(30)AC(35)Q(35)P(35)
r t −0.04054.1080.000−0.04058.3710.000−0.00765.1400.0000.03574.2160.000
Table 3. Autocorrelation (AC) values and Ljung and Box (1978) statistics and probabilities for difference time series of residual probability.
Table 3. Autocorrelation (AC) values and Ljung and Box (1978) statistics and probabilities for difference time series of residual probability.
VariableAC(1)Q(1)P(1)AC(5)Q(5)P(5)AC(10)Q(10)P(10)AC(20)Q(20)P(20)
q t ( a ) 0.0010.00050.982−0.0020.02361.0000.0091.05631.000−0.04018.6500.545
d q t ( a ) −0.500381.330.000−0.003381.360.000−0.016382.370.000−0.022406.780.000
VariableAC(1)Q(1)P(1)AC(5)Q(5)P(5)AC(10)Q(10)P(10)AC(20)Q(20)P(20)
d 2 q t ( a ) −0.667678.440.000−0.005721.330.000−0.027723.330.000−0.017759.940.000
d 3 q t ( a ) −0.750858.300.000−0.0051000.90.000−0.0331004.00.000−0.0191053.70.000
d 4 q t ( a ) −0.800976.190.000−0.0051243.90.000−0.0391248.20.000−0.0231310.40.000
Table 4. Results of the second-order finite difference regression models of AR-FD when the lags of the probability are different.
Table 4. Results of the second-order finite difference regression models of AR-FD when the lags of the probability are different.
No.Prediction Model for Second-Order Difference d 2 q t ( a ) r a , t Correlation ρ ( r a , t , r t )
ω α 0 α 1 p R 2 S . E . A I C S I C
10.503001−1.006001−0.98422230.8333020.002388−9.233150−9.212135 r a , t | p = 3 0.136766
20.699362−1.39872410.57640500.8399550.002381−9.207139−9.016722 r a , t | p = 50 0.238128
30.800919−1.60181225.151031000.8448300.002335−9.212454−8.831902 r a , t | p = 100 0.294749
40.908183−1.81630747.936761500.8515970.0023299.181926−8.600049 r a , t | p = 150 0.341969
51.072259−2.144498101.59092000.8566840.002350−9.128677−8.333173 r a , t | p = 200 0.389903
61.006904−2.01385938.472443000.8720710.002402−9.014493−7.749547 r a , t | p = 300 0.486086
70.657598−1.315164−72.802064000.8885800.002247−9.085781−7.284239 r a , t | p = 400 0.578318
81.028349−2.056674245.48245000.8997750.002524−9.093179−6.670786 r a , t | p = 500 0.651674
91.495949−2.991754809.03026000.9243620.002325−9.042018−5.890813 r a , t | p = 600 0.745966
101.775340−3.5505461191.3507000.9579540.002693−9.208468−5.186548 r a , t | p = 700 0.867847
Table 5. Results of the second-order finite difference regression models of AR-GARCH-FD when the lags of the probability are different.
Table 5. Results of the second-order finite difference regression models of AR-GARCH-FD when the lags of the probability are different.
No.Prediction Model for Second-Order Difference d 2 q a , t ( e ) r a , t ( e ) Correlation ρ ( r a , t ( e ) , r t )
ω α 0 α 1 p R 2 S . E . A I C S I C
10.469779−0.932678−1.15864230.8338700.200776−0.369317−0.348303 r a , t ( e ) | p = 3 0.119018
20.589021−1.1704114.081024500.8375180.202060−0.325208−0.134791 r a , t ( e ) | p = 50 0.209055
30.628908−1.24893111.116141000.8436000.200978−0.3016580.078894 r a , t ( e ) | p = 100 0.268237
40.581813−1.1533233.9881021500.8490750.202184−0.2546240.327253 r a , t ( e ) | p = 150 0.315291
50.731301−1.45378652.434782000.8535970.203925−0.2019190.593585 r a , t ( e ) | p = 200 0.367438
60.760529−1.51657654.265853000.8692620.206495−0.1064051.158541 r a , t ( e ) | p = 300 0.472224
70.821250−1.628421172.47204000.8876680.206334−0.0455671.755975 r a , t ( e ) | p = 400 0.552860
80.999699−1.986074296.85635000.9054210.2108930.0317702.454163 r a , t ( e ) | p = 500 0.640771
91.493677−2.955449783.04876000.9301270.2205990.0629823.214187 r a , t ( e ) | p = 600 0.701112
101.459492−2.890825827.53647000.9625850.248488−0.1589153.863005 r a , t ( e ) | p = 700 0.847974
Table 6. Autocorrelation (AC) values and probabilities of Ljung and Box (1978) statistics for the cumulative return gap index.
Table 6. Autocorrelation (AC) values and probabilities of Ljung and Box (1978) statistics for the cumulative return gap index.
VariableAC(1)P(1)AC(5)P(5)AC(10)P(10)AC(15)P(15)AC(20)P(20)AC(25)P(25)
r t ( c , g a p ) 0.9880.0000.9420.0000.8970.0000.8500.0000.8080.0000.7690.000
Table 7. Results of the second-order finite difference regression models of ARDL-CRG-FD when the lags of the probability are different.
Table 7. Results of the second-order finite difference regression models of ARDL-CRG-FD when the lags of the probability are different.
No.Prediction Model for Second-Order Difference d 2 q t ( b ) r b , t Correlation ρ ( r b , t , r t )
ω α 0 α 1 p R 2 S . E . A I C S I C
10.563364−1.126737−0.67554330.8426680.003118−8.699228−8.678259 r b , t | p = 3 0.161486
20.538496−1.077017−0.410510500.8471950.003144−8.651569−8.461571 r b , t | p = 50 0.242633
30.421591−0.843178−20.900501000.8529190.003136−8.622071−8.242382 r b , t | p = 100 0.296988
40.373183−0.746334−35.278771500.8600500.003148−8.579957−7.999447 r b , t | p = 150 0.344799
50.398982−0.797960−15.828282000.8636070.003214−8.503039−7.709469 r b , t | p = 200 0.382242
60.343252−0.686537−28.556713000.8772310.003315−8.370555−7.108924 r b , t | p = 300 0.478909
70.667269−1.334446199.75504000.8910280.003242−8.352795−6.556372 r b , t | p = 400 0.584397
80.563196−1.126326140.55675000.9069350.003267−8.302990−5.888115 r b , t | p = 500 0.656670
92.112864−4.2255201324.0226000.9302810.003485−8.230197−5.089768 r b , t | p = 600 0.752572
104.288409−8.5762203052.1097000.9615140.004071−8.362576−4.355974 r b , t | p = 700 0.873537
Table 8. Results of the second-order finite difference ARDL-CRG-GARCH-FD models when the lags of the probability are different.
Table 8. Results of the second-order finite difference ARDL-CRG-GARCH-FD models when the lags of the probability are different.
No.Prediction Model for Second-Order Difference d 2 q b , t ( e ) r b , t ( e ) Correlation ρ ( r b , t ( e ) , r t )
ω α 0 α 1 p R 2 S . E . A I C S I C
10.534293−1.060669−0.82032330.8423840.201461−0.362523−0.341564 r b , t ( e ) | p = 3 0.146718
20.442926−0.881374−6.802927500.8457450.202992−0.316119−0.126225 r b , t ( e ) | p = 50 0.220724
30.364174−0.723608−19.372621000.8520910.202484−0.2869660.092508 r b , t ( e ) | p = 100 0.284660
40.300778−0.596242−37.472531500.8576100.204075−0.2363630.343807 r b , t ( e ) | p = 150 0.329443
50.330035−0.655289−19.056482000.8610820.206740−0.1749690.618118 r b , t ( e ) | p = 200 0.368404
60.331949−0.6596831.0169893000.8751540.210358−0.0700131.190792 r b , t ( e ) | p = 300 0.472701
70.589264−1.164536161.41544000.8926940.210504−0.0062551.788893 r b , t ( e ) | p = 400 0.566134
80.504847−0.998953135.50475000.9102580.2149210.0695152.482519 r b , t ( e ) | p = 500 0.646393
92.024880−3.9987761122.9266000.9338390.2246600.1025003.240248 r b , t ( e ) | p = 600 0.732672
103.727738−7.3562592310.7397000.9653930.250051−0.1222203.880572 r b , t ( e ) | p = 700 0.840273
Table 9. Correlations between the real return index and its prediction values from four different kinds of prediction models.
Table 9. Correlations between the real return index and its prediction values from four different kinds of prediction models.
r a , t ρ ( r a , t , r t ) r b , t ρ ( r b , t , r t ) r a , t ( e ) ρ ( r a , t ( e ) , r t ) r b , t ( e ) ρ ( r b , t ( e ) , r t )
r a , t | p = 3 0.136766 r b , t | p = 3 0.161486 r a , t ( e ) | p = 3 0.119018 r b , t ( e ) | p = 3 0.146718
r a , t | p = 50 0.238128 r b , t | p = 50 0.242633 r a , t ( e ) | p = 50 0.209055 r b , t ( e ) | p = 50 0.220724
r a , t | p = 100 0.294749 r b , t | p = 100 0.296988 r a , t ( e ) | p = 100 0.268237 r b , t ( e ) | p = 100 0.284660
r a , t | p = 150 0.341969 r b , t | p = 150 0.344799 r a , t ( e ) | p = 150 0.315291 r b , t ( e ) | p = 150 0.329443
r a , t | p = 200 0.389903 r b , t | p = 200 0.382242 r a , t ( e ) | p = 200 0.367438 r b , t ( e ) | p = 200 0.368404
r a , t | p = 300 0.486086 r b , t | p = 300 0.478909 r a , t ( e ) | p = 300 0.472224 r b , t ( e ) | p = 300 0.472701
r a , t | p = 400 0.578318 r b , t | p = 400 0.584397 r a , t ( e ) | p = 400 0.552860 r b , t ( e ) | p = 400 0.566134
r a , t | p = 500 0.651674 r b , t | p = 500 0.656670 r a , t ( e ) | p = 500 0.640771 r b , t ( e ) | p = 500 0.646393
r a , t | p = 600 0.745966 r b , t | p = 600 0.752572 r a , t ( e ) | p = 600 0.701112 r b , t ( e ) | p = 600 0.732672
r a , t | p = 700 0.867847 r b , t | p = 700 0.873537 r a , t ( e ) | p = 700 0.847974 r b , t ( e ) | p = 700 0.840273
Table 10. Comparison of correlations between the return index and the prediction values from AR, ARDL, AR-GARCH, ARDL-GARCH.
Table 10. Comparison of correlations between the return index and the prediction values from AR, ARDL, AR-GARCH, ARDL-GARCH.
ρ ( r a , t , r t ) ρ ( r b , t , r t ) ρ ( r a , t ( e ) , r t ) ρ ( r b , t ( e ) , r t ) CorrelationCorrelationCorrelationCorrelation
(1)(2)(3)(4)(2)–(1)(4)–(3)(1)–(3)(2)–(4)
r a , t | p = 3 r b , t | p = 3 r a , t ( e ) | p = 3 r b , t ( e ) | p = 3 0.0247200.0277000.0177480.014768
r a , t | p = 50 r b , t | p = 50 r a , t ( e ) | p = 50 r b , t ( e ) | p = 50 0.0045050.0116690.0290730.021909
r a , t | p = 100 r b , t | p = 100 r a , t ( e ) | p = 100 . r b , t ( e ) | p = 100 0.0022390.0164230.0265120.012328
r a , t | p = 150 r b , t | p = 150 r a , t ( e ) | p = 150 r b , t ( e ) | p = 150 0.0028300.0141520.0266780.015356
r a , t | p = 200 r b , t | p = 200 r a , t ( e ) | p = 200 r b , t ( e ) | p = 200 −0.0076610.0009660.0224650.013838
r a , t | p = 300 r b , t | p = 300 r a , t ( e ) | p = 300 r b , t ( e ) | p = 300 −0.0071770.0004770.0138620.006208
r a , t | p = 400 . r b , t | p = 400 r a , t ( e ) | p = 400 r b , t ( e ) | p = 400 0.0060790.0132740.0254580.018263
r a , t | p = 500 r b , t | p = 500 r a , t ( e ) | p = 500 r b , t ( e ) | p = 500 . 0.0049960.0056220.0109030.010277
r a , t | p = 600 r b , t | p = 600 r a , t ( e ) | p = 600 r b , t ( e ) | p = 600 0.0066060.0315600.0448540.019900
r a , t | p = 700 . r b , t | p = 700 r a , t ( e ) | p = 700 r b , t ( e ) | p = 700 0.005690−0.0077010.0198730.033264
Table 11. Hit ratios between the real return index and its prediction values of the direct AR-FD model for the return index.
Table 11. Hit ratios between the real return index and its prediction values of the direct AR-FD model for the return index.
Condition { r t 1 } { χ 1 } { r t 1 } { χ < 1 } { r t < 1 } { χ < 1 } { r t < 1 } { χ 1 } Total RatioPrediction
Hit Ratio(1)(2)(3)(4)(1) + (3)Windows
μ a , t 54035.39%28518.68%26617.43%43528.51%80652.82%1526
r a , t | p = 3 53635.24%28618.80%26717.55%43228.40%80352.79%1521
r a , t | p = 50 48132.63%31221.17%32421.98%35724.22%80554.61%1474
r a , t | p = 100 46132.37%30421.35%32122.54%33823.74%78254.92%1424
r a , t | p = 150 45533.11%28921.03%32623.73%30422.13%78156.84%1374
r a , t | p = 200 44033.23%27020.39%33125.00%28321.37%77158.23%1324
r a , t | p = 300 41533.91%23819.44%35428.92%21717.73%76962.83%1224
r a , t | p = 400 41336.74%19016.90%32629.00%19517.35%73965.75%1124
r a , t | p = 500 37937.01%16816.41%31931.15%15815.43%69868.16%1024
r a , t | p = 600 36539.50%13714.83%29431.82%12813.85%65971.32%924
r a , t | p = 700 35743.33%9411.41%29235.44%819.83%64978.76%824
Note: (1) variable χ represents each of the variables μ a , t , r a , t | p = 3 , …, and r a , t | p = 700 ; (2) because the lag order levels in the different autoregressive models are different, the sample sizes are different.
Table 12. Hit ratios between the return index and its prediction values of the indirect ARDL-CRG-FD model for the cumulative return index.
Table 12. Hit ratios between the return index and its prediction values of the indirect ARDL-CRG-FD model for the cumulative return index.
Condition { r t 1 } { χ 1 } { r t 1 } { χ < 1 } { r t < 1 } { χ < 1 } { r t < 1 } { χ 1 } Total RatioPrediction
Hit Ratio(1)(2)(3)(4)(1) + (3)Windows
μ b , t 51033.33%31820.78%27517.97%42727.91%78551.31%1530
r b , t | p = 3 49132.20%33321.84%29319.21%40826.75%78451.41%1525
r b , t | p = 50 47131.87%32521.99%33622.73%34623.41%80754.60%1478
r b , t | p = 100 46632.63%30321.22%34023.81%31922.34%80656.44%1428
r b , t | p = 150 45533.02%29121.12%35425.69%27820.17%80958.71%1378
r b , t | p = 200 43232.53%28021.08%34525.98%27120.41%77758.51%1328
r b , t | p = 300 41133.47%24519.95%35128.58%22118.00%76262.05%1228
r b , t | p = 400 40535.90%19917.64%33930.05%18516.40%74465.96%1128
r b , t | p = 500 38036.96%16916.44%32831.91%15114.69%70868.87%1028
r b , t | p = 600 36939.76%13314.33%30132.44%12513.47%67072.20%928
r b , t | p = 700 36243.72%9110.99%29835.99%779.30%66079.71%828
Note: (1) variable χ represents each of the variables μ b , t , r b , t | p = 3 , …, and r b , t | p = 700 ; (2) because the lag order levels in the different autoregressive models are different, the sample sizes are different.
Table 13. Comparison of hit ratios between the results from the AR-FD models and the ARDL-CRG-FD models.
Table 13. Comparison of hit ratios between the results from the AR-FD models and the ARDL-CRG-FD models.
Condition { r t 1 } { χ 1 } { r t 1 } { χ < 1 } { r t < 1 } { χ < 1 } { r t < 1 } { χ 1 } Total Hit Ratio
Hit Ratio(1)(2)(3)(4)(1) + (3)
μ b , t μ a , t −2.06%2.10%0.54%−0.60%−1.51%
r b , t r a , t | p = 3 −3.04%3.04%1.66%−1.65%−1.38%
r b , t r a , t | p = 50 −0.76%0.82%0.75%−0.81%−0.01%
r b , t r a , t | p = 100 0.26%−0.13%1.27%−1.40%1.52%
r b , t r a , t | p = 150 −0.09%0.09%1.96%−1.96%1.87%
r b , t r a , t | p = 200 −0.70%0.69%0.98%−0.96%0.28%
r b , t r a , t | p = 300 −0.44%0.51%−0.34%0.27%−0.78%
r b , t r a , t | p = 400 −0.84%0.74%1.05%−0.95%0.21%
r b , t r a , t | p = 500 −0.05%0.03%0.76%−0.74%0.71%
r b , t r a , t | p = 600 0.26%−0.50%0.62%−0.38%0.88%
r b , t r a , t | p = 700 0.39%−0.42%0.55%−0.53%0.95%
Table 14. Hit ratios between the real return index and its prediction values of the AR-GARCH-FD model for the return index.
Table 14. Hit ratios between the real return index and its prediction values of the AR-GARCH-FD model for the return index.
Condition { r t 1 } { χ 1 } { r t 1 } { χ < 1 } { r t < 1 } { χ < 1 } { r t < 1 } { χ 1 } Total RatioPrediction
Hit Ratio(1)(2)(3)(4)(1) + (3)Windows
μ a , t 54035.39%28518.68%26617.43%43528.51%80652.82%1526
r a , t ( e ) | p = 3 59839.32%22414.73%19312.69%50633.27%79152.01%1521
r a , t ( e ) | p = 50 54536.97%24816.82%29119.74%39026.46%83656.72%1474
r a , t ( e ) | p = 100 50435.39%26118.33%30921.70%35024.58%81357.09%1424
r a , t ( e ) | p = 150 49135.74%25318.41%32423.58%30622.27%81559.32%1374
r a , t ( e ) | p = 200 47836.10%23217.52%33825.53%27620.85%81661.63%1324
r a , t ( e ) | p = 300 43835.78%21517.57%34928.51%22218.14%78764.30%1224
r a , t ( e ) | p = 400 43638.79%16714.86%35131.23%17015.12%78770.02%1124
r a , t ( e ) | p = 500 39338.38%15415.04%33032.23%14714.36%72370.61%1024
r a , t ( e ) | p = 600 38741.88%11512.45%31634.20%10611.47%70376.08%924
r a , t ( e ) | p = 700 38746.97%647.77%32239.08%516.19%70986.04%824
Note: (1) variable χ represents each of the variables μ a , t , r a , t ( e ) | p = 3 , …, and r a , t ( e ) | p = 600 ; (2) because the lag order levels in the different autoregressive models are different, the sample sizes are different.
Table 15. Hit ratios between the return index and its prediction values of the ARDL-CRG-GARCH-FD model for the cumulative return index.
Table 15. Hit ratios between the return index and its prediction values of the ARDL-CRG-GARCH-FD model for the cumulative return index.
Condition { r t 1 } { χ 1 } { r t 1 } { χ < 1 } { r t < 1 } { χ < 1 } { r t < 1 } { χ 1 } Total RatioPrediction
Hit Ratio(1)(2)(3)(4)(1) + (3)Windows
μ b , t 51033.33%31820.78%27517.97%42727.91%78551.31%1530
r b , t ( e ) | p = 3 52734.53%29819.53%25216.51%44929.42%77951.05%1526
r b , t ( e ) | p = 50 52835.70%26918.19%29119.68%39126.44%81955.38%1479
r b , t ( e ) | p = 100 50435.27%26518.54%31822.25%34223.93%82257.52%1429
r b , t ( e ) | p = 150 49235.68%25418.42%32923.86%30422.04%82159.54%1379
r b , t ( e ) | p = 200 47635.82%23717.83%34025.58%27620.77%81661.40%1329
r b , t ( e ) | p = 300 44336.05%21417.41%34427.99%22818.55%78764.04%1229
r b , t ( e ) | p = 400 43038.09%17515.50%34530.56%17915.85%77568.64%1129
r b , t ( e ) | p = 500 40339.16%14614.19%33332.36%14714.29%73671.53%1029
r b , t ( e ) | p = 600 39442.41%10911.73%32034.45%10611.41%71476.86%929
r b , t ( e ) | p = 700 38746.68%667.96%31938.48%576.88%70685.16%829
Note: (1) variable χ represents each of the variables μ b , t , r b , t ( e ) | p = 3 , …, and r b , t ( e ) | p = 700 ; (2) because the lag order levels in the different autoregressive models are different, the sample sizes are different.
Table 16. Comparison of the hit ratios between the results from the AR-GARCH-FD and the ARDL-CRG-GARCH-FD models.
Table 16. Comparison of the hit ratios between the results from the AR-GARCH-FD and the ARDL-CRG-GARCH-FD models.
Condition { r t 1 } { χ 1 } { r t 1 } { χ < 1 } { r t < 1 } { χ < 1 } { r t < 1 } { χ 1 } Total Ratio
Hit Ratio(1)(2)(3)(4)(1) + (3)
μ b , t μ a , t −2.06%2.10%0.54%−0.60%−1.51%
r b , t ( e ) r a , t ( e ) | p = 3 −4.79%4.80%3.82%−3.85%−0.96%
r b , t ( e ) r a , t ( e ) | p = 50 −1.27%1.37%−0.06%−0.02%−1.34%
r b , t ( e ) r a , t ( e ) | p = 100 −0.12%0.21%0.55%−0.65%0.43%
r b , t ( e ) r a , t ( e ) | p = 150 −0.06%0.01%0.28%−0.23%0.22%
r b , t ( e ) r a , t ( e ) | p = 200 −0.28%0.31%0.05%−0.08%−0.23%
r b , t ( e ) r a , t ( e ) | p = 300 0.27%−0.16%−0.52%0.41%−0.26%
r b , t ( e ) r a , t ( e ) | p = 400 −0.70%0.64%−0.67%0.73%−1.38%
r b , t ( e ) r a , t ( e ) | p = 500 0.78%−0.85%0.13%−0.07%0.92%
r b , t ( e ) r a , t ( e ) | p = 600 0.53%−0.72%0.25%−0.06%0.78%
r b , t ( e ) r a , t ( e ) | p = 700 −0.29%0.19%−0.60%0.69%−0.88%
Table 17. Comparison of hit ratios between the results from the direct AR models and the direct AR-GARCH models.
Table 17. Comparison of hit ratios between the results from the direct AR models and the direct AR-GARCH models.
Condition { r t 1 } { χ 1 } { r t 1 } { χ < 1 } { r t < 1 } { χ < 1 } { r t < 1 } { χ 1 } Total Ratio
Hit Ratio(1)(2)(3)(4)(1) + (3)
μ a , t μ a , t 0.00%0.00%0.00%0.00%0.00%
r a , t r a , t ( e ) | p = 3 −4.08%4.07%4.86%−4.87%0.78%
r a , t r a , t ( e ) | p = 50 −4.34%4.35%2.24%−2.24%−2.11%
r a , t r a , t ( e ) | p = 100 −3.02%3.02%0.84%−0.84%−2.17%
r a , t r a , t ( e ) | p = 150 −2.63%2.62%0.15%−0.14%−2.48%
r a , t r a , t ( e ) | p = 200 −2.87%2.87%−0.53%0.52%−3.40%
r a , t r a , t ( e ) | p = 300 −1.87%1.87%0.41%−0.41%−1.47%
r a , t r a , t ( e ) | p = 400 −2.05%2.04%−2.23%2.23%−4.27%
r a , t r a , t ( e ) | p = 500 −1.37%1.37%−1.08%1.07%−2.45%
r a , t r a , t ( e ) | p = 600 −2.38%2.38%−2.38%2.38%−4.76%
r a , t r a , t ( e ) | p = 700 −3.64%3.64%−3.64%3.64%−7.28%
Table 18. Comparison of hit ratios between the results from the ARDL-CRG-FD models and the ARDL-CRG-GARCH-FD models.
Table 18. Comparison of hit ratios between the results from the ARDL-CRG-FD models and the ARDL-CRG-GARCH-FD models.
Condition { r t 1 } { χ 1 } { r t 1 } { χ < 1 } { r t < 1 } { χ < 1 } { r t < 1 } { χ 1 } Total Ratio
Hit Ratio(1)(2)(3)(4)(1) + (3)
μ b , t μ b , t 0.00%0.00%0.00%0.00%0.00%
r b , t r b , t ( e ) | p = 3 −2.33%2.31%2.70%−2.67%0.36%
r b , t r b , t ( e ) | p = 50 −3.83%3.80%3.05%−3.03%−0.78%
r b , t r b , t ( e ) | p = 100 −2.64%2.68%1.56%−1.59%−1.08%
r b , t r b , t ( e ) | p = 150 −2.66%2.70%1.83%−1.87%−0.83%
r b , t r b , t ( e ) | p = 200 −3.29%3.25%0.40%−0.36%−2.89%
r b , t r b , t ( e ) | p = 300 −2.58%2.54%0.59%−0.55%−1.99%
r b , t r b , t ( e ) | p = 400 −2.19%2.14%−0.51%0.55%−2.68%
r b , t r b , t ( e ) | p = 500 −2.20%2.25%−0.45%0.40%−2.66%
r b , t r b , t ( e ) | p = 600 −2.65%2.60%−2.01%2.06%−4.66%
r b , t r b , t ( e ) | p = 700 −2.96%3.03%−2.49%2.42%−5.45%
Table 19. RMSE of the return index prediction values from the AR-FD, AR-GARCH-FD, ARDL-CRM-FD, ARDL-CRM-GARCH-FD models.
Table 19. RMSE of the return index prediction values from the AR-FD, AR-GARCH-FD, ARDL-CRM-FD, ARDL-CRM-GARCH-FD models.
VariableRMSEVariableRMSEPredictionRMSEPredictionRMSE
μ a , t 0.009521 μ b , t 0.009553 μ a , t 0.009521 μ b , t 0.009553
r a , t | p = 3 0.009531 r b , t | p = 3 0.009489 r a , t ( e ) | p = 3 0.009556 r b , t ( e ) | p = 3 0.009513
r a , t | p = 50 0.009352 r b , t | p = 50 0.009332 r a , t ( e ) | p = 50 0.009430 r b , t ( e ) | p = 50 0.009394
r a , t | p = 100 0.008994 r b , t | p = 100 0.009018 r a , t ( e ) | p = 100 0.009078 r b , t ( e ) | p = 100 0.009067
r a , t | p = 150 0.008783 r b , t | p = 150 0.008780 r a , t ( e ) | p = 150 0.008887 r b , t ( e ) | p = 150 0.008856
r a , t | p = 200 0.008649 r b , t | p = 200 0.008685 r a , t ( e ) | p = 200 0.008761 r b , t ( e ) | p = 200 0.008760
r a , t | p = 300 0.008334 r b , t | p = 300 0.008364 r a , t ( e ) | p = 300 0.008449 r b , t ( e ) | p = 300 0.008440
r a , t | p = 400 0.007197 r b , t | p = 400 0.007227 r a , t ( e ) | p = 400 0.007388 r b , t ( e ) | p = 400 0.007383
r a , t | p = 500 0.006279 r b , t | p = 500 0.006235 r a , t ( e ) | p = 500 0.006385 r b , t ( e ) | p = 500 0.006332
r a , t | p = 600 0.005482 r b , t | p = 600 0.005415 r a , t ( e ) | p = 600 0.005870 r b , t ( e ) | p = 600 0.005612
r a , t | p = 700 0.004127 r b , t | p = 700 0.004063 r a , t ( e ) | p = 700 0.004408 r b , t ( e ) | p = 700 0.004544
Table 20. Comparison between RMSE values from the AR-FD, AR-GARCH-FD, ARDL-CRM-FD, and ARDL-CRM-GARCH-FD models.
Table 20. Comparison between RMSE values from the AR-FD, AR-GARCH-FD, ARDL-CRM-FD, and ARDL-CRM-GARCH-FD models.
r a , t r b , t r a , t ( e ) r b , t ( e ) RMSERMSERMSERMSE
(1)(2)(3)(4)(2)–(1)(1)–(3)(2)–(4)(4)–(3)
μ a , t μ b , t μ a , t μ b , t 0.000032 0.000000 0.000000 0.000032
r a , t | p = 3 r b , t | p = 3 r a , t ( e ) | p = 3 r b , t ( e ) | p = 3 −0.000042 −0.000025 −0.000024 −0.000043
r a , t | p = 50 r b , t | p = 50 r a , t ( e ) | p = 50 r b , t ( e ) | p = 50 −0.000020 −0.000078 −0.000062 −0.000036
r a , t | p = 100 r b , t | p = 100 r a , t ( e ) | p = 100 r b , t ( e ) | p = 100 0.000024 −0.000084 −0.000049 −0.000011
r a , t | p = 150 r b , t | p = 150 r a , t ( e ) | p = 150 r b , t ( e ) | p = 150 −0.000003 −0.000104 −0.000076 −0.000031
r a , t | p = 200 r b , t | p = 200 r a , t ( e ) | p = 200 r b , t ( e ) | p = 200 0.000036 −0.000112 −0.000075 −0.000001
r a , t | p = 300 r b , t | p = 300 r a , t ( e ) | p = 300 r b , t ( e ) | p = 300 0.000030 −0.000115 −0.000076 −0.000009
r a , t | p = 400 r b , t | p = 400 r a , t ( e ) | p = 400 r b , t ( e ) | p = 400 0.000030 −0.000191 −0.000156 −0.000005
r a , t | p = 500 r b , t | p = 500 r a , t ( e ) | p = 500 r b , t ( e ) | p = 500 −0.000044 −0.000106 −0.000097 −0.000053
r a , t | p = 600 r b , t | p = 600 r a , t ( e ) | p = 600 r b , t ( e ) | p = 600 −0.000067 −0.000388 −0.000197 −0.000258
r a , t | p = 700 r b , t | p = 700 r a , t ( e ) | p = 700 r b , t ( e ) | p = 700 −0.000064 −0.000281 −0.000481 0.000136
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Yan, K.; Gupta, R.; Haddad, S. Statistical Analysis Dow Jones Stock Index—Cumulative Return Gap and Finite Difference Method. J. Risk Financial Manag. 2022, 15, 89. https://doi.org/10.3390/jrfm15020089

AMA Style

Yan K, Gupta R, Haddad S. Statistical Analysis Dow Jones Stock Index—Cumulative Return Gap and Finite Difference Method. Journal of Risk and Financial Management. 2022; 15(2):89. https://doi.org/10.3390/jrfm15020089

Chicago/Turabian Style

Yan, Kejia, Rakesh Gupta, and Sama Haddad. 2022. "Statistical Analysis Dow Jones Stock Index—Cumulative Return Gap and Finite Difference Method" Journal of Risk and Financial Management 15, no. 2: 89. https://doi.org/10.3390/jrfm15020089

Article Metrics

Back to TopTop