How to Promote the Performance of Parametric Volatility Forecasts in the Stock Market? A Neural Networks Approach

This study uses the fourteen stock indices as the sample and then utilizes eight parametric volatility forecasting models and eight composed volatility forecasting models to explore whether the neural network approach and the settings of leverage effect and non-normal return distribution can promote the performance of volatility forecasting, and which one of the sixteen models possesses the best volatility forecasting performance. The eight parametric volatility forecasts models are composed of the generalized autoregressive conditional heteroskedasticity (GARCH) or GJR-GARCH volatility specification combining with the normal, Student’s t, skewed Student’s t, and generalized skewed Student’s t distributions. Empirical results show that, the performance for the composed volatility forecasting approach is significantly superior to that for the parametric volatility forecasting approach. Furthermore, the GJR-GARCH volatility specification has better performance than the GARCH one. In addition, the non-normal distribution does not have better forecasting performance than the normal distribution. In addition, the GJR-GARCH model combined with both the normal distribution and a neural network approach has the best performance of volatility forecasting among sixteen models. Thus, a neural network approach significantly promotes the performance of volatility forecasting. On the other hand, the setting of leverage effect can encourage the performance of volatility forecasting whereas the setting of non-normal distribution cannot.


Introduction
Volatility is a statistical measure of the dispersion of returns for a given asset. A higher volatility means that an asset's price can change dramatically over a short time period in either direction, and thus is expected to be less predictable. On the other hand, a lower volatility means that an asset's price does not fluctuate dramatically, and then tends to be more steady (volatility is often measured by either the standard deviation or variance between returns from that same asset). Hence, volatility can be used to measure the amount of uncertainty or risk related to the size of changes in an asset's price, and it obeys the criteria: 'the higher the volatility and then the riskier the asset'. Because of the above property for volatility, volatility is usually used in asset allocation [1][2][3], option pricing [4,5], risk management [6][7][8][9] and hedge strategy [10,11]. Thus, how to accurately predict the volatility of an asset is a very important issue in the actual investment process in the financial field. As to the issue of volatility forecasting, most of literatures used the generalized autoregressive conditional heteroskedasticity (GARCH) family models, a parametric volatility forecasting approach, to predict the volatility of an asset [12][13][14][15][16][17][18][19].
Because this type of model can capture most common features of financial assets such as both the linear dependence and strong autoregressive conditional heteroskedasticity (ARCH) effect subsisting on the return series, and both the volatility clustering and leverage effect usually existing at the volatility of financial asset returns series [8,[20][21][22][23] (Volatility clustering means that large changes tend to be followed by large changes, of either sign, Emerging 7 (E7) is the seven countries Brazil, Russia, India, China, Mexico, Indonesia and Turkey. They have the highest economic performance in the class of emerging economies or they are the seven biggest emerging countries in terms of economic growth). The eight parametric volatility forecasting models are composed of the GJR-GARCH-SGT model and its seven degenerate models. The eight composed volatility forecasting models are obtained by eight parametric volatility forecasting models combined with a neural network approach. From the empirical results of both the preliminary analysis of data and the performance comparison of the 16 models, I discovered the following phenomena. The preliminary analysis of data includes the descriptive statistics of data and estimation results of the GJR-GARCH-SGT model. First, the stock indices in the E7 have higher return and higher risk than those in the G7. Second, all the stock indices in the G7 and E7 for the forecasting period have higher risk than those for the overall period because of COVID-19 spreading throughout the world in the last year. Third, the leverage effect is significant in the stock indices in the G7 and E7, especially for the G7. Fourth, the distribution of returns is left-skewed and has a larger and thicker tail than the normal distribution. Fifth, the performance for the composed volatility forecasting models is significantly superior to that for the parametric volatility forecasting models, and thus the neural network approach can significantly promote the performance of volatility forecasting. Sixth, the performance for the GJR-based models is significantly superior to that of the GARCH-based models, and thus the setting of the leverage effect can significantly encourage the performance of volatility forecasting. Seventh, the performance of the models with non-normal distribution is not superior to that of the models with normal distribution, and thus the setting of the non-normal return distribution cannot promote the performance of volatility forecasting. Eighth, among the 16 models in this study, the performance of the GJR-GARCH-N-NN models is the best followed by the GJR-GARCH-N, GJR-GARCH-T-NN and GJR-GARCH-SGT-NN, and thus the GJR-GARCH model combining both the normal distribution and a neural network approach has the better performance of volatility forecasting. Finally, for each of the 14 stock indices, the most suitable models are not necessarily the same but they possess the setting of leverage effect and further combine with a neural network approach, and thus these results are the same as those obtained from the analysis of previous issues. Thus, this study uses the stock indices in the developed and emerging markets as the sample and then utilizes eight parametric volatility forecasting models and eight composed volatility forecasting models to explore whether the neural network approach can promote the performance of volatility forecasts, whether the settings of the leverage effect and return distribution can encourage the performance of volatility forecasts and which one of the 16 models possesses the best volatility forecasting performance. In addition, for each of fourteen stock indices, this study also investigates which model is the most suitable for it. The stock indices in the Group of Seven (G7) and Emerging Seven (E7) are used (a) One backpropagation neural networks with an input layer including two input nodes (x 1 and x 2 ), a hidden layer of three hidden nodes ( h 1 , h 2 and h 3 ), and an output layer containing an output node (y). (b) One backpropagation neural networks with an input layer including one input node (x), a hidden layer of five hidden nodes ( h 1 , h 2 , h 3 , h 4 , and h 5 ), and an output layer containing one output node (y).
The remainder of this paper is organized as follows. Section 2 describes the empirical models utilized in this study, eight parametric volatility forecasting models and eight composed volatility forecasting models, and two types of loss function to evaluate the Entropy 2021, 23, 1151 5 of 26 above models. Section 3 states the basic statistical features of the return series for the stock indices in the E7 and G7 during the overall period and two sub-periods: estimation period and forecast period. Section 4 analyzes the results of the empirical model and further explores the issues addressed in this study. Finally, Section 5 concludes the findings in Sections 3 and 4.

Methodology
In order to accurately forecast volatility, the selected empirical model should capture the common features of financial assets. For example, the distribution of returns is skewed to the right or left and has a larger and thicker tail than the normal distribution. In other words, the return series is not normally distributed. Moreover, the return series exhibits linear dependence and strong ARCH effect. In addition, the volatility clustering and leverage effect usually exists in the volatility of financial asset return series [8,20,23]. Hence, the empirical models include a symmetric type of volatility specification, the GARCH, and an asymmetric one, the GJR-GARCH, combined with the normal (N), Student's t (T), skewed Student's t (ST) and skewed generalized Student's t (SGT) distributions, totaling eight different models (the GJR-GARCH model of Glosten et al. [24] is an asymmetric type of GARCH-based model, and it can seize the financial features: the volatility clustering and leverage effect. On the contrary, the asymmetric type of distributions, skewed Student's t (ST) and skewed generalized Student's t (SGT), can capture the skewness and fat-tails on the distribution of return). The eight models above can be divided into two categories. The first category of model includes the GARCH-N, GARCH-T, GARCH-ST and GARCH-SGT models, named the GARCH-based models. The second category of model consists of the GJR-GARCH-N, GJR-GARCH-T, GJR-GARCH-ST and GJR-GARCH-SGT models, called the GJR-based models. The eight models above are called parametric volatility forecasting models. Notably, this study combines the above parametric volatility forecasting models with a neural network (NN) approach to promote the volatility forecasting performance in the stock market. Hence, there are an additional eight models, the GARCH-N-NN, GARCH-T-NN, GARCH-ST-NN, GARCH-SGT-NN, GJR-GARCH-N-NN, GJR-GARCH-T-NN, GJR-GARCH-ST-NN and GJR-GARCH-SGT-NN models, respectively, representing the GARCH-N, GARCH-T, GARCH-ST, GARCH-SGT, GJR-GARCH-N, GJR-GARCH-T, GJR-GARCH-ST and GJR-GARCH-SGT models combined with a neural network approach. The above eight models are named as composed volatility forecasting models.

Parametric Volatility Forecasting Models
Among the eight parametric volatility forecasting models in this study, the GJR-GARCH-SGT model can degenerate into the other seven models. Hence, in this subsection, I mainly illustrate the mean and variance equations of the GJR-GARCH-SGT model and then describe how the GJR-GARCH-SGT model be degenerated into the other seven models. The mean and variance equations of the GJR-GARCH-SGT model are expressed as follows: where r t represents the return of the stock indices in the emerging and developed markets where r t = (lnP t − lnP t−1 ) × 100. P t is the close price of the stock index at time t and e t is the current error. µ t and h t represent the conditional mean and variance of return, respectively. Moreover, I − t−1 is an indicator dummy that takes the value of 1 if e t−1 < 0 and zero otherwise, and thus parameter η is used to capture the leverage effect of volatility. Furthermore, ω, α and β are the parameters of variance equation and they obey the constraints ω, α, β > 0 and β + α + 0.5η < 1 (according to Example 2.1 of Ling and McAleer [26], the necessary and sufficient condition for the existence of the second moment condition for all GJR-based models is β + α + 0.5η < 1). Notably, if η = 0 in Equation (2), then the GJR-GARCH volatility specification degenerates into the GARCH volatility specification. IID denotes that the standardized errors z t are independent and identically distributed. Because z t is drawn from the standardized SGT distribution which allows returns innovation to follow a flexible treatment of both skewness and excess kurtosis in the conditional distribution of returns. The probability density function for the standardized SGT distribution is derived in Lee and Su [7] and can be represented as follows (the standardized SGT distribution, which has zero mean and unit variance, was checked by Mathematica software and another analogous standardized SGT distribution was proposed by Bali and Theodossiou [27]): where κ, n and λ are the scaling parameters and C and θ are the normalizing constants ensuring that f(·) is a proper probability density function. The parameters κ and n control the height and tails of density with the constraints κ > 0 and n > 2, respectively. The skewness parameter λ controls the rate of descent of the density around the mode of z t with −1 < λ < 1. In the case of positive and negative skewness, the density function skews toward to the right and left, respectively. B(·) is the beta function whereas 'sign' denotes a sign function. The parameter n has the degrees of freedom interpretation in case λ = 0 and κ = 2. The log-likelihood function of the GJR-GARCH-SGT model thus can be expressed as: where Ψ = [φ 0 , φ 1 , ω, α, β, η, κ, λ, n] is the vector of parameters to be estimated, and Ω t−1 denotes the information set of all observed returns up to time t−1. Notably, if κ = 2 in Equation (3), then the standardized SGT distribution degenerates into the standardized ST distribution. Using the same inference process, the standardized SGT distribution degenerates into the standardized Student's t distribution if κ = 2 and λ = 0 and it also degenerates into the standardized normal distribution if κ = 2, λ = 0 and n → ∞ (regarding the process of the SGT distribution degenerating into the normal, Student's t, skewed Student's t (ST) distributions, please see Lee and Su [7] for more details).

Composed Volatility Forecasting Models
The eight composed volatility forecasting models are the GARCH-N-NN, GARCH-T-NN, GARCH-ST-NN, GARCH-SGT-NN, GJR-GARCH-N-NN, GJR-GARCH-T-NN, GJR-GARCH-ST-NN and GJR-GARCH-SGT-NN models. They are obtained by eight parametric volatility forecasting models combined with a neural network approach. A neural network (NN) approach is particularly useful for handling the complex, non-linear univariate and multivariate relationships that would be difficult to fit by using other techniques. A neural network model is composed of a multilayer perceptron with an interconnected group of nodes. For example, Figure 1a shows one backpropagation neural network with three layers, an input layer including two input nodes (x 1 and x 2 ), a hidden layer of three hidden nodes ( h 1 , h 2 and h 3 ) and an output layer containing an output node (y). On the other hand, Figure 1b displays the other backpropagation neural network with an input layer including one input node (x), a hidden layer of five hidden nodes ( h 1 , h 2 , h 3 , h 4 , and h 5 ) and an output layer containing one output node (y). Notably, the input nodes and the output nodes are analogous to the explanatory variables and the dependent variables in a regression model, respectively. The theory of neural network (NN) models is illustrated as follows (regarding the theory of neural network (NN) models, please see Lu et al. [28] and chapter 12 in the user's guide of RATS version 6 or Doan [29]): Subsequently, I used a vector to represent the nodes of each layer and the total number of elements in a vector denote that of node in a layer. For example, the vectors X = (x 1 , x 2 , . . . , x d ) and Y = y 1 , y 2 , . . . , y c represent the nodes in the input layer and output layer, respectively. On the other hand, the vector H= h 1 , h 2 , . . . , h m denotes the nodes in the hidden layer. The three layers' perceptron model is obtained by a weighted linear combination of the d input values from the d input nodes, X = (x 1 , x 2 , . . . , x d ), and is expressed as follows: The activation of hidden unit j can be achieved by transforming the linear sum via using a logistic activation function g a j = 1/ 1 + e −a j : Thus, the node of output layer is defined as: If the output function is taken linear, g(a) = a, the output model reduces to: where w (1) ji and w (2) jk are the weights of the hidden node and output node, respectively. Notably, fitting a neural networks model involves a training process of this model. The training process is executed by supplying a set of known input and output values, and then allowing the neural networks algorithm to adjust the hidden node and output node weights until the output produced by the networks matches the actual output in the training sample to the desired degree of accuracy (in this study, the input values are a volatility forecasting seriesĥ t obtained by the variance equation of parametric volatility forecasting models as shown in Equation (2). On the contrary, the output values are the true values of variance and are replaced by the squared intraday return series (r 2 t ) as the proxies. In addition, the neural networks algorithm in the training process is the instruction of 'NNLEARN' in RATS version 6, manufacturer, city and country). Once the training process is completed, the model can be used to generate new output data from the other sets of inputs (the new output data may be the fitted values for the in-sample volatility forecasts or the forecast values for the out-of-sample volatility forecasts in this study. Notably, the new output data is obtained by the instruction of 'NNTEST' in RATS version 6). Assuming that the fit is good, the relationships represented by the sample input and output data can generalize to other samples, thus the model can produce good predictions. Thus, the volatility forecasts of the GJR-GARCH-SGT-NN model are obtained by assigning a volatility forecasting seriesĥ t obtained from the GJR-GARCH-SGT model as the input values and using a backpropagation neural network with one input node, five hidden nodes and an output node as shown in Figure 1b. Thus, I took an example of the GJR-GARCH-SGT-NN model to explain the process of the volatility forecasting for a neural network approach. Figure 2 lists the schematic diagram for the process of the volatility forecasting of the GJR-GARCH-SGT-NN model, and is illustrated as follows: Step 1 Fit a return series (r t ) into the GJR-GARCH-SGT model (see, Equations (1)-(4)).
Then, get a forecasted variance series (ĥ t ) (see, Equation (2)) Step 2 Input a forecasted variance series (ĥ t ) obtained from step 1 and the true variance series (r 2 t ) into the 'NNLEARN' function. Then, obtain the weights of the hidden nodes (w (1) ji ) and the weights of the output node (w (2) jk ). The 'NNLEARN' function respectively regardsĥ t and r 2 t as the input value and output value of Figure 1b, and then substitutes them (ĥ t and r 2 t ) into Equation (8) to execute a training process of the neural network in order to obtain the weights w (1) ji and w (2) jk (in the training process of neural networks, the weights of the hidden nodes and the output node (w (1) ji and w (2) jk ) are initially determined by a set of initial weights values, and their weights are adjusted until the output produced by the networks (ĥ t ) matches the actual output (r 2 t ) in the training sample to the desired degree of accuracy. Te training is then considered complete, the weights w (1) ji and w (2) jk are obtained). Figure 1b is a structure of a backpropagation neural network with one node in the input layer, five nodes in the hidden layer and one node in the output layer.
Step 3 Input a forecasted variance series (ĥ t ) obtained from step 1 and the weights of the hidden nodes (w (1) ji ) and the weights of the output node (w (2) jk ) obtained from step 2 into the 'NNTEST' function. Furthermore, obtain the forecasted variance series for a neural network approach (ĥ NN t ). That is, the 'NNTEST' function substitutes the input valueĥ t and weights of the hidden nodes and output node (w (1) ji and w (2) jk ) into Equation (8), and then the output valueĥ NN t is obtained.
Step 4 Given a forecasted variance series for a neural network approach (ĥ NN t ) and the true variance series (r 2 t ), then calculate the values of loss functions, MAE and RMSE (see Equations (9) and (10)).  The fourth category of model is composed of all sixteen models in this study. Hence, the performance comparison of volatility forecasting for the first category of model can explore whether the neural network approach can promote the performance of volatility forecasting. On the contrary, the performance comparison of volatility forecasting for the second and third categories of model can investigate whether the settings of the leverage effect and return distribution can encourage the performance of volatility forecasts, respectively. In addition, the performance comparison of volatility forecasting for the fourth category of model can explore which one of the 16 models possesses the best volatility forecasting performance. Two types of loss function are the mean absolute error (MAE) and root mean squared error (RMSE). The MAE measures the average magnitude of the errors in a set of forecasts, without considering their direction. The MAE for the out-of-sample volatility forecasting can be evaluated by the following equation.
where ε i denotes the forecast error;ĥ t+i|t+i−1 is the one-step-ahead forecast of the variance of the returns dependent on all information upon the time t + i−1 and can be estimated by one of the sixteen models in this study; h t+i is the true value of variance and is replaced by the squared intraday returns as the proxies; T is the number of computing 1-day-ahead variance and is equal to 250 in this study. Moreover, the RMSE for the out-of-sample volatility forecasting can be obtained by the following equation (the MAE and RMSE for in-sample volatility forecasting can be evaluated by the equation: where T is the total number of observations for overall period and is equal to 3501 in this study).
where ε i ,ĥ t+i|t+i−1 , h t+i and T are defined the same as for Equation (9). Hence, the MAE and RMSE can measure the differences between the values predicted by a model and the values actually observed from the thing being modeled or estimated.

Data and Descriptive Statistics
This study uses the stock indices in the developed and emerging markets as the sample to explore the issue of 'how to promote the performance of parametric volatility forecasting models? A neural networks approach'. I used the stock indices in the G7 and E7 to represent the stock markets in the developed and emerging markets, respectively. The stock indices in the G7 include the Dow Jones (DJ), TSX, FTSE, CAC40, DAX, MIB and N225, respectively corresponding to the United States, Canada, the United Kingdom, France, Germany, Italy and Japan. On the contrary, the stock indices in the E7 are the BVSP, RTSI, BSE, SSE, MXX, JKSE and XU100, respectively corresponding to Brazil, Russia, India, China, Mexico, Indonesia and Turkey. Table 1 reports basic descriptive statistics of daily return of stock indices in the G7 and E7 during the overall period and two sub-periods. The overall period starts from 10 January 2001 and ends 27 July 2020 and is used to perform the in-sample volatility forecasting. The overall period is divided into two sub-periods, the estimation period and forecasting period, to execute the out-of-sample volatility forecasting. The estimation period is from 10 January 2001 to 8 April 2019 whereas the forecasting period is from 9 April 2019 to 27 July 2020. Notably, all the daily close price data of the 14 stock indices were obtained from the Yahoo finance website. As shown in panel A of Table 1, the mean values of G7 are between −0.0215 (MIB) and 0.0263 (DJ) whereas those of E7 range from 0.0254 (SSE) to 0.0717 (JKSE). Conversely, the values of standard deviation for the G7 are between 1.2631 (TSX) and 1.8306 (MIB) whereas those for the E7 range from 1.4361 (MXX) to 2.3155 (RTSI). These results indicate that during the overall period, the stock indices in the E7 have the higher return and higher risk than those in the G7 (the reason is that the maximum value of mean for the E7 (0.0717) is greater than that for the G7 (0.0263) and the minimum value of mean for the E7 (0.0254) is also greater than that for the G7 (−0.0215). On the other hand, the maximum value of standard deviation for the E7 (2.3155) is greater than that for the G7 (1.8306) and the minimum value of standard deviation for the E7 (1.4361) is also greater than that for the G7 (1.2631)). The above finding is consistent with that found in Su [9]. As illustrated in panel B and panel C of Table 1, I discovered that the stock indices in the G7 and E7 for the estimation and forecasting periods possess the same phenomena found from the overall period. That is, the stock indices in the E7 have higher return and higher risk than those in the G7. Notably, regarding the forecasting period, the values of standard deviation for the G7 are between 1.7269 (N225) and 2.3058 (MIB) whereas those for the E7 range from 1.6095 (MXX) to 3.1898 (SSE). These results indicate that the stock indices in the G7 and E7 for the forecasting period have higher risk than those for the overall period (the reason is, regarding the G7, the maximum value of standard deviation for the forecasting period (2.3058) is greater than that for the overall period (1.8306) and the minimum value of standard deviation for the forecasting period (1.7269) is greater than that for the overall period (1.2631). Conversely, regarding the E7, the maximum value of standard deviation for the forecasting period (3.1898) is greater than that for the overall period (2.3155) and the minimum value of standard deviation for the forecasting period (1.6095) is greater than that for the overall period (1.4361)). This phenomenon is attributed to COVID-19 spreading throughout the world in the last year. Figure 3 illustrates the trend of price levels, and the variation of return for the 14 stock indices during the overall period. From Figure 3, I also discovered that in the last year, the price of stock indices underwent a severe decline and its return experienced a serious variation, indicating that the high risk appears at the forecasting period. This phenomenon is the same as that found from the above analysis. In addition, the volatility clustering occurs significantly in the overall period.  4. Q (20) statistics are asymptotically chi-squared-distributed with 20 degrees of freedom. 5. Obs. denotes the number of observations. 6. The overall period starts from 10 January 2001 to 27 July 2020. On the other hand, the estimation period is from 10 January 2001 to 8 April 2019 whereas the forecasting period is from 9 April 2019 to 27 July 2020. 7. Bold font in column 'Mean' (resp. 'SD') of each panel denotes the largest value when all seven numbers in column 'Mean' (resp. 'SD') corresponding to the G7 or E7 are compared with each other. 8. Underline font in column 'Mean' (resp. 'SD') of each panel denotes the smallest value when all seven numbers in column 'Mean' (resp. 'SD') corresponding to the G7 or E7 are compared with each other.  Finally, regarding the other descriptive statistics, they have the same features as those for most of the financial return series. For example, the distribution of returns is leftskewed and has a larger and thicker tail than the normal distribution, indicating that the return series is not normally distributed. The above results are found by coefficient of skewness, excess kurtosis and the J-B normality test statistics [30]. In addition, the return series exhibit linear dependence and strong ARCH effect as shown by the Ljung-Box (20) statistics for the squared returns. From the above findings, a GARCH family model is very suitable to seize the fat tails and time-varying volatility found in these asset return series.

Empirical Results
As described in Section 2, the empirical models in this study can be divided into two categories: the parametric volatility forecasting models and composed volatility forecast- Finally, regarding the other descriptive statistics, they have the same features as those for most of the financial return series. For example, the distribution of returns is left-skewed and has a larger and thicker tail than the normal distribution, indicating that the return series is not normally distributed. The above results are found by coefficient of skewness, excess kurtosis and the J-B normality test statistics [30]. In addition, the return series exhibit linear dependence and strong ARCH effect as shown by the Ljung-Box Q 2 (20) statistics for the squared returns. From the above findings, a GARCH family model is very suitable to seize the fat tails and time-varying volatility found in these asset return series.

Empirical Results
As described in Section 2, the empirical models in this study can be divided into two categories: the parametric volatility forecasting models and composed volatility forecasting models. The composed volatility forecasting models are the parametric volatility forecasting models combined with a neural network approach. The parametric volatility forecasting models include the GARCH-N, GARCH-T, GARCH-ST, GARCH-SGT, GJR-GARCH-N, GJR-GARCH-T, GJR-GARCH-ST and GJR-GARCH-SGT models. Among the above eight parametric volatility forecasting models, the GJR-GARCH-SGT model is the most flexible because this model can capture most of the common features of financial assets and this model can degenerate into the other seven models under the setting of some restrictions. Hence, I will report some financial features for the 14 stock indices via using the empirical results of the GJR-GARCH-SGT model. Table 2 illustrates the empirical results of the GJR-GARCH-SGT model for the stock indices in the G7 and E7. As shown in Table 2, parameters ω, α and β are significantly positive for most stock indices and all stock indices obey the constraint β + α + 0.5η < 1 as reported by the numbers listed in row 'C' of Table 2. For example, the constraint β + α + 0.5η for BVSP is equal to 0.9756, and is less than 1 because the values of parameters β, α and η are equal to 0.9197, 0.0143 and 0.0832, respectively. Moreover, parameter η is significantly positive at 1% for all stock indices. The values of parameter η for the stock indices in the G7 are greater than 0.1421 (TSX) whereas those for the stock indices in the E7 are smaller than 0.0873 (MXX) except for BSE. These results indicate that the leverage effect exists significantly in the stock indices in the G7 and E7, especially for the G7 because the values of parameter η for the case of G7 are greater than those for the E7. In addition, the shape parameters κ, n and λ are significant for most of stock indices and obey the constraints κ > 0, n > 2 and −1 < λ < 1. Notably, the values of parameter λ are significantly negative for most of stock indices. These results indicate that the distribution of returns is left-skewed and has a larger and thicker tail than the normal distribution. Finally, regarding the Ljung-Box Q 2 (20) statistics for the squared returns, they are not significant at the 10% level for most of cases and they are all far smaller than the same statistics appearing in Table 1. These results indicate that the serial correlation does not exist in standard residuals and the GJR-GARCH-SGT model is sufficient to correct the serial correlation of these returns series in the conditional variance equation for the stock indices in the G7 and E7.      (20) statistics are asymptotically chi-squared-distributed with 20 degrees of freedom. 5. The numbers in row 'C' denote the value of the expression 'β + α + 0.5η' and are used to check the constraint 'β + α + 0.5η < 1 , the necessary and sufficient condition for the existence of the second moment condition for all GJR-based models. Please see Example 2.1 of Ling and McAleer [26] for more details.

The Performance Assessment of Volatility Forecasts
Via the analysis of empirical results in Table 2, the selected empirical model in this study can capture the common features of financial assets well. I then executed the in-sample and out-of-sample volatility forecasts of the 16 models for the the 14 stock indices in the G7 and E7. Tables 3 and 4 report the results of in-sample volatility forecasts, respectively, based on the MAE and RMSE loss functions for the overall period. On the contrary, Tables 5 and 6 list the results of out-of-sample volatility forecasts, respectively, based on the MAE and RMSE loss functions for the forecasting period via using a rolling window approach (for each data series, the eight parametric volatility forecasting models and eight composed volatility forecasting models, totaling sixteen models, were first estimated using a sample of 3250 daily returns, and a volatility forecast for the next period was obtained. Subsequently, the estimation period was rolled forward by adding one new day and omitting the most distant day. By repeating this procedure, the out-of-sample volatility forecasts were calculated for the next 250 days). Figure 4 shows the trend of actual variance and its two out-of-sample variance forecasts obtained by the GJR-GARCH-SGT and GJR-GARCH-SGT-NN models. From Figure 4, I observed that there is a significantly sharp value of variance on March of last year owing to COVID-19 spreading througout the world.
Subsequently, regarding Tables 3-6, I performed the volatility forecasting performance comparison for four categories of model to explore whether the neural network approach can promote the performance of volatility forecasting, whether the settings of leverage effect and non-normal return distribution can encourage the performance of volatility forecasting and which one of the 16 models possesses the best volatility forecasting performance, and then record the results of performance comparison in columns S1, S2, S3 and S4 of each table (regarding four categories of model, please see Section 2.2 for more details). The results in columns S1, S2, S3 and S4 of Tables 3-6 are also summarized in Table 7 in order to easily explore the four main issues of this study. I used the data in Table 3 to illustrate the performance comparison of volatility forecasting for the four categories of the model. Regarding the performance comparison of volatility forecasting for the first category of model, I took the following two examples in Table 3 to illustrate it. First, regarding the paired models, 'the GARCH-N and GARCH-N-NN', the GARCH-N model has the lower value of MAE for the cases of DJ, TSX, FTSE and BVSP whereas the GARCH-N-NN model possesses the lower value of MAE for the cases of the other ten stock indices. For example, regarding the DJ, the value of MAE for the GARCH-N model (1.94301) is lower than that for the GARCH-N-NN model (1.94312). Furthermore, in Table 3 the results '4' for the GARCH-N model and '10' for the GARCH-N-NN model are recorded in column 'S1', respectively corresponding to the rows 'GARCH-N' and 'GARCH-N-NN'. In Table 7, the above results, '4' and '10', are also recorded in column 'MAE' below 'Insample' of S1, respectively corresponding to the rows 'GARCH-N' and 'GARCH-N-NN'. Second, regarding the paired models, 'the GJR-GARCH-SGT and GJR-GARCH-SGT-NN', the GJR-GARCH-SGT-NN model possesses the lower value of MAE for all 14 stock indices but the GJR-GARCH-SGT model does not obtain the lower value of MAE. In Table 3 the results '0' and '14' are then recorded in column 'S1', respectively corresponding to the rows 'GJR-GARCH-SGT' and 'GJR-GARCH-SGT-NN'. In Table 7, the above results, '0' and '14', are also recorded in column 'MAE' below 'In-sample' of S1, respectively corresponding to the rows 'GJR-GARCH-SGT' and 'GJR-GARCH-SGT-NN'. Regarding the performance comparison of volatility forecasting for the second category of model, I took the following example in Table 3 to explain it. Regarding the paired models, 'the GARCH-N and GJR-GARCH-N models', the GARCH-N model has the lower value of MAE for the cases of MIB, RTSI, BSE and XU100 whereas the GJR-GARCH-N model possesses the lower value of MAE for the other ten stock indices. For example, regarding the MIB, the value of MAE for the GARCH-N model (3.79126) is lower than that for the GJR-GARCH-N model (3.79163). In Table 3 the results '4' and '10' are then recorded in column 'S2', respectively corresponding to the rows 'GARCH-N' and 'GJR-GARCH-N'. In Table 7, the above results, '4' and '10', are also recorded in column 'MAE' below 'In-sample' of S2, respectively corresponding to the rows 'GARCH-N' and 'GJR-GARCH-N'. Regarding the performance comparison of volatility forecasting for the third category of model, I took the following example in Table 3 Table 3 the results '7', '0', '3', and '4' are then recorded in column 'S3', respectively corresponding to the rows 'GARCH-N', 'GARCH-T', 'GARCH-ST' and 'GARCH-SGT'. In Table 7, the above results '7', '0', '3', and '4' are also recorded in column 'MAE' below 'In-sample' of S3, respectively corresponding to the rows 'GARCH-N', 'GARCH-T', 'GARCH-ST' and 'GARCH-SGT'. Regarding the performance comparison of volatility forecasting for the fourth category of model, I took the following example in Table 3 NN (1.69350). In other words, regarding the TSX, the GJR-GARCH-N model has the lowest value of MAE among all sixteen models. In Table 3 the result '2' is then recorded in column 'S4', corresponding to the row 'GJR-GARCH-N'. In Table 7, the above result '2' is also recorded in column 'MAE' below 'In-sample' of S4, corresponding to the row 'GJR-GARCH-N'.       Subsequently, regarding Tables 3-6, I performed the volatility forecasting performance comparison for four categories of model to explore whether the neural network approach can promote the performance of volatility forecasting, whether the settings of leverage effect and non-normal return distribution can encourage the performance of volatility forecasting and which one of the 16 models possesses the best volatility forecasting performance, and then record the results of performance comparison in columns S1, S2, S3 and S4 of each table (regarding four categories of model, please see Section 2.2 for more    Table 7 lists the summary results of performance comparison for the in-sample and out-of-sample volatility forecasts based on the MAE and RMSE loss functions. In other words, the numbers in column 'MAE' below 'In-sample' of S1, S2, S3 and S4 are respectively summarized from those in columns 'S1', 'S2', 'S3' and 'S4' of Table 3. On the contrary, the numbers in column 'RMSE' below 'In-sample' of S1, S2, S3 and S4 are respectively summarized from those in columns 'S1', 'S2', 'S3' and 'S4' of Table 4. The numbers in column 'MAE' below 'Out-of-sample' of S1, S2, S3 and S4 are respectively summarized from those in columns 'S1', 'S2', 'S3' and 'S4' of Table 5. Conversely, the numbers in column 'RMSE' below 'Out-of-sample' of S1, S2, S3 and S4 are respectively summarized from those in columns 'S1', 'S2', 'S3' and 'S4' of Table 6. In order to easily explore the four main issues of this study, I performed calculations for the summation of all four numbers in column 'S1' for each model, as well as columns 'S2', 'S3' and 'S4'. For example, regarding the GARCH-N model, the numbers '4 and '0' are respectively in columns 'MAE' and 'RMSE' below 'In-sample' of S1. Moreover, the numbers '3 and '5 are respectively in columns 'MAE' and 'RMSE' below 'Out-of-sample' of S1. Hence, in Table 7, the summation of all four numbers in column 'S1' is equal to 12, and is recorded in the column 'Sum' below 'S1' and the row 'GARCH-N'. Regarding the other 15 models, the summation of all four numbers in columns 'S1' must be done with the same inference process. With regard to the 16 models, the summation of all four numbers in columns 'S2', 'S3' or 'S4' must also be evaluated with the same inference process. The above summation results in columns 'S2', 'S3' or 'S4' are recorded in the column 'Sum' below 'S1', 'S3' or 'S4', respectively. Subsequently, I used all 16 numbers in column 'Sum' below 'S1', 'S2', 'S3', and 'S4' of Table 7 to execute the performance comparison of volatility forecasting for four categories of model. As shown by the numbers at column 'Sum' below S1 of Table 7, I found that the numbers for all eight composed volatility forecasting models are far greater than those for all eight corresponding parametric volatility forecasting models. For example, the number for the GARCH-N-NN model (44) is far greater than that for the GARCH-N model (12). These results indicate that the performance for the composed volatility forecasting models is significantly superior to that for the parametric volatility forecasting models. In other words, the neural network approach can significantly improve the performance of volatility forecasting. As reported by the numbers in column 'Sum' below S2 of Table 7, I found that, regarding the parametric volatility forecasting approach, the numbers for all four GJR-based models are far greater than those for all four corresponding GARCH-based models, as shown in panel A of this table. For instance, the number for the GJR-GARCH-N model (38) is far greater than that for the GARCH-N model (18). I also found that, regarding the composed volatility forecasting approach, the numbers for all four GJR-based models are far greater than those for all four corresponding GARCH-based models, as shown in panel B of this table. These results imply that irrespective of the parametric forecasting approach or composed forecasting approach, the performance for the GJR-based models is significantly superior to that of the GARCH-based models. That is to say, the setting of the leverage effect can significantly encourage the performance of volatility forecasting (as shown in Section 2, the GJR-based model can seize the leverage effect appearing at the financial assets whereas the GARCH-based model cannot.) As illustrated by the numbers in column 'Sum' below S3 of Table 7, I found that the numbers for the models with non-normal distribution are not greater than those for the models with normal distribution based on the same volatility forecasting approach and volatility specification. For example, the number for the GARCH-N model (35) is far greater than those for the GARCH-T (1), GARCH-ST (5) and GARCH-SGT (15) models. Moreover, the number for the GJR-GARCH-N model (45) is far greater than those for the GJR-GARCH-T (2), GJR-GARCH-ST (3) and GJR-GARCH-SGT (6) models. Furthermore, the number for the GARCH-N-NN model (37) is far greater than those for the GARCH-T-NN (13), GARCH-ST-NN (4) and GARCH-SGT-NN (2) models. In addition, the number for the GJR-GARCH-N-NN model (32) is far greater than those for the GJR-GARCH-T-NN (12), GJR-GARCH-ST-NN (3) and GJR-GARCH-SGT-NN (10) models. The above results indicate that irrespective of volatility forecasting approach or volatility specification, the performance of the models with the non-normal distribution is not superior to that of the models with the normal distribution. In other words, the setting of the non-normal return distribution cannot promote the performance of volatility forecasting. As listed by the 16 numbers in column 'Sum' below S4 of Table 7, I found the number for the GJR-GARCH-N-NN model (15) is the greatest. On the contrary, the numbers for the GJR-GARCH-N, GJR-GARCH-T-NN and GJR-GARCH-SGT-NN are all equal to 9, the second greatest among the 16 numbers. The above result indicates that, among the 16 models in this study, the performance of the GJR-GARCH-N-NN models is the best followed by GJR-GARCH-N, GJR-GARCH-T-NN and GJR-GARCH-SGT-NN. In other words, the GJR-GARCH model combined with both the normal distribution and a neural networks approach has the best performance of volatility forecasting among the sixteen models in this study.
In addition, this study also investigates which model is the most suitable for each of the fourteen stock indices. That is, regarding each stock index, which model has the best performance of volatility forecasting in order to find the most suitable model for each stock index. In order to easily explore this issue, I summarized the most superior model for each stock index based on two types of volatility forecasts (in-sample and out-of-sample) and two types of loss function (MAE and RMSE). Taking an example of 'DJ' stock index, among the 16 models, the GJR-GARCH-SGT-NN model has the best performance for in-sample volatility forecast based on MAE (respectively, RMSE) as shown in the column 'DJ' of Table 3 (respectively, Table 4). These results are recorded in column 'DJ' and rows 'MAE' and 'RMSE' of 'In-sample' in Table 8. Conversely, among the 16 models, the GJR-GARCH-N-NN model has the best performance for out-of-sample volatility forecast based on MAE (respectively, RMSE) as shown in the column 'DJ' of Table 5 (respectively, Table 6). These results are recorded in column 'DJ' and rows 'MAE' and 'RMSE' of 'Out-of-sample' in Table 8. Hence, Table 8 summarizes the results of the most suitable mode for alternative stock indices. In other words, the results listed in row 'MAE' (respectively, 'RMSE') of 'In-Sample' in Table 8 are summarized from the results of the performance comparison for the fourth category of model in Table 3 (respectively, Table 4). On the other hand, the results listed in row 'MAE' (respectively, 'RMSE') of 'Out-of-Sample' in Table 8 are summarized from the results of the performance comparison for the fourth category of model in Table 5 (respectively, Table 6). From Table 8, I found that both GJR-GARCH-N-NN and GJR-GARCH-SGT-NN are the most suitable models for the DJ stock index because both the GJR-GARCH-N-NN and GJR-GARCH-SGT-NN appear twice among four cases in column 'DJ' in Table 8 (the four cases in Table 8 are composed of two types of volatility forecasts (in-sample and out-of-sample) and two types of loss function (MAE and RMSE) when the volatility forecasting of a specific stock index is executed). These results are recorded in row 'Best model', corresponding to column 'DJ' in Table 8. On the contrary, GJR-GARCH-N is the most suitable model for the TSX stock index because GJR-GARCH-N is the most relevant for the four cases. These results are recorded in row 'Best model', corresponding to column 'TSX' in Table 8. In the same inference process, I found the most suitable models for the others stock indices, and I recorded them in row 'Best model' and the column corresponding to the specific stock index in Table 8. From the results listed in the row 'Best model' of Table 8, I obtained the following conclusion. First, GJR-GARCH-N is the most suitable model for the TSX, BVSP and MXX. Second, GARCH-N-NN is the most suitable model only for the RTSI. Third, GJR-GARCH-N-NN is the most suitable model for the DJ, FTSE, MIB and SSE. Fourth, GJR-GARCH-T-NN is the most suitable model for the DAX, JKSE and XU100. Fifth, the GJR-GARCH-SGT-NN is the most suitable model for the DJ, N225 and BSE. To sum up, the most suitable models for the 14 stock indices are distributed at the GJR-GARCH-N, GARCH-N-NN, GJR-GARCH-N-NN, GJR-GARCH-T-NN and GJR-GARCH-SGT-NN models. These results indicate that the most suitable models are not necessarily the same for each of the 14 stock indices. Regarding the most suitable models above, they possess the setting of leverage effect and further combine with a neural networks approach. As to the setting of distribution, they are randomly distributed at the normal, Student's t and SGT. Hence, the above conclusions are the same as those obtained from the analysis of previous issues. That is, a neural network approach and the setting of leverage effect can significantly promote the performance of volatility forecasting but the setting of non-normal distribution cannot.

Conclusions
This study uses the stock indices in the developed and emerging markets as the sample and then utilizes eight parametric volatility forecasting models and eight composed volatility forecasting models to explore whether the neural networks approach can promote the performance of volatility forecasting, whether the settings of leverage effect and nonnormal return distribution can encourage the performance of volatility forecasting and which one of the 16 models posseses the best volatility forecasting performance. In addition, this study also investigates which model is the most suitable for each of 14 stock indices. The stock indices in the G7 and E7 are used to represent the stock markets in the developed and emerging markets, respectively. The eight parametric volatility forecasts models are composed of the GJR-GARCH or GARCH models with the normal, Student's t, skewed Student's t and generalized skewed Student's t distributions. The eight composed volatility forecasting models are the eight parametric volatility forecasting models combined with a neural network approach. Notably, the neural network model has the same concepts as the network topology.
The empirical findings can be summarized as follows. From the descriptive statistics of data and estimation results of the GJR-GARCH-SGT model, I obtained the following conclusions. First, the stock indices in the E7 have higher return and higher risk than those in the G7. Second, all the stock indices in the G7 and E7 for the forecasting period have higher risk than those for the overall period because of COVID-19 spreading throughout the world in the last year. Third, the leverage effect exists significantly in the stock indices in the G7 and E7, especially for the G7. Fourth, the distribution of returns is left-skewed and has