Self-Weighted LSE and Residual-Based QMLE of ARMA-GARCH Models †

: This paper studies the self-weighted least squares estimator (SWLSE) of the ARMA model with GARCH noises. It is shown that the SWLSE is consistent and asymptotically normal when the GARCH noise does not have a ﬁnite fourth moment. Using the residuals from the estimated ARMA model, it is shown that the residual-based quasi-maximum likelihood estimator (QMLE) for the GARCH model is consistent and asymptotically normal, but if the innovations are asymmetric, it is not as efﬁcient as that when the GARCH process is observed. Using the SWLSE and residual-based QMLE as the initial estimators, the local QMLE for ARMA-GARCH model is asymptotically normal via an one-step iteration. The importance of the proposed estimators is illustrated by simulated data and ﬁve real examples in ﬁnancial markets.


Introduction
Time series models have been extensively applied in various areas and many methodologies were proposed in the literature; for example, Zhang (2003) proposed a hybrid methodology that combines both ARIMA and ANN models to improve forecasting accuracy. Since Engle (1982), the ARCH-type models have been widely used in economics and finance. In particular, the GARCH model proposed by Bollerslev (1986) has been a benchmark in the risk management. Zhang and Zhang (2020) showed that the GARCH-based option-pricing models are able to price the SPX one-month variance swap rate, that is, the CBOE Volatility Index (VIX) accurately. Setiawan et al. (2021) used the GARCH(1, 1) model to analyze stock market turmoil during COVID-19 outbreak in an emerging and developed Economy.
However, recent research showed that the usual statistical inference procedure does not work if the fourth moment of the GARCH process does not exist. To make it clear, let us consider the AR(1)-GARCH(1, 1) model ε t = η t h t and h t = α 0 + α 1 ε 2 t−1 + β 1 h t−1 , where α 0 > 0, α 1 ≥ 0, β 1 ≥ 0, and η t is a sequence of independent and identically distributed (i.i.d.) innovations with zero mean and unit variance. For model (1) where n is the sample size. Weiss (1986) and Pantula (1989) showed thatφ LSn is √ nconsistent and asymptotically normal if Eε 4 t < ∞. However, Eε 4 t = ∞ when the tail index α of ε t is in (0,4]. In this case, Davis and Mikosch (1998) and Basrak et al. (2002) showed that ε t has a heavy-tailed feature and its sample autocorrelation function is neither √ nconsistent nor asymptotically normal. Lange (2011) showed thatφ LSn is n 1−2/α -consistent and converges to a stable random variable when α ∈ (2, 4). Furthermore, for the AR model with ε t being G-GARCH(1, 1) noise in He and Teräsvirta (1999), Zhang and Ling (2015) showed that √ n log n (φ LSn − φ 1 ) −→ L Normal, if α = 4 (i.e. Eε 4 t = ∞), when n → ∞, where −→ L denotes the convergence in distribution. From (3)-(6), we find that the LSE not only has a slower rate of convergence but also is not asymptotically normal when α ∈ (0, 4). Thus, based on the LSE, the classical theory and methodology (e.g., t-test, Wald test, and Ljung-Box test, among others) do not work in this case. Using a simulation method, we give the regime of parameter vector (α 1 , β 1 ) with Eε 2ι t < ∞ in Figure 1 when η t ∼ N(0, 1). It can be seen that the regime of (α 1 , β 1 ) is very small for Eε 4 t < ∞ (i.e., α > 4). In practice, the estimated value of (α 1 , β 1 ) does not lie in this regime, usually. Thus, it is very important to study the statistical inference when α ∈ (0, 4]. Zhu and Ling (2015) studied the self-weighted least absolute deviation estimator (SLADE) of the ARMA-GARCH model and showed that it is consistent and asymptotically normal when α ∈ (0, 4]. This paper studies the self-weighted LSE (SWLSE) of the ARMA model with GARCH noises. It is shown that the SWLSE is consistent and asymptotically normal when the GARCH noise does not have a finite fourth moment (i.e., α ∈ (2, 4]). Using the residuals from the estimated ARMA model, it is shown that the residual-based quasi-maximum likelihood estimator (QMLE) for the GARCH model is consistent and asymptotically normal, but if the innovations are asymmetric, it is not as efficient as that when the GARCH process is observed. Using the SWLSE and residual-based QMLE as the initial estimators, the local QMLE for ARMA-GARCH model is asymptotically normal via an one-step iteration. This paper is arranged as follows. Section 2 presents the model and assumptions. Section 3 presents our main results. Section 4 presents simulation results and Section 5 gives real examples. All the proofs are deferred into the Appendix A.
Assumption 1 is the stationarity and invertibility condition of ARMA models, under which it follows that where sup Θ γ |a ψ (i)| = O(ρ i ) and sup Θ γ |a γ (i)| = O(ρ i ) with ρ ∈ (0, 1). Assumption 2 ensures that {ε t } is strictly stationary and ergodic with Eε 2 t < ∞, see Ling and Li (1997) and Ling and McAleer (2002). It is also the identifiability condition for model (2) and, by Lemma 2.1 in Ling (1999), the condition ∑ s i=1 β i < 1 is equivalent to I k is the k × k identity matrix, and ρ(B) is the spectral radius of matrix B. Under this condition, we have Given the observations {y n , · · · , y 1 } and initial value Y 0 ≡ {y 0 , y −1 , · · · }, we can write the parametric model as It is easy to see that η t (λ 0 ) = η t , ε t (γ 0 ) = ε t , and h t (λ 0 ) = h t . In practice, we do not observe those y i in Y 0 and hence they have to be replaced by some constants. This does not affect our asymptotic results, see Ling and McAleer (2003a). For simplicity, we do not study this case in details.

Main Results
The self-weighted estimation approach was proposed by Ling (2005) and it has been used to solve the problem on statistical inference of the heavy-tailed ARMA-GARCH model in Ling (2007) and Zhu and Ling (2011). Using a similar idea, we define the SWLSE as where w t = 1 + ∑ ∞ k=1 k −1/2−1 |y t−k |. We can state the following result: Theorem 1. Suppose that Assumptions 1-2 hold. Then, as n → ∞, The preceding result holds for any kind of ARCH-type errors only if Eh t < ∞, see the proof in the Appendix A. To easily understand it, we refer to model (1)-(2) again. In this case, the information function is E( The score function is n −1/2 ∑ n t=1 y t−1 ε t /w t and E(y t−1 ε t /w t ) 2 ≤ O(1)Eh t < ∞, which is the condition we need for the GARCH errors. This result holds when Eε 4 t < ∞, but it is not as efficient as the LSE in this case. When Eε 4 t = ∞ and Eε 2 t < ∞, the process y t has a heavy tailed feature and the SWLSE has a faster rate of convergence than that of LSE. The weight function w t can be replaced by others, see Ling (2007).
Next, we use the residualε t ≡ ε t (γ n ) from ARMA parts as the artificial observation of ε t . The log-quasi-likelihood function based onε t can be written as whereh t (δ) = h t (λ)| γ=γ n . We define the residual-based QMLE of δ 0 as We now give the asymptotic properties ofδ n as follows.
Theorem 2. Suppose that Assumptions 1-2 hold. Then, as n → ∞, where When η t is symmetric and µ = 0, we have Eη 3 t = 0, ED t = ED t = 0, and hence Ω δ = κEH δt . When the conditional mean is zero (i.e., y t = ε t ), model (7)-(8) reduces to the GARCH model. In this case, the log-quasi-likelihood function based on ε t can be written as Then, the global QMLE of δ 0 is defined asδ n = arg max δ∈Θ δ L δn (δ). Berkes et al (2003) and Hall and Yao (2003) showed thatδ n is consistent and as n → ∞, From Theorem 2, we see that the efficiency of the estimated δ 0 is affected by the estimated parameters in ARMA parts unless η t has a symmetric density and µ is known to be zero without estimation. This gives a reminder to practitioners that we need to be careful when ones use the residuals to estimate the GARCH model. Given {y n , · · · , y 1 } and the initial value Y 0 , we can write down the log-quasi-likelihood function of model (7)-(8) as follows: Then, the global QMLE of λ 0 is defined as the maximizer of L n (λ) in Θ. Ling and McAleer (2003a) proved the consistency of this QMLE. But the asymptotic normality of this QMLE requires Eε 4 t < ∞, see also Francq and Zakoïan (2004). Based onλ n ≡ (γ n ,δ n ) , we obtain the local QMLE through an one-step iteration As in Ling (2007), we can show that as n → ∞, When η t ∼ N(0, 1), the local QMLE is efficient. So, Theorems 1-2 provide an approach to obtain an efficient estimator for the full ARMA-GARCH models under the finite second moment condition of ε t . When η t is not normal, the efficient and adaptive estimators can be obtained by using the results in this section and following the similar lines as in , , Ling (2003), and Ling and McAleer (2003b).

Simulation Study
In this section, we assess the finite sample performance ofλ n = (γ n ,δ n ) and λ n = (γ n ,δ n ) , whereγ n is the SWLSE,δ n is the residual-based QMLE, andλ n is the local QMLE. We generate 1000 replications of sample size n = 1000 and 2000 from the following model where γ 0 = (φ 10 , ψ 10 ) = (0.4, 0.5), δ 0 = (α 00 , α 10 , β 10 ) = (0.1, 0.1, 0.8), and η t is chosen to be the standard normal N(0, 1) distribution, re-scaled Laplace L(0, 1) distribution, or re-scaled student's t(5) distribution with Eη 2 t = 1. Table 1 reports the sample bias (Bias), the sample standard deviations (SD), and the average estimated asymptotic standard deviation (AD) ofλ n andλ n . From this table, we find that (i) each considered estimator has a small bias, and its value of SD is close to that of AD, demonstrating the validity of its asymptotic normality; (ii)γ n could be slightly more efficient thanγ n , whereasδ n is as efficient as δ n ; (iii) all estimators for η t ∼ N(0, 1) are more efficient than the corresponding ones for η t ∼ L(0, 1) or t(5). All these findings are consistent with our theory in Section 3. We should mention that the QMLE of δ 0 is not reliable when the sample size n is less than 800 according to our simulation experiments and hence the results are not reported here. As a comparison, we compute the classical LSEγ LSn = (φ LSn ,ψ LSn ) for γ 0 in model (19)-(20), whereγ LSn is computed in a similar way asγ n with w t ≡ 1. Table 2 reports the corresponding results ofγ LSn . Compared withγ n in Table 1, we find thatγ LSn is less efficient thanγ n for all examined cases. This finding suggests that it seems better to fit the ARMA model by the SWLSE rather than the LSE method when the data exhibit the conditionally heteroscedastic effect.

Real Examples
This section first studies the log returns (×100) of DJIA, NASDAQ, NASDAQ 100, and S&P 500 from 11 March 2015 to 10 March 2021, with a total of 1764 observations (see Figure 2). Denote each log return series by {y t } 1764 t=1 . Before fitting an AR(1)-GARCH(1, 1) to {y t } 1764 t=1 , we first estimate α y , the tail index of |y t |, and get the following results: (DJIA)α y = 2.3029, (NASDAQ)α y = 3.2592, (0.9285) (0.6830) (NASDAQ 100)α y = 3.6956, (S&P 500)α y = 2.5329, (0.6077) (0.8567) whereα y is the proposed estimator of α y in Hill (2010), and the value in parentheses is the AD ofα y . From the above results, we can conclude that each |y t | has a finite second moment, but does not have a finite fourth moment. Hence, it is reasonable to fit four return series by using the procedure in Section 3, that is, we first obtain the SWLSEγ n and the residual-based QMLEδ n , and then obtain the local QMLEλ n . The resulting fitted models are as follows: where all estimated parameters are the local QMLEλ n , and the values in parentheses are the ADs ofλ n . From these fitted models, we can find that all estimated parameters are significantly different from zero at the level of 5%. In particular, the significant parameters in the fitted AR models imply that the U.S. stock market is not efficient during the examined period. Next, this section considers the log returns (×100) of PHLX Oil Service Index OSX from 11 March 2015 to 10 March 2021, with a total of 1510 observations (see Figure 3). As before, we denote this log return series by {y t } 1510 t=1 , and obtain its estimateα y = 2.7960 with AD = 0.7078. This implies that |y t | has a finite second moment, but does not have a finite fourth moment. Hence, we apply the local QMLE method to get the following fitted model for y t : Unlike the fitted results for the four U.S. stock indexes above, the fitted AR coefficient for the OSX index is not significantly different from zero at the level of 5%, indicating that the oil market is efficient during the examined period.

Concluding Remarks
This paper studied the SWLSE of the ARMA model with GARCH noises and the residual-based QMLE for the GARCH model. The consistency and asymptotic normality of SWLSE were established under a little moment condition. The importance of the proposed estimators was illustrated by simulated data and four major stock indexes and one major oil index in U.S. The ARMA-GARCH model is very important in the risk management, see He et al. (2019). In practice, ones need to build the ARMA-GARCH model from the historical data. The major contribution of our paper is to present a way to build an efficient and reliable model for this purpose. Several potential future research topics are listed as follows: first, we may extend our procedure for the hybrid methodology that combines both ARIMA and ANN models with GARCH errors as in Zhang (2003); second, we could use our procedure to analyze the energy data and build an ARMA-GARCH model for the green energy, renewable energy, and bio-energy data as discussing in An and Mikhaylov (2020); third, we may explore a linear programming or a genetic algorithm to find the QMLE of ARMA-GARCH model as presented in An et al. (2021).

Informed Consent Statement: Not applicable.
Data Availability Statement: Publicly available data sets were analyzed in this study and can be found at https://www.wsj.com/, accessed on 5 January 2022.

Acknowledgments:
The authors thank the referees for careful reading and useful comments that helped to improve the paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proofs
The following lemma gives two basic properties for model (7)-(8).
(ii) First,γ n is a consistent estimator of γ 0 . Second, exists and is continuous in Θ γ . Third, let By Lemma A2, we can show that E sup γ∈Θ γ A t (γ) < ∞. By the ergodic theorem and Theorem 3.1 in Ling and McAleer (2003a), we can show that ∂ 2 L sn (γ)/∂γ∂γ converges to 2EA t (γ) uniformly in Θ γ in probability. Since EA t (γ) is continuous in terms of γ, we can show that ∂ 2 L sn (γ n )/∂γ∂γ converges to 2A in probability for any sequence γ n , such that γ n → γ 0 in probability. Fourth, By Lemma A2, it follows that Similar to the proof of Lemma 4.2 in Ling and McAleer (2003a), we can show that A and B are positive definite. By the central limit theorem, we have that ∂L sn (γ 0 )/∂γ −→ L N(0, 4B). Thus, we have established all the conditions in Theorem 4.1.3 in Amemiya (1985), and hence √ n(γ n − γ 0 ) −→ L N(0, A −1 BA −1 ). This completes the proof.
Lemma A5. If the assumptions of Lemma A3 hold, then it follows that Proof. DenoteṼ t (δ) =h −1 t (δ)[∂h t (δ)/∂δ] and similarly for V t (δ). Then Similarly, we can have the formula of ∂ 2 l t (δ)/∂δ ∂δ . By (A5), we have Furthermore, by Lemma A1, we can take ι 1 in ξ δt−1 small enough such that the leading factors in the last terms are bounded uniformly in δ ∈ Θ δ . Thus, the last two terms are o p (1), and hence it follows that where o p (1) holds uniformly in δ ∈ Θ δ . Moreover, by Lemma A3(i), we have By Lemma A1 and taking ι and ι 1 in ξ δt−1 small enough, we have where C is a constant. By the dominated convergence theorem, we can show that Thus, we can show that (A13) is o p (1) uniformly in δ ∈ Θ δ . Furthermore, by (A12), Similarly, we can show that Similar to (A8), we can show that The o p (1) in (A14)-(A16) hold uniformly in δ ∈ Θ δ . By (A12) and (A14)-(A16), we have that We can show that a similar equation holds for other terms in (A10). Thus, (i) holds. By Lemmas A2-A3, it is straightforward to show that (ii) holds. This completes the proof.