Model Selection and Post Selection to Improve the Estimation of the ARCH Model

: The Autoregressive Conditionally Heteroscedastic (ARCH) model is useful for handling volatilities in economical time series phenomena that ARIMA models are unable to handle. The ARCH model has been adopted in many applications that contain time series data such as ﬁnancial market prices, options, commodity prices and the oil industry. In this paper, we propose an improved post-selection estimation strategy. We investigated and developed some asymptotic properties of the suggested strategies and compared with a benchmark estimator. Furthermore, we conducted a Monte Carlo simulation study to reappraise the relative characteristics of the listed estimators. Our numerical results corroborate with the analytical work of the study. We applied the proposed methods on the S&P500 stock market daily closing prices index to illustrate the usefulness of the developed methodologies.


Introduction
Modeling and forecasting financial markets are a challenging activities for both investors and researchers equally. Generally, financial markets are extremely manipulated by a number of factors such as interest rates, political issues, inflation rates and foreign exchange rates. More precisely, the uncertainty of stock markets produces high volatility that makes the forecasting stage very complex. Volatility forecasting is an important financial matter, and a precise and accurate volatility forecast is important to traders, investors and financial analysts. Before the 1980s, researchers were relying on ARIMA models, however, many financial time series violate the assumptions of the ARIMA model (cf. Brockwell and Davis 2016;Teräsvirta 2009).
Fortunately, Engle (1982) suggested a stationary non-linear model for economical time series and introduced the Autoregressive Conditionally Heteroscedastic (ARCH) model, wherein the conditional variance of a series {y k } changes according to an autoregressivetype process. Subsequently, Francq and Zakoïan (2012); Grublytė et al. (2017) discussed the properties of maximum likelihood (MLE) and ordinary least squares (OLS) estimates of ARCH model parameters. They also investigated the consistency and the asymptotic normality of the OLS estimator for the ARCH model.
In this article, we are interested in estimating the parameter vector of the ARCH model when some prior information is available in the form of potential linear restrictions on the parameters in the parameter space. Practically, an ample number of variables may be collected and included in the model in an initial stage. However, due to model complexity (in terms of both interpretation and variation), estimation when a subset of parameters are under linear restrictions is an important problem in such scenarios. In order to form such constraints, one requires some prior information about the parameter space under consideration. One possible source of prior information may be distinguishing which predictors are of most interest and which are not. An alternative source of prior information, specifically uncertain prior information (UPI), might be obtained from previous studies or expert knowledge that search for some specified patterns.
This paper is organized as follows: Section 2 discuss the recent findings of modeling time series data using ARCH family models. Section 3 discusses the parameters' estimation of the ARCH model. Section 4 is dedicated to introducing the concept of restricted, pretest and shrinkage estimations of the ARCH model. We derive the asymptotic properties of the estimators and compare their performances using risk analysis and the mean squared error in Section 5. We conduct an extensive simulation study for our selected model and demonstrate the application of the proposed estimators in real-life problems in Section 6. In Section 7 we give some conclusions.,

Literature Review
Various studies have investigated the dilemma of having a financial time series with high volatility. For instance, Peiris and Peiris (2011) examined the volatility of different sectors in the Colombo Stock Exchange (CSE) and they applied ARCH/GARCH models on the monthly time series data of 20 sectors in CSE from 2005 to 2010. They investigated the impact of macroeconomic factors on volatility. As a result, they found that sixteen out of twenty sectors in CSE had significant volatility and both ARCH and GARCH terms on the fitted models for individual sectors were significant. Subsequently, Rathnayaka et al. (2013) carried out a study to understand the trends and cyclic patterns in CSE in order to predict future behaviors during seven years since January 2007. They investigated the causal relationships between market performances and economic growth conditions related to Sri Lanka. Their results revealed that both microeconomic and macroeconomic conditions had a direct impact on stock market volatility.
Recently, Wang (2021) utilized GARCH models to analyze Bitcoin's returns and volatility. As the GARCH (1,1) model was adopted, the outcome found that the returns and volatility of Bitcoin have clustering characteristics and returns and the volatility of Bitcoin is a persistent process; however, its effect gradually reduces with time. To overcome the limitations of the GARCH (1,1) model, researchers had used TARCH and EGARCH models to overcome the Leverage Effect of the returns and volatility of Bitcoin.
In practice, regression models usually do not possess a pre-defined UPI; thus, model selection criterion such as Akaike's Information Criterion (AIC) Akaike (1974), Bayesian Information Criterion (BIC) Schwarz (1978) or any other technique can be used to construct a sub-model. Now, it is up to the practitioner to use a sub-model based on fewer predictors (under-fitted model) or to lean towards a so-called full or over-fitted model. Alternatively, one may utilize the prior information to test whether some parameters are indeed zero, or more generally, whether the full vector of parameters are under linear restrictions. To do this, we will explore the pretesting strategy to improve the post-estimation inference of the ARCH model. Furthermore, we will implement the use of the Stein-type shrinkage estimator as an alternative to pretesting which shrinks the full model estimator in the direction of the restrictions. This leads to more efficient estimators when the shrinkage is adaptive. In the first stage, we select a sub-model by variable selection method or impose a linear restriction on the parameter space to obtain a submodel. In the second stage, we combine the sub-model with the full model via a test statistic to improve the estimation efficiency.
Many studies have considered incorporating the UPI in the estimation process to obtain efficient estimators for many statistical models. Recently, Ahmed et al. (2015) proposed efficient estimators for the regression coefficients of the spatial conditional autoregressive model under the availability of uncertain auxiliary information about these coefficients. Al-Momani et al. (2016) proposed shrinkage and penalty estimators for the spatial error model. Thomson et al. (2016) investigated the relative performances of pretest and shrinkage estimators for time series following generalized linear models. In all these cases, shrinkage estimators outperformed classical estimators. Dawod et al. (2018) introduced Bayesian estimation strategies for jointly monitoring the linear profile. Al-Momani et al. (2019) proposed the use of the pretest, shrinkage and positive shrinkage in estimating the large-scale regression parameter vector in the spatial moving average and showed that the positive shrinkage dominated all other estimators in terms of the relative efficiency of the mean squared error with respect to the classical maximum likelihood estimator. For more details, the reader is referred to (Ahmed and Raheem 2012;Emmert-Streib and Dehmer 2019;Yüzbaşı and Ahmed 2020;Yüzbaşı 2016, 2017;Ahmed 2014) for detailed information on the subject.

Estimating ARCH(q) Parameters
Following Francq and Zikoïan (2010), we introduce the ARCH model and consider the existence of a strictly stationary solution to this model.
where t is the error term, independently and identically distributed with a mean 0 and variance 1, ω > 0, α i ≥ 0, β j ≥ 0 are unknown constants, ∀ i = 1, . . . , q and j = 1, . . . , p, and If the ARCH(q) model holds the conditions ω > 0 and ∑ q i=1 α i < 1, then the uniquely strictly stationary solution of the model is a weak white noise.
The Ordinary Least Squares (OLS) method will be used to estimate the parameters of ARCH(q). The OLS method uses the autoregressive representation on the squares of the observed process and no distributional assumptions are needed for the error term ( t ).
The autoregressive AR(q) representation can be obtained by applying some mathematical transformations as follows where (u t , F t ) is the sequence containing a martingale difference when E(y t ) = σ 2 t < ∞, denoting by F t the σ-field generated by {y s : s ≤ t}.

Estimation of the Parameter
Assuming X is of full rank, X X is invertible, and the OLS estimator is given bŷ In the forthcoming sections, we will refer to this estimator as unrestricted estimator (UE) or simply byθ U .

Estimation of σ 2 0
Assuming that t follows normal distribution with mean 0 and variance σ 2 0 and with the following conditions:

Estimation of the Information Matrices
Accordingly, Francq and Zikoïan (2010) A and B have the same length q × q.

2.
A and B are invertible.
Then, the estimates of A and B are respectively, given bŷ

Asymptotic Distribution of OLS Estimator
Weiss (1986) was the pioneer who discussed the properties of maximum likelihood and the least squares estimates of the parameters of both the regression and ARCH models in parallel with the properties of various tests of the model that are available. He did not assume that the errors are normally distributed. Rich et al. (1991) introduced another attractive way to estimate the parameters of the ARCH model without assuming normality condition. They used the generalized method of moments of Hansen (1982) and showed that, under fairly weak conditions, the estimator is consistent and asymptotically normally distributed. Zikoïan (2004, 2012) proved the consistency and asymptotic normality of OLS. In this subsection, we list two theorems by Francq and Zikoïan (2010) about the consistency and asymptotic normality of the OLS estimator for θ.
Theorem 1 (Francq and Zikoïan 2010). Consistency of OLS estimates: Ifθ U is a sequence of estimators satisfying the OLS solution for ARCH under the assumptions (1)-(4) in Section 3.2, thenθ asθ U is a consistent estimator for θ and where p denotes convergence in probability.

Efficient Estimation Strategies
Usually in the case ofθ U , the corresponding model is recognized as a full model because all parameters are included even though some of them may not have a significant effect. In this section, we will consider different estimation methods of θ when some UPIs are available.

Restricted Estimator
UPI(s) can be formulated as a linear hypothesis in which some of the given parameters are zeros or there is a restriction on some parameters. Then, the estimated parameters under such UPI is known as the restricted estimator (RE) and is simply denoted byθ R . The derivation idea of this estimator is given below: Suppose that the UPI is formulated in the form of the null hypothesis: where R is m × q known matrix of rank(m) (m q) (cf. Neter et al. 1996) and r is an m × 1 vector of known constants. Under the restrictions given in Equation (13), the method uses the Lagrange Multiplier for each restriction. The method minimizes the following function, with respect to θ and λ to obtain the restricted estimator. This estimator is denoted byθ R and defined byθ θ R given by Equation (15) is a biased estimator for θ unless the restriction given in Equation (13) is true.
Theorem 3. The Wald test statistic for testing the hypothesis in Equation (13) is given by whereσ 2 is estimated in Equation (8) and it can be shown that L n L − → χ 2 (m).
We will use α = 0.05 as a level of significance for testing purposes.

Pretest Estimator
The pretest estimate of θ denoted byθ PT is defined by: where L n is given in Equation (16) and L n,α is the α-critical value from the distribution. For more details, the reader can refer to Bancroft (1944); Saleh (2006); Stein (1956).
The pretest estimator is a binary choice function which choosesθ U if the null hypothesis is rejected andθ R if the test fails to reject the null hypothesis.
θ PT can be rewritten in a more attractive way as follows: where I(A) is the indicator function of the set A.

Shrinkage Estimator
The shrinkage estimator of Stein (1956) denoted byθ S is defined by: It is clear thatθ S is no longer a binary choice regardless of whether H 0 is rejected. The shrinkage estimator is a smoothed function of the two choices.θ S does not represent a convex combination ofθ U andθ R and suffers from a phenomenon known as over-shrinkage which occurs when L n is smaller than (m − 2) and hence, an unexpected sign for some of the estimated parameters may be obtained.

Positive Shrinkage Estimator
A modified version of James-Stein estimator was proposed by Stein (1966) to overcome the phenomenon of the over-shrinkage estimator known as the positive part shrinkage estimator. This estimator is denoted byθ S+ and defined as: where Z + = max(0, Z).

Asymptotic Results
In this section, we will study the asymptotic behavior of the proposed estimatorŝ θ U ,θ R ,θ PT ,θ S ,θ S+ . We will show that the restricted and unrestricted estimators are jointly asymptotically normal. In addition, we will define and extract expressions for the asymptotic distributional quadratic bias and the asymptotic quadratic risk of the estimators relying on the joint normality ofθ U andθ R .

Joint Normality of the Unrestricted and Restricted Estimators
The asymptotic distribution of all the estimators under hypothesis (13) are the same. Hence, we will study the asymptotic properties under a class of local alternatives that is given by where ξ is a q × 1 fixed vector in R q . If we set ξ = 0, the local alternative becomes as in (13) which is a linear hypothesis representing the candidate null subspace.
Some distributional results involving the estimatorsθ U andθ R are given in the following theorem.
Theorem 4. Under the local alternatives in (21) and the regularity conditions (1)-(4) appearing in Section 3.2 and assuming that (X q×n X n×q ) as n −→ ∞ and C is a positive definite matrix (p.d.m), then we have Proof. The proof of the theorem is located in the Appendix A.
Theorem 5. Under the assumptions of Theorem (4) and the local alternatives in (21), we have where ∆ 2 is the non-centrality parameter and G m (L α ; ∆ 2 ) is the non-central chi-square distribution function with q-degrees of freedom and non-centrality parameter ∆ 2 .
The proof can be found in Appendix B.

Quadratic Weighted Risks
For any estimatorθ * of θ, define the quadratic loss as where W is a positive semidefinite matrix of order q × q, and tr(A) is the trace of the matrix A.
The asymptotic mean squared error matrix M(θ * ) is given by and the asymptotic quadratic risk (AQR) is defined as The asymptotic weighted quadratic risk expressions are given in the following Theorem.

Risk Analysis of the Estimators
In this section, all estimators will be compared based on their asymptotic quadratic risk. We will not carry out all derivations; instead, we will give a summary of our results as follows: i.
ii. Comparison ofθ PT andθ U :θ PT performs better thanθ U when where the opposite holds whenever iii. Comparison ofθ S andθ U :θ S performs better thanθ U whenever Note that A 11 involves the matrix W, hence,θ S dominatesθ U . As ∆ 2 −→ ∞, the risk difference approaches 0 from below.
iv. Comparison ofθ S andθ S+ : The risk difference is non-negative for all ∆ 2 so we have . which means thatθ S+ uniformly dominates the unrestricted estimator.

Numerical Studies
In this section, we will carry out a numerical study to investigate the performance of the proposed estimators. In the first subsection, we aim to examine the relative performance of the restricted, pretest and shrinkage estimators while appointing the unrestricted estimator as a benchmark for comparison. A real dataset from the S&P500 stock market will be used to compare the performance of the estimators to confirm the analytical results obtained in the previous section.

Monte Carlo Simulation Experiments
The Monte Carlo simulation experiments will be conducted to compare the restricted, pretest and shrinkage estimators with respect to the unrestricted estimator. The following algorithm is used for the Monte Carlo simulation 1.
Generate an error term (η t ) from standard normal distribution. 3.
Generate the X matrix of size n × (q + 1) with initial values estimated from standard normal distribution with n = 30, 50, 75, 100 and 150. 4.
Compute the simulated relative efficiency (SRE) as follows whereθ U is appointed as benchmark. A value greater than one of the SRE(θ U ,θ * ) indicates thatθ * performs better thanθ U and vice versa.
Results of these simulations are reported in Figures 1-4. The numerical results effectively assure our analytical results that the positive shrinkage estimator plays the role of a safeguard against the high risks associated with the reduced model that we obtained under the set of local alternatives.θ R shows the best performance under the null space and it degrades towards zero as the value of ∆ 2 goes way from the null space.     (m, q) = (9, 14) and for different sample sizes.
As the value of ∆ 2 increases, the superiority changes fromθ R toθ PT ,θ S andθ S+ , respectively, andθ S+ dominates other estimators, because it acts as a safeguard against the high risks associated with the reduced model.

Application on Standard & Poor 500 (SP500) Stock Market
The "sp500dge" dataset contains daily closing prices of the Standard & Poor 500 (SP500) stock market that has been used by Ding et al. (1993). The dataset is also available in fGarch/R-package produced by Wuertz and Chalabi (2008). Following the illustrative example of Ding et al. (1993), we considered the most recent returns as our targeted subset from 3 December 1988 to 30 August 1991. This contains 1000 daily returns (i.e., the official working days in the financial market is 252).
To fit the ARCH model, we first conducted a Lagrange-Multiplier (LM) test to check the effect of ARCH; more details about this test can be found in Tsay (2005). Then, we fit an ARCH model with an adequate order. The order q = 12 is an adequate selection for our data which represents the full model that given by Formula (30).θ U is then obtained by fitting the full model. √ y t = σ t t , t ∼ N(0, 1), σ 2 t = ω + α 1 y t−1 + · · · + α q y t−q .
In order to obtain the UPI from the data, we used AIC and BIC selection criteria to pick the significant order under the forward selection strategy the selected order under the auxiliary information of AIC and BIC represents the reduced model given by Formula (31).
Consequently, from the reduced model, we computeθ R , the restricted estimator.
To assess the performance of the estimators, we use the relative efficiency of the mean squared error (RMSE) with respect to the true parameters θ which will be estimated byθ * , whereθ * can be any of the estimators. The approach is based on the bootstrapping method which is similar to that introduced by Freedman (1981).
After fitting the full model on the original data, the procedure is conducted in two steps. The first step is as follows: 1.
Select a sample of size n from the residuals of the full model, say E 1 , . . . , E n with replacement.

2.
Compute the observations Y * 1 , . . . , Y * n as follows whereŶ i is the ith fitted observation from the full model applied on the original data, and E i is the ith residual in (1).

3.
Fit the ARCH model on Y * i to obtainθ U boot (1).

4.
Repeat steps (1)-(3) a number of times K until stable results are obtained-we found that K = 3000 worked well.

5.
Compute the average of K iterations which will represent the true parameter θ.
After the true parameters' vector has been estimated in the previous step, the second step is conducted as follows: 1.
Select a sample of size n from the residuals of the full model, say E 1 , . . . , E n with replacement.

2.
Compute Y * 1 , . . . , Y * n as follows whereŶ i is the ith fitted observation from the full model applied on the original data and E i is the ith residual in (1).

3.
Fit both the full and reduced models and computeθ Compute the predicted valuesŶ * i using the estimated parameters of all estimatorŝ Compute the Bootstrapping Mean Squared Error (MSEB) ofθ * boot (1) the estimatorθ * as follows: 6.
Repeat steps (1)-(5) a number of times K until stable results are obtained. We found that K = 3000 is an adequate number of iterations. 7.
Compute the relative efficiency of the mean squared error (RMSE) as follows, Results of the RMSEs for our data are reported in Table 1.

Conclusions
In this article, we investigated the performance of the pretest and James-Stein (shrinkage) estimators to estimate the parameter's vector θ of the ARCH model. These estimators were first analytically compared via their asymptotic quadratic risk and asymptotic mean square error matrices and then numerically compared using simulated and real datasets to confirm our analytical results. However, the reduced model in some cases might not be the right choice: analytical and numerical results showed that the pretest and James-Stein estimators represent a safeguard against the high risks associated with the reduced model that we obtain under the set of local alternatives.
Historically, the ARCH model is the simplest version of ARCH family models; however, its drawback is that it requires many parameters to adequately describes the volatility of such phenomena, and the positive James-Stein estimator should successfully overcome this dilemma by providing a parsimonious submodel (reduced model). To obtain a UPI, we used AIC and BIC selection criteria to select the reduced model.
According to our research findings, it is recommended that the positive James-Stein estimator is used as it outperforms all other estimators regardless of whether the restriction given by the null hypothesis is true. In addition, the proposed estimation strategy can be applied to different ARCH family models.

= T
(1) where I is the identity matrix.

T
(2) n is a linear combination in T (1) n that can be represented in a matrix format as T (2) n = A 2 T (1) n − B 2 where A 2 and B 2 are given as follows From Theorem 4 part (1), as n −→ ∞, T (2) n L − → T (2) and by Slutsky's Theorem, with µ (2) and Σ (2) which are given by Similarly, we can prove Formulas (1)-(6).