4.1. Discussion Part 1
Table 5 compares the criteria for each of the two models evaluated on the same S&P 500 time series. The upper triangular table shows the differences between all criteria for the ARCH(1) model. The lower triangular table focuses on the EGARCH(1,1) model, showing all possible pairs of differences between the same criteria.
Initially, let us consider the upper triangular table. The value represents the AIC–BIC difference. If the model is evaluated with a single degree of freedom for the Pearson VII distribution, then the value is not of interest. However, in the modeling carried out, a wide range of degrees of freedom was considered, ranging from 1 to , so that the model was examined across its full complexity, up to the normal limit expected for large degrees. The fact that AIC-BIC = was valid for each of the ARCH(1) models with such dissimilar degrees of freedom is of interest, since it was a notable constant, independent of the important shape parameter in t-distribution. Specifically, the constant difference was maintained for the following degrees of freedom: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10,000, , , , , , , , , , . This means that the difference between any pair of criteria was invariant under the considered degrees of freedom.
Now, if the subindex stands for AIC, BIC, HQC, BIC, and CAIC, respectively, then = criterion-criterion in the ARCH(1) model and = criterion-criterion for the EGARCH(1,1) model. For example, for the ARCH(1) model, the upper triangular constants read In the EGARCH(1,1) model, the lower triangular additive constants for the CAIC mean that
The additive constants in this example seem new in time series applications and information theory. A general mathematical expression is not straightforwardly obtained from the implied definitions; otherwise, the development of such a theory could have been simplified over the years. However, in the current time series example, a simplification, indeed, appears.
4.2. Discussion Part 2
Applying the correct settings, a general formula can be derived based on the findings from the initial part of
Section 4.1.
Given that the criteria are applied to the same model and data, the maximum log-likelihood
holds for all of them. The definition of the criteria studied is easily described by the constant term
, namely:
Then, any criterion can be expressed as
Moreover, the function
is linear in
, then
Finally, any difference between two criteria takes the general form:
where the particular functions of the sample size
n are given by
And now we can obtain any criterion for particular data. For the upper triangular
Table 5, the ARCH(1) model involves
parameters and a time series of
. Then,
. Similarly, the remaining differences can be obtained from the upper triangular table.
For example, since the CAIC criterion is usually for non-nested models, we provide the following simple equivalences for the S&P 500 log returns in the dependence case:
For the EGARCH(1,1) model of the lower triangular table, the only change involves two more parameters than the ARCH (1) model, i.e., , then , and the other coefficients for the EGARCH(1,1) model can be obtained.
At this stage, we have exact formulae for any difference between two criteria applied to the same model, and then the upper and lower triangular in
Table 5 can be constructed separately. However, the differences for the complex model seem to reflect an additional property. The lower triangular table is written in terms of the results for the simplest model, ARCH(1). Numerically, there is a constant, 1.4, that appears in any difference and expresses the difference of the criteria in EGARCH (1,1) in terms of the corresponding difference in ARCH(1).
The example of the CAIC criterion in the EGARCH(1,1) model implies the following simple equivalences.
The novelty of additive constants thus faded when we found that a more general constant could be established for these particular data. The dilatation constant 1.4 holds for all relations in the lower triangular equalities. Given its importance, this dilatation constant will be referred to as the transition number for nested ARCH(1) and EGARCH(1,1) models.
This means that, for the data considered, any criterion for the simplest model ARCH(1) can be explained exactly (with an accuracy of six decimal places) by the behavior of the criterion of the complex model EGARCH(1,1). The relationship is simply a shift and dilation of the models.
This intriguing relationship presents a formula for consideration that is quite straightforward.
Let and be the number of parameters in the upper and lower models, respectively. For the upper triangular relation , meanwhile, for the lower triangular equivalence.
Thus,
for all
. Next, we can easily find the transition number between the complex lower model by referring to the parameters in both models we are looking at.
In
Table 5, the behavior of the lower triangle table with the more complex EGARCH(1,1) model can be predicted using the simplest model in this paper, ARCH(1). In this case, the transition number takes the form:
for all
.
We recall that each EGARCH(1,1) model for the dependent case was calculated using a total of 47 degrees of freedom from 1 to , and the calculations for each were optimized through various methods available in the Optimx R package. Each calculation was performed exhaustively until all available methods converged to the same value, with a resolution of six decimal places. On a personal computer, this task takes days, while calculating the simplest model, ARCH(1), is much quicker. The transition relation found proposes a method of analysis in time series that only requires the calculation of the simplest model; and with it, any of the more complex models, such as GARCH, TGARCH, and EGARCH, can be predicted.
The method is also interesting from an optimization point of view, as it is consistent with the calculations in commercial software. It is expected that for the same time series, the models studied here comply with the transition constant up to a certain number of decimal places. If the results of the more complex model are not derived from the simpler model with a tolerance similar to the one used in this example, which is
, then the optimization algorithm is neither sufficiently stable nor reliable. The tables of the independent case were created using well-known commercial software. As expected, the transition constants are not the same in the tolerance of six decimal places, see
Table 6.
In summary, a critical area remains to be investigated. If the most basic model, ARCH(1), can predict the distinctions among the criteria of all the models analyzed here, what role do the more complex models play in this scenario?
This indicates that we need to quantify the importance of the difference between two given criteria for a pair of models: one complex and one simple.
Table 5 allows for differences between the criteria for the same model, but our interest now lies in comparing the more complex model with the larger model.
We now ask whether the difference between the two models applied to the same time series is significant enough to choose the more complex model. If so, should all the expensive computational calculations be performed instead of simply computing the smaller model?
In the classical literature, it is common to choose the best model that meets the lowest criterion. In some cases a choice of minimal discrepancies of in criteria is achieved. It seems to be a widespread practice that the most complex models tend to have more information on likelihood and, therefore, exceed, as expected, the smaller models; thus, the most complex ones are generally chosen.
This answer is well known in other areas of classical statistics, but it seems to be less well established in time series. The choice of the best model is left to a process of elementary measurability, regardless of the order of the differences. Raftery found the answer more than 30 years ago through the so-called weak, positive, strong, and very strong difference evidence.
Using Bayesian theory, he established approximate limits for the difference in the BIC between two models. That classification is so demanding that, for barely acceptable evidence, referred to as positive, a difference greater than approximately 2 is required; otherwise, it is insignificant, and the best model must be the simplest one. Thus, in this case, the large expense spent on the more complex model was unnecessary. Assume weak evidence; this means that
The above results on additive constants and transition numbers allow general formulas for checking evidence between non-Bayesian criteria. In particular, we study again the simplest criterion, the AIC.
To achieve this, we act on the above inequality using the general additive constants that were previously derived.
Let
and
denote the BIC and AIC for Model 1 with the parameters
. Similarly,
and
are the BIC and AIC for Model 2 with parameters
. Then, the Strong evidence for BIC can easily be mapped to the Strong evidence for AIC. Starting with
The additive constant for BIC and AIC adjusts the inequality as
And, finally, the weak evidence for the AIC difference between the two models is given by
The remaining intervals for weak, strong, and very strong differences in AIC can be easily mapped; the AIC evidence is condensed into the following
Table 7.
The lower bound of the weak evidence can be zero, allowing a model to be perfectly similar to itself. However, such a bound is irrelevant, since, in any case, the result indicates weakness in comparing the more complex model with itself and, therefore, recommends the simpler model.
At this point, we can answer the question inherent to the transition from the simple ARCH(1) model to the more complex EGARCH(1,1) and its high computational cost. The shift parameter in this case is given by
Thus, strong evidence is located in the interval .
For comparison purposes, let us consider the ARCH(1) model with the reported degrees of freedom (see
Figure 7a). For a million degrees of freedom, each criterion brings it closer in probability to the Gaussian model (see the yellow limit of Gaussian estimation for each criterion in
Figure 7a). Let us take EGARCH(1,1) as the most complex model, with a great computational cost to achieve the required accuracy in the transition number to the simple ARCH(1). If we calculate the differences in the AIC criterion for the most complex model with all the degrees of freedom studied (from 1 to
), we obtain
Figure 7b with the corresponding Gaussian limit (yellow).
The ARCH(1) model exhibited a monotonic decreasing relation with the degrees of freedom DF. Meanwhile, the EGARCH (1,1) model showed a one-mode behavior with a maximum at
and a Gaussian limit tendency for large DF; see
Figure 7.
In the criterion subindex notation of this section, if we denote
as the shift in BIC evidence for any criterion
j, including BIC (
), the
Table 7 is easily written for any reference. However, in this work, we prefer to set the AIC evidence intervals and the ARCH(1) model as global references, given their simplicity.
A similar analysis of the remaining criteria led to a corresponding evidence table for AIC indexed by a different shift constant for the BIC intervals. Furthermore, due to its interval translation character, the interval was neither widened nor narrowed, so the Raftery evidence indicator for BIC had the same amplitudes for the other criteria: Weak, 2 units; Positive, 4 units; Strong, 4 units.
We conclude this section with a discussion of the complex problem of comparing non-nested models. This problem is outside the scope of this paper, but we provide a brief discussion of the implemented models. Once we have calculated the simplest criterion and have its evidence across a whole range of degrees of freedom, we can expect a relationship with the CAIC criterion suitable for non-nested models. Specifically, we are interested in mapping the weak evidence of the ARCH(1) model, parameterized by its degrees of freedom, to the CAIC differences of the EGARCH(1,1) model with the same degrees of freedom. In this case, the target model was ARCH(1) with 1 million degrees of freedom, and the remaining 46 models were compared simultaneously with the same reference.
The heuristic method we propose is based on using a Pearson-type model and sweeping the entire domain of degrees of freedom to implement convergence in probability toward the normal. Although the mapping is performed via likelihood, this convergence is not guaranteed; however, it limits the relationship between the differences in the CAIC and the differences in the ARCH. The ARCH(1) model is the independent variable in the example we are considering, because it is much simpler. It has the particularity of following a decreasing monotonicity in the degrees of freedom, while the EGARCH(1,1) that we have considered exhibits unimodal behavior. Thus, the range of the expected function is limited by the function’s boundaries at 1 and
degrees of freedom. The monotonicity also ensured that the maximum of 30 degrees was mapped to the maximum in the estimated model. The result can be seen in
Figure 8b. This nonlinear relationship doubled the ideal segment of linear mapping between ARCH(1) and EGARCH(1,1) for the case of the same criterion, resulting in evidence that, in absolute value, is certainly weak.