Building Multivariate Time-Varying Smooth Transition Correlation GARCH Models, with an Application to the Four Largest Australian Banks

: This paper proposes a methodology for building Multivariate Time-Varying STCC–GARCH models. The novel contributions in this area are the speciﬁcation tests related to the correlation component, the extension of the general model to allow for additional correlation regimes, and a detailed exposition of the systematic, improved modelling cycle required for such nonlinear models. There is an R-package that includes the steps in the modelling cycle. Simulations demonstrate the robustness of the recommended model building approach. The modelling cycle is illustrated using daily return series for Australia’s four largest banks.


Introduction
Recently, Silvennoinen and Teräsvirta (2021) introduced a new multivariate GARCH model called the Multivariate Time-Varying Smooth Transition GARCH model (MTV model).This is a model that explicitly accounts for nonstationarities that are common in daily return series.The authors considered maximum likelihood (ML) estimation of the parameters of the model and, under suitable conditions, proved the consistency and asymptotic normality of the resulting ML estimators.
Before actually estimating an MTV model, however, the model builder has to make a number of data-driven decisions needed for specifying the parametric structure of the model.Further, the estimated structure has to be evaluated by statistical tests to reveal its potential weaknesses.Silvennoinen and Teräsvirta (2021) did not, however, discuss any model building issues, leaving them for further research.The present work is intended to fill this void.
As with many other multivariate GARCH models, the MTV model is based on the decomposition of the conditional covariance matrix Bollerslev (1990), in which the conditional covariance is decomposed to conditional variances and a conditional correlation matrix.In the MTV model, however, it is assumed that the conditional variances can be nonstationary, while a nested special case, (weak) stationarity, is a testable hypothesis.
Likewise, the correlations in this model are time-varying such that its time-varying correlation matrix nests a constant correlation matrix.Due to the parametric structure of this nonlinear correlation matrix, the constancy of correlations has to be tested (and rejected) before fitting a model with time-varying correlations.
Since, as will be explained later, these testing situations, both in the variances and the correlation matrix, are nonstandard, specification of the MTV model is an important issue in building MTV models.Furthermore, the estimated MTV model has to be evaluated before using it, from which it follows that techniques for this part of the model building process have to be examined as well.
In order to illustrate the MTV model building, we consider the Australian banking sector.This is an oligopoly dominated by four banks, commonly called the 'Big Four'.In early 2020, they represented approximately 19% of the market value of the ASX200 share index and held about 80% of the home loan market in Australia; see Figure 1.Consequently, the banking sector is a major component for many Australian superannuation and other investment funds.As to the Big Four daily returns, their volatility cannot automatically be assumed to be stationary.
Furthermore, the correlations, even when time-varying, cannot a priori be assumed to fluctuate around a constant level, which is one of the assumptions in many popular multivariate GARCH models.Applying the flexible MTV model to these return series is, therefore, an interesting exercise.An in-depth analysis of the Australian banking sector is beyond the scope of this paper; however, modelling the daily returns of the Big Four over a period of almost 30 years serves as a useful example of how our MTV model building techniques work and are applied in practice.
The modelling process is data driven, requiring user input and consists of several steps.For this reason, we developed an R-package to help users build MTV models.The version of R used is 4.1.0.The package, called mtvgarch, includes, among other things, the estimation routines as well as the necessary specification and evaluation tests.The code is maintained in a private GitHub repository and can be obtained upon request.
The plan of the paper is as follows.The MTV model is introduced in Section 2, followed by details of the stages and procedures related to the model building in Section 3. Model specification is considered in Section 4, estimation in Section 5 and evaluation in Section 6. Section 7 is devoted to the illustration of the complete modelling cycle on the Big Four volatilities and correlations.Our conclusions can be found in Section 8.There are also appendices containing material, such as relevant test statistics, simulation results, details of the estimation algorithm, and estimated equations.

The MTV Model
The MTV model used in this paper belongs to the family of multivariate GARCH models introduced by Bollerslev (1990).In the original model, the conditional correlations were constant, hence, the name Constant Conditional Correlation (CCC-GARCH) model.This assumption that made the resulting model rather parsimonious was later found to be too restrictive in applications, and time-varying correlations were simultaneously introduced by Engle (2002) (dynamic conditional correlations, DCC) and Tse and Tsui (2002) (varying correlations, VC).In these models, conditional variance components are typically assumed to be stationary, and correlations are assumed, at least implicitly, to fluctuate around a constant level.
In order to consider the MTV model as defined in Silvennoinen and Teräsvirta (2021), we introduce certain notation.The observable stochastic N × 1 vector ε t is decomposed in a customary fashion as where H t = S t D t P t D t S t is an N × N covariance matrix, and ζ t ∼ iid(0, I N ).We also define z t = P 1/2 t ζ t , a vector of independent random variables with Ez t = 0 and a positive definite deterministically varying covariance matrix cov(z t ) = P t .The structure of P t will be defined later.The deterministic matrix S t = diag(g 1/2 1t , . . ., g 1/2 Nt ) has positive diagonal elements for all t, and D t = diag(h 1/2 1t , . . ., h 1/2 Nt ) contains the conditional standard deviations of the elements of S −1 t ε t = (ε 1t /g 1/2 1t , . . ., ε Nt /g 1/2 Nt ) .As in Silvennoinen and Teräsvirta (2021) and earlier univariate papers, beginning with Amado and Teräsvirta (2008), and in the multivariate time-varying GARCH article by Amado and Teräsvirta (2014), the diagonal elements of S 2 t are defined as follows: i = 1, . . ., N, where δ i0 > 0 is a known constant, δ ij = 0, j = 1, . . ., r i , and the (generalised) logistic function where γ ij > 0 and c ij = (c ij1 , . . ., is known is another one.Furthermore, to prevent exchangeability of the components in (2), restrictions are needed on c ij .As an example, if K ij = 1 for j = 1, . . ., r i , one can assume (for instance) that c i11 < . . .< c ir 1 1 .
As discussed in earlier papers, the idea of g it is to normalise or rescale the observations.Left-multiplying (1) by S −1 t yields where each element of φ t is assumed to have a standard weakly stationary GARCH representation.In our work, the conditional variances have a GARCH or GJR-GARCH(1,1) structure; see Glosten et al. (1993) for the latter: where I(A) is an indicator function: I(A) = 1 when A occurs, zero otherwise.A higherorder structure is possible, although there do not seem to exist applications of the GJR-GARCH model of order greater than one.
The conditional covariance matrix E{φ t φ t |F t−1 } = D t P t D t .In order to describe the correlation structure, we employ the Double Smooth Transition Conditional Correlation (DSTCC) model by Silvennoinen and Teräsvirta (2009).In that model, assuming that the transition variable is t/T throughout, the time-varying correlation matrix P t is defined as where P (ij) , i, j = 1, 2, are four positive definite correlation matrices not equal to each other, and where c i = (c i1 , . . ., c iK i ), c i1 < . . .< c iK i , i = 1, 2. This variant of the DSTCC model is called the Time-Varying Correlation (TVC) model to emphasise its deterministic rather than stochastic nature-hence, removing the term 'Conditional' from its name.For the Big Four application, we simplify the definition (5) slightly by assuming P (12) = P (22) , and therefore (5) becomes where re-indexing the matrices highlights the interpretation that there are two transitions over time.One is from P (1) to P (2) , and the other one is from a convex combination of these two to P (3) .Since P (1) , P (2) , and P (3) are positive definite, P t is positive definite as a convex combination of the three matrices.This simplified version of the TVC model is especially useful when modelling correlations that shift from one state to the next as a function of time.
To that end, the obvious extension to n such transitions is best expressed as a recursion When G 2 (t/T, γ 2 , c 2 ) ≡ 1 and N = 2, ( 5) and ( 7) collapse into the smooth transition correlation GARCH model by Berben and Jansen (2005) or, if the transition variable in G 1 is stochastic and N ≥ 2, into the smooth transition conditional correlation GARCH model of Silvennoinen andTeräsvirta (2005, 2015).An MTV-Conditional Correlation GARCH model with GARCH equations similar to the ones here but differently defined stochastic P t was discussed in Amado and Teräsvirta (2014).It may be noted that Feng (2006) introduced another multivariate Conditional Correlation type GARCH model with deterministically varying correlations.In this model, the variation is described nonparametrically, and the model can be viewed as a generalisation of the univariate model in Feng (2004).

The Three Stages of Model Building
The MTV model is rather general and nests many models.To take one example, fitting an MTV model when a nested CCC-GARCH model generates the data leads to inconsistent parameter estimates.For this reason, building adequate MTV models requires care, and a systematic approach is necessary.Selecting a candidate from this family of models is a data-driven process, and statistical inference has to be used to obtain an acceptable model such that it passes the available misspecification tests.
In this work, we follow the classical approach to model building advocated by Box and Jenkins (1970) and later applied to nonlinear models of the conditional mean; see, for example, Teräsvirta et al. (2010, Ch. 16).It has also been applied to building single-equation MTV-GARCH models; see Amado and Teräsvirta (2017) and Amado et al. (2017).The idea is to first specify the model (select a member from the family of MTV models) and, once this has been done, to estimate its parameters.At the evaluation stage, the estimated model is subjected to a battery of misspecification tests.These three stages, specification, estimation, and evaluation, will be considered in the next three sections.The emphasis will be on specification and evaluation as maximum likelihood estimation of the parameters of the MTV model was already considered in Silvennoinen and Teräsvirta (2021).

Specification of the Univariate Variance Equations
Specification of the MTV model is begun by specifying the univariate volatility equations.This was first discussed in Amado and Teräsvirta (2017).The idea is to begin with a GARCH(1,1) model by Bollerslev (1986) or the GJR-GARCH model by Glosten et al. (1993) and to test the hypothesis that the multiplicative deterministic component is constant.The single-equation MTV-GARCH model has the following form: where z it ∼ iid(0, 1), the conditional variance h it is defined as in (4) with φ it = ε it /g 1/2 it , and the deterministic positive-valued function g it = g i (t/T) is defined as in ( 2) and (3).
Positivity of (2) imposes the following restrictions on δ ij , j = 1, . . ., r i : Typically in applications, K ij = 1, 2 in (3).There are two specification issues, determining r i and choosing K ij , j = 1, . . ., r i .It is possible that g i (t/T) = δ 0 > 0-that is, g i (t/T) is a positive constant.In this case, the MTV-GARCH model collapses into a standard GARCH or GJR-GARCH equation.Amado and Teräsvirta (2017) solved the problem of choosing r i by first estimating the GARCH model and testing the hypothesis of a constant g i (t/T) against the alternative r i = 1 in (2) thereafter using a Lagrange multiplier type test.The test can be viewed as a misspecification test of the estimated GARCH model.If the null hypothesis is rejected, an MTV-GARCH model with a single transition is estimated, and the hypothesis r i = 1 is tested against r i = 2. Sequential testing continues until the first non-rejection of the null hypothesis.
The number of transitions is determined in this order because of an identification problem: the model with r i + 1 transitions is not identified if the true number of transitions is r i .The shape of the logistic function, controlled by the parameter K ij , can be determined using the sequence of tests familiar from the specification of smooth transition autoregressive (STAR) models; see Teräsvirta (1994) or Teräsvirta et al. (2010, Ch. 16).Details can be found in Amado and Teräsvirta (2017).
More recently, Silvennoinen and Teräsvirta (2016) considered testing the constancy of g i (t/T) before estimating the GARCH model-that is, assuming h it = 1 in (9).The details are laid out in Appendix A.1.This implies that the size of the test is distorted because conditional heteroskedasticity is ignored, so it has to be adjusted by simulation.It turns out that, by doing this, the power of the size-adjusted test considerably improves compared to the case where the test is a standard misspecification test.Reasons for this improvement are discussed in Silvennoinen and Teräsvirta (2016).
A major difficulty with this approach is that, while in simulations, the parameters of the conditional variance component h it under the null hypothesis are known-in practice, this is not the case.The underlying 'null' GARCH process has to be generated artificially.In so doing, special attention is to be placed on the persistence of the (GJR-)GARCH process, measured by α i1 + κ i1 /2 + β i1 in (4) when g it ≡ 1.In fact, the asymmetry parameter has no practical importance for the purpose of calibrating the test statistic distribution, and it is therefore sufficient to restrict attention to the standard GARCH process.Other features, such as implied kurtosis or relative sizes of α and β corresponding to a particular level of persistence only have a negligible effect on the performance of the test.
A practical problem is that it is not possible to estimate this measure of persistence when the null hypothesis does not hold-that is, when g it is not constant over time.How this difficulty is handled has an effect on the power of the test.We study two approaches that are discussed more in detail in Appendix B.1.The first one consists of visually identifying a period of time where there appears to be no change in the overall level of baseline volatility.A standard GARCH(1,1) is estimated over this subperiod.The second approach is to use rolling window variance targeting.This means that the intercept in the GARCH equation is time-varying, and its value at each point in time is calculated such that it matches the unconditional variance obtained from a window around that point in time.
Simulations discussed in Appendix B.1 experiment with the choice of window size.Both of these methods provide GARCH parameter and persistence estimates that are used for calibrating the null distribution of the test statistic and calculating p-values.

Specification of Time-Varying Correlations
After the MTV-GARCH equations have been specified and estimated assuming the errors are uncorrelated, the next step is to specify the time-varying correlation structure.This is done by sequential testing.First, the constancy of correlation tested against the model with a single transition, i.e., G 2 (t/T, γ 2 , c 2 ) ≡ 1 in (5).The null hypothesis is that the model is a MTV-Constant Correlation GARCH model as in Bollerslev (1990), except that the GARCH equations are MTV-GARCH equations.If this model is rejected, the one-transition model is estimated and tested against (5) or (7).If this specification is also rejected, the alternative with two transitions is estimated.This is repeated until no further evidence for time-variation in the correlations is detected.
As discussed in Silvennoinen andTeräsvirta (2005, 2015), the MTV model with one transition is only identified under the alternative, which invalidates the standard asymptotic inference.The identification problem can be circumvented by approximating the transition function ( 6) by its Taylor expansion around the null hypothesis, H 0 : γ 1 = 0.The form of the expansion depends on the order of the exponent in (6).
The test can be constructed along the lines presented in the appendix of Silvennoinen and Teräsvirta (2005). 1 See also Silvennoinen and Teräsvirta (2021).To derive the test statistic, consider the first-order Taylor expansion of (6) around γ 1 = 0 assuming K 1 = 2.It has the following form: where R 2 (t/T; γ 1 ) is the remainder.Using (10), (5) becomes Note that a simpler version of the test assumes K i = 1 and yields a similar approximation although without the term (t/T) 2 P (A2) .The new null in this case is H 0 : . This version of the test is more powerful than the former in the case that time-variation in the correlations is monotonic.However, and especially with longer time horizons, this may not always be the case, and the square term of the expansion is able to capture at least some nonmonotonic changes.
The details of the ensuing LM-type test statistic for the test of constant correlations is presented in Appendix A.3, and the test for an additional transition in correlations is laid out in Appendix A.4.

Estimation of the MTV Model
After specifying the deterministic components of the model, both in GARCH equations and correlations, one can estimate the complete model with conditional heteroskedasticity included.The log-likelihood of the MTV-STCC-GARCH model has the form where the full parameter vector θ = (θ h , θ g , θ P ) is partitioned according to the relevant functions: the conditional variance in ( 4) . . ,N; the deterministic variance component in ( 2) and ( 3) . ., N; and correlations θ P = (veclP (1) , . . ., γ 1 , . . ., c 1 , . ..) , where the number of matrices as well as transition functions ( 6) are determined by the choice of the model, ( 5), ( 7), or (8).
We make the following assumptions; see Silvennoinen and Teräsvirta (2021): AN1 is the necessary and sufficient weak stationarity condition for the ith first-order GJR-GARCH equation.Assumption AN2 is a standard regularity condition required for proving the asymptotic normality of maximum likelihood estimators of θ hi , i = 1, . . ., N. AN3 (normality) is a strong condition; however, it is needed here for the proofs to go through; see Silvennoinen and Teräsvirta (2021).These assumptions are sufficient for the maximum likelihood estimators of the GARCH parameters in single-equation GARCH models to be consistent and asymptotically normal.
The parameters are estimated in turn: first estimate θ gi to obtain starting-values for the joint estimation of θ g and θ P .This is done assuming h it (θ hi ) ≡ 1, i = 1, . . ., N. Amado and Teräsvirta (2013) showed in the single-equation GJR-GARCH case that, under regularity conditions, the maximum likelihood estimator of θ gi is consistent and asymptotically normal.Silvennoinen and Teräsvirta (2021) generalised this result to MTV models.That means that joint estimation of θ g and θ P by maximum likelihood produces consistent estimates of these parameter vectors.
If θ g and θ P are consistent and Assumptions AN1, AN2, and AN3 hold, then, by Theorem 3.3 of Song et al. (2005), the maximum likelihood estimator of θ h is consistent and asymptotically normal.After estimating θ h , the parameter vectors θ g and θ P are re-estimated.Iteration continues until convergence.Song et al. (2005) showed that the final maximum likelihood estimator of θ is consistent and asymptotically normal.A more detailed description of the maximisation by parts applied to the present situation can be found in Appendix C; see also Silvennoinen and Teräsvirta (2021).

Evaluation of the MTV Model
Once the model has been specified and estimated, it has to be evaluated in order to find potential misspecifications.The tests in Section 4.1 were used to guide the choice of the functional form of the deterministic component, and a rejection of the null was seen as evidence of the current model still lacking in its specification.In that sense, the tests in Section 4.1 are seen as both specification and evaluation tests.It is worth reiterating that these specification tests were constructed at the stage when the GARCH part was not yet specified-that is, h t = 1 in (4).However, when the deterministic part passes these tests, and an MTV-GARCH equation is subsequently estimated, there is room for additional checks in terms of model misspecification, beyond the presently final model specification.
The tests in Amado and Teräsvirta (2017) are available for this purpose.They fall into three categories.In the first one, the deterministic component is additively misspecified.In the context of the current MTV-GARCH model, the relevant case is a test for yet another transition in (2).The second test assesses the GARCH equation for additive misspecification.The concern here is the validity of the maximum lags p or q.The final test is the 'test of no remaining ARCH', which is based on the idea of a sufficiently well-specified model managing to clear any autocorrelation from the squared standardised residuals.The test that suits each of these situations (or its robustified version to avoid the assumption of normality) is conveniently performed following a set of steps outlined in Appendix A.2.
It is worth stating that the tests here are applied to P−1/2 t ε t , one series at a time.Efforts towards completing the tests in the complete N-variate system simultaneously would open up a vast number of permutations of various misspecification options.To manage the task, the recommendation is to focus on the univariate specifications one at a time, even with the acknowledgment of some potential for deviating from the asymptotically exact results.Simulations in Section Appendix B.2 indicate that applying the tests on the pre-filtered data has very little impact on the distributions of the test statistics.While the standard form of the misspecification tests suffers from minor oversizing, this is mostly corrected when using the robust version of the test.
The test for an additional transition in the correlations in Section 4.2 may also be used as an evaluation test.It is based on the completely specified univariate and correlation components, and therefore its role as a misspecification test of a complete model is justified.The number of degrees of freedom in this test quickly becomes large with increasing N.One way of restricting this growth would be to assume that, under the alternative, only the eigenvalues of the correlation matrix are changing over time.The alternative would be a correlation matrix only if all correlations were identical (see Engle and Kelly (2012)); however, an LM test can nevertheless be built on this assumption.
Write the correlation matrix as P t = Q t Λ t Q t , where P t is defined as in (5), Λ t is the diagonal matrix of eigenvalues and Q t contains the corresponding eigenvectors.Simplify this by assuming Q t = Q and approximate Λ t by Ψ t = ∑ K k=0 Ψ k (t/T) k .Under the null hypothesis, K = 0.The resulting test statistic is derived, and its small-sample properties are studied in Kang et al. (2022).

Main Features of the Australian Banking Sector 1990-2020
In order to provide some background for our empirical results, we shall now draw attention to a number of interesting features of the Australian banking sector between the years 1990 and 2020.In 1990, the Australian government adopted an intervention policy called 'six pillars'.It covered the four biggest Australian banks commonly referred to as the 'Big Four', the Commonwealth Bank of Australia (CBA), Westpac Banking Corporation (WBC), National Australia Bank (NAB) and Australia and New Zealand Banking Group (ANZ), listed in descending order of market share, as well as two insurers (AMP Limited and National Mutual).
This policy stated that further mergers of these institutions would not be accepted.The basic idea was to ensure a competitive banking market.In 1997, the policy became 'four pillars' as the insurers were left outside the arrangement.Since its establishment, it has mostly enjoyed the support of the two main political parties, and the proponents of the policy have argued that it has contributed to the stability and strength of the Australian financial sector.The government also had sympathetic policy settings, which allowed the banks to recapitalize in the 1990s and 2000s.
During this period, there were financial losses at Westpac (a $1.6 billion loss in 1992 and close to insolvency), ANZ (poorly executed international expansions) and subsequently with NAB (purchase of the US mortgage originator and servicer Homeside led to $2.2 billion in losses 2002).Although the Big Four were not allowed to merge with each other, larger financial concentration due to mergers with other financial institutions was seen as accept-able: in 2008, Westpac and CBA acquired St. George and BankWest and then the fifth and the sixth largest Australian banks, respectively.The impact can be seen in the right panel of Figure 1 as the Big Four's share of the ASX200 Financials Index drastically increased.
The recent history of the pillars policy coincides with a few major incidents and changes.These include not only the dot-com boom in the late 1990s and early 2000s and the global financial crisis (GFC) nearly ten years later but also events that have had more localised impacts, such as a number of of regulatory changes (Basel guidelines), the most recent mining boom that started around 2005 and was interrupted by the GFC, and technology-driven market disruptions (non-bank lenders and payment providers).Since the GFC, the banks have enjoyed substantial government support, including a deposit guarantee and, as already noted, have come to dominate the home mortgage market with an 80% market share.
During the first decade of the millennium, a few of the above-mentioned events have positioned the Big Four in an increasingly competitive environment.The stagnation of the housing market and the removal of barriers to changing mortgage providers may also have contributed towards this trend; however, the most notable event was the announcement in 2003 that Basel II was to be implemented in Australia by end of 2007.The idea behind the updated accord was to level inequalities amongst the internationally active banks and to set expectations regarding capital adequacy requirements.The Australian Prudential Regulation Authority, in charge of overseeing the uptake of the accord, worked extensively with numerous Authorised Deposit-Taking Institutions, the industry and other relevant bodies during 2005 to 2007, aiming at ensuring that the adoption of Basel II included all relevant aspects of the implementation process, its goals, and impacts.
Fear of being subjected to a competitive disadvantage relative to their international counterparts, both within international and domestic operations, coupled with an opportunity for a reduced regulatory capital, incentivised the banks to signal early their preference to conform with the accord. 3As a result, Australia was amongst the first nations to have fully implemented the framework on 1 January 2008.

Modelling the Error Variances
The daily return series for the Big Four used in this paper extend from 2 January 1992 to 31 January 2020.As suggested in the Introduction, these series may not be adequately described by a weakly stationary GARCH or GJR-GARCH model.From the plots in Figure 2, it is seen that the amplitude of clusters varies for all four banks, in particular during and after the financial crisis beginning in 2008.The crisis was preceded by a rather tranquil period between 2003 and 2008.This variation also shows in the autocorrelation functions of the squared returns in Figure 3.In all four cases, the autocorrelations decay very slowly as a function of the lag length, which suggests nonstationarity.
For this reason, modelling the returns has to be initiated by testing the stationarity hypothesis.As discussed in Section 4.1, the slow moving 'baseline' volatility is specified first, followed by the inclusion of the GJR-GARCH component.The test statistic from Appendix A.1 is calibrated using methods described in Appendix B.1.In the first one, based on choosing a period during which the amplitude of clusters seems constant, the selected period extends from November 2003 to October 2007.In the other, where a rolling window is moved over the observation period, the window size chosen by simulations is 400, as outlined in Appendix B.1.
To enhance the performance of the test, the entire sample of over 7000 observations is broken into subsections.Once one transition is found and estimated, the test is applied to both before and after this transition to determine if there is another transition on either side.The process is continued until the null of no transition is not rejected.After the deterministic component has been specified and its parameters tentatively estimated, the GJR-GARCH equations are estimated together with the time-varying component to form a complete TV-GJR-GARCH model.The estimated equations are then checked for signs of misspecification using the tests from Amado and Teräsvirta (2017).Estimated TV-GJR-GARCH equations results appear in Table 1, and the deterministic components are in Appendix D. For comparison, the GJR-GARCH equations without the deterministic component are also reported in this table.In all four cases, the persistence strongly decreases after rescaling the returns with the TV component.This is also indirectly obvious from the autocorrelations in Figure 4 that are considerably smaller and decay faster than the ones in Figure 3.The main cause for this decrease lies in the coefficient of the lagged conditional variance whose estimate shrinks in the process.Estimates of the asymmetry parameter κ i1 slightly increase, and thus asymmetry becomes more pronounced when nonstationarity is properly modelled.Table 1 also contains the kurtosis estimates for the two GJR-GARCH processes, obtained using definitions in He and Teräsvirta (1999).It is seen that, in all four cases, rescaling lowers the kurtosis to values close to three.Figure 5 contains the estimated transitions.There are two conspicuous features in these graphs.One is the downward shift around 2004, which, for WBC, is a long and rather smooth decline.This coincides with the local events discussed in the previous Section.The other is that, for all four banks, the deterministic component remains higher after 2010 than it was before 2008.
For WBC, the deterministic component slowly but steadily declines after the crisis, whereas, for the three others, it remains constant.This can also be seen from the estimated equations behind the figures reported in Appendix D. Effects of the deterministic component g it on the GARCH equations also become obvious by comparing the conditional variances from the GJR-GARCH equations in Figure 6 with the TV-GJR-GARCH ones in Figure 7. Clearly, for all four banks, the nonstationarity around 2008-2010 in the former figure is no longer visible in the latter.

Modelling the Error Correlations
The stability of correlations over time, as discussed in Section 4.2, is tested using the test statistic (A5) in Appendix A.3.The p-value of the test is very close to zero, and thus the null hypothesis is rejected.An TVC model is then estimated with a single monotonic time transition.The adequacy of this specification is tested with the test for an additional transition.
The resulting p-value is 0.467; therefore, the single time-transition is deemed sufficient.The estimation results of the TVC component of the model are presented in Table 2 (see also Figure 8) indicate that the correlations between the standardised residuals are stable, around 0.49-0.60depending on the bank, from 1992 until the mid-to-late 2006.At that point, the correlations begin their steady increase to the range of 0.78-0.83,which they reach by early 2008.The final correlations are not only large but also remarkably similar.In addition to considering the complete Big Four system, we repeated the analysis using bivariate models.The outcome was similar: a single transition was sufficient for all pairs.Furthermore, inspection of the pairwise correlation estimates in Table 3 reveals strong similarity between the four-variate and bivariate estimates.This is illustrated in Figure 9.It should be noted that the shift in the correlation of the ANZ-NAB pair is estimated as a step function.
This happens when the speed of transition increases without a bound due to the likelihood function effectively becoming 'flat' with respect to that particular parameter.The solution is to fix the speed to a large value and proceed with the estimation of the remaining parameters.The steady increase of the correlations among the Big Four over time as well as the maintained high correlation state since the GFC could be linked to a few changes in the Australian financial sector.For instance, the time-varying correlation structure may reflect the four pillars policy, the financial concentration, which was further strengthened by the acquisition of the next largest banks by CBA and WBC in 2008.Further contributing factors that may have made the banks look more similar from the investors' point of view include the regulatory requirements brought by the implementation of Basel II, the easing of restrictions that directly impacted the home mortgage market that the Big Four now dominated and the stagnation of the housing credit market.Furthermore, since the GFC, all four banks have enjoyed substantial government support.
The fact that serious effort has been made to model the volatilities and correlations separately allows for observations on the timing and magnitude of those features without the cross-contamination occurring if covariances were examined instead.It is often noted that correlations increase during turbulent times.In the case of the Big Four, this is not exactly the case.The calm volatility period (from 2003 until late 2007) overlaps with the period of smoothly increasing correlations (16 months leading to early 2008).Furthermore, it is notable that the GFC has a tremendous impact on the volatilities, whereas the correlations have by then settled to their high levels and exhibit no further change.

Conclusions
In this paper, a data-driven modelling cycle for building MTV-GARCH models for asset returns was constructed and illustrated with an empirical example.The paper complements Silvennoinen and Teräsvirta (2021), which presented the asymptotic theory for this model, however, as already mentioned in the Introduction, does not contain any discussion on practical model-building issues.All three phases of the cycle: model specification, estimation, and evaluation, were covered.Specification includes testing the constancy of the GARCH equations against multiplicatively time-varying GARCH.This is a nonstandard testing problem as the MTV-GARCH model is not identified when the null hypothesis holds.
Furthermore, constructing the null model requires new techniques due to the fact that the conditional variance is not observed, and two such alternatives for solving this problem are presented.Simulations reported in an online appendix show that the proposed testing procedure had reasonable small-sample properties.In specifying the correlation structure, the constancy of correlations has to be tested, and a relevant test for this nonstandard testing situation was developed.
The GARCH equations and the correlation structure, both nonlinear, were estimated jointly, and the technical details of this process were presented.Misspecification tests for the estimated model were derived to be used for the evaluation of the estimated model.The application to the four main Australian banks, the Big Four, demonstrated the use of the modelling cycle.The GARCH equations were found to be multiplicatively time-varying, and the correlations also changed over time.The estimation results indicated that the amplitude of the volatility clusters declines to a lower level around 2004 and temporarily rises during the GFC.
The (positive) correlations were found to be nonconstant, and they increased to a fairly high level already before the GFC.There is no single reason for this shift that is also established by estimating pairwise models for the six pairs of banks separately.Generally speaking, it may be argued that, whatever the aforementioned events before 2008 have meant to the ability of the banks to compete with each other, from the investors' viewpoint, they have become increasingly similar.Given their size, these four banks may represent a systemic risk to the Australian financial sector or the Australian economy in general.
Finally, the paper is accompanied by an R-package entitled mtvgarch, which is maintained in a private GitHub repository and contains all the econometric tools necessary for building MTV-GARCH models.
There are appendices contain additional material to the paper.Appendix A provides details of the TVV-model specification, the MTV-GARCH model evaluation, the test of constant correlations, and finally the test for an additional transition in the correlations.The simulations studies in Appendix B explore aspects of the specification and evaluation of the GARCH equations, and the size and sensitivity of the test of constant correlations.Appendix C presents the details of maximisation by parts.The estimated deterministic components of the Four Banks' transition equations are presented in Appendix D.

Author Contributions:
) and r = (r, r 2 , r 3 ) .We state the following lemma: Lemma A1.Under the null hypothesis and assuming z t ∼ iidN (0, 1), the information matrix where Proof.The 'sample' information matrix with T observations equals Consider the (1, 2) element of B 11(T) = (1/T) ∑ T t=1 B 11t : which is an average of T values of the logistic cumulative distribution function.Let [Tr] = t be the integer closest to t.Then, (1/T) as T → ∞.The other elements of B 11 = lim T→∞ B 11(T) , are derived in a similar fashion.In matrix form, The blocks B 12 and B 22 are obtained similarly.
Since the maximum likelihood estimators of the parameters of the auxiliary TVV model under H 0 are consistent, we may construct the LM test for the hypothesis H 0 : ψ = (ψ 1 , ψ 2 , ψ 3 ) = 0. Denoting the relevant block of the score by where ∂g t /∂ψ = τ t and Then, assuming z t = ε t /g 1/2 t is standard normal under H 0 , the test statistic has the following form: where θ 1 = ( δ * 0 , δ 1 , γ 1 , c 1 , 0, 0, 0) ; see, for example, Godfrey (1988, p. 14).In order to make (A3) operational, the blocks of B are replaced by their consistent counterparts.
When constancy of the error variance is tested against a single transition, g t ≡ δ 0 , g 1 (t/T) = 1 (scalar), and ∂g t /∂ψ = τ t as before.Then, The test statistic (A3) becomes where When the elements of the covariance matrix are replaced by their consistent estimators in (A4), the ratio (δ 0 0 ) 2 / δ 2 0 equals unity.As already mentioned, conditional heteroskedasticity is ignored in setting up the test.For this reason, the test statistic (A3) is likely to be size distorted when applied to financial time series of sufficiently high frequency-that is, when GARCH-type volatility clustering is present.In applications, its size has to be adjusted by calibrating its distribution to reflect the persistence of the GARCH effect present in the data.This is the topic of discussion in Appendix B.1. where The GARCH equation derivatives are formed recursively as From this example, it should be easy to extend the null model to include more additive deterministic terms and/or have a higher order GARCH equation with or without asymmetric terms.
One extension regarding the deterministic part should be mentioned.It is often convenient to replace the slope parameter γ with e η .In this case, θ g = (δ 1 , η 1 , c 1 , δ 2 , η 2 , c 21 , c 22 ) , and Vector r 2t contains the derivatives of the misspecified part.Details in the most commonly encountered situations will be given shortly.The number of variables (columns) in r 2t defines the degrees of freedom in the χ 2 -distribution for the test statistic under the null.
Given the three components, the LM-test is performed as follows: 1.
The robust version that does not rely on the normality of the error term is formed as follows: 1.
Regress r 2t on r 1t and obtain residuals w t .When r 2t has more than one variable, run the regression for each of them separately and, thereby, obtain a set of residuals w t .2.
Regress 1 on ( ζ2 t − 1)w t and form the sum of squared residuals SSR.

3.
Compute the test statistic LM R = T − SSR.
The first case seeks to find evidence of misspecification of the determininstic part of the MTV-GARCH model.The conditional variance is of the form where the additive term f t is zero under the null of the model being correctly specified.The case that we consider here is the one of testing r against r + 1 transitions in the deterministic part.The additive term is linearised and reparameterised, after which, it becomes The derivative component for the alternative is then The second case addresses misspecification in the GARCH part: where the additive term f t is again zero under the null.A common scenario is when f t may increase either the ARCH or the GARCH order (but not both).An example of the former is GARCH(1,1) vs. GARCH(2,1), in which case, , and therefore If the model is a GJR one, and the potential increase in the order of the ARCH term extends to the asymmetric terms as well, then , and An example of the latter is GARCH(1,1) vs. GARCH(2,1), which leads to f t = β 2 h t−2 , and thus r 2t = ĥ−1 t ĥt−2 .The third case is the test of no remaining ARCH.This is a test against multiplicative misspecification, σ 2 t = h t g t f t , where f t = 1 under the null.If the alternative is that there is ARCH of order m left unaccounted for, then

Test of Constant Correlations
The log-likelihood of the auxiliary MTV model for observation t assuming K = 2 equals where and g it = δ i0 + δ i1 G i1 (t/T, γ i1 , c i1 ); only one transition for notational simplicity, and h it is as in (4).The first sub-block of the score corresponding to the deterministic variance component under H 0 becomes where e i = (0 i−1 , 1, 0 N−i ) , i = 1, . . ., N, and 0 0 is an empty set.The sub-block corresponding to the GARCH parameters under H 0 is The remaining sub-blocks under H 0 equal , where U = ∂vec(P (Aj) )/∂ρ Aj consists of zeroes and ones and is identical for all j.
The N 2 × N(N − 1)/2 matrix U is a column-wise collection of vectorised indicator matrices that identify the locations of the particular correlation parameters within the matrix P At .For example, the first correlation parameter in ρ Aj is located in positions (2,1) and (1,2) in P At .An indicator matrix corresponding to this parameter has ones in those positions and zeros elsewhere.This vectorised indicator matrix is then the first column of matrix U and so on.Consequently, the 3N(N − 1)/2 × N 2 matrix ∂vec(P At ) /∂ρ A equals The information matrix for observation t under H 0 is quite similar to, but simpler than, the corresponding one in Silvennoinen and Teräsvirta (2021).In order to give the matrix a proper expression, we need the commutation matrix K, an N 2 × N 2 matrix whose (i, j) block equals e j e i -that is, [K] ij = e j e i ; see, for example, Lütkepohl (1996, pp. 115-18).Let the superscript 0 indicate that the corresponding entity is evaluated under H 0 (for example, g 0 it equals g it | H 0 , and ∂g 0 it /∂θ gi equals ∂g it /∂θ gi | H 0 ).The matrix is defined in the following lemma.

Lemma A2. The expectations of the nine blocks of the information matrix at (rescaled) time t/T under H
)e i P −1 (A0) e j e i P (A0) e j .
When i = j, The (i, j) sub-block of hj )e i P −1 (A0) e j e i P (A0) e j .
When i = j, The (i, j) sub-block of e j e i P (A0) e j .
When i = j, Furthermore, the (i, j) sub-block of Es t (θ 0 g )s t (ρ A ) equals Finally, the (i, j) sub-block of the last block is equal to where Next, we will consider the blocks related to the correlations.The matrix ∂vec(P At ) /∂θ ρ A0 equals ⊗ U and the matrix ∂vec(P At ) /∂θ ρ A1 equals and the (i, j) sub-block of B55 is equal to Next, define , where ẑt and P(A0t) equal z t and P (A0t) estimated under H 0 , respectively.The test statistic block in the south-west corner of the inverse of B. As the matrix B can have a large dimension, its inverse could be obtained by using block inversion methods, perhaps applying them recursively.The test statistic has an asymptotic χ 2 -distribution with N(N − 1) degrees of freedom when H 0 holds.

Appendix B.1. Tests of GARCH Equations
The test for slow moving baseline volatility has a statistic whose distribution is sensitive to the high frequency, GARCH, volatility.For this reason, one cannot use the asymptotic distribution, rather the distribution must be generated via simulation.Further, Silvennoinen and Teräsvirta (2016) showed that the size of the test was distorted if the GARCH parameterisation deviates from the true one.For this reason, a few alternative approaches to estimate the GARCH parameters, and especially the persistence, have been investigated.It should be noted that estimating GARCH without taking the nonstationarity into account will yield overestimated persistence, thereby, impacting the null distribution of the test statistic and thus rendering the test outcomes unreliable.These estimates are given in Table A1.
The baseline volatility may be very different in different series.Therefore, one should not ignore visual inspection of the returns nor rely on general rules of thumb.If there are sufficiently long sections of data where the general level of volatility remains constant, it is advisable to estimate the GARCH parameters over such subsample.In the present case, there are a couple of relatively constant volatility sections-for example, one from November 2003 until October 2007.
The parameter estimates for that calm subperiod are in Table A1.Comparison with the estimates from the entire period GARCH model makes it clear that the neglected nonstationarity has biased the estimates, resulting in high persistence and kurtosis.As the data set has a sufficiently long span of GARCH-type clustering without (visually) significant movement in the general baseline level, relevant estimates are obtained by using that subsample only.
Another approach consists of estimating the GARCH equation over a rolling window such that the intercept is time-varying, targeting the unconditional volatility over each window, while the other parameters are assumed to be constant over the entire sample period and estimated in the usual way.The choice of the window length should consider the general recommendations regarding the sample size when attempting GARCH estimation.Too long a window will be impacted by the slowly changing baseline volatility level, whereas too short a window will yield very uncertain GARCH estimates.
To investigate the properties of this approach, we ran a simulation experiment with a few different baseline volatilities.The window widths varied from 250 to 1000 observations.Figures A1-A3 depict the distributions of the GARCH estimates and the derived persistence and kurtosis measures, as explained in He and Teräsvirta (1999), for a selection of baseline volatilities and window widths.Based on these experiments, we concluded that a window width of 400 observations yields sufficiently robust results for our application.
The resulting GARCH estimates are reported in Table A1, and they are quite similar to the ones obtained for the aforementioned calm period.This can be interpreted as support for the rolling window method, particularly in situations where visual inspection of data does not reveal a sufficiently long period of constant unconditional volatility.
Overall, it is clear that using simply the GARCH estimates from the entire sample to calibrate the null distribution of the test statistic for the specification of the deterministic component of the volatility is not recommended.For comparison, Table A1 also reports the GARCH estimates from a TV-GARCH model where the TV specification has been completed.The estimated persistence is higher than the ones obtained from the calm period or rolling window variance targeting method, however, as discussed in Silvennoinen and Teräsvirta (2016), underestimation of persistence has a less severe impact on the performance of the TV specification test than does overestimation.
Table A1.Specification stage for the deterministic component in volatilities of each of the four banks.α and β are the initial estimates used for calibrating the test statistic distribution.The rolling window method allows the GARCH intercept to adjust to target the unconditional variance in a window of size 400.The 'calm period' selects the continuous period from November 2003 to October 2007, which has very little visible variation in the baseline volatility.For comparison, the GARCH estimates from the entire sample period are reported along with the final estimates from the TV-GARCH model.

Appendix B.2. Evaluation Tests of GARCH Equations
The fact that the evaluation tests discussed in Appendix A.2 are applied to the prefiltered data Pt −1/2 ε t is known to potentially alter the distribution of the test statistic.In this section, we present simulation results that show that the size of the tests remains practically unchanged, rendering the tests applicable in the proposed way.

First step
The individual TVGARCH models are estimated, assuming the series are uncorrelated.
correlation.It is seen that the empirical size of the test is rather close to the nominal one already when N = 2 and T = 100.The size holds up across the various correlation patterns.
Table A3.Size-study: Test of constant correlations.Data are generated as an MTV-CCC with an equicorrelation coefficient of 0.33 (CEC33) and 0.67 (CEC67) and a Toeplitz structure with a correlation coefficient of 0.5 (CTC50) and 0.9 (CTC90).Tests are based on the first-order polynomial approximation.A total of 5000 replications.We next turn to the case D t = I N .Tables A4 and A5 contain results of size simulations where the sensitivity of the test is examined against combinations for the GARCH persistence and kurtosis as well as a selection of strengths of correlations (the equicorrelated and Toepliz ones described above).The test is generally well-sized.However, an interesting aspect is that there is slight oversizing when kurtosis decreases (which means shifting the relative weight from α to β in the GARCH equation, while keeping the persistence constant).A change in persistence does not seem to affect the size of the test.As the dimension of the system increases, the test does not perform equally well.Increasing the sample size does not seem to be able to counteract this (the simulations use T = 500,1000,2000).

CEC33
In yet another simulation (results not reported here), we considered the effects of misspecifying the conditional heteroskedasticity on the correlation test.More specifically, when D t = I N but conditional heteroskedasticity is ignored, the test is, as may be expected, heavily oversized.The obvious conclusion is that the constancy of correlations can only be tested after specifying and estimating both D t and S t .

Appendix C. Details of Maximisation by Parts
This appendix describing the outlines of the estimation algorithm derives from Silvennoinen and Teräsvirta (2021).The estimation proceeds as follows.
For identification reasons, δ 0i , i = 1, . . ., N, is frozen to δ i0 = δ (1,R 1 ) i0 . This frees the intercepts in θ hi .Any positive constant would do for δ i0 ; however, for numerical reasons, the intercepts are fixed to the values they obtain after the first iteration when θ h has not yet been estimated a single time.
In practice, in estimating the slope parameters in transition functions it may be useful to apply the transformation γ ij = exp{η ij }, in which case γ ij need not be restricted when η ij is bounded away from −∞.The motivation for this transformation is that estimating η ij instead of γ ij is numerically convenient in cases where γ ij is large; see Goodwin et al. (2011) or Silvennoinen and Teräsvirta (2016) for discussion.
Another alternative, proposed by Chan and Theoharakis (2011), is to redefine the slope parameter as γ ij = 1/η 2 ij and estimate η ij .The authors show that this also alleviates the convergence problems sometimes found when γ ij is large.Ekner and Nejstgaard (2013) aim at the same effect by rescaling γ ij to vary between zero and one.

Appendix D. Estimated Transition Equations
This appendix contains the estimated deterministic components in the TV-GARCH Equations (standard deviation estimates in parentheses).Note that the intercept is fixed after the first iteration; hence, it does not have a standard deviation estimate.)}) −1 .
The locations of the transitions are remarkably similar across transitions.The first transition of the WBC equation is very slow.The effect of the transition extends over the whole estimation period and is the reason for the post-crisis decline in the value of g 4t ; see Figure 5.The operator vecl(•) stacks the subdiagonal elements of its argument matrix.

Figure 1 .
Figure 1.The market capitalisation of the Big Four as percentage of ASX200 (left) and of ASX200 Financials Index (right).

Figure 2 .Figure 3 .
Figure 2. Daily returns of the Big Four, from 2 January 1992 to 31 January 2020.

Figure 4 .
Figure 4.The first 100 autocorrelations of squared standardised returns ε 2 it / g it .

Figure 6 .
Figure6.Estimated conditional variance h it from the GJR-GARCH model.

Figure 7 .
Figure 7.Estimated conditional variance h it from the TV-GJR-GARCH model.

Figure A2 .
Figure A2.Simulated distributions of GARCH estimates and implied persistence and kurtosis measures for a selection of window widths.The baseline g t has an asymmetric double transition.The dotted vertical lines indicate the true values of the parameters α, β, persistence and kurtosis.

Figure A2 .
Figure A2.Simulated distributions of GARCH estimates and implied persistence and kurtosis measures for a selection of window widths.The baseline g t has an asymmetric double transition.The dotted vertical lines indicate the true values of the parameters α, β, persistence, and kurtosis.

Figure A3 .
Figure A3.Simulated distributions of GARCH estimates and implied persistence and kurtosis measures for a selection of window widths.The baseline g t has two double transitions.The dotted vertical lines indicate the true values of the parameters α, β, persistence and kurtosis.

Figure A3 .
Figure A3.Simulated distributions of GARCH estimates and implied persistence and kurtosis measures for a selection of window widths.The baseline g t has two double transitions.The dotted vertical lines indicate the true values of the parameters α, β, persistence, and kurtosis.

Table 1 .
Univariate estimation results for the four banks.GJR is the GJR-GARCH(1,1) equation, and TV-GJR is the TV-GJR-GARCH equation; standard errors in parentheses.

Table 2 .
Estimation results for the four banks' time-varying correlations.A total of 90% of the estimated transition is between the dates 18 October 2006 and 28 February 2008.The centre point of the location corresponds to 28 June 2007 with ± two standard error ranges of 11 May-13 August 2007.

Table 3 .
Estimation results for the four banks' time-varying bivariate correlations.

Table A4 .
Size-study: Test of constant correlations.Data are generated as an MTV-GARCH-CEC with persistence of 0.95 and 0.97, kurtosis of 4 and 6, and an equicorrelation coefficient of 0.33 and 0.67.Tests are based on the first-order polynomial approximation.A total of 2500 replications.

Table A5 .
Size-study: Test of constant correlations.Data are generated as an MTV-GARCH-CTC with persistence of 0.95 and 0.97, kurtosis of 4 and 6, and a correlation matrix with a Toeplitz structure with a correlation coefficient of 0.5 and 0.9.Tests are based on the first-order polynomial approximation.A total of 2500 replications.
, becauseS t ( θ (1,R 1 ) g) is fixed and does not affect the maximum and P t ( θ(1,R 1 ) ρ) is known.In total, steps 1-4 form the first iteration of the maximisation algorithm.Denote the estimate Estimate θ g from S t (θ g ) keeping D t ( θ