Next Article in Journal
Size of the Company as the Main Determinant of Talent Management in Slovakia
Next Article in Special Issue
Improving Many Volatility Forecasts Using Cross-Sectional Volatility Clusters
Previous Article in Journal
The Predictability of the Exchange Rate When Combining Machine Learning and Fundamental Models
Previous Article in Special Issue
Capital Markets Integration and Cointegration: Testing for the Correct Specification of Stock Market Indices
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analytical Gradients of Dynamic Conditional Correlation Models

by
Massimiliano Caporin
1,*,
Riccardo (Jack) Lucchetti
2 and
Giulio Palomba
2
1
Department of Statistical Sciences, University of Padova, 35122 Padova PD, Italy
2
DISES, Università Politecnica delle Marche, 60121 Ancona AN, Italy
*
Author to whom correspondence should be addressed.
J. Risk Financial Manag. 2020, 13(3), 49; https://doi.org/10.3390/jrfm13030049
Submission received: 31 January 2020 / Revised: 25 February 2020 / Accepted: 28 February 2020 / Published: 4 March 2020
(This article belongs to the Special Issue Financial Time Series: Methods & Models)

Abstract

:
We provide the analytical gradient of the full model likelihood for the Dynamic Conditional Correlation (DCC) specification by Engle (2002), the generalised version by Cappiello et al. (2006), and of the cDCC model by Aielli(2013). We discuss how the gradient might be further extended by introducing elements related to the conditional variance parameters, and discuss the issue arising from the estimation of constrained and/or reparametrised versions of the model. A computational simulation compares analytical versus numerical gradients, with a view to parameter estimation; we find that analytical differentiation yields more efficiency and improved accuracy.

1. Introduction

One of most challenging issues related to the class of Multivariate GARCH (MGARCH) models is the so-called curse of dimensionality: as a rule, the number of parameters of a MGARCH model diverges when the number of modelled variables increases. In general terms, denoting as n the number of variables, the number of parameters in a MGARCH model is O n b with b > 1 : in the VECH model by Engle and Kroner (1995) b = 4 , while in the BEKK model, by the same authors, b = 2 , and for the Dynamic Conditional Correlation (DCC) model by Engle (2002) b = 2 . The solution most commonly adopted to overcome the curse of dimensionality is the introduction of constraints on the parameter space, leading to the so-called diagonal or scalar specifications, or to specifications driven by external information; see, among many others, Engle (2002); Bauwens et al. (2006); Caporin and McAleer (2008); and Caporin and Paruolo (2015). Few efforts have been posed on approaches which make it possible to ease those constraints and keep an MGARCH specification as close as possible to full parametrisation. Our approach goes into that direction, and is related to the work by Lucchetti (2002) on the BEKK model.
A key element for the feasibility of model estimation when the number of parameters is large is the availability of analytical gradients of the likelihood. In this work, we provide the analytical expression of the gradient for three well-known conditional correlation models: the Dynamic Conditional Correlation (DCC) model by Engle (2002), the generalisation proposed by Cappiello et al. (2006), and the more recent Consistent DCC (cDCC) model by Aielli (2013).
Notably, these models are generally estimated by two-step approaches: first, we recover conditional variance parameters by mean of univariate estimations; then, we maximise the conditional correlation likelihood with respect to correlation parameters, conditionally on the estimated variance parameters. In this way, the estimates clearly suffer from a loss of efficiency. However, when the correlation dynamic is highly restricted, for instance leading to scalar parametrisations, the number of parameters is quadratic in the number of variables, that is, b = 2 , but the model estimation is performed by a collection of univariate GARCH likelihoods (one for each marginal) and a correlation likelihood. The latter might take advantage of correlation targeting, thus estimating O n 2 parameters via a sample statistic. Consequently, the model estimation becomes feasible irrespective of n.
While on the one side restricted parametrisations are feasible, on the other side they are extremely limited in terms of economic interpretation of the outcomes; in fact, they most likely neglect all possible interdependence among correlations and variances (i.e., they do not include spillover effects). The introduction of the gradient for the most general DCC specification will allow estimating the fully parametrised specifications when the cross-sectional dimension (n) is not too large, and conditionally to the computing power which is available to the researcher. Alternatively, by adapting the gradient to intermediate specifications might allow reintroducing even limited spillovers within and between correlation and/or variance dynamic.
Our contribution could be potentially considered as a tool for measuring the efficiency loss of the two-step estimation approaches. Moreover, it could be applied in the implementation of full likelihood estimation approaches, and it will be mostly useful for the application of the maximisation by parts estimation methodology as suggested in Fan et al. (2015). However, the complexity of the following analytical formulae represents a serious challenge for the previous purposes.
Other studies have already tackled the problem of deriving the analytical representation of the DCC model gradient—Engle and Sheppard (2001) presents a partial gradient without taking into account its recursive structure, while Hafner and Herwartz (2008) report the gradient of the second stage likelihood of the DCC model of Engle (2002), without considering the derivatives with respect to the first stage parameters. We also mention other works reporting analytical forms for different multivariate GARCH models—Ling and McAleer (2003) focused on their VARMA-GARCH model that nests the constant conditional correlation model of Bollerslev (1990), and Lucchetti (2002) provides the analytical expression for the gradient of the BEKK model of Engle and Kroner (1995). We complement this strand of the financial econometrics literature with respect to the most common conditional correlation models.
Beside the analytical derivatives, and a discussion on how they could be adapted to the introduction of constraints on the model parameters, we also provide empirical evidences supporting the use of analytical score. A simulation experiment shows that the use of the analytical gradient has clear advantages as compared to the numerical gradient, with a sensible reduction in the CPU time, unless in the case of extremely restricted specifications where its impact is detrimental. However, even in these cases analytical differentiation yields vastly improved accuracy.
The paper proceeds as follows. In the next section we provide preliminary elements for the following derivation of the analytical gradient. Section 3 reports several differentials which are then used in Section 4 to obtain the gradient of the DCC model of Engle (2002). Section 4.2 contains the gradient of the cDCC model of Aielli (2013) and of the GDCC model of (Cappiello et al. 2006). Section 5 and Section 6 are devoted to a detailed analysis of the issues that arise when actually building a software implementation of the gradient, together with a simulation in Section 7. Section 8 concludes.

2. Preliminaries

In this section, we establish the notation we will use in the rest of the paper and define explicitly the class of models we will be working on.

2.1. Notation

In the following sections we make use of the following symbols, operators and matrix relations; whenever possible, we adhere to the conventions spelled out in Magnus and Neudecker (1999):
- ι n and 0 n are n-dimensional vectors whose elements are all 1 and 0, respectively;
- I n is the identity matrix of dimension n;
- for a diagonal matrix A we denote by A 1 / 2 the square root of the matrix, where the square root operates element-wise;
- diag a is a diagonal matrix with the vector a on its main diagonal;
- dg A creates a vector from the diagonal of matrix A;
- the δ A operator is defined as δ A diag vec ( A ) ;
- the “:” symbol denotes horizontal concatenation of matrices;
- the operator vech stacks the lower triangular part of a matrix, while the operator vechl stacks the lower triangular part of a matrix excluding the diagonal;
- for a square diagonal matrix A of dimension n with the vector a over the main diagonal, we denote by G n the n 2 × n matrix satisfying vec A = G n a ;
- for a square matrix A of dimension n we denote by K n n the n 2 × n 2 commutation matrix satisfying vec A = K n n vec A ;
- we define the “symmetrisation” matrix as N n = 1 2 I n 2 + K n n , which is symmetric and idempotent with rank n ( n + 1 ) 2 ;
- for a symmetric matrix A, we denote by D n the n 2 × n ( n + 1 ) 2 duplication matrix satisfying D n vech A = vec A ;
- L n is the n ( n + 1 ) 2 × n 2 “elimination matrix”, satisfying L n vec ( A ) = vech ( A ) ;
- for a square matrix A we denote by Δ n the n 2 × n 2 selection matrix satisfying Δ n vec A = vec diag dg A . This matrix eliminates the off-diagonal elements in the vectorisation of A;
- d A denotes the differential of matrix A;
- the symbol ⊙ is used for the Hadamard (element-by-element) matrix product;
- a property we will use repeatedly is: vec A B = δ A vec B where A and B are square matrices of the same order; therefore, vec d B A = vec A d B = δ A vec d B ;
- if R is a correlation matrix, we denote by S n the duplication matrix of dimension n 2 × n ( n 1 ) 2 satisfying vec R = vec I n + S n vechl ( R ) ; then, vec d R = S n vechl ( d R ) .
For selected and relevant matrices we report below them and within parenthesis their dimensions. For general matrix properties and matrix differentials see Magnus and Neudecker (1999) and Lütkepohl (1996).

2.2. Dynamic Conditional Correlation Models

In the following we focus on three dynamic conditional correlation models. The first, our reference model, is the DCC model of Engle (2002). Denote an n-dimensional vector of asset returns by x t and the information set at time t 1 by I t 1 . Since our object of interest here are covariances and correlations, we consider the mean-centred returns ε t = x t E x t | I t 1 and we assume they follow some unspecified conditional density satisfying
ε t | I t 1 D 0 n , Σ t .
The conditional covariance matrix can be split into
Σ t = V t 1 / 2 R t V t 1 / 2 ,
where V t = diag v t is the diagonal matrix of conditional variances, where v t = σ 1 , t 2 , σ 2 , t 2 , , σ n , t 2 , and R t is a dynamic correlation matrix. The conditional variances are governed by the following dynamic equation
v t = ω + A ε t 1 ε t 1 + B v t 1 ,
where ω is an n dimensional parameter vector while A and B are square parameter matrices of dimension n. Note that the adopted specification nests as a special case a set of univariate GARCH(1,1) for all assets (under a diagonality assumption for the parameter matrices A and B ). Note this structure has already been used in Ling and McAleer (2003). The dynamic correlation matrix is further decomposed into
R t = Q ˜ t 1 / 2 Q t Q ˜ t 1 / 2 ,
where Q ˜ t = dg Q t and
Q t = Γ + A η t 1 η t 1 Γ + B Q t 1 Γ ,
in which η t = V t 1 / 2 ε t , A and B are symmetric parameter matrices and Γ is a correlation matrix. We consider here the most general DCC specification. Restricted forms, such as the scalar one, could be obtained as special cases by appropriate restrictions on the matrices A and B. For instance, the scalar model is given by A = α · ι n ι n and B = β · ι n ι n , where α and β are scalar parameters. See Section 6 for a fuller discussion.
The second model is the cDCC model by Aielli (2013), that solves some drawbacks of the DCC model of Engle (2002). The two models are very similar. In fact, the cDCC only replaces Equation (1) with
Q t = Γ + A Q ˜ t 1 1 / 2 η t 1 η t 1 Q ˜ t 1 1 / 2 Γ + B Q t 1 Γ .
The interested reader should refer to Aielli (2013) for additional details on model properties and for the comparison between DCC and cDCC. For the cDCC specification A and B are again two symmetric parameter matrices. Differently from the DCC of Engle (2002), Γ is a symmetric matrix with ones over the main diagonal (not necessarily positive definite).
The third model is the Generalised DCC (GDCC) of Cappiello et al. (2006) that considers a quadratic structure in Equation (1)
Q t = Γ A Γ A B Γ B + A η t 1 η t 1 A + B Q t 1 B ,
where the parameter matrices are not required to be symmetric, Γ is a correlation matrix and the intercept Γ A Γ A B Γ B must be positive definite.
We stress that we choose a very simple dynamic for the conditional variances only for illustrative purposes. More flexible model structures can be used, even with the adoption of different GARCH-type dynamics across the assets. In the following, we provide the gradient for our conditional variance specification, and discuss in more details the implementation and computational aspects of the gradient only referring to the conditional correlation parameters. Such a choice depends on the fact that different specifications for the conditional variance dynamic might lead to different implementation and computational strategies for the full model analytical gradient. We will further discuss this issue in Section 5.

3. Differentials

We report here a set of differentials which will be then used to construct the gradient of DCC models. For simplicity, we focus on the Normal likelihood but our approach can be generalised to other densities. Alternatively, the following results might be anyway used within a Quasi Maximum Likelihood estimation approach. Given a sample of T observations for n time series, the full model likelihood is given by
L T ( θ ) = t = 1 T l t ( θ ) ,
where θ is the parameter vector of the full likelihood, that is
θ = ω : vec A : vec B : vechl Γ : vech A : vech B .
Note that θ has dimension d = n + n 2 + n 2 + n ( n 1 ) 2 + n ( n + 1 ) 2 + n ( n + 1 ) 2 , also written as d = d 1 + d 2 + d 3 + d 4 + d 5 + d 6 . In order to keep notation clean, however, we will omit the explicit dependence of the key quantities from θ and write l t instead of l t ( θ ) and so on for similar cases.
After dropping constant terms, we have
l t 1 2 log | Σ t | 1 2 ε t Σ t 1 ε t .

3.1. Differential of the Full Likelihood

The differential for the likelihood l t is given by
d l t = ε t Σ t 1 d ε t 1 2 d log | Σ t | + 1 2 ε t Σ t 1 d Σ t Σ t 1 ε t .
We now assume that d ε t = 0 n given that we focus on mean residuals and do not take into account possible mean parameters.
By the properties of the trace and vec operators we have
d log | Σ t | = tr Σ t 1 d Σ t = vec Σ t 1 vec d Σ t
and
ε t Σ t 1 d Σ t Σ t 1 ε t = ε t Σ t 1 ε t Σ t 1 vec d Σ t .
Replace (5) and (6) into (4) to obtain
d l t = 1 2 ε t Σ t 1 ε t Σ t 1 vec Σ t 1 vec d Σ t .
We do not report the differential with respect to the non-duplicate elements of Σ t to get a more direct relation with properties of the vec operator and given the model structure.

3.2. Differential of the Conditional Covariance Matrix

We now report the differential of the conditional covariance matrix taking into account its multiplicative structure:
d Σ t = d V t 1 / 2 R t V t 1 / 2 = V t 1 / 2 R t d V t 1 / 2 + d V t 1 / 2 R t V t 1 / 2 + V t 1 / 2 d R t V t 1 / 2 .
We then consider the vec representation of the differential
vec d Σ t = vec V t 1 / 2 R t d V t 1 / 2 + vec d V t 1 / 2 R t V t 1 / 2 + vec V t 1 / 2 d R t V t 1 / 2
and, by properties of the vec operator, we have
vec V t 1 / 2 d R t V t 1 / 2 = V t 1 / 2 V t 1 / 2 vec d R t ,
vec V t 1 / 2 R t d V t 1 / 2 = I n V t 1 / 2 R t vec d V t 1 / 2 ,
and
vec d V t 1 / 2 R t V t 1 / 2 = K n n vec V t 1 / 2 R t d V t 1 / 2 .
For the diagonal matrix of conditional standard deviations V t 1 / 2 the following holds
vec d V t 1 / 2 = 1 2 I n V t 1 / 2 vec d V t ,
where V t is the diagonal matrix of conditional variances.
Then, using the diagonality of V t we have
vec d V t = G n d v t ,
where v t is the vector of conditional variances. After collecting terms we have
vec d Σ t = N n I n V t 1 / 2 R t V t 1 / 2 G n d v t + V t 1 / 2 V t 1 / 2 vec d R t .
Putting together Equations (7) and (9), the differential of the full likelihood becomes
d l t = 1 2 ε t Σ t 1 ε t Σ t 1 vec Σ t 1 × × N n I n V t 1 / 2 R t V t 1 / 2 G n d v t + V t 1 / 2 V t 1 / 2 vec d R t ;
note, however, that the Jacobian term in Equation (7) is the vectorisation of a symmetric matrix, and therefore remains unchanged if multiplied by N n . The expression above, therefore, becomes
d l t = 1 2 ε t Σ t 1 ε t Σ t 1 vec Σ t 1 × × I n V t 1 / 2 R t V t 1 / 2 G n d v t + V t 1 / 2 V t 1 / 2 vec d R t
which can be abbreviated to
d l t = C 1 , t d v t + C 2 , t d R t .
The calculation of the two partial Jacobian terms C 1 , t and C 2 , t is greatly simplified by defining
ξ t V t 1 / 2 Σ t 1 ε t .
The first partial Jacobian C 1 , t becomes
C 1 , t 1 × n = 1 2 ε t Σ t 1 ε t Σ t 1 vec Σ t 1 I n V t 1 / 2 R t V t 1 / 2 G n = 1 2 ε t Σ t 1 ξ t R t V t 1 / 2 vec V t 1 G n
since
I n V t 1 / 2 R t V t 1 / 2 vec Σ t 1 = vec V t 1 / 2 R t V t 1 / 2 Σ t 1 = vec V t 1 .
As for the second one,
C 2 , t 1 × n 2 = 1 2 ξ t ξ t vec R t 1
where the equality
vec Σ t 1 V t 1 / 2 V t 1 / 2 = vec ( V t 1 / 2 Σ t 1 V t 1 / 2 ) = vec ( V t 1 / 2 Σ t V t 1 / 2 ) 1 = vec R t 1
was used.

3.3. Differential of the Vector of Conditional Variances

Now focus on the d v t term, keeping in mind we adopt a simplified parametrisation where all conditional variances follow a GARCH(1,1)-type dynamic, the one adopted in Ling and McAleer (2003). Recalling that v t = ω + A ε t 1 ε t 1 + B v t 1 , we have the relation
d v t = d ω + d A ε t 1 ε t 1 + 2 A ε t 1 d ε t 1 + d B v t 1 + B d v t 1 .
Given that d ε t 1 = 0 n this simplifies into
d v t = d ω + d A ε t 1 ε t 1 + d B v t 1 + B d v t 1 ,
and then by standard differentiation rules we have
d v t = d ω + I n ε t 1 ε t 1 vec d A + I n v t 1 vec d B + B d v t 1 .
We consider the differential with respect to the vec of parameter matrices (removing thus the transpose), obtaining
d v t = d ω + I n ε t 1 ε t 1 K n n vec d A + + I n v t 1 K n n vec d B + B d v t 1 = d ω + W 1 , t vec d A + W 2 , t vec d B + B d v t 1 ,
where
W 1 , t n × n 2 = I n ε t 1 ε t 1 K n n
and
W 2 , t n × n 2 = I n v t 1 K n n .
Note that the matrices A and B need not be symmetric, but in many cases they may have some kind of structure so that the differential with respect to more restricted specifications could be easily obtained. For instance, a collection of univariate GARCH(1,1) corresponds to diagonal parameter matrices, and thus to vec d A = G n d a where a is an n-dimensional parameter vector (similarly for the matrix B ).

3.4. Differential of the Conditional Correlation Matrix

We now focus on vec d R t . Starting from the decomposition of the correlation matrix we have
d R t = d Q ˜ t 1 / 2 Q t Q ˜ t 1 / 2 = d Q ˜ t 1 / 2 Q t Q ˜ t 1 / 2 + Q ˜ t 1 / 2 d Q t Q ˜ t 1 / 2 + Q ˜ t 1 / 2 Q t d Q ˜ t 1 / 2 .
The diagonal matrix Q ˜ t 1 / 2 is a function of elements of Q t , we thus report first its differential with respect to Q t
d Q ˜ t 1 / 2 = d Q ˜ t 1 / 2 1 = Q ˜ t 1 / 2 d Q ˜ t 1 / 2 Q ˜ t 1 / 2
which follows by properties of differentials. Then
d Q ˜ t 1 / 2 = 1 2 Q ˜ t 1 / 2 d Q ˜ t .
By substituting these results into the differential of the conditional correlation matrix we have
d R t = 1 2 Q ˜ t 1 d Q ˜ t R t + Q ˜ t 1 / 2 d Q t Q ˜ t 1 / 2 1 2 R t d Q ˜ t Q ˜ t 1
where we used d Q ˜ t 1 / 2 = Q ˜ t 1 / 2 d Q ˜ t = d Q ˜ t Q ˜ t 1 / 2 , which comes from the diagonality (hence, commutability) of Q ˜ t and Q ˜ t 1 / 2 .
Turning to the vec of the correlation matrix, we have
vec d R t = 1 2 R t Q ˜ t 1 vec d Q ˜ t 1 2 Q ˜ t 1 R t vec d Q ˜ t + + Q ˜ t 1 / 2 Q ˜ t 1 / 2 vec d Q t .
Note that vec d Q ˜ t is the vectorisation of a symmetric matrix, so vec d Q ˜ t = K n n vec d Q ˜ t . As a consequence,
Q ˜ t 1 R t vec d Q ˜ t = Q ˜ t 1 R t K n n vec d Q ˜ t = K n n R t Q ˜ t 1 vec d Q ˜ t
and therefore
1 2 R t Q ˜ t 1 + Q ˜ t 1 R t vec d Q ˜ t = N n R t Q ˜ t 1 vec d Q ˜ t .
Finally, by using vec d Q ˜ t = Δ n vec d Q t , we obtain
vec d R t = C 3 , t vec d Q t ,
where
C 3 , t ( n 2 × n 2 ) = Q ˜ t 1 / 2 Q ˜ t 1 / 2 N n R t Q ˜ t 1 Δ n .

3.5. Differential of the Dynamic Matrix Q T

Note that the dynamic of the DCC model is given by
Q t = Γ + A η t 1 η t 1 Γ + B Q t 1 Γ .
The differential is thus
d Q t = ι n ι n A B d Γ + d A η t 1 η t 1 Γ + d B Q t 1 Γ + + B d Q t 1 + A d η t 1 η t 1 .
For the last term we can write
d η t 1 η t 1 = d V t 1 1 / 2 ε t 1 ε t 1 V t 1 1 / 2 = d V t 1 1 / 2 ε t 1 ε t 1 V t 1 1 / 2 + V t 1 1 / 2 ε t 1 ε t 1 d V t 1 1 / 2 ,
where two differentials have been dropped since here we are assuming for the sake of simplicity1 that d ε t = 0 n for all values of t. Using previous results on the differentials of the conditional standard deviations we have
d V t 1 1 / 2 = 1 2 V t 1 1 / 2 d V t 1 V t 1 1
and using this last result we have
d η t 1 η t 1 = V t 1 1 d V t 1 V t 1 1 / 2 ε t 1 ε t 1 V t 1 1 / 2 V t 1 1 / 2 ε t 1 ε t 1 V t 1 1 / 2 d V t 1 V t 1 1 = 1 2 V t 1 1 d V t 1 η t 1 η t 1 1 2 η t 1 η t 1 d V t 1 V t 1 1 .
Taking the vec of the previous expression we obtain
vec d η t 1 η t 1 = 1 2 η t 1 η t 1 V t 1 1 vec d V t 1 + 1 2 V t 1 1 η t 1 η t 1 vec d V t 1 .
By a reasoning similar to the one in the previous subsection, we note that V t 1 is diagonal, hence symmetric and therefore K n n vec d V t 1 = vec d V t 1 . As a consequence,
vec d η t 1 η t 1 = 1 2 η t 1 η t 1 V t 1 1 vec d V t 1 + 1 2 V t 1 1 η t 1 η t 1 K n n vec d V t 1 = N n V t 1 1 η t 1 η t 1 G n d v t 1 .
For other elements in vec d Q t results follow from properties of differentials and matrix operators. For the Γ matrix we have
vec ι n ι n A B d Γ = δ ι n ι n A B vec d Γ = δ ι n ι n A B S n vechl d Γ .
For the two parameter matrices driving the dynamics we obtain, instead,
vec d A η t 1 η t 1 Γ = vec η t 1 η t 1 Γ d A = δ η t 1 η t 1 Γ vec d A = δ η t 1 η t 1 Γ D n vech d A ,
and
vec d B Q t 1 Γ = vec Q t 1 Γ d B = δ Q t 1 Γ vec d B = δ Q t 1 Γ D n vech d B .
The differential also has a recursive structure, so
vec B d Q t 1 = δ B vec d Q t 1 .
Finally, we must take into account the dependence of elements Q t from the conditional variances.
vec A d η t 1 η t 1 = δ A vec d η t 1 η t 1 = 1 2 δ A η t 1 η t 1 V t 1 1 G n d v t 1 + 1 2 δ A V t 1 1 η t 1 η t 1 G n d v t 1 .
By an analogous argument as in Equation (18), the above can also be written as
vec A d η t 1 η t 1 = δ A N n V t 1 1 η t 1 η t 1 G n d v t 1 .
In order to use a more manageable notation, we introduce the following matrices:
J 1 n 2 × n ( n 1 ) 2 = δ ι n ι n A B S n ,
J 2 , t n 2 × n ( n + 1 ) 2 = δ η t 1 η t 1 Γ D n ,
J 3 , t n 2 × n ( n + 1 ) 2 = δ Q t 1 Γ D n ,
J 4 , t n 2 × n = δ A N n V t 1 1 η t 1 η t 1 G n ,
J 5 n 2 × n 2 = δ B .
After substituting them into the differential we obtain
vec d Q t = J 1 vechl d Γ + J 2 , t vech d A + J 3 , t vech d B + + J 4 , t d v t 1 + J 5 vec d Q t 1 .

4. Gradient of the DCC Model

By differentiating Equation (3), the score for the DCC model is seen to equal
l t θ = C 1 , t v t θ + C 2 , t C 3 , t vec Q t θ .
Furthermore, the derivatives of v t with respect to the various parameter matrices are
v t ω = I n + B v t 1 ω , v t vec A = W 1 , t + B v t 1 vec A , v t vec B = W 2 , t + B v t 1 vec B , v t vechl Γ = 0 , v t vech A = 0 , v t vech B = 0 .
In the same fashion, the derivatives of Q t are
vec Q t ω = J 4 , t v t 1 ω + J 5 vec Q t 1 ω , vec Q t vec A = J 4 , t v t 1 vec A + J 5 vec Q t 1 vec A , vec Q t vec B = J 4 , t v t 1 vec B + J 5 vec Q t 1 vec B , vec Q t vechl Γ = J 1 + J 5 vec Q t 1 vechl Γ , vec Q t vech A = J 2 , t + J 5 vec Q t 1 vech A , vec Q t vech B = J 3 , t + J 5 vec Q t 1 vech B .
We finally represent the gradient in a more compact form as
l t θ = C 1 , t : C 2 , t C 3 , t v t θ vec Q t θ ,
where
v t θ vec Q t θ n + n 2 × dim θ = I n : W 1 , t : W 2 , t : 0 n × d 4 + d 5 + d 6 0 n 2 × d 1 + d 2 + d 3 : J 1 : J 2 , t : J 3 , t + + B 0 n × n 2 J 4 , t J 5 v t 1 θ vec Q t 1 θ .

4.1. Gradient of Other DCC Specifications

Using the previous results, we could derive the gradient of several variants of the DCC model of Engle (2002). In fact, alternative models generally differ from that of Engle (2002), only in the equation governing the correlation dynamic. We provide here an example for the GDCC model of Cappiello et al. (2006), while in the next section we focus on the more recent cDCC model of Aielli (2013).
The differential of the GDCC equation is
d Q t = d Γ d A Γ A A d Γ A A Γ d A + d B Γ B B d Γ B B Γ d B + + d A η t 1 η t 1 A + A d η t 1 η t 1 A + A η t 1 η t 1 d A + + d B Q t 1 B + B d Q t 1 B + B Q t 1 d B .
and taking the vec , with some manipulations and using previous results for the differential of d η t 1 η t 1 , we obtain
vec d Q t = J 1 vechl d Γ + J 2 , t vech d A + J 3 , t vech d B + + J 4 , t d v t 1 + J 5 vec d Q t 1 ,
where
J 1 = I n 2 + A A + B B S n ,
J 2 , t = 2 N n A Γ I n + A η t 1 η t 1 I n ,
J 3 , t = 2 N n B Γ I n + B Q t 1 I n ,
J 4 , t = 1 2 A A N n V t 1 1 η t 1 η t 1 G n ,
J 5 = B B .
Finally, the gradient has exactly the same form as in (27) with the J matrices defined in (30)–(34).

4.2. Gradient of the cDCC Model

The cDCC model of Aielli (2013) differs from the DCC model of Engle (2002) just in the dynamic equation of the Q t matrix. Thus, the two gradients are very similar: the differential of Q t is equal to
d Q t = ι n ι n A B d Γ + d A Q ˜ t 1 1 / 2 η t 1 η t 1 Q ˜ t 1 1 / 2 Γ + + d B Q t 1 Γ + B d Q t 1 + A d Q ˜ t 1 1 / 2 η t 1 η t 1 Q ˜ t 1 1 / 2 .
The difference from the original DCC by Engle (2002) therefore comes from two terms: d A Q ˜ t 1 1 / 2 η t 1 η t 1 Q ˜ t 1 1 / 2 Γ and d Q ˜ t 1 1 / 2 η t 1 η t 1 Q ˜ t 1 1 / 2 . For the former, we simply have
J ˜ 2 , t n 2 × n ( n + 1 ) 2 = δ Q ˜ t 1 1 / 2 η t 1 η t 1 Q ˜ t 1 1 / 2 Γ D n .
On the contrary, the differential for the dynamic correlations shocks must be entirely reconsidered. First note that
d Q ˜ t 1 1 / 2 η t 1 η t 1 Q ˜ t 1 1 / 2 = d Q ˜ t 1 1 / 2 η t 1 η t 1 Q ˜ t 1 1 / 2 + + Q ˜ t 1 1 / 2 d η t 1 η t 1 Q ˜ t 1 1 / 2 + + Q ˜ t 1 1 / 2 η t 1 η t 1 d Q ˜ t 1 1 / 2 .
Taking the vec and using the operator’s properties we obtain
vec d Q ˜ t 1 1 / 2 η t 1 η t 1 Q ˜ t 1 1 / 2 = η t 1 η t 1 Q ˜ t 1 1 / 2 I n vec d Q ˜ t 1 1 / 2 + + Q ˜ t 1 1 / 2 Q ˜ t 1 1 / 2 vec d η t 1 η t 1 + + I n Q ˜ t 1 1 / 2 η t 1 η t 1 vec d Q ˜ t 1 1 / 2 .
Using previous results we also have
vec d Q ˜ t 1 1 / 2 η t 1 η t 1 Q ˜ t 1 1 / 2 = Q ˜ t 1 1 / 2 Q ˜ t 1 1 / 2 vec d η t 1 η t 1 + + 1 2 η t 1 η t 1 Q ˜ t 1 1 / 2 I n + I n Q ˜ t 1 1 / 2 η t 1 η t 1 Q ˜ t 1 1 / 2 vec d Q t 1 .
Using the last equation we can reconsider the recursive element of the differential
vec A d Q ˜ t 1 1 / 2 η t 1 η t 1 Q ˜ t 1 1 / 2 = δ A vec d Q ˜ t 1 1 / 2 η t 1 η t 1 Q ˜ t 1 1 / 2 = = 1 2 δ A Q ˜ t 1 1 / 2 Q ˜ t 1 1 / 2 η t 1 η t 1 V t 1 1 G n d v t 1 + 1 2 δ A Q ˜ t 1 1 / 2 Q ˜ t 1 1 / 2 V t 1 1 η t 1 η t 1 G n d v t 1 + + 1 2 δ A η t 1 η t 1 Q ˜ t 1 1 / 2 I n + I n Q ˜ t 1 1 / 2 η t 1 η t 1 Q ˜ t 1 1 / 2 vec d Q t 1 .
Taking into account these additional elements, and algebraic equalities adopted in the previous section, we have:
J ˜ 4 , t n 2 × n = δ A Q ˜ t 1 1 / 2 Q ˜ t 1 1 / 2 N n V t 1 1 η t 1 η t 1 G n , J ˜ 5 , t n 2 × n 2 = δ B + δ A × N n η t 1 η t 1 Q ˜ t 1 1 / 2 I n Q ˜ t 1 1 / 2 ,
and we stress that the last matrix is now time-varying.
Finally, the cDCC gradient has the following compact form (which is identical to that of the DCC model apart for the content of the four matrices with a tilde)
l t θ = C 1 , t : C 2 , t C 3 , t v t θ vec Q t θ v t θ vec Q t θ n + n 2 × dim θ = I n : W 1 , t : W 2 , t : 0 n × d 4 + d 5 + d 6 0 n 2 × d 1 + d 2 + d 3 : J 1 : J ˜ 2 , t : J 3 , t + + B 0 n × n 2 J ˜ 4 , t J ˜ 5 , t v t 1 θ vec Q t 1 θ .
We note that the differences between the gradient of the DCC of Engle (2002) and of the cDCC of Aielli (2013) are confined to the matrices J ˜ 4 , t , J ˜ 5 , t , and J ˜ 6 , t , where the last one is dynamic for the cDCC while it is static for the DCC. As a result, the implementation of the analytical gradient for the cDCC model would be computationally (marginally) more complex than that for the DCC model.

5. Two-Step Estimation of Dcc-Like Models

The previous sections provide the analytical expressions for the gradient of selected DCC specifications. The availability of analytical gradients has relevant advantages from the model estimation point of view. In fact, they might bring about substantial savings in the CPU time needed for model estimation, particularly so when the model dimension n is large. However, some qualifications have to be made.
When estimating DCC-type models, most practitioners refrain from maximising the full log-likelihood and rely instead on a two-step approach put forward by Engle (2002). In the first step, a battery of n univariate GARCH models is estimated, one for each series; the second step involves the maximisation of the log-likelihood (3) over the parameter matrices Γ , A and B by taking ε t and v t as given and using the estimated residuals and conditional variances in their stead.
Given the structure of the gradient, the two-step approach has adverse consequences on efficiency. In fact, estimating the conditional variance parameters separately from the conditional correlation parameters leads to using numerical procedures for the evaluation of the gradient and the Hessian that neglect the dependence of the conditional variance parameters’ derivatives on the conditional correlation. The availability of the DCC model gradient might be of help even in the case we plan to maintain a two-step estimation approach. In fact, the Maximisation by Parts (MbP) approach of Fan et al. (2015), introduces an iterative approach for a general case that includes DCC-type models. The framework of MbP requires that a log-likelihood function depends on two sets of parameters and can be decomposed as
L T θ 1 , θ 2 = L 1 , T θ 1 + L 2 , T θ 1 , θ 2 .
Imposing the first order conditions leads to
L T θ 1 , θ 2 θ 1 = L 1 , T θ 1 θ 1 + L 2 , T θ 1 , θ 2 θ 1 = 0 ,
L T θ 1 , θ 2 θ 2 = L 2 , T θ 1 , θ 2 θ 2 = 0 .
Fan et al. (2015) suggest estimating the parameters according to an iterative scheme that takes advantage of the fact that one of the two parameters set appears only in one of the two components of the likelihood, and tries at the same time to avoid neglecting the second term of L T θ 1 , θ 2 θ 1 . The MbP approach estimates parameters iterating between the two following systems of first order conditions, where j is the iteration index:
L 1 , T θ 1 θ 1 + L 2 , T θ 1 , θ ^ j 1 , 2 θ 1 = 0 ,
L 2 , T θ ^ j , 1 , θ 2 , θ 2 = 0 .
The DCC log-likelihood is coherent with that framework, as Engle (2002) shows that we can separate the conditional variance contribution to the overall likelihood from the conditional correlation contribution. Moreover, if we let the conditional correlation parameters to be included in the set θ 1 and the conditional variance parameters to be included in θ 2 , coherently with our previous derivations, we have that the conditional variance parameters enter in the log-likelihood of the conditional correlations. The availability of the analytical gradient for a DCC model allows adopting the MbP approach with a computational advantage compared to the use of numerical methods.
However, as the gradient with respect to the conditional variance parameters depend on the GARCH-type specifications adopted, we concentrate from now on only on the implementation of the gradient with respect to the conditional correlation parameters.

6. Software Implementation of Analytical Gradients

The implementation of computer code to translate the formulae of the previous section into actual algorithms is not trivial. First, it should be noted that for efficient numerical implementation of the score several special matrices used here (such as D n , K n n or Δ n ) are selection/permutation matrices. Therefore, pre- or post-multiplication of a matrix (say, A) by a special matrix simply amounts to selecting/reordering the rows and columns of A, thereby avoiding floating-point operations.
Moreover, the full generality of the law of motion for Q t as provided in Equation (1) is rarely needed, and practitioners have focused on several sub-cases, which allow for different reparametrisations, some of which involve parameter restrictions. In all these cases, the parameter vector θ is re-expressed as θ = θ ( ψ ) , where ψ contains the unrestricted parameters for the numerical maximisation algorithm. We present here the analysis for the DCC model by Engle (2002) but similar adaptations are possible for the other DCC specifications as well.
The first example of these restrictions involves the Γ matrix in (1), which could generally be parametrised as a positive semidefinite symmetric matrix with ones on the diagonal. However, the n ( n 1 ) 2 free parameters that Γ contains are often constrained in such a way that Γ equals some prescribed matrix, usually obtained from the data. This idea, introduced in Engle and Mezrich (1996), is very common among prectitioners and goes under the name of “variance targeting”.2
Moreover, the A and B matrices are also often reparametrised as well in several ways:
Cholesky factorisation 
The A and B matrices are expressed as
A = H A H A , B = H B H B ,
where H A and H B are lower triangular matrices, ensuring that A and B are positive semi-definite. In this case, ψ = [ vech ( H A ) : vech ( H B ) ] and θ = θ ( ψ ) is a bijective transformation and no constraints are set.
Rank-1 specification 
The A and B matrices are expressed as
A = a a , B = b b ,
where a and b are column n-vectors. Note that this is a special case of the one above, where A and B are restricted to be rank 1 matrices. Here, ψ = [ a : b ] is a 2 n × 1 vector and the transformation is not invertible. This parametrisation implies n 2 n constraints; as a consequence, the dimension of ψ is O ( n ) .
Scalar DCC 
The A and B matrices are expressed as
A = α · ι n ι n , B = β · ι n ι n ,
where α and β are positive scalars. Note that this is a restricted case of the above, where a = α · ι n and b = β · ι n . Here, ψ = [ α : β ] (a 2 × 1 vector) and n 2 + n 2 constraints are implied; hence, the dimension of ψ is fixed at 2, irrespective of n.
All these reparametrisations have the advantage that the unknown element of ψ can vary freely and no numerical check on the symmetry and positive-semidefinitness on A and B is required.
The analytical gradient with respect to the unconstrained parameters ψ is readily obtained by post-multiplying Equation (4) by the Jacobian term Ψ = θ ψ , that could be written more extensively as
Ψ dim ( θ ) × dim ( ψ ) = θ ψ = vechl Γ ψ vech A ψ vech B ψ = Ψ Γ Ψ A Ψ B .
Note that the matrix Ψ is time-invariant, so it can be computed just once: in practice, once a loop over t = 1 T has been run on Equation (25) and the results have been stored into a T × dim ( θ ) matrix, all is needed for computing the analytical score with respect to ψ is post-multiplying this matrix by Ψ .
If the Γ matrix is estimated without restrictions, then Ψ Γ = I n ( n 1 ) 2 ; instead, under variance targeting Γ contains no parameters at all and Ψ Γ simply vanishes.
As for Ψ A , for the three cases listed above, we have that,
d vech ( A ) = 2 L n N n H A I n D n d vech H A Cholesky 2 L n N n a I n d a rank 1 L n ι n 2 d a = ι n ( n + 1 ) 2 d α scalar
so that
Ψ A = 2 L n N n H A I n D n Cholesky 2 L n N n a I n rank 1 ι n ( n + 1 ) 2 scalar
and similar expressions hold, mutatis mutandis, for the matrix B.

7. Empirical Example

In order to give the reader an idea of the computational advantages that analytical scores bring about, we ran the following experiment: we generated random data for T = 512, 1024, 2048 and evaluated the score at a given ψ , for four different DCC specification.
We evaluate CPU time and precision in the computation of the derivative of l ( θ ; x t ) ; this task is accomplished by taking a specific data point x t , t = 1 T ; note that the evaluation of CPU time and precision has nothing to do with the way the specific sample x t was generated, as long as the point x t belongs to the support of the density whose log-likelihood we are evaluating.
However, to make sure this is the case, we ran the experiment several times, on data points that belong to the support of densities that can conceivably be of interest to the practitioners, such as such as those that represent persistency at high frequencies, jumps and long-memory (see, for example, Gradojevic et al. 2020);3 we found no appreciable differences and the additional tables, not included in the paper for the sake of completeness but available upon request from the authors, are completely in line with the one reported in this section.
The DGPs we chose can be ordered in increasing order of restrictedness; we use the following sets of parameters:
  • Unrestricted (no transformation): Γ = I n , A = 0.1 · P n and B = 0.9 · P n , where P n = I n + n 1 ι n ι n ; therefore,
    ψ = 0 n ( n 1 ) 2 : 0.1 · vech ( P n ) : 0.9 · vech ( P n ) ;
  • Cholesky: Γ = I n , H A = 0.1 · H n and H B = 0.9 · H n , where H n is a lower-triangular matrix equal to P n in its non-zero elements; therefore,
    ψ = 0 n ( n 1 ) 2 : 0.1 · vech ( P n ) : 0.9 · vech ( P n ) ;
  • rank-1: Γ = I n , a = 0.1 · ι n and b = 0.9 · ι n ; therefore,
    ψ = 0 n ( n 1 ) 2 : 0.1 · ι n : 0.9 · ι n ;
  • scalar: Γ = I n , α = 0.1 and β = 0.9 ; therefore,
    ψ = 0 n ( n 1 ) 2 : 0.1 : 0.9 .
The same specifications were also analysed under the variance targeting setup, where of course the first block of ψ drops out.
The score was computed both analytically and numerically. For each parameter configuration, Table 1 and Table 2 report two indicators, respectively:
  • the CPU timings ratio ρ = t N t A , where t N and t A are the CPU times for the numerical and analytical gradient, respectively;
  • an indicator of the numerical accuracy of numerical derivatives, d max , given the max absolute difference between numerical and analytical gradients.4
Results show quite clearly that analytical differentiation is uniformly advantageous in terms of precision, and increasingly so as dim ( ψ ) gets large, either because of larger n or fewer constraints on the model. The quality of numerical approximation also degrades uniformly as the sample size T grows. The comparison was performed by using the central difference formula (see for example, Epperson 2013, Section 2.2) for numerical derivatives; obviously, more sophisticated methods, such as Richardson extrapolation, would yield more accuracy but would also imply heavier CPU usage.
On the contrary, analytical derivatives seem to yield a speed premium only when ψ is O ( n 2 ) or more, as the formulae presented in this paper entail a computational overhead that seems roughly to be O ( n ) . As can be seen, analytical derivatives dominate numerical ones in all those situations when the number of unrestricted parameters gets large. Interestingly, the opposite happens when constraints are pushed to the extreme: in the scalar model with variance targeting, the number of unrestricted parameters is fixed at 2 no matter what the model size is, and analytical derivatives are actually detrimental, and increasingly so as n gets larger. Sample size does not seem to make much difference, instead.
For any realistic estimation problem, therefore, the combined effect of increased precision and quicker computation make it advisable to employ analytical derivatives any time the model being estimated is more general than the simple scalar model: even if CPU time were roughly equivalent in the two cases, the deterioration in precision one gets with numerical derivatives would imply a substantial increase in the number of iterations needed to reach the maximum of the log-likelihood function.
A word of warning: our results should be considered only suggestive, since the actual hardware and software setup may make quite a large difference: for example, the results reported in Table 2 were obtained on a Linux machine with 20 physical CPU cores, using the gretl package linking to the OpenBLAS matrix library (Wang et al. 2013). Results may be different on other hardware/software combinations.

8. Concluding Remarks

The formulae we developed make it possible to compute the analytical score for the DCC model quickly and efficiently. Compared to the numerical derivatives modern statistical packages provide, the advantages are twofold: increased precision and computational speed.
However, while increased precision depends rather uniformly on the size of the model being estimated, the effects on CPU time are worth some more detailed comment: generally speaking, analytical derivatives are especially beneficial for larger models when the parameters of the law of motion for the conditional correlation matrix are not overly constrained. In the extreme case of the scalar DCC model with covariance targeting, the computational overhead implied by analytically differentiating the log-likelihood offsets completely the advantages. In all other cases, analytical derivatives provide substantial savings in computing time, often dramatic. Sample size, on the other hand, does not seem to affect the relative speed of analytical versus numerical gradient calculation very much.
In the light of the enormous importance of modelling covariance for large numbers of variables using parametric structures which do not impose an excessive number of constraints, the usage of analytical derivatives seems to be imperative in all cases when modelling accurately the time profile of conditional correlations is essential.
Such situations arise frequently in practice, and a few examples can be given: first, conditional correlation models are relevant for risk management, particularly so when the portfolio includes a great number of assets. Second, these models may prove highly beneficial in asset allocation problems, since predictions on both the conditional mean and the conditional covariance matrix are required to determine the optimal portfolio composition; in these cases the number of assets in the investment universe is often rather large, so computational complexity is likely to be a very real issue. Third, the possibility of estimating full DCC models in medium-sized problems makes it possible to account for spillovers and interdependence between conditional variances and within conditional correlations, and thereby analyze the diffusion of shocks within and/or between financial markets in greater detail.

Author Contributions

Conceptualization, M.C.; methodology, M.C. and R.J.L.; software, R.J.L. and G.P.; investigation, G.P.; data curation, R.J.L and G.P.; writing—original draft preparation, M.C.; writing—review and editing, M.C., R.J.L. and G.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Aielli, Gian Piero. 2013. Dynamic conditional correlation: On properties and estimation. Journal of Business & Economic Statistics 31: 282–99. [Google Scholar]
  2. Bauwens, Luc, Sebastien Laurent, and Jeroen K. Rombouts. 2006. Multivariate GARCH models: A survey. Journal of Applied Econometrics 21: 79–109. [Google Scholar] [CrossRef] [Green Version]
  3. Bollerslev, Tim. 1990. Modelling the coherence in short-run nominal exchange rates: A multivariate generalized ARCH model. Review of Economics and Statistics 72: 498–505. [Google Scholar] [CrossRef]
  4. Caporin, Massimiliano, and Michael McAleer. 2008. Scalar BEKK and Indirect DCC. Journal of Forecasting 27: 537–49. [Google Scholar] [CrossRef]
  5. Caporin, Massimiliano, and Paolo Paruolo. 2015. Proximity-structured multivariate volatility models. Econometric Reviews 34: 559–93. [Google Scholar] [CrossRef]
  6. Cappiello, Lorenzo, Robert F Engle, and Kevin Sheppard. 2006. Asymmetric dynamics in the correlations of global equity and bond returns. Journal of Financial Econometrics 4: 537–72. [Google Scholar] [CrossRef]
  7. Engle, Robert. 2002. Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business & Economic Statistics 20: 339–50. [Google Scholar]
  8. Engle, Robert F, and Kenneth F Kroner. 1995. Multivariate simultaneous generalized ARCH. Econometric Theory 11: 122–50. [Google Scholar] [CrossRef]
  9. Engle, Robert F., and J. Mezrich. 1996. GARCH for Groups. Risk 9: 36–40. [Google Scholar]
  10. Engle, Robert F, and Kevin Sheppard. 2001. Theoretical and Empirical Properties of Dynamic Conditional Correlation Multivariate GARCH. Technical Report. Cambridge: National Bureau of Economic Research. [Google Scholar]
  11. Epperson, James F. 2013. An Introduction to Numerical Methods and Analysis, 2nd ed. Hoboken: Wiley Publishing. [Google Scholar]
  12. Fan, Yanqin, Sergio Pastorello, and Eric Renault. 2015. Maximization by parts in extremum estimation. The Econometrics Journal 18: 147–71. [Google Scholar] [CrossRef]
  13. Gradojevic, Nikola, Deniz Erdemlioglu, and Ramazan Gençay. 2020. A new wavelet-based ultra-high-frequency analysis of triangular currency arbitrage. Economic Modelling 85: 57–73. [Google Scholar] [CrossRef]
  14. Hafner, Christian M, and Helmut Herwartz. 2008. Analytical quasi maximum likelihood inference in multivariate volatility models. Metrika 67: 219–39. [Google Scholar] [CrossRef] [Green Version]
  15. Ling, Shiqing, and Michael McAleer. 2003. Asymptotic theory for a vector ARMA-GARCH model. Econometric Theory 19: 280–310. [Google Scholar] [CrossRef] [Green Version]
  16. Lucchetti, Riccardo. 2002. Analytical score for multivariate GARCH models. Computational Economics 19: 133–43. [Google Scholar] [CrossRef]
  17. Lütkepohl, Helmut. 1996. Handbook of Matrices. Hoboken: John Wiley & Sons. [Google Scholar]
  18. Magnus, Jan R., and Heinz Neudecker. 1999. Matrix Differential Calculus with Applications in Statistics and Econometrics, 2nd ed. Hoboken: John Wiley. [Google Scholar]
  19. Pelletier, Denis. 2006. Regime switching for dynamic correlations. Journal of Econometrics 131: 445–73. [Google Scholar] [CrossRef]
  20. Wang, Qian, Xianyi Zhang, Yunquan Zhang, and Qing Yi. 2013. AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs. In Presented at the SC’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, 17–22 November 2013; pp. 1–12. [Google Scholar]
1
Obviously, in a realistic application the conditional expectation of the observables x t has to be modelled somehow.
2
A possible alternative reparametrisations, based on hyperbolic trigonometric functions, is the one proposed in Pelletier (2006).
3
We thank an anonymous referee for bringing up this point.
4
One may think that a discrepancy should not mean in itself that numerical derivatives are “wrong” and analytical ones are “right”. The analytical results were carefully checked for small values of n and were found to be correct; since the formulae employed in the score calculations in Section 3 and Section 4 do not contain operations that should be particularly sensitive to rounding errors, we are confident in their validity. Nevertheless, to be on the safe side, we repeated the comparison using a slower but more accurate algorithm for computing numerical derivatives and found that d max decreased uniformly, as expected.
Table 1. Maximum distance indices ( d max ).
Table 1. Maximum distance indices ( d max ).
Model1: No Tranformation2: Cholesky
No TargetingTargetingNo TargetingTargeting
T51210242048512102420485121024204851210242048
n = 2 0.00030.00030.00250.00030.00030.00320.00020.00020.00200.00020.00020.0020
n = 3 0.00120.00250.00520.00120.00250.00780.00100.00290.00350.00100.00290.0035
n = 4 0.00140.00520.00820.00140.00520.01640.00170.00510.01230.00170.00510.0123
n = 5 0.00400.00770.01930.00400.00770.03910.00490.00860.03040.00490.00860.0304
n = 6 0.00440.02390.04100.00440.02390.05640.00530.01240.06480.00530.01240.0648
n = 7 0.00480.02180.07090.00480.02180.05800.00750.02650.06220.00750.02650.0622
n = 8 0.01260.03770.11040.01260.03770.07760.01040.02860.09250.01040.02860.0925
n = 9 0.01290.05700.11270.01290.05700.09270.01310.03750.12160.01310.03750.1216
n = 10 0.02530.06540.17460.02530.06540.14920.02120.06210.21290.02120.06210.2129
n = 15 0.03990.12890.41500.03990.12890.36920.03950.10860.35340.03950.10860.3534
n = 20 0.09270.26190.86560.09270.26190.70930.09190.29000.60510.09190.29000.6052
n = 25 0.18500.45361.14830.18500.45360.96300.13210.44740.98380.13210.44740.9838
model3: Rank-14: Scalar
No TargetingTargetingNo TargetingTargeting
T 51210242048512102420485121024204851210242048
n = 2 8.3 × 10 5 0.00020.00078.3 × 10 5 0.00020.00070.00020.00020.00060.00020.00020.0006
n = 3 6.6 × 10 5 0.00050.00146.6 × 10 5 0.00050.00141.9 × 10 5 0.00040.00061.9 × 10 5 0.00040.0006
n = 4 0.00020.00090.00080.00020.00090.00083.5 × 10 5 0.00040.00313.5 × 10 5 0.00040.0031
n = 5 0.00040.00090.00220.00040.00090.00221.4 × 10 5 0.00055.4 × 10 5 1.4 × 10 5 0.00055.4 × 10 5
n = 6 0.00020.00120.00450.00020.00120.00450.00030.00050.00030.00030.00050.0003
n = 7 0.00060.00180.00260.00060.00180.00260.00070.00170.00260.00070.00170.0026
n = 8 0.00040.00160.00430.00040.00160.00430.00100.00140.00060.00100.00140.0006
n = 9 0.00090.00260.00580.00090.00260.00583.3 × 10 5 0.00050.00183.3 × 10 5 0.00050.0018
n = 10 0.00060.00220.00310.00060.00220.00310.00050.00070.00550.00050.00070.0055
n = 15 0.00100.00390.01520.00100.00390.01520.00050.00200.00050.00050.00200.0005
n = 20 0.00180.00630.01120.00180.00630.01120.00260.00310.03110.00260.00310.0311
n = 25 0.00220.00550.01740.00220.00550.01740.00650.03290.02540.00650.03290.0254
T is sample size, while n is the number of series in the model.
Table 2. Time ratios t N / t A .
Table 2. Time ratios t N / t A .
Model1: No Tranformation2: Cholesky
No TargetingTargetingNo TargetingTargeting
T51210242048512102420485121024204851210242048
n = 2 1.471.531.481.471.421.271.591.496.541.511.431.40
n = 3 2.963.032.942.772.672.493.122.962.982.832.732.61
n = 4 5.035.074.894.424.443.925.214.965.034.574.334.28
n = 5 7.557.597.306.416.445.977.607.367.496.606.306.17
n = 6 5.575.685.554.714.734.255.585.545.755.144.714.53
n = 7 7.316.917.015.885.915.407.047.067.125.945.845.50
n = 8 8.548.248.336.917.466.338.458.488.617.066.906.45
n = 9 9.479.239.417.738.376.609.519.659.637.857.747.02
n = 10 10.419.799.478.068.268.1410.729.9610.408.198.117.69
n = 15 11.6511.1412.428.789.038.3312.2511.4811.579.148.978.33
n = 20 8.848.458.896.136.036.259.028.718.936.405.976.38
n = 25 7.507.717.685.064.995.137.727.637.545.885.195.13
model3: Rank-14: Scalar
No TargetingTargetingNo TargetingTargeting
T 51210242048512102420485121024204851210242048
n = 2 1.111.071.090.931.010.920.700.680.710.580.550.56
n = 3 1.871.901.801.301.321.421.061.041.070.540.530.55
n = 4 2.783.151.321.711.591.831.681.571.670.550.540.54
n = 5 3.944.333.782.071.972.192.372.262.320.530.510.53
n = 6 2.732.612.691.331.261.401.711.671.800.270.280.28
n = 7 3.353.143.311.411.351.522.152.102.160.250.260.26
n = 8 3.824.053.831.531.451.582.602.542.570.240.240.24
n = 9 4.143.974.121.501.301.612.892.822.900.210.210.22
n = 10 4.484.194.421.511.391.503.062.933.140.190.180.19
n = 15 4.484.364.761.071.021.153.443.552.020.090.090.10
n = 20 3.333.413.350.600.630.582.792.693.130.040.040.04
n = 25 2.872.982.720.420.420.402.452.442.480.020.020.02
T is sample size, while n is the number of series in the model.

Share and Cite

MDPI and ACS Style

Caporin, M.; Lucchetti, R.; Palomba, G. Analytical Gradients of Dynamic Conditional Correlation Models. J. Risk Financial Manag. 2020, 13, 49. https://doi.org/10.3390/jrfm13030049

AMA Style

Caporin M, Lucchetti R, Palomba G. Analytical Gradients of Dynamic Conditional Correlation Models. Journal of Risk and Financial Management. 2020; 13(3):49. https://doi.org/10.3390/jrfm13030049

Chicago/Turabian Style

Caporin, Massimiliano, Riccardo (Jack) Lucchetti, and Giulio Palomba. 2020. "Analytical Gradients of Dynamic Conditional Correlation Models" Journal of Risk and Financial Management 13, no. 3: 49. https://doi.org/10.3390/jrfm13030049

Article Metrics

Back to TopTop