Multivariate Student versus Multivariate Gaussian Regression Models with Application to Finance

Nguyen, Thi Huong An; Ruiz-Gazen, Anne; Thomas-Agnan, Christine; Laurent, Thibault

doi:10.3390/jrfm12010028

Open AccessArticle

Multivariate Student versus Multivariate Gaussian Regression Models with Application to Finance

by

Thi Huong An Nguyen

^1,2,†,

Anne Ruiz-Gazen

^1,*,†,

Christine Thomas-Agnan

^1,† and

Thibault Laurent

^3,†

¹

Toulouse School of Economics, University of Toulouse Capitole, 21 allée de Brienne, 31000 Toulouse, France

²

Department of Economics, DaNang Architecture University, Da Nang 550000, Vietnam

³

Toulouse School of Economics, CNRS, University of Toulouse Capitole, 31000 Toulouse, France

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

J. Risk Financial Manag. 2019, 12(1), 28; https://doi.org/10.3390/jrfm12010028

Submission received: 29 December 2018 / Revised: 24 January 2019 / Accepted: 31 January 2019 / Published: 9 February 2019

(This article belongs to the Special Issue Applied Econometrics)

Download

Browse Figures

Versions Notes

Abstract

:

To model multivariate, possibly heavy-tailed data, we compare the multivariate normal model (N) with two versions of the multivariate Student model: the independent multivariate Student (IT) and the uncorrelated multivariate Student (UT). After recalling some facts about these distributions and models, known but scattered in the literature, we prove that the maximum likelihood estimator of the covariance matrix in the UT model is asymptotically biased and propose an unbiased version. We provide implementation details for an iterative reweighted algorithm to compute the maximum likelihood estimators of the parameters of the IT model. We present a simulation study to compare the bias and root mean squared error of the ensuing estimators of the regression coefficients and covariance matrix under several scenarios of the potential data-generating process, misspecified or not. We propose a graphical tool and a test based on the Mahalanobis distance to guide the choice between the competing models. We also present an application to model vectors of financial assets returns.

Keywords:

multivariate regression models; heavy-tailed data; Mahalanobis distances; maximum likelihood estimator; independent multivariate Student distribution; uncorrelated multivariate Student distribution

1. Introduction

Many applications involving models for multivariate data underline the limitations of the classical multivariate Gaussian model, mainly due to its inability to model heavy tails. It is then natural to turn attention to a more flexible family of distributions, for example the multivariate Student distribution.

In one dimension, the generalized Student distribution encompasses the Gaussian distribution as a limit when the number of degrees of freedom or shape parameter tends to infinity, allowing for heavier tails when the shape parameter is small. As we will see, a first difficulty in higher dimensions is that there are several kinds of multivariate Student distributions; see for example Johnson and Kotz (1972) and more recently Kotz and Nadarajah (2004). A nice summary of the properties of the multivariate Student distribution that we will use later on in this paper, and its comparison with the Gaussian multivariate, can be found in Roth (2013).

Before going further, let us mention that it is not so easy to have a clear overview of the results in terms of Student regression models for at least three reasons. The first reason is that this topic is scattered, with some papers in the statistical literature and others in the econometrics literature, sometimes without cross-referencing. The second reason is that the word “multivariate” is sometimes misleading since, as we will see, the multivariate Student is used to define a univariate regression model. At last, the distinction between models UT and IT (see below) is not always clearly announced in the papers. Other miscellaneous reasons are that some authors just fit the distribution without covariates and finally that some authors consider the degrees of freedom as fixed, whereas others estimate it. Our first purpose here is to lead the reader through this literature and gather the results concerning the maximum likelihood estimators of the parameters in the multivariate UT and IT models with a common notation. In the present paper, we consider a multivariate dependent vector and a linear regression model with different assumptions on the error term distribution. The most common and convenient assumption is the Gaussian distribution. For a Gaussian vector, the assumption of independent coordinates is equivalent to the assumption of uncorrelated coordinates. Such an equivalence is no longer true when considering a multivariate Student distribution. We thus consider two cases: uncorrelated (UT) on the one hand and independent Student (IT) error vectors on the other hand.

The purpose of this paper is to contribute to the UT and IT models as well as to their comparisons. First of all, for the UT model, we extend to the multivariate case the results of Zellner (1976) for the derivation of the maximum likelihood estimators and Zellner’s formula (Zellner (1976)) for the bias of the covariance matrix estimator, and we prove that it does not vanish asymptotically. For the multivariate IT model, in the same spirit as Lange and Sinsheimer (1993), we provide details for the implementation of an iterative reweighted algorithm to compute the maximum likelihood estimators of the parameters. We devise a simulation study to measure the impact of misspecification on the bias, variance, and mean squared error of these different parameters’ estimates under several data-generating processes (Gaussian, UT, and IT) and try to answer the question: what are the consequences of a wrong specification? Finally we introduce a new procedure for model selection based on the knowledge of the distribution of the Mahalanobis distances under the different data-generating processes (DGP).

One application attracted our attention in the finance literature. The work in Platen and Rendek (2008) identified the Student distribution with between three and five degrees of freedom, with a concentration around four, as the typical distribution for modeling the distribution of log-returns of world stock indices. They embedded the Student t in the class of generalized hyperbolic distributions, itself a subclass of the normal/independent family. For bivariate returns, the work in Fung and Seneta (2010) compared a multivariate Student IT model with an alternative model obtained by a more complex mixing representation from the point of view of asymptotic tail dependence. The work in Hu and Kercheval (2009) insisted on the fact that the choice of distribution matters when optimizing the portfolio. They found that the Student UT model performs the best in the class of symmetric generalized hyperbolic distributions. The work in Kan and Zhou (2017) advocated using a multivariate IT model for fitting the joint distribution of stock returns for a few fixed values of the degrees of freedom parameter and showed that this model outperforms the multivariate Gaussian.

In Section 2, after recalling the univariate results, we extend the results of Zellner (1976) for the derivation of the maximum likelihood estimators and its properties in the UT model and propose an iterative implementation for the IT model. We present the results of the simulation study in Section 3 and of the model selection strategy in Section 4 using a toy example and a dataset from finance. Section 5 summarizes the findings and gives recommendations.

2. Multivariate Regression Models

2.1. Literature Review

In order to define a Student regression model, even in the univariate case (single dependent variable), one needs to use the multivariate Student distribution to describe the joint distribution of the vector of observations for the set of statistical units. There are mainly two options, which were described in Kelejian and Prucha (1985) for the case of univariate regression. Indeed, the property of the equivalence between the independence and uncorrelatedness for components of a Gaussian vector are not satisfied anymore for a multivariate Student vector. One option, which we will call the IT model (for independent t-distribution) in the sequel, considers that the components of the random disturbance vector of the regression model are independent with the same marginal Student distribution. The second option, which we will call the UT model (for uncorrelated t-distribution), postulates a joint multivariate Student distribution for the vector of disturbances. Note that in both models, the marginal distribution of each component still is Student univariate.

The work in Zellner (1976) introduced a univariate Student regression model of the type UT with known degrees of freedom and studied the corresponding maximum likelihood and Bayesian estimators (with some adapted priors). The work in Singh (1988) considered the case of univariate Student regression with the UT model and with unknown degrees of freedom and derived an estimator of the degrees of freedom and subsequent estimators of the other parameters. However, Fernandez and Steel (1999) showed that this estimator was not consistent. Using one possible representation of the multivariate Student distribution, Lange and Sinsheimer (1993) embedded univariate Student regression with the UT model in a larger family of regression models (with normal/independent error distributions) and developed EM algorithms to compute their maximum likelihood estimates, as in Dempster et al. (1978).

In the framework of the spherical error distribution, which includes the Student error model as a special case, the work in Fraser and Ng Kai (1980) proved an extension to the multivariate case of Zellner’s result stating that inference about the parameters corresponds closely to that under normal theory. Motivated by a financial application, the work in Sutradhar and Ali (1986) used a multivariate UT Student regression model with moment estimators instead of maximum likelihood and allowing the degrees of freedom to be unknown.

The univariate IT model was introduced in Fraser (1979) and compared to the UT model in Kelejian and Prucha (1985).

Concerning multivariate IT Student distributions, there was first a collection of results or applications for the case without regressors. The work in McNeil et al. (2005) used a representation of the multivariate IT Student distribution to derive an algorithm of the EM type for computing the maximum likelihood parameter estimators. They used the framework of normal mixture distributions in which the Student distribution can be expressed as a combination of a Gaussian random variable and an inverse gamma random variable. More recently, the work in Dogru et al. (2018) proposed a more robust extension, replacing maximum likelihood by a kind of M-estimation method based on the minimization of a q-entropy criterion. For the multivariate Student IT model, the work in Prucha and Kelejian (1984) derived the normal equations for the maximum likelihood estimators and their asymptotic properties with known degrees of freedom in a framework that encompasses our multivariate Student regression case. The work in Lange et al. (1989) illustrated this multivariate IT model on several examples. The work in Lange and Sinsheimer (1993) considered the framework of normal/independent error distributions (same as normal variance mixtures) and derived the EM algorithm for the maximum likelihood estimators in a model with covariates. The works in Liu and Rubin (1995) and Liu (1997) developed extensions of the EM algorithm for the multivariate IT model with known or unknown degrees of freedom, with or without covariates and with or without missing data. The work in Katz and King (1999) fit a multivariate IT distribution to multiparty electoral data. The work in Fernandez and Steel (1999) attracted attention to the fact that maximum likelihood inference can encounter problems of unbounded likelihood when the number of degrees of freedom is considered unknown and has to be estimated. Before engaging in the use of the multivariate Student distribution, it is wise to read Hofert (2003), which explained some traps to be avoided. One difficulty indeed is to be aware that some authors parametrize the multivariate Student distribution using the covariance matrix, while others use the scatter matrix, sometimes with the same notation for either one.

We consider the following version of the Student p-multivariate distribution denoted by

T_{p} (μ, Σ, ν)

with

μ

being the p-vector of means,

Σ

being the

p \times p

covariance matrix, and

ν > 2

the degrees of freedom. It is defined, for a p-vector

z

, by the probability density function:

p (z | μ, Σ, ν) = \frac{f (ν)}{det {(Σ)}^{1 / 2}} {[1 + \frac{1}{ν - 2} {(z - μ)}^{T} Σ^{- 1} (z - μ)]}^{- (ν + p) / 2},

(1)

where

^{T}

denotes the transpose operator,

f (ν) = \frac{Γ [(ν + p) / 2]}{Γ (ν / 2) {(ν - 2)}^{p / 2} π^{p / 2}}

and

Γ

is the usual Gamma function.

Note that the assumption

ν > 2

implies the existence of the first two moments of the distribution and that the above density function is parametrized in terms of the covariance matrix. In most of the literature on multivariate Student distributions, the density is rather parametrized as a function of the scatter matrix

((ν - 2) / ν) Σ

. Using the covariance matrix parametrization facilitates the comparison with the Gaussian distribution. We first recall some results in the univariate regression context.

2.2. Univariate Regression Case Reminder

In the univariate regression case and for a sample of size n, we have a one-dimensional dependent variable

Y_{i}

,

i = 1, \dots, n

, whose values are stacked in a vector

Y

, and K explanatory variables defining a

n \times (K + 1)

design matrix

X

including the constant.

The regression model is written as

Y = X β + ϵ,

where

β = {(β_{0}, \dots, β_{K})}^{T}

is a

(K + 1)

-dimensional vector of parameters and the error term

ϵ = {(ϵ_{1}, \dots, ϵ_{n})}^{T}

is an n-dimensional vector. If we consider that the design matrix is fixed with rank

K + 1

or look at the distribution of

ϵ

conditional on

X

, the usual assumptions are the following. The errors

ϵ_{i}

,

i = 1, \dots, n

, are independent and identically distributed (i.i.d.) with expectation zero and equal variance

σ^{2}

. In this context, it is well known that the least squares estimator of

β

is equal to:

\hat{β} = {(X^{T} X)}^{- 1} X^{T} Y

(2)

while the classical

σ^{2}

estimator is

{\hat{σ}}^{2} = {\hat{ϵ}}^{T} \hat{ϵ} / (n - K - 1)

where

\hat{ϵ} = Y - X \hat{β}

. These estimators are unbiased. In the case of a Gaussian error distribution, the estimator

\hat{β}

coincides with the maximum likelihood estimator of

β

, while the maximum likelihood estimator of

σ^{2}

is equal to

{\hat{σ}}^{2}

multiplied by

(n - K - 1) / n

and is only asymptotically unbiased. In the Gaussian case, there is an equivalence between the

ϵ_{i}

being independent or uncorrelated. However, this property is no longer true for a Student distribution. This means that one should distinguish the case of uncorrelated errors from the case of independent errors. The case where the errors

ϵ_{i}

,

i = 1, \dots, n

, follow a joint n-dimensional Student distribution with diagonal covariance matrix and equal variance is called the UT model, and its coordinates are uncorrelated, but not independent. Interestingly, the maximum likelihood method for the UT model with known degrees of freedom leads to the least squares estimator (2) of

β

(Zellner (1976)). This property is true for more general distributions as long as the likelihood is a decreasing function of

ϵ^{T} ϵ

. Concerning the error variance, the maximum likelihood estimator is

(n - K - 1) ν {\hat{σ}}^{2} / (n (ν - 2))

and is biased even asymptotically Zellner (1976). For the independent case, we assume that the errors

ϵ_{i}

,

i = 1, \dots, n

, are i.i.d. with a Student univariate distribution and known degrees of freedom. The maximum likelihood estimators belong to the class of M-estimators, which are studied in detail in Chapter 7 of Huber and Ronchetti (2009). These estimators are defined through implicit equations and can be computed using an iterative reweighted algorithm.

In what follows, we consider the case of a multivariate dependent variable and propose to gather and complete the results from the literature. As we will see, the results derived in the multivariate case are very similar to their univariate counterpart. In particular, the maximum likelihood estimator of the error covariance matrix is biased for the uncorrelated Student model, while there is a need to define an iterative algorithm for the independent Student model.

2.3. The Multivariate Regression Model

Let us consider a sample of size n, and for

i = 1, \dots, n

, let us denote the L-dimensional dependent vector by:

Y_{i} = {(y_{i 1}, \dots, y_{i L})}^{T} .

For K explanatory variables, the design matrix is of size

L \times (K + 1) L

and is given by:

\begin{matrix} X_{i} = I_{L} \otimes x_{i}^{T} \end{matrix}

for

i = 1, \dots, n

, with the

(K + 1)

-vector

x_{i} = {(1, x_{i 1}, \dots, x_{i K})}^{T}

,

I_{L}

the identity matrix with dimension L, and ⊗ the usual Kronecker product. The parameter of interest is a

(K + 1) L

vector given by:

β = {(β_{1}^{T}, \dots, β_{L}^{T})}^{T},

where

β_{j} = {(β_{0 j}, \dots, β_{K j})}^{T}

, for

j = 1, \dots, L

, and the L-vector of errors is denoted by:

ϵ_{i} = {(ϵ_{i 1}, \dots, ϵ_{i L})}^{T}

for

i = 1, \dots, n

. We consider the linear model:

\begin{matrix} Y_{i} = X_{i} β + ϵ_{i} \end{matrix}

(3)

with

E (ϵ_{i}) = 0

and

i = 1, \dots, n

. Using matrix notations, we can write Model (3) as:

\begin{matrix} Y = X β + ϵ \end{matrix}

(4)

with the

n L

-vectors:

Y = {(Y_{1}^{T}, \dots, Y_{n}^{T})}^{T},

ϵ = {(ϵ_{1}^{T}, \dots, ϵ_{n}^{T})}^{T}

and the

n L \times (K + 1) L

matrix:

X = {(X_{1}^{T}, \dots, X_{n}^{T})}^{T} .

In what follows, we make different assumptions on the distribution of

ϵ

and recall (for Gaussian and IT) or derive (for UT) the maximum likelihood estimators of the parameter

β

and of the covariance matrix of

ϵ

.

2.4. Multivariate Normal Error Vector

Let us first consider Model (4) with independent and identically distributed error vectors

ϵ_{i}

,

i = 1, \dots, n

, following a multivariate normal distribution

N_{L} (0, Σ)

with an L-vector of means equal to zero and an

L \times L

covariance matrix

Σ

. This model is denoted by N, and the subscript N is used to denote the error terms

ϵ_{N i}

,

i = 1, \dots, n

, and the parameters

β_{N}

and

Σ_{N}

of the model. The maximum likelihood estimators of

β_{N}

and

Σ_{N}

are:

\begin{matrix} {\hat{β}}_{N} = {(X^{T} X)}^{- 1} X^{T} Y, \end{matrix}

(5)

\begin{matrix} {\hat{Σ}}_{N} = \frac{\sum_{i = 1}^{n} {\hat{ϵ}}_{N i} {\hat{ϵ}}_{N i}^{T}}{n}, \end{matrix}

(6)

where

{\hat{ϵ}}_{N i} = Y_{i} - X_{i} {\hat{β}}_{N}

(see, e.g., Theorem 8.4 from Seber (2008)).

The estimator

{\hat{β}}_{N}

is an unbiased estimator of

β_{N}

while the bias of

{\hat{Σ}}_{N}

is equal to

- ((K + 1) / n) Σ_{N}

and tends to zero when n tends to infinity (see, e.g., Theorems 8.1 and 8.2 from Seber (2008)).

For data such as financial data, it is well known that the Gaussian distribution does not fit the error term well. Student distributions are known to be more appropriate because they have heavier tails than the Gaussian. As for the univariate case, for Student distributions, the independence of coordinates is not equivalent to their uncorrelatedness, and we consider below two types of Student distributions for the error term. In Section 2.5, the error vector

ϵ

is assumed to follow a Student distribution with

n L

dimensions and a particular block diagonal covariance matrix. More precisely, we assume that the error vectors

ϵ_{i}

,

i = 1, \dots, n

, are identically distributed and uncorrelated but are not independent. In Section 2.6, however, we consider independent and identically distributed error vectors

ϵ_{i}

,

i = 1, \dots, n

, with an L-dimensional Student distribution.

2.5. Uncorrelated Multivariate Student (UT) Error Vector

Let us consider Model (4) with uncorrelated and identically distributed error vectors

ϵ_{i}

,

i = 1, \dots, n

, such that the vector

ϵ

follows a multivariate Student distribution

T_{n L} (0, Ω, ν)

with known degrees of freedom

ν > 2

and covariance matrix

Ω = I_{n} \otimes Σ

. The

L \times L

matrix

Σ

is the common covariance matrix of the

ϵ_{i}

,

i = 1, \dots, n

. This model is denoted by UT, and the subscript

U T

is used to denote the error terms

ϵ_{U T i}

,

i = 1, \dots, n

, and the parameters

β_{U T}

,

Ω_{U T}

, and

Σ_{U T}

of the model. This model generalizes the model proposed by Zellner (1976) to the case of multivariate

ϵ_{i}

s. We derive the maximum likelihood estimators of

β_{U T}

and

Σ_{U T}

in Proposition 1 and give the bias of the covariance estimator in Proposition 2. The proofs of the propositions are given in the Appendix A.

Proposition 1.

The maximum likelihood estimators of

β_{U T}

and

Σ_{U T}

are given by:

\begin{matrix} \begin{matrix} {\hat{β}}_{U T} = {(X^{T} X)}^{- 1} X^{T} Y, \\ {\hat{Σ}}_{U T} = \frac{ν}{ν - 2} \frac{\sum_{i = 1}^{n} {\hat{ϵ}}_{U T i} {\hat{ϵ}}_{U T i}^{T}}{n}, \end{matrix} \end{matrix}

(7)

where

{\hat{ϵ}}_{U T i} = Y_{i} - X_{i} {\hat{β}}_{U T}

.

The next proposition gives the bias of the maximum likelihood estimators and generalizes Zellner’s result (Zellner (1976), p. 402) to the multivariate UT model. The maximum likelihood estimator of

β_{U T}

coincides with the least squares and with the method of moment estimators and is unbiased. This is no longer the case for the maximum likelihood estimator of

Σ_{U T}

, which is biased even asymptotically. This gives an example of a maximum likelihood estimator that is not asymptotically unbiased in a context where the random variables are not independent. It illustrates that the independence assumption is crucial to derive the usual properties of the maximum likelihood estimators. Note that the method of moments estimator is a consistent estimator of

Σ_{U T}

(see Sutradhar and Ali (1986)).

Proposition 2.

The estimator

{\hat{β}}_{U T}

is unbiased for

β_{U T}

. The estimator

{\hat{Σ}}_{U T}

is biased for

Σ_{U T}

even asymptotically. More precisely,

\begin{matrix} E ({\hat{Σ}}_{U T}) = \frac{n - K}{n} \frac{ν}{ν - 2} Σ_{U T} \end{matrix}

A consequence of Proposition 2 is that an asymptotically unbiased estimator of

Σ_{U T}

is given by

{\tilde{Σ}}_{U T} = \sum_{i = 1}^{n} {\hat{ϵ}}_{U T i} {\hat{ϵ}}_{U T i}^{T} / n

.

2.6. Independent Multivariate Student Error Vector

Let us consider Model (4) using the notations of Section 2.3 with i.i.d.

ϵ_{i}

,

i = 1, \dots, n

, following a Student distribution with L dimensions and known degrees of freedom

ν > 2

. We denote this model by IT and the parameters of the model by

β_{I T}

and

Σ_{I T}

. The IT model is a particular case of Prucha and Kelejian (1984) where the B matrix in Expression (2.1) in Prucha and Kelejian (1984) is equal to zero.

Following Prucha and Kelejian (1984), we derive the maximum likelihood estimators for the IT model.

Proposition 3.

The maximum likelihood estimators of

β_{I T}

and

Σ_{I T}

in the IT regression model satisfy the following implicit equations:

\begin{matrix} \begin{matrix} {\hat{β}}_{I T} = {(\sum_{i = 1}^{n} {\hat{w}}_{I T i} X_{i}^{T} {\hat{Σ}}_{I T}^{- 1} X_{i})}^{- 1} \sum_{i = 1}^{n} {\hat{w}}_{I T i} X_{i}^{T} {\hat{Σ}}_{I T}^{- 1} Y_{i} \\ {\hat{Σ}}_{I T} = \frac{1}{n} \sum_{i = 1}^{n} {\hat{w}}_{I T i} {\hat{ϵ}}_{I T i} {\hat{ϵ}}_{I T i}^{T} \end{matrix} \end{matrix}

(8)

w i t h : {\hat{ϵ}}_{I T i} = Y_{i} - X_{i} {\hat{β}}_{I T} a n d {\hat{w}}_{I T i} = \frac{ν + L}{ν - 2 + {\hat{ϵ}}_{I T i}^{T} {\hat{Σ}}_{I T}^{- 1} {\hat{ϵ}}_{I T i}} .

These estimators are consistent estimators of

β_{I T}

and

Σ_{I T}

(see Theorem 3.2 in Prucha and Kelejian (1984)). In order to compute them, we propose to implement the following iterative reweighted algorithm in the same spirit as in Huber and Ronchetti (2009) for the univariate case (see also Lange et al. (1989)).

Step 0: Let:

\begin{matrix} {\hat{β}}_{I T}^{(0)} & = {(X^{T} X)}^{- 1} X^{T} Y \\ {\hat{ϵ}}_{I T}^{(0)} & = Y - X {\hat{β}}_{I T}^{(0)} \\ {\hat{Σ}}_{I T}^{(0)} & = \frac{1}{n} \sum_{i = 1}^{n} {\hat{ϵ}}_{I T i}^{(0)} {\hat{ϵ}}_{I T i}^{(0) T} \end{matrix}

Step k→ Step (

k + 1

),

k > 0

:

\begin{matrix} {\hat{w}}_{I T i}^{(k + 1)} & = \frac{ν + L}{ν - 2 + {\hat{ϵ}}_{I T i}^{(k)} {\hat{Σ}}_{I T}^{(k) - 1} {\hat{ϵ}}_{I T i}^{(k)}} \\ {\hat{β}}_{I T}^{(k + 1)} & = {(\sum_{i = 1}^{n} {\hat{w}}_{I T i}^{(k + 1)} X_{i}^{T} {\hat{Σ}}_{I T}^{(k) - 1} X_{i})}^{- 1} \sum_{i = 1}^{n} {\hat{w}}_{I T i}^{(k + 1)} X_{i}^{T} {\hat{Σ}}_{I T}^{(k) - 1} Y_{i} \\ {\hat{ϵ}}_{I T}^{(k + 1)} & = Y - X {\hat{β}}_{I T}^{(k + 1)} \\ {\hat{Σ}}_{I T}^{(k + 1)} & = \frac{1}{n} \sum_{i = 1}^{n} {\hat{w}}_{I T i}^{(k + 1)} {\hat{ϵ}}_{I T i}^{(k + 1)} {\hat{ϵ}}_{I T i}^{(k + 1) T} \end{matrix}

The process is iterated until convergence. Note that this algorithm is given in detail in Section 7.8 of Huber and Ronchetti (2009) for a general class of univariate regression M-estimators. It is also sometimes called IRLS for iteratively-reweighted least squares and can be seen as a particular case of the EM algorithm (Dempster et al. (1978)).

Table 1 gathers the likelihoods and thus summarizes the three models of interest.

3. Simulation Study

3.1. Design

This study aims at comparing the properties of the estimators of

β

and

Σ

as defined in the previous section for the multivariate Gaussian (N), the uncorrelated multivariate Student (UT), and the independent multivariate Student (IT) error distributions, under several scenarios for the DGP. Note that for the UT model, we used the asymptotically unbiased estimator

{\tilde{Σ}}_{U T}

to estimate

Σ_{U T}

. We considered a variety of degrees of freedom

ν_{D G P}

for the Student IT and UT models with a focus on values between three and five. We used the function rmvt from the R package mvnfast to simulate the Student distributions. For a sample size

n = 1000

and a number of replications N = 10,000, we simulated an explanatory variable

X

following a Gaussian distribution

N (45, 10)

. The parameter vector

β

and the covariance matrix

Σ

are respectively chosen to be:

\begin{matrix} β = [\begin{matrix} β_{01} \\ β_{11} \\ β_{02} \\ β_{12} \end{matrix}] = [\begin{matrix} 2 \\ 3 \\ 4 \\ - 3 \end{matrix}]; Σ = [\begin{matrix} σ_{1}^{2} & ρ σ_{1} σ_{2} \\ ρ σ_{1} σ_{2} & σ_{2}^{2} \end{matrix}] = [\begin{matrix} 2 & 0.5 \\ 0.5 & 1 \end{matrix}] . \end{matrix}

Note that similar results are obtained with other choices of parameters.

For each DGP, we calculate a number of Monte Carlo performance measures of the estimators proposed in Section 2. The performances are measured by the Monte Carlo relative bias (RB) and the mean squared error (MSE), which are defined for an estimator

\hat{θ}

of a parameter

θ

by:

\begin{matrix} \begin{matrix} Bias (\hat{θ}) = \frac{1}{N} \sum_{i = 1}^{n} {\hat{θ}}^{(i)} - θ \\ RB (\hat{θ}) = 100 \frac{Bias (\hat{θ})}{θ} \\ MSE (\hat{θ}) = \frac{1}{N} \sum_{i = 1}^{N} {({\hat{θ}}^{(i)} - θ)}^{2} . \end{matrix} \end{matrix}

(9)

We also compute a relative root mean squared error (RRMSE) with respect to a baseline estimator

\tilde{θ}

as:

RRMSE (\hat{θ}) = {(\frac{MSE (\hat{θ})}{MSE (\tilde{θ})})}^{1 / 2} .

In our case, the baseline estimator is the maximum likelihood estimator (MLE) corresponding to the DGP. For example, in Table 2, the RRMSE of the

{\hat{β}}_{I T}

for the Gaussian DGP is the ratio of the MSE of

{\hat{β}}_{I T}

with the degrees of freedom

ν_{M L E}

and the MSE of

{\hat{β}}_{N}

. Note that if

\hat{θ} = \tilde{θ}

, then the RRMSE of

\hat{θ}

is equal to one.

3.2. Estimators of the $β$ Parameters

Table 3 reports the bias and the MSE of the Gaussian MLE estimator

{\hat{β}}_{N}

, the UT MLE estimator

{\hat{β}}_{U T}

(

ν_{DGP} = 3

), and the IT MLE estimator

{\hat{β}}_{I T}

(

ν_{DGP} = 3

) when the model is well specified, i.e., under the corresponding DGP. The bias and MSE of the estimators of

β

are small and comparable under the Gaussian and the UT DGP, but smaller for the IT DGP. Note that, in our implementation, the results of the algorithm for the IT estimators are very similar to those obtained using the function heavyLm from the R package heavy.

In Table 2, we start considering misspecifications and report the corresponding relative values RB and RRMSE of the same estimators and the same DGP as in Table 3 with all possible combinations of DGP and estimation methods. The results indicate that the RB of

\hat{β}

are all very small. If the DGP is Gaussian and the estimator is IT, the RRMSE of coordinates of

\hat{β}

is about

1.09

. However, if the DGP is IT and the estimator is Gaussian, the RRMSE of coordinates of

\hat{β}

is higher (from 1.46–1.48). Hence for the Gaussian DGP, we do not loose too much efficiency using the IT estimator

{\hat{β}}_{I T}

with three degrees of freedom. Inversely, we loose much more efficiency when using

{\hat{β}}_{N}

for the IT DGP with three degrees of freedom.

In order to consider more degrees of freedom (3, 4, and 5), we now drop the bias and focus on the RRMSE. Table 4 indicates that the RRMSE of

\hat{β}

is very similar and close to one, with a maximum of

1.09

, except for the case of the N estimator under the IT DGP, where it can reach

1.48

. The work in Maronna (1976) provided theoretical asymptotic efficiencies of the Student versus the Gaussian estimators, the ratio of asymptotic variances being equal to

\frac{(ν - 2) (ν + L + 2)}{ν (ν + L)} .

The values obtained in Table 2 are very similar to these asymptotic values.

Figure 1 shows the performances in terms of RRMSE of the IT estimators

{\hat{β}}_{12}^{I T}

under different DGP as a function of the degrees of freedom of the IT estimator (

ν_{MLE}

). The considered DGP are the Gaussian, UT, and IT DGP with the degrees of freedom

ν_{DGP} = 3

(respectively,

ν_{DGP} = 4

,

ν_{DGP} = 5

) on the left (respectively, middle, right) plot. Overall, the RRMSE of

{\hat{β}}_{12}^{I T}

for the IT DGP has a down trend and then an up trend, while for the Gaussian and the UT DGP, the RRMSE are decreasing when

ν_{MLE}

increases. The maximum RRMSE of

{\hat{β}}_{12}^{I T}

is around 1.09 under the UT DGP and is around 1.08 under the Gaussian DGP. It decreases then to one when

ν_{MLE}

increases to twenty under the Gaussian and the UT DGP; thus, the risk under misspecification is not very high. The curve is U-shaped under the IT DGP with a minimum when

ν_{MLE} = ν_{DGP}

. The worst performance is when

ν_{DGP}

is small and

ν_{MLE}

is large. The RRMSE of

{\hat{β}}_{12}^{I T}

with

ν_{DGP} = 4

is similar than the one with

ν_{DGP} = 5

.

3.3. Estimators of the Variance Parameters

Table 5 reports the biases and the MSE of

\hat{ρ}, {\hat{σ}}_{1}^{2}, {\hat{σ}}_{2}^{2}

for the Gaussian DGP, the UT (

ν_{DGP} = 3

) DGP, and the IT (

ν_{DGP} = 3

) DGP. The bias and the MSE of

\hat{ρ}

are very similar and small for all cases. The MSE of the Gaussian estimators

{\hat{σ}}_{1}^{2}

and

{\hat{σ}}_{2}^{2}

are small under the Gaussian DGP, but they are higher under the UT and IT DGP. The biases and MSE of the IT estimator

{\hat{σ}}_{1}^{2}

and

{\hat{σ}}_{2}^{2}

are small under the IT DGP, but high under the Gaussian and the UT DGP. Besides, Table 5 also indicates that there is no method that estimates the variances well under the UT DGP.

As before, we now consider misspecified cases and focus on relative bias in Table 6. We observe that the relative bias for

\hat{ρ}

is negligible in all situations. The RB for

{\hat{σ}}_{1}^{2}

and

{\hat{σ}}_{2}^{2}

are also quite small (less than around 5%) when using the Gaussian estimator for all DGP. This is also true when using the IT estimator for the IT DGP with the same degrees of freedom

ν_{M L E} = ν_{D G P}

. There are some biases for

{\hat{σ}}_{1}^{2}

and

{\hat{σ}}_{2}^{2}

if the DGP is Gaussian or UT and the estimator is IT. For this estimator, the relative bias of

{\hat{σ}}_{1}^{2}, {\hat{σ}}_{2}^{2}

is around 100% for the Gaussian DGP, 96% for the UT DGP with

ν_{DGP} = 5

and

ν_{MLE} = 3

, and 22% for the UT DGP with

ν_{DGP} = 5

and

ν_{MLE} = 5

. The RB for

{\hat{σ}}_{1}^{2}

and

{\hat{σ}}_{2}^{2}

are also quite high (up to 50%) for the IT estimator when the DGP is IT with

ν_{M L E} \neq ν_{D G P}

. To summarize, in terms of the RB of the variance estimators, the Gaussian estimator yields better results than the IT estimator.

Finally, Table 7 presents the RRMSE in the same cases. It shows that the RRMSE of

\hat{ρ}

varies from 0.94–1.09 for all DGP except for the case of the IT DGP with the Gaussian estimator, which ranges between 1.42 and 3.21. Besides, if the DGP is Gaussian and the estimator is IT or if the DGP is IT and the estimator is Gaussian, the RRMSE of

{\hat{σ}}_{1}^{2}

and

{\hat{σ}}_{2}^{2}

are high in particular for

ν_{D G P} = 3

or

ν_{M L E} = 3

: we loose a lot of efficiency in these misspecified cases. To conclude, we have seen from Table 6 that the RB of

{\hat{σ}}_{1}^{2}

and

{\hat{σ}}_{2}^{2}

are smaller for the Gaussian estimator than for the IT estimator. However, in terms of RRMSE, there is no clear advantage in using the Gaussian estimator with respect to the IT estimator.

It should be noted that for

ν \leq 4

, the Student distribution has no fourth-order moment, which may explain the fact that the covariance estimators have large MSE.

In order to allow the reproducibility of the empirical analyses contained in the present and the following sections, some Supplementary Material is available at the following link: http://www.thibault.laurent.free.fr/code/jrfm/.

4. Selection between the Gaussian and IT Models

In this section, we propose a methodology to select a model between the Gaussian and independent Student models and to select the degrees of freedom for the Student in a short list of possibilities. Following the warnings of Fernandez and Steel (1999) and the empirical results of Katz and King (1999), Platen and Rendek (2008), and Kan and Zhou (2017), we decided to focus on a small selection of degrees of freedom and fit our models without estimating this parameter, considering that a second step of model selection will make the choice. Indeed, there is a limited number of interesting values, which are between three and eight (for larger values, the distribution gets close to being Gaussian). The work in Lange et al. (1989), p.883, proposed the likelihood ratio test for the univariate case. In what follows, we use the fact that the distribution of the Mahalanobis distances is known under the two DGP, which allows building a Kolmogorov–Smirnov test and using Q-Q plots. Unfortunately, this technique does not apply to the UT model for which the n observations are a single realization of the multivariate distribution. One advantage of this approach is that the Mahalanobis distance is a one-dimensional variable, whereas the original observations have L dimensions.

4.1. Distributions of Mahalanobis Distances

For an L-dimensional random vector

Y

, with mean

μ

, and covariance matrix

Σ

, the squared Mahalanobis distance is defined by:

\begin{matrix} d^{2} = {(Y - μ)}^{T} Σ^{- 1} (Y - μ) \end{matrix}

If

Y_{1}, \dots Y_{n}

is a sample of size n from the L-dimensional Gaussian distribution

N_{L} (μ_{N}, Σ_{N})

, the squared Mahalanobis distance of observation i, denoted by

d_{N i}^{2}

, follows a

χ_{L}^{2}

distribution. If

μ_{N}

and

Σ_{N}

are unknown, then the squared Mahalanobis distance of observation i can be estimated by:

\begin{matrix} {\hat{d}}_{N i}^{2} = {(Y_{i} - {\hat{μ}}_{N})}^{T} {\hat{Σ}}_{N}^{- 1} (Y_{i} & - {\hat{μ}}_{N}) \end{matrix}

where

{\hat{μ}}_{N} = \bar{Y} = \frac{1}{n} \sum_{i = 1}^{n} Y_{i}

and

{\hat{Σ}}_{N}

is the sample covariance matrix. The work in Gnanadesikan and Kettenring (1972) (see also Bilodeau and Brenner (1999)) proved that this square distance follows a Beta distribution, up to a multiplicative constant:

\begin{matrix} \frac{n}{{(n - 1)}^{2}} {(Y_{i} - {\hat{μ}}_{N})}^{T} {\hat{Σ}}_{N}^{- 1} (Y_{i} & - {\hat{μ}}_{N}) \sim B e t a (\frac{L}{2}, \frac{n - L - 1}{2}) \end{matrix}

where L is the dimension of

Y

. For large n, this Beta distribution can be approximated by the chi-square distribution

d_{N i}^{2} \sim χ_{L}^{2}

. According to Gnanadesikan and Kettenring (1972) (p. 172),

n = 25

already provides a sufficiently large sample for this approximation, which is the case in all our examples below. If we now assume that

Y_{1}, \dots, Y_{n}

is a sample of size n from the L-dimensional Student distribution

Y_{i} \sim T (μ_{I T}, Σ_{I T}, ν)

, then the squared Mahalanobis distance of observation i, denoted by

d_{I T i}^{2}

and properly scaled, follows a Fisher distribution (see Roth (2013)):

\begin{matrix} \frac{1}{L} \frac{ν}{ν - 2} d_{I T i}^{2} \sim F (L, ν) \end{matrix}

If

μ_{I T}

and

Σ_{I T}

are unknown, then the squared Mahalanobis distance of observation i can be estimated by:

\begin{matrix} {\hat{d}}_{I T i}^{2} = {(Y_{i} - {\hat{μ}}_{I T})}^{T} {\hat{Σ}}_{I T}^{- 1} (Y_{i} - {\hat{μ}}_{I T}), \end{matrix}

where

{\hat{μ}}_{I T}

and

{\hat{Σ}}_{I T}

are the MLE of

μ_{I T}

and

Σ_{I T}

. Note that in the IT model,

{\hat{μ}}_{I T}

is no longer equal to

\bar{Y}

. Up to our knowledge, there is no result about the distribution of

{\hat{d}}_{I T i}^{2}

.

In the elliptical distribution family, the distribution of Mahalanobis distances characterizes the distribution of the observations. Thus, in order to test the normality of the data, we can test whether the Mahalanobis distances follow a chi-square distribution. Similarly, testing the Student distribution is equivalent to testing whether the Mahalanobis distances follow the Fisher distribution. There are two difficulties with the approach. The first one is that the estimated Mahalanobis distances are not a sample from the chi-square (respectively, the Fisher) distribution because there is dependence due to the estimation of the parameters. The second one is that, in our case, we not only estimate

μ

and

Σ

, but we are in a regression framework where

μ

is linear combination of regressors, and we indeed estimate its coefficients. In what follows, we will ignore these two difficulties and consider that, for large n, the distributions of the estimated Mahalanobis distances behave as if

μ

and

Σ

were known.

We propose to implement several Kolmogorov–Smirnov tests in order to test different null hypothesis: Gaussian, Student with three degrees of freedom, and Student with four degrees of freedom. As an exploratory tool, we also propose drawing Q-Q plots of the Mahalanobis distances with respect to the chi-square and the Fisher distribution Small (1978).

4.2. Examples

This section illustrates some applications of the proposed methodology for selecting a model. We use a real dataset from finance and three simulated datasets with the same DGP as in Section 3.

The real dataset is the daily closing share price of IBM and MSFT, which are imported from Yahoo Finance from 3 January 2007–27 September 2018 using the quantmod package in R. It contains

n = 2955

observations. Let

S_{t}, t = 1, \dots, n

be the daily share price of IBM and MSFT and

Y_{t}

be the log-price increment (return) (see Fung and Seneta (2010)) over a day period, then:

\begin{matrix} Y_{t} = log S_{t} - log S_{t - 1} . \end{matrix}

The three other datasets are simulated using the same model as in Section 3 with the Gaussian DGP, the IT DGP with

ν_{D G P} = 3

, and the IT DGP with

ν_{D G P} = 4

and with sample size

n = 1000

. Figure 2 (respectively, Figure 3) displays the scatterplots of the financial data (respectively, the three toy data).

We compute the Gaussian and the IT estimators as in Section 3. We then calculate the squared Mahalanobis distances of the residuals and use a Kolmogorov–Smirnov test for deciding between the models. For the financial data, we have no predictor. We test the Gaussian (respectively the Student with three degrees of freedom, the Student with four degrees of freedom) null hypothesis. When testing one of the null hypotheses, we use the estimator corresponding to the null. Moreover, when the null hypothesis is Student, we use the corresponding degrees of freedom for computing the maximum likelihood estimator. We do reject the null hypothesis if the p-value is smaller than

α = 5 %

. Note that we could adjust the level of

α

by taking into account multiple testing.

Table 8 shows the p-values of these tests. For the simulated data, at the

5 %

level, we do not reject the Gaussian assumption when the DGP is Gaussian. Similarly, we do not reject the Student distribution with three (respectively, four) degrees of freedom when the DGP is the IT with degrees of freedom

ν_{D G P} = 3

(respectively,

ν_{D G P} = 4

). For the financial data, we do not reject the Student distribution with three degrees of freedom, but we do reject the Gaussian distribution and the Student distribution with four degrees of freedom.

Figure 4 shows the Q-Q plots comparing the empirical quantiles of the Mahalanobis distances for the normal (respectively, the IT (

ν_{M L E} = 3

), the IT (

ν_{M L E} = 4

)) estimators on the horizontal axis to the theoretical quantiles of the Mahalanobis distances for the normal (respectively, the IT (

ν_{M L E} = 3

), the IT (

ν_{M L E} = 4

)) on the vertical axis for the financial data. These Q-Q plots are coherent with the results of the tests in Table 8. The IT model with three degrees of freedom fits our financial data well.

Figure 5 displays the Q-Q plots for the toy DGP: the Gaussian DGP in the first column, the IT DGP with

ν_{D G P} = 3

in the second column, and the IT DGP with

ν_{D G P} = 4

in the third column. The first row compares the empirical quantiles to the normal case quantiles, the second to the Student case quantiles with

ν_{D G P} = 3

, and the third row to Student case quantiles with

ν_{D G P} = 4 .

The Q-Q plots on the diagonal confirm that the fit is good when the model is correct. The other Q-Q plots outside the diagonal correctly reveal a clear deviation from the hypothesized model.

To summarize the findings of this study, let us first say that there may be an abusive use of the Gaussian distribution in applications due to its simplicity. We have seen that considering the Student distribution instead is just slightly more complex, but feasible, and that one can test this choice. Concerning the two Student models, we have seen that the UT model is simpler to fit than the IT model, but has limitations due to the fact that it assumes a single realization, which restricts the properties of the maximum likelihood estimators and prevents the use of tests against the other two models.

5. Conclusions

We have compared three different models: the multivariate Gaussian model and two different multivariate Student models (uncorrelated or independent). We have derived some theoretical properties of the Student UT model and proposed a simple iterative reweighted algorithm to compute the maximum likelihood estimators in the IT model. Our simulations show that using a multivariate Student IT model instead of a multivariate Gaussian model for heavy tail data is simple and can be viewed as a safeguard against misspecification in the sense that there is more to loose if the DGP is Student and one uses a Gaussian model than in the reverse situation. Finally, we have proposed some graphical tools and a test to choose between the Gaussian and the IT models. The IT model fits our finance dataset quite well. There is still work to do in the direction of improving the model selection procedure to overcome the fact that the parameters are estimated and hence the hypothetical distribution is only approximate. Let us mention that it is also possible to adapt our algorithm for the IT model to the case of missing data. We intend to work in the direction of allowing different degrees of freedom for each coordinate. It may be also relevant to consider an alternative estimation method by generalizing the one proposed in Kent et al. (1994) to the multivariate regression case. Finally, another perspective is to consider multivariate errors-in-variables models, which allow incorporating measurement errors in the response and the explanatory variables. A possible approach is proposed in Croux et al. (2010).

Supplementary Materials

In order to allow the reproducibility of the empirical analyses contained in the present paper, some Supplementary Material is available at the following link: http://www.thibault.laurent.free.fr/code/jrfm/.

Author Contributions

T.H.A.N., C.T.-A. and A.R.-G., Methodology, analysis, review, and editing; T.H.A.N., writing, original draft preparation; C.T.-A. and A.R.-G. supervision and validation; T.L. and T.H.A.N., data curation.

Funding

This research received no external funding

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

EM	Expectation-maximization
MLE	Maximum likelihood estimator
N	Normal (Gaussian) model
IT	Independent multivariate Student
UT	Uncorrelated multivariate Student
RB	Relative bias
MSE	Mean squared error
RRMSE	Root relative mean squared error
DGP	Data-generating process

Appendix A

Proof of Proposition 1.

Using Expression (1), the joint density function of

{\hat{ϵ}}_{U T}

is:

\begin{matrix} p (ϵ_{U T} | 0, Ω_{U T}, ν) & = \frac{f (ν)}{det {(I_{n} \otimes Σ_{U T})}^{1 / 2}} {[1 + \frac{1}{ν - 2} ϵ_{U T}^{T} {(I_{n} \otimes Σ_{U T})}^{- 1} ϵ_{U T}]}^{- \frac{ν + n L}{2}} \\ = \frac{f (ν)}{det {(Σ_{U T})}^{n / 2}} {[1 + \frac{1}{ν - 2} ϵ_{U T}^{T} {(I_{n} \otimes Σ_{U T})}^{- 1} ϵ_{U T}]}^{- \frac{ν + n L}{2}} \\ = \frac{f (ν)}{det {(Σ_{U T})}^{n / 2}} {[1 + \frac{1}{ν - 2} \sum_{i = 1}^{n} ϵ_{U T i}^{T} Σ_{U T}^{- 1} ϵ_{U T i}]}^{- \frac{ν + n L}{2}} \end{matrix}

Therefore, the logarithm of

p (ϵ_{U T} | 0, Ω_{U T}, ν)

is:

\begin{matrix} log p (ϵ_{U T} | 0, Ω_{U T}, ν) = log f (ν) - \frac{n}{2} log Σ_{U T} - \frac{ν + n L}{2} log [1 + \frac{1}{ν - 2} \sum_{i = 1}^{n} ϵ_{U T i}^{T} Σ_{U T}^{- 1} ϵ_{U T i}] . \end{matrix}

(A1)

In order to maximize

log p (p (ϵ_{U T} | 0, Ω_{U T}, ν))

as a function of

β_{U T}

, we follow the same argument as in Theorem 8.4 from Seber (2008) for the Gaussian case and obtain that the minimum of

\sum_{i = 1}^{n} ϵ_{U T i}^{T} Σ_{U T}^{- 1} ϵ_{U T i}

is obtained for:

\begin{matrix} {\hat{β}}_{U T} = {(X^{T} X)}^{- 1} X^{T} Y . \end{matrix}

Besides, taking the partial derivative of (A1) as a function of

Σ_{U T}

, we obtain:

\begin{matrix} \begin{matrix} \frac{\partial log (p (ϵ_{U T} | 0, Ω_{U T}, ν))}{\partial Σ_{U T}} & = - \frac{n Σ_{U T}^{- 1}}{2} - \frac{(ν + n L)}{2} \frac{\partial log (ν - 2 + \sum_{i = 1}^{n} ϵ_{U T i}^{T} Σ_{U T}^{- 1} ϵ_{U T i})}{\partial Σ_{U T}} \\ = - \frac{n Σ_{U T}^{- 1}}{2} - \frac{(ν + n L)}{2} \frac{\partial (ν - 2 + \sum_{i = 1}^{n} ϵ_{U T i}^{T} Σ_{U T}^{- 1} ϵ_{U T i}) / \partial Σ_{U T}}{ν - 2 + \sum_{i = 1}^{n} ϵ_{U T i}^{T} Σ_{U T}^{- 1} ϵ_{U T i}} . \end{matrix} \end{matrix}

Let:

w_{U T} = \frac{1}{ν - 2 + \sum_{i = 1}^{n} ϵ_{U T i}^{T} Σ_{U T}^{- 1} ϵ_{U T i}} .

(A2)

We have:

\begin{matrix} \begin{matrix} \frac{\partial log (p (ϵ_{U T} | 0, Ω_{U T}, ν))}{\partial Σ_{U T}} & = - \frac{n Σ_{U T}^{- 1}}{2} - \frac{(ν + n L) w_{U T}}{2} \partial (ν - 2 + \sum_{i = 1}^{n} ϵ_{U T i}^{T} Σ_{U T}^{- 1} ϵ_{U T i}) / \partial Σ_{U T} \\ = - \frac{n Σ_{U T}^{- 1}}{2} + \frac{(ν + n L) w_{U T}}{2} \sum_{i = 1}^{n} Σ_{U T}^{- 1} ϵ_{U T i} ϵ_{U T i}^{T} Σ_{U T}^{- 1} \end{matrix} \end{matrix}

Solving

\frac{\partial log (p (ϵ_{U T} | 0, Ω_{U T}, ν))}{\partial Σ_{U T}} = 0

and letting

E = \sum_{i = 1}^{n} ϵ_{U T i} ϵ_{U T i}^{T}

, we have:

\begin{matrix} Σ_{U T}^{- 1} & = \frac{ν + n L}{n} w_{U T} \sum_{i = 1}^{n} Σ_{U T}^{- 1} ϵ_{U T i} ϵ_{U T i}^{T} Σ_{U T}^{- 1} \\ Σ_{U T} Σ_{U T}^{- 1} Σ_{U T} & = \frac{ν + n L}{n} w_{U T} \sum_{i = 1}^{n} Σ_{U T} Σ_{U T}^{- 1} ϵ_{U T i} ϵ_{U T i}^{T} Σ_{U T}^{- 1} Σ_{U T} \\ Σ_{U T} & = (ν + n L) w_{U T} \frac{E}{n} \end{matrix}

(A3)

The expression of

w_{U T}

in (A3) can be simplified by noting that:

\begin{matrix} Σ_{U T}^{- 1} & = n {((ν + n L) w_{U T})}^{- 1} E^{- 1} \\ \sum_{i = 1}^{n} ϵ_{U T i}^{T} Σ_{U T}^{- 1} ϵ_{U T i} & = n {((ν + n L) w_{U T})}^{- 1} \sum_{i = 1}^{n} ϵ_{U T i}^{T} E^{- 1} ϵ_{U T i} \\ = \frac{n}{(ν + n L) w_{U T}} tr (\sum_{i = 1}^{n} ϵ_{U T i} ϵ_{U T i}^{T} E^{- 1}) \\ \sum_{i = 1}^{n} ϵ_{U T i}^{T} Σ_{U T}^{- 1} ϵ_{U T i} & = \frac{n L}{(ν + n L) w_{U T}} . \end{matrix}

(A4)

Replacing the expression of

\sum_{i = 1}^{n} ϵ_{U T i}^{T} Σ_{U T}^{- 1} ϵ_{U T i}

from (A4) into

w_{U T}

, we get:

\begin{matrix} w_{U T} = \frac{ν}{(ν - 2) (ν + n L)} . \end{matrix}

Finally,

\begin{matrix} {\hat{Σ}}_{U T} = \frac{ν}{ν - 2} \frac{\sum_{i = 1}^{n} {\hat{ϵ}}_{U T i} {\hat{ϵ}}_{U T i}^{T}}{n} . \end{matrix}

Proof of Proposition 2.

The property

E ({\hat{β}}_{U T}) = β_{U T}

is immediate. In order to facilitate the derivation of the proof for

{\hat{Σ}}_{U T}

, we write Model (4) as:

\begin{matrix} Y = X B + ε \end{matrix}

(A5)

where:

\begin{matrix} \begin{matrix} Y = [\begin{matrix} y_{11} & y_{12} & \dots & y_{1 L} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ y_{n 1} & y_{n 2} & \dots & y_{n L} \end{matrix}], X = [\begin{matrix} 1 & x_{11} & \dots & x_{1 K} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 1 & x_{n 1} & \dots & x_{n K} \end{matrix}], B = [\begin{matrix} β_{01} & β_{0 L} \\ β_{11} & β_{1 L} \\ ⋮ & ⋮ \\ β_{K 1} & β_{K L} \end{matrix}] \\ ε = [\begin{matrix} ε_{11} & ε_{12} & \dots & ε_{1 L} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ ϵ_{n 1} & ε_{n 2} & \dots & ε_{n L} \end{matrix}], {\hat{B}}_{U T} = {(X^{T} X)}^{- 1} X^{T} Y and {\hat{ε}}_{U T} = Y - X {\hat{B}}_{U T} . \end{matrix} \end{matrix}

Let

E = {\hat{ε}}_{U T}^{T} {\hat{ε}}_{U T}

and

M = I_{n} - X {(X^{T} X)}^{- 1} X^{T}

. We have

M X B = 0

, and following Seber (2008), Theorem 8.2,

\begin{matrix} E & = {(Y - X {\hat{B}}_{U T})}^{T} (Y - X {\hat{B}}_{U T}) = {(M Y)}^{T} M Y = Y^{T} M Y \\ = {(Y - X B)}^{T} M (Y - X B) = ε^{T} M ε = \sum_{h} \sum_{i} M_{h i} ε_{h} ε_{i}^{T} . \end{matrix}

Since

E (ε_{h} ε_{i}^{T}) = \{\begin{matrix} Σ & if h = i \\ 0 & otherwise \end{matrix}

, for

h, i = 1, \dots, n

,

E (E) = \sum_{h} M_{h h} Σ = t r (M) Σ = (n - K) Σ

and:

E ({\hat{Σ}}_{U T}) = E (\frac{ν}{ν - 2} \frac{E}{n}) = \frac{ν}{ν - 2} \frac{E (E)}{n} = \frac{ν}{ν - 2} \frac{n - K}{n} Σ_{U T} .

References

Bilodeau, Martin, and David Brenner. 1999. Theory of Multivariate Statistics (Springer Texts in Statistics). Berlin: Springer, ISBN 978-0-387-22616-3. [Google Scholar]
Croux, Christophe, Mohammed Fekri, and Anne Ruiz-Gazen. 2010. Fast and robust estimation of the multivariate errors in variables model. Test 19: 286–303. [Google Scholar] [CrossRef]
Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. 1978. Iteratively Reweighted Least Squared for Linear Regression when Errors are Normal/Independent distributed. Multivariate Analysis V 5: 35–37. [Google Scholar]
Dogru, Fatma Zehra, Y. Murat Bulut, and Olcay Arslan. 2018. Double Reweighted Estimators for the Parameters of the Multivariate t distribution. Communications in Statistics-Theory and Methods 47: 4751–71. [Google Scholar] [CrossRef]
Fernandez, Carmen, and Mark F. J. Steel. 1999. Multivariate Student t- Regression Models: Pitfalls and Inference. Biometrika Trust 86: 153–67. [Google Scholar] [CrossRef]
Fung, Thomas, and Eugene Seneta. 2010. Modeling and Estimating for Bivariate Financial Returns. International Statistical Review 78: 117–33. [Google Scholar] [CrossRef]
Fraser, Donald Alexander Stuart. 1979. Inference and Linear Models. New York: McGraw Hill, ISBN 9780070219106. [Google Scholar]
Fraser, Donald Alexander Stuart, and Kai Wang Ng. 1980. Multivariate regression analysis with spherical error. Multivariate Analysis 5: 369–86. [Google Scholar]
Gnanadesikan, Ram, and Jon R. Kettenring. 1972. Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28: 81–124. [Google Scholar] [CrossRef]
Hofert, Marius. 2003. On Sampling from the Multivariate t Distribution. The R Journal 5: 129–36. [Google Scholar] [CrossRef]
Hu, Wenbo, and Alec N. Kercheval. 2009. Portfolio optimization for Student t and skewed t returns. Quantitative Finance 10: 129–36. [Google Scholar] [CrossRef]
Huber, Peter J., and Elvezio M. Ronchetti. 2009. Robust Statistics. Hoboken: Wiley, ISBN 9780470129906. [Google Scholar]
Johnson, Norman L., and Samuel Kotz. 1972. Student multivariate distribution. In Distribution in Statistics: Continuous Multivariate Distributions. Michigan: Wiley Publishing House, ISBN 9780471443704. [Google Scholar]
Kan, Raymond, and Guofu Zhou. 2017. Modeling non-normality using multivariate t: implications for asset pricing. China Finance Review International 7: 2–32. [Google Scholar] [CrossRef]
Katz, Jonathan N., and Gary King. 1999. A Statistical Model for Multiparty Electoral Data. American Political Science Review 93: 15–32. [Google Scholar] [CrossRef]
Kelejian, Harry H., and Ingmar R. Prucha. 1985. Independent or Uncorrelated Disturbances in Linear Regression. Economics Letters 19: 35–38. [Google Scholar] [CrossRef]
Kent, John T., David E. Tyler, and Yahuda Vard. 1994. A curious likelihood identity for the multivariate t-distribution. Communications in Statistics-Simulation and Computation 23: 441–53. [Google Scholar] [CrossRef]
Kotz, Samuel, and Saralees Nadarajah. 2004. Multivariate t Distributions and Their Applications. Cambridge: Cambridge University Press, ISBN 9780511550683. [Google Scholar]
Lange, Kenneth, Roderick J. A. Little, and Jeremy Taylor. 1989. Robust Statistical Modeling Using the t-Distribution. International Statistical Review 84: 881–96. [Google Scholar] [CrossRef]
Lange, Kenneth, and Janet S. Sinsheimer. 1993. Normal/Independent Distributions and Their Applications in Robust Regression. Journal of Computational and Graphical Statistics 2: 175–98. [Google Scholar] [CrossRef]
Liu, Chuanhai, and Donald B. Rubin. 1995. ML estimation of the t distribution using EM and its extensions, ECM and ECME. Statistica Sinica 5: 19–39. [Google Scholar]
Liu, Chuanhai. 1997. ML Estimation of the Multivariate t Distribution and the EM Algorithm. J. Multivar. Anal. 63: 296–312. [Google Scholar] [CrossRef]
Maronna, Ricardo Antonio. 1976. Robust M-Estimators of Multivariate Location and Scatter. The Annals of Statistics 4: 51–67. [Google Scholar] [CrossRef]
McNeil, Alexander J., Rüdiger Frey, and Paul Embrechts. 2005. Quantitative Risk Management: Concepts, Techniques and Tools. Vol. 3, Princeton: Princeton University Press. [Google Scholar]
Platen, Eckhard, and Renata Rendek. 2008. Empirical Evidence on Student-t Log-Returns of Diversified World Stock Indices. Journal of Statistical Theory and Practice 2: 233–51. [Google Scholar] [CrossRef]
Prucha, Ingmar R., and Harry H. Kelejian. 1984. The Structure of Simultaneous Equation Estimators: A generalization Towards Nonnormal Disturbances. Econometrica 52: 721–36. [Google Scholar] [CrossRef]
Roth, Michael. 2013. On the Multivariate t Distribution. Report Number: LiTH-ISY-R-3059. Linkoping: Department of Electrical Engineering, Linkoping University. [Google Scholar]
Seber, George Arthur Frederick. 2008. Multivariate Observations. Hoboken: John Wiley & Sons, ISBN 9780471881049. [Google Scholar]
Singh, Radhey. 1988. Estimation of Error Variance in Linear Regression Models with Errors having Multivariate Student t-Distribution with Unknown Degrees of Freedom. Economics Letters 27: 47–53. [Google Scholar] [CrossRef]
Small, N. J. H. 1978. Plotting squared radii. Biometrics 65: 657–58. [Google Scholar] [CrossRef]
Sutradhar, Brajendra C., and Mir M. Ali. 1986. Estimation of the Parameters of a Regression Model with a Multivariate t Error Variable. Communication Statistics Theory and Method 15: 429–50. [Google Scholar] [CrossRef]
Zellner, Arnold. 1976. Bayesian and Non-Bayesian Analysis of the Regression Model with Multivariate Student-t Error Terms. Journal of the American Statistical Association 71: 400–5. [Google Scholar] [CrossRef]

Figure 1. The RRMSE of the IT estimator of

{\hat{β}}_{12}

for the UT DGP in solid line, for the IT DGP in dashed line, and for the Gaussian DGP in dotted line with

ν_{D G P} = 3

(respectively,

ν_{D G P} = 4, ν_{D G P} = 5

) on the left (respectively, middle, right) plot.

Figure 1. The RRMSE of the IT estimator of

{\hat{β}}_{12}

for the UT DGP in solid line, for the IT DGP in dashed line, and for the Gaussian DGP in dotted line with

ν_{D G P} = 3

(respectively,

ν_{D G P} = 4, ν_{D G P} = 5

) on the left (respectively, middle, right) plot.

Figure 2. Financial data: scatterplot of returns.

Figure 3. Toy data: scatterplots of residuals in the Gaussian DGP (respectively, the IT DGP with

ν_{D G P} = 3

, the IT DGP with

ν_{D G P} = 4

) on the first row (respectively, the second row, the third row).

Figure 3. Toy data: scatterplots of residuals in the Gaussian DGP (respectively, the IT DGP with

ν_{D G P} = 3

, the IT DGP with

ν_{D G P} = 4

) on the first row (respectively, the second row, the third row).

Figure 4. Financial data: Q-Q plots of the Mahalanobis distances for the normal, IT (

ν_{M L E} = 3

), and IT (

ν_{M L E} = 4

) estimators.

Figure 4. Financial data: Q-Q plots of the Mahalanobis distances for the normal, IT (

ν_{M L E} = 3

), and IT (

ν_{M L E} = 4

) estimators.

Figure 5. Toy data: Q-Q plots of the Mahalanobis distances of the residuals for the normal (respectively, the IT with

ν_{D G P} = 3

, the IT with

ν_{D G P} = 4

) case empirical quantiles against the normal (respectively, the IT with

ν_{M L E} = 3

, the IT with

ν_{M L E} = 4

) case theoretical quantiles in the first row (respectively, the second row, the third row).

Figure 5. Toy data: Q-Q plots of the Mahalanobis distances of the residuals for the normal (respectively, the IT with

ν_{D G P} = 3

, the IT with

ν_{D G P} = 4

) case empirical quantiles against the normal (respectively, the IT with

ν_{M L E} = 3

, the IT with

ν_{M L E} = 4

) case theoretical quantiles in the first row (respectively, the second row, the third row).

Table 1. Distribution of the error vector

ϵ

in the Gaussian, UT, and IT models.

Table 1. Distribution of the error vector

ϵ

in the Gaussian, UT, and IT models.

Model	Distribution
N $(ϵ_{1}, \dots, ϵ_{n})$	$N_{n L} (0, I_{n} \otimes Σ_{N}) = \prod_{i = 1}^{n} N_{L} (0, Σ_{N})$
UT $(ϵ_{1}, \dots, ϵ_{n})$	$T_{n L} (0, I_{n} \otimes Σ_{U T}, ν)$
IT $(ϵ_{1}, \dots, ϵ_{n})$	$\prod_{i = 1}^{n} T_{L} (0, Σ_{I T}, ν)$

Table 2. Relative bias and relative root mean squared error of the estimators of

β

(

{\hat{β}}_{N}, {\hat{β}}_{U T}, {\hat{β}}_{I T}

) for the corresponding DGP (Gaussian, UT, and IT).

Table 2. Relative bias and relative root mean squared error of the estimators of

β

(

{\hat{β}}_{N}, {\hat{β}}_{U T}, {\hat{β}}_{I T}

) for the corresponding DGP (Gaussian, UT, and IT).

DGP		N		UT ( $ν_{DGP} = 3$ )		IT ( $ν_{DGP} = 3$ )
Methods	Estimators	RB (%)	RRMSE	RB (%)	RRMSE	RB (%)	RRMSE
${\hat{β}}_{N}, {\hat{β}}_{U T}$	${\hat{β}}_{01}$	−0.07	1.00	−0.06	1.00	−0.09	1.48
	${\hat{β}}_{02}$	0.00	1.00	0.00	1.00	0.00	1.48
	${\hat{β}}_{11}$	−0.02	1.00	−0.01	1.00	−0.07	1.46
	${\hat{β}}_{12}$	−0.00	1.00	−0.00	1.00	−0.00	1.46
${\hat{β}}_{I T} (ν_{M L E} = 3)$	${\hat{β}}_{01}$	−0.09	1.04	−0.09	1.09	−0.03	1.00
	${\hat{β}}_{02}$	0.00	1.04	0.00	1.09	0.00	1.00
	${\hat{β}}_{11}$	−0.04	1.07	−0.02	1.08	−0.03	1.00
	${\hat{β}}_{12}$	−0.00	1.07	−0.00	1.08	−0.00	1.00

Table 3. Bias and MSE of the maximum likelihood estimators of

β

for the corresponding DGP (Gaussian, UT, and IT).

Table 3. Bias and MSE of the maximum likelihood estimators of

β

for the corresponding DGP (Gaussian, UT, and IT).

DGP	N		UT ( $ν_{DGP} = 3$ )		IT ( $ν_{DGP} = 3$ )
Estimators	Bias	MSE	Bias	MSE	Bias	MSE
${\hat{β}}_{01}$	$- 1.39 \times 10^{- 3}$	$4.57 \times 10^{- 2}$	$- 1.27 \times 10^{- 3}$	$3.72 \times 10^{- 2}$	$6.65 \times 10^{- 4}$	$1.99 \times 10^{- 2}$
${\hat{β}}_{02}$	$2.41 \times 10^{- 5}$	$2.18 \times 10^{- 5}$	$1.47 \times 10^{- 5}$	$1.76 \times 10^{- 5}$	$9.90 \times 10^{- 6}$	$9.50 \times 10^{- 6}$
${\hat{β}}_{11}$	$- 6.62 \times 10^{- 4}$	$2.16 \times 10^{- 2}$	$3.23 \times 10^{- 4}$	$2.05 \times 10^{- 2}$	$- 1.02 \times 10^{- 3}$	$9.84 \times 10^{- 3}$
${\hat{β}}_{12}$	$1.87 \times 10^{- 5}$	$1.02 \times 10^{- 5}$	$3.90 \times 10^{- 6}$	$9.60 \times 10^{- 6}$	$2.14 \times 10^{- 5}$	$4.70 \times 10^{- 6}$

Table 4. The root relative mean squared errors of

\hat{β}

.

Table 4. The root relative mean squared errors of

\hat{β}

.

Methods	DGP	N	UT			IT
Methods	RRMSE		$ν_{DGP} = 3$	$ν_{DGP} = 4$	$ν_{DGP} = 5$	$ν_{DGP} = 3$	$ν_{DGP} = 4$	$ν_{DGP} = 5$
N	${\hat{β}}_{01}$	1.00	1.00	1.00	1.00	1.48	1.22	1.14
	${\hat{β}}_{02}$	1.00	1.00	1.00	1.00	1.48	1.23	1.14
	${\hat{β}}_{11}$	1.00	1.00	1.00	1.00	1.46	1.22	1.13
	${\hat{β}}_{12}$	1.00	1.00	1.00	1.00	1.46	1.22	1.13
IT ( $ν_{M L E} = 3$ )	${\hat{β}}_{01}$	1.04	1.09	1.09	1.08	1.00	1.00	1.01
	${\hat{β}}_{02}$	1.04	1.09	1.09	1.08	1.00	1.00	1.01
	${\hat{β}}_{11}$	1.07	1.08	1.10	1.08	1.00	1.00	1.01
	${\hat{β}}_{12}$	1.07	1.08	1.09	1.09	1.00	1.00	1.01
IT ( $ν_{M L E} = 4$ )	${\hat{β}}_{01}$	1.02	1.07	1.06	1.06	1.00	1.00	1.00
	${\hat{β}}_{02}$	1.01	1.06	1.06	1.05	1.00	1.00	1.00
	${\hat{β}}_{11}$	1.04	1.06	1.07	1.06	1.00	1.00	1.00
	${\hat{β}}_{12}$	1.04	1.05	1.07	1.06	1.00	1.00	1.00
IT ( $ν_{M L E} = 5$ )	${\hat{β}}_{01}$	1.00	1.05	1.05	1.04	1.01	1.00	1.00
	${\hat{β}}_{02}$	1.00	1.05	1.05	1.04	1.01	1.00	1.00
	${\hat{β}}_{11}$	1.03	1.04	1.05	1.05	1.01	1.00	1.00
	${\hat{β}}_{12}$	1.03	1.04	1.05	1.05	1.01	1.00	1.00

Table 5. The bias and the MSE of

\hat{ρ}, {\hat{σ}}_{1}^{2}, {\hat{σ}}_{2}^{2}

.

Table 5. The bias and the MSE of

\hat{ρ}, {\hat{σ}}_{1}^{2}, {\hat{σ}}_{2}^{2}

.

Methods	DGP	N		UT ( $ν_{DGP} = 3$ )		IT ( $ν_{DGP} = 3$ )
Methods		Bias	MSE	Bias	MSE	Bias	MSE
N	$\hat{ρ}$	$- 4.85 \times 10^{- 4}$	$9.46 \times 10^{- 4}$	$- 2.08 \times 10^{- 4}$	$7.68 \times 10^{- 4}$	$- 3.99 \times 10^{- 3}$	$1.17 \times 10^{- 2}$
	${\hat{σ}}_{1}^{2}$	$- 3.89 \times 10^{- 3}$	$8.33 \times 10^{- 3}$	$- 1.05 \times 10^{- 1}$	58	$6.94 \times 10^{- 3}$	$3.17$
	${\hat{σ}}_{2}^{2}$	$- 1.75 \times 10^{- 3}$	$2.01 \times 10^{- 3}$	$- 5.17 \times 10^{- 2}$	$14.93$	$- 1.77 \times 10^{- 2}$	$2.85 \times 10^{- 1}$
IT $ν_{M L E} = 3$	$\hat{ρ}$	$- 1.70 \times 10^{- 4}$	$8.94 \times 10^{- 4}$	$- 2.18 \times 10^{- 4}$	$9.05 \times 10^{- 4}$	$- 2.03 \times 10^{- 4}$	$1.07 \times 10^{- 3}$
	${\hat{σ}}_{1}^{2}$	$2.00$	$4.06$	$1.80$	$244.87$	$- 1.43 \times 10^{- 2}$	$1.54 \times 10^{- 2}$
	${\hat{σ}}_{2}^{2}$	$1.00$	$1.02$	$0.91$	$64.75$	$- 7.30 \times 10^{- 3}$	$3.94 \times 10^{- 3}$

Table 6. The RB of

\hat{ρ}, {\hat{σ}}_{1}^{2}, {\hat{σ}}_{2}^{2}

with

ν = 3, 4, 5

.

Table 6. The RB of

\hat{ρ}, {\hat{σ}}_{1}^{2}, {\hat{σ}}_{2}^{2}

with

ν = 3, 4, 5

.

Methods	DGP	N	UT			IT
Methods	RB (%)		$ν_{DGP} = 3$	$ν_{DGP} = 4$	$ν_{DGP} = 5$	$ν_{DGP} = 3$	$ν_{DGP} = 4$	$ν_{DGP} = 5$
N	$\hat{ρ}$	−0.14	−0.06	−0.06	−0.06	−1.13	−0.24	0.02
	${\hat{σ}}_{1}^{2}$	−0.21	−5.23	−3.34	−2.31	0.35	−0.08	−0.12
	${\hat{σ}}_{2}^{2}$	−0.18	−5.17	−3.33	−2.20	−1.77	−0.30	−0.09
IT, $ν_{M L E} = 3$	$\hat{ρ}$	−0.05	−0.06	−0.06	−0.06	−0.06	−0.04	−0.02
	${\hat{σ}}_{1}^{2}$	99.99	90.25	93.89	95.80	−0.72	32.79	50.12
	${\hat{σ}}_{2}^{2}$	100.05	90.60	93.90	96.03	−0.73	32.79	50.13
IT, $ν_{M L E} = 4$	$\hat{ρ}$	−0.05	−0.06	−0.06	−0.06	−0.06	−0.04	−0.01
	${\hat{σ}}_{1}^{2}$	42.62	35.80	38.32	39.68	−24.66	−0.24	11.18
	${\hat{σ}}_{2}^{2}$	42.66	36.01	38.34	39.85	−24.67	−0.23	11.19
IT, $ν_{M L E} = 5$	$\hat{ρ}$	−0.06	−0.06	−0.06	−0.06	−0.06	−0.04	−0.00
	${\hat{σ}}_{1}^{2}$	24.71	18.85	21.03	22.23	−31.75	−10.13	−0.14
	${\hat{σ}}_{2}^{2}$	24.74	19.02	21.04	22.38	$- 3$ 1.76	−10.13	−0.14

Table 7. The RRMSE of

\hat{ρ}, {\hat{σ}}_{1}^{2}, {\hat{σ}}_{2}^{2}

in the Gaussian DGP, the UT DGP (

ν_{DGP} = 3, 4, 5

), and the IT DGP (

ν_{DGP} = 3, 4, 5

).

Table 7. The RRMSE of

\hat{ρ}, {\hat{σ}}_{1}^{2}, {\hat{σ}}_{2}^{2}

in the Gaussian DGP, the UT DGP (

ν_{DGP} = 3, 4, 5

), and the IT DGP (

ν_{DGP} = 3, 4, 5

).

Methods	DGP	N	UT			IT
Methods	RRMSE		$ν_{DGP} = 3$	$ν_{DGP} = 4$	$ν_{DGP} = 5$	$ν_{DGP} = 3$	$ν_{DGP} = 4$	$ν_{DGP} = 5$
N	$\hat{ρ}$	1.00	1.00	1.00	1.00	3.21	1.91	1.42
	${\hat{σ}}_{1}^{2}$	1.00	1.00	1.00	1.00	14.33	2.65	1.64
	${\hat{σ}}_{2}^{2}$	1.00	1.00	1.00	1.00	8.50	2.24	1.78
IT, $ν_{M L E} = 3$	$\hat{ρ}$	0.97	1.09	1.09	1.09	1.00	1.00	1.01
	${\hat{σ}}_{1}^{2}$	22.07	2.05	2.11	2.16	1.00	5.89	9.18
	${\hat{σ}}_{2}^{2}$	22.45	2.08	2.11	2.16	1.00	5.77	9.13
IT, $ν_{M L E} = 4$	$\hat{ρ}$	0.95	1.06	1.06	1.06	1.01	1.00	1.00
	${\hat{σ}}_{1}^{2}$	9.49	1.46	1.47	1.48	4.04	1.00	2.31
	${\hat{σ}}_{2}^{2}$	9.65	1.48	1.47	1.48	4.00	1.00	2.30
IT, $ν_{M L E} = 5$	$\hat{ρ}$	0.94	1.05	1.05	1.05	1.01	1.00	1.00
	${\hat{σ}}_{1}^{2}$	5.58	1.27	1.27	1.28	5.16	1.99	1.00
	${\hat{σ}}_{2}^{2}$	5.68	1.28	1.28	1.27	5.10	1.95	1.00

Table 8. All datasets: the p-values of the Mahalanobis distances tests with the null hypothesis and the corresponding estimators.

Hypothesis $H_{0}$	Toy DGP			Financial Data
Methods	N	IT, $ν_{DGP} = 3$	IT, $ν_{DGP} = 4$	Financial Data
N	0.546	$2.2 \times 10^{- 16}$	$2.2 \times 10^{- 16}$	$2.2 \times 10^{- 16}$
IT, $ν_{M L E} = 3$	$2.2 \times 10^{- 16}$	0.405	0.033	0.882
IT, $ν_{M L E} = 4$	$2.2 \times 10^{- 16}$	0.023	0.303	0.049

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nguyen, T.H.A.; Ruiz-Gazen, A.; Thomas-Agnan, C.; Laurent, T. Multivariate Student versus Multivariate Gaussian Regression Models with Application to Finance. J. Risk Financial Manag. 2019, 12, 28. https://doi.org/10.3390/jrfm12010028

AMA Style

Nguyen THA, Ruiz-Gazen A, Thomas-Agnan C, Laurent T. Multivariate Student versus Multivariate Gaussian Regression Models with Application to Finance. Journal of Risk and Financial Management. 2019; 12(1):28. https://doi.org/10.3390/jrfm12010028

Chicago/Turabian Style

Nguyen, Thi Huong An, Anne Ruiz-Gazen, Christine Thomas-Agnan, and Thibault Laurent. 2019. "Multivariate Student versus Multivariate Gaussian Regression Models with Application to Finance" Journal of Risk and Financial Management 12, no. 1: 28. https://doi.org/10.3390/jrfm12010028

APA Style

Nguyen, T. H. A., Ruiz-Gazen, A., Thomas-Agnan, C., & Laurent, T. (2019). Multivariate Student versus Multivariate Gaussian Regression Models with Application to Finance. Journal of Risk and Financial Management, 12(1), 28. https://doi.org/10.3390/jrfm12010028

Article Menu

Multivariate Student versus Multivariate Gaussian Regression Models with Application to Finance

Abstract

1. Introduction

2. Multivariate Regression Models

2.1. Literature Review

2.2. Univariate Regression Case Reminder

2.3. The Multivariate Regression Model

2.4. Multivariate Normal Error Vector

2.5. Uncorrelated Multivariate Student (UT) Error Vector

2.6. Independent Multivariate Student Error Vector

3. Simulation Study

3.1. Design

3.2. Estimators of the $β$ Parameters

3.3. Estimators of the Variance Parameters

4. Selection between the Gaussian and IT Models

4.1. Distributions of Mahalanobis Distances

4.2. Examples

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Multivariate Student versus Multivariate Gaussian Regression Models with Application to Finance

Abstract

1. Introduction

2. Multivariate Regression Models

2.1. Literature Review

2.2. Univariate Regression Case Reminder

2.3. The Multivariate Regression Model

2.4. Multivariate Normal Error Vector

2.5. Uncorrelated Multivariate Student (UT) Error Vector

2.6. Independent Multivariate Student Error Vector

3. Simulation Study

3.1. Design

3.2. Estimators of the β Parameters

3.3. Estimators of the Variance Parameters

4. Selection between the Gaussian and IT Models

4.1. Distributions of Mahalanobis Distances

4.2. Examples

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2. Estimators of the $β$ Parameters