Next Article in Journal
A Novel Method of Vein Detection with the Use of Digital Image Correlation
Next Article in Special Issue
BFF: Bayesian, Fiducial, and Frequentist Analysis of Cognitive Engagement among Cognitively Impaired Older Adults
Previous Article in Journal
Phase Transitions in Transfer Learning for High-Dimensional Perceptrons
Previous Article in Special Issue
Bayesian Analysis of Finite Populations under Simple Random Sampling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

New Estimators of the Bayes Factor for Models with High-Dimensional Parameter and/or Latent Variable Spaces

1
Department of Mathematics, Cracow University of Economics, ul. Rakowicka 27, 31-510 Kraków, Poland
2
Department of Financial Mathematics, Jagiellonian University in Kraków, ul. Prof. Stanisława Łojasiewicza 6, 30-348 Kraków, Poland
Entropy 2021, 23(4), 399; https://doi.org/10.3390/e23040399
Submission received: 4 March 2021 / Revised: 23 March 2021 / Accepted: 24 March 2021 / Published: 27 March 2021
(This article belongs to the Special Issue Bayesian Inference and Computation)

Abstract

:
Formal Bayesian comparison of two competing models, based on the posterior odds ratio, amounts to estimation of the Bayes factor, which is equal to the ratio of respective two marginal data density values. In models with a large number of parameters and/or latent variables, they are expressed by high-dimensional integrals, which are often computationally infeasible. Therefore, other methods of evaluation of the Bayes factor are needed. In this paper, a new method of estimation of the Bayes factor is proposed. Simulation examples confirm good performance of the proposed estimators. Finally, these new estimators are used to formally compare different hybrid Multivariate Stochastic Volatility–Multivariate Generalized Autoregressive Conditional Heteroskedasticity (MSV-MGARCH) models which have a large number of latent variables. The empirical results show, among other things, that the validity of reduction of the hybrid MSV-MGARCH model to the MGARCH specification depends on the analyzed data set as well as on prior assumptions about model parameters.

1. Introduction

The Bayes factor, defined as a ratio of the marginal likelihoods for the two models being compared, is an important quantity in Bayesian model comparison and hypothesis testing, see, e.g., in [1]. The posterior odds ratio, used for comparing two competing models, is equal to the product of the prior odds and the Bayes’ factor. Therefore, one of the challenges for Bayesians is to accurately estimate the factor, especially in models with a large number of parameters and/or latent variables. Most popular methods of calculating the Bayes factor, based on estimation of the marginal likelihoods of each model separately, are very time-consuming (or even computationally infeasible) in high-dimensional cases. Therefore, methods for direct estimation of the Bayes factor, instead of estimating two marginal likelihoods, have been devised. There already exist different approaches, such as the product-space approach, reversible jump MCMC, and path sampling (see, e.g., in [2,3,4,5]), which make it possible to estimate the Bayes factor without calculation of the marginal densities of the data, but in many empirical cases (especially in the case of a large number of model parameters) they require extremely complicated and extensively time-consuming operations, with no general guarantees of success.
Note that for nested models with a large number of latent variables such as stochastic volatility models, Jacquier et al. [6] have shown how to estimate the Bayes factor directly using Markov Chain Monte Carlo (MCMC) simulations from the posterior distribution. They pointed out that the Bayes factor can be expressed as the expected value of the ratio of corresponding densities with respect to the posterior distribution of parameters and latent variables. This expected value can be estimated by averaging over the MCMC draws. Thus, using the special structure of stochastic volatility models and exploiting prior assumptions about parameters, they have estimated Bayes factors for comparing the basic stochastic volatility model to the models with fat tails and correlated errors. Unfortunately, the finite sample properties of the estimator have not been analyzed by the authors. In this paper, we will show that even in very simple models it does not perform well, but it can be improved by trimming the space of parameters and latent variables.
Now, the issue of a direct estimation of the Bayes factor will be described more formally. Let us consider a Bayesian model, in which
(1)
the space of parameters is denoted by Θ and p ( θ ) is a prior density function of the parameters collected in θ Θ ,
(2)
the space of latent variables is denoted by H and p ( h | θ ) is a prior density function of the latent variables collected in h H , and
(3)
y is a vector of observations.
The marginal data density, p ( y ) , is defined as the integral (calculated over the whole space of parameters and latent variables, Θ × H ) of the conditional data density given the vector of parameters and latent variables, p ( y | θ , h ) , with respect to the prior distribution of the parameters and latent variables:
p ( y ) = Θ × H p ( y , h , θ ) d θ d h = Θ × H p ( y | h , θ ) p ( h | θ ) p ( θ ) d θ d h .
In the majority of models it is impossible to analytically integrate out parameters and latent variables from the joint distribution for y, h, and θ , p ( y , h , θ ) . It very often results from the number of dimensions of the space of the latent variables and parameters being too large. Furthermore, a correct assessment of the marginal likelihood is computationally challenging (see, e.g., in [7,8,9], where summaries of various methods can be found). In the presence of a large number of latent variables (e.g., in stochastic volatility models) popular methods of Monte Carlo estimation of p ( y ) for the observed vector y are computationally extremely intensive, and often they turn out to be infeasible. Fortunately, we can consider the ratio of marginal data densities instead of p ( y ) . In fact, the main criterion of comparison of two competing models M i and M j is the posterior odds ratio between the two models:
O i j = p ( M i | y ) p ( M j | y ) = p ( y | M i ) p ( y | M j ) × p ( M i ) p ( M j ) ,
where p ( M i | y ) is the posterior probability of the model M i , p ( M i ) is the prior probability of the model M i , and p ( y | M i ) = Θ i × H i p ( y | h i , θ i , M i ) p ( h i | θ i , M i ) p ( θ i | M i ) d θ i d h i is the marginal data density value (the marginal likelihood) in the model M i . The ratio B i j = p ( y | M i ) p ( y | M j ) is referred to as the Bayes factor in favor of the model M i against the model M j , in turn, the ratio p ( M i ) p ( M j ) is the prior odds ratio. Thus, if the prior odds ratio is equal to 1 (i.e., both models are equally probable a priori, p ( M i ) = p ( M j ) ) , the posterior odds ratio between the two models is then equal to the Bayes factor. Moreover, if we assume that the set of models { M 1 , , M m } is exhaustive, Bayes factors can be used to obtain the posterior model probabilities:
p ( M i | y ) = p ( M i ) p ( y | M i ) p ( y | M k ) j = 1 m p ( M j ) p ( y | M j ) p ( y | M k ) = p ( M i ) B i k j = 1 m p ( M j ) B j k ,
for i , k { 1 , , m } . By choosing k = i , we obtain
p ( M i | y ) = p ( M i ) j = 1 m p ( M j ) B j i ,
for i { 1 , , m } .
Thus, the Bayes factors can be used instead of the marginal data density values. Note that in many situations it is easier to estimate the Bayes factor than the marginal likelihood. In this paper, we show how to estimate the Bayes factor in models with a large number of parameters and/or latent variables, in which calculation of the marginal data density value is numerically impossible. It will be shown that the Bayes factor is equal to the posterior mean, restricted to a certain subset D of the parameter and latent variable space, of the ratio of conditional densities of the corresponding quantities times the reciprocal of the posterior probability of the subset D. This fact leads the researcher to using arithmetic mean estimator of the ratio based on simulation from the posterior distribution restricted to the subset D.
It is well known that the Savage–Dickey density ratio and its generalizations (see in [10,11,12]) are relatively simple and widely used methods for computing Bayes factors for nested models. The Savage–Dickey method requires an estimate of the value of the posterior distribution at a single point. It is not reliable when this point lies in the tail of the posterior distribution. As has already been mentioned, the Savage–Dickey method can be used only for nested models. Our method can be applied for both nested and non-nested ones.
Note that although the posterior odds principle is fundamental, there are other Bayesian approaches to model comparison or hypothesis testing, e.g., the so-called Lindley type test (see [13]) or the Full Bayesian Significance Test (FBST), introduced by Pereira and Stern [14] as a Bayesian significance test for precise (sharp) hypotheses. A solid theoretical background of the FBST can be found in [15]. Detailed discussions and extensions of the Pereira–Stern test as well as of its applications are presented in, e.g., [16,17,18,19,20,21]. Our approach to Bayesian model comparison (or comparing hypotheses) is based on posterior probabilities associated with models (or hypotheses). This leads directly to the so-called posterior odds approach. Motivations for the formal Bayesian approach to model selection and for the use of Bayes factors are discussed in [22]. First of all, the Bayes factors and posterior model probabilities are easy to understand. Moreover, this approach is consistent, penalizes model complexity, remains conceptually the same regardless of the number of models (or hypotheses) under consideration, and does not require nested models or regular asymptotics. This approach allows one not only to test hypotheses, but also to compare them—“the process of revising prior probabilities associated with alternative hypotheses does not necessarily involve a decision to reject or accept these hypotheses” (see [1], p. 291). Furthermore, the Bayes factors make it possible to compare models whose prior distributions are different. Finally, the posterior probabilities of models or the Bayes factors can be used in the so-called Bayesian pooling approach (or Bayesian model averaging, see, e.g., in [23]). This paper is organized as follows. In Section 2, our new method is presented. Section 3 contains the simulation study. In Section 4, we present results of comparison of hybrid MSV-MGARCH models. The last section contains conclusions and direction of further research.

2. New Estimators of the Bayes Factor

It is easy to show that the Bayes factor in favor of the model M i against the model M j can be expressed as an integral:
p ( y | M i ) p ( y | M j ) = Θ i × H i p ( y | θ i , h i , M i ) p ( h i | θ i , M i ) p ( θ i | M i ) p ( y | θ j , h j , M j ) p ( h j | θ j , M j ) p ( θ j | M j ) p ( θ j , h j | y , M j ) d θ i d h i ,
where p ( y | θ i , h i , M i ) , p ( h i | θ i , M i ) , and p ( θ i | M i ) denote the conditional probability density function of the vector of observations (conditional sampling density), the conditional probability density function of latent variables, and the prior density of parameters in the model M i , respectively. Note that when the two competing models M i and M j have the same parameter vector and the vector of latent variables (i.e., θ i = θ j = θ , h i = h j = h , Θ i = Θ j = Θ , H i = H j = H ) , then the Bayes factor can be computed in a relatively straightforward way which does not require estimation of marginal data density values for each model separately. Of course, it is possible only if draws from one of the posterior distributions are available (see in [24]). We have
B F i j = Θ × H r i j ( θ , h ) p ( θ , h | y , M j ) d θ d h ,
where
r i j ( θ , h ) = p ( y | θ , h , M i ) p ( h | θ , M i ) p ( θ | M i ) p ( y | θ , h , M j ) p ( h | θ , M j ) p ( θ | M j ) .
Therefore, given the sample { ( θ ( q ) , h ( q ) ) } q = 1 k from the posterior distribution p ( θ , h | y , M j ) , an estimator of B F i j can be expressed as
B F ^ i j = 1 k q = 1 k r i j ( θ ( q ) , h ( q ) ) .
As was pointed out in [24], it is crucial that the variability of the ratio r i j ( θ , h ) is small under the posterior distribution p ( θ , h | y , M j ) ; otherwise, estimates of B F i j might be driven by few values of h ( q ) and θ ( q ) , and thus an extremely large simulation sample could be required to obtain adequate result. To deal with this problem, we propose a certain modification of the estimator (7). The idea of this modification is based on trimming the posterior sample to a certain subset of parameter and latent variable space, similarly to the idea of the correction of arithmetic mean estimator, proposed in [25]. Let us assume that D Θ × H and 0 < P r ( D | y , M i ) < . The equality
p ( y | M i ) p ( y | M j ) Pr ( D | y , M i ) = Θ × H I D ( θ , h ) p ( y | M i ) p ( y | M j ) p ( θ , h | y , M i ) d θ d h ,
implies that
p ( y | M i ) p ( y | M j ) Pr ( D | y , M i ) = Θ × H I D ( θ , h ) p ( y , θ , h | M i ) p ( y | M j ) d θ d h .
We can see immediately that
p ( y | M i ) p ( y | M j ) Pr ( D | y , M i ) = Θ × H I D ( θ , h ) p ( y | θ , h , M i ) p ( θ , h | M i ) p ( y | θ , h , M j ) p ( θ , h | M j ) p ( θ , h | y , M j ) d θ d h ,
thus
p ( y | M i ) p ( y | M j ) = 1 Pr ( D | y , M i ) Θ × H I D ( θ , h ) r i j ( θ , h ) p ( θ , h | y , M j ) d θ d h .
Equality (9) means that the Bayes factor can be expressed as a product of the reciprocal of the posterior probability of the subset D in the model M i , P r ( D | y , M i ) , and of the expected value of the indicator function of subset D times the ratio r i j ( θ , h ) , I D ( θ , h ) r i j ( θ , h ) . This expected value is calculated with respect to the posterior distribution of the model parameters and latent variables in the model M j . Therefore, given the sample { ( h ( q ) , θ ( q ) ) } q = 1 k from the posterior distribution p ( θ , h | y , M j ) , an estimator of B F i j can be expressed as
B F ^ i j , D = 1 P ^ r ( D | y , M i ) 1 k q = 1 k I D ( θ ( q ) , h ( q ) ) r i j ( θ ( q ) , h ( q ) ) ,
where P ^ r ( D | y , M i ) is an assessment of the posterior probability of the subset D in the model M i . Unfortunately, this assessment requires also sampling from the posterior distribution in the model M i .
Note that equality (9) can obviously be written as
p ( y | M i ) p ( y | M j ) = Pr ( D | y , M j ) Pr ( D | y , M i ) D r i j ( θ , h ) p ( θ , h | D , y , M j ) d θ d h ,
provided that 0 < P r ( D | y , M j ) < . This equality suggests that the Bayes factor can be estimated by the product of the ratio of posterior probabilities of the subset D and the sample arithmetic mean of the ratio r i j ( θ , h ) :
B F ^ i j , D = P ^ r ( D | y , M j ) P ^ r ( D | y , M i ) 1 k q = 1 k r i j ( θ ( q ) , h ( q ) ) ,
based on { ( h ( q ) , θ ( q ) ) } q = 1 k drawn from the posterior distribution given by p ( θ , h | y , M j ) , truncated to the subset D, that is, p ( θ , h | D , y , M j ) .
As a by-product of our considerations, we have just shown that if a Monte Carlo sampler only visits a subset of the support of the posterior distribution, the correction of the sample arithmetic mean of the ratio r i j ( θ , h ) is needed.
Now, we will extend our analysis to models which contain two groups of parameters: the parameters common to both models and parameters specific to only one of them. Moreover, additional assumptions pertaining to conditional probability density functions will be discussed.
Let us assume that
(i)
θ j = θ U = ( θ A , θ R ) Θ U = Θ A × Θ R denotes all the parameters of the model M j (since called M U ) , while θ i = θ R contains the parameters common to both models: M U and M i (since called M R ) , and the vector θ A denotes specific parameters of M U ;
(ii)
p ( θ U | M U ) = p ( θ A , θ R | M U ) = p ( θ A | M U ) p ( θ R | M U ) , i.e., the random vectors θ R and θ A are a priori independent;
(iii)
Θ A p ( θ A | M U ) d θ A = 1 , i.e., the prior distribution for the vector of parameters θ A is proper; and
(iv)
h i = h j = h ; H i = H j = H , i.e., competing models have the same vector of latent variables.

2.1. Different Prior Distributions of Latent Variables in Both Models

In this section, we additionally assume that
(v)
p ( y | θ U , h U , M U ) = p ( y | θ R , h R , M R ) , i.e., competing models have the same conditional sampling density and
(vi)
p ( θ R | M U ) = p ( θ R | M R ) , i.e., in both models, the common parameters have the same prior distribution.
Under above assumptions the Bayes factor in favor of the model M R versus the model M U is given by the following integral:
p ( y | M R ) p ( y | M U ) = Θ A × Θ R × H p ( h | θ R , M R ) p ( h | θ A , θ R , M U ) p ( θ A , θ R , h | y , M U ) d θ A d θ R d h .
Thus, the estimator of the Bayes factor takes the form
B F ^ R , U , h = 1 k q = 1 k p ( h ( q ) | θ R ( q ) , M R ) p ( h ( q ) | θ A ( q ) , θ R ( q ) , M U ) ,
where { ( θ A ( q ) , θ R ( q ) , h ( q ) ) } q = 1 k are drawn from the posterior distribution of parameters and latent variables in the model M U , i.e., from the distribution given by p ( θ A , θ R , h | y , M U ) .
On the other hand, it is easy to show that the Bayes factor in favor of the model M U against the model M R is as follows:
p ( y | M U ) p ( y | M R ) = Θ A × Θ R × H p ( h | θ A , θ R , M U ) p ( h | θ R , M R ) p ( θ R , h | y , M R ) p ( θ A | M U ) d θ A d θ R d h ,
and consequently
B F ^ U , R , h = 1 k q = 1 k p ( h ( q ) | θ A ( q ) , θ R ( q ) , M U ) p ( h ( q ) | θ R ( q ) , M R ) ,
where { ( θ R ( q ) , h ( q ) ) } q = 1 k and { θ A ( q ) } q = 1 k are drawn from the posterior distribution p ( θ R , h | y , M R ) and from the prior distribution, p ( θ A | M U ) , respectively. However, if the dimension of the vector ( θ , h ) is high, then the estimators B F ^ U , R , h and B F ^ R , U , h tend to suffer from the so-called “simulation pseudo-bias”, similarly to the harmonic mean estimator (see in [26]). This “simulation pseudo-bias” of the estimators will be illustrated on the basis of simulation studies in the next section. To deal with the problem of “pseudo-bias”, we propose drawing from the posterior and prior distributions restricted to a subset of the space of parameters and latent variables with non-zero posterior probability, and correcting the arithmetic mean of the ratio by the posterior probability of the subset visited by the sampler.
Let us assume that D R Θ R × H and D A Θ A , 0 < P r ( D A × D R | M U ) < , 0 < P r ( D A × D R | y , M U ) < .
Starting from the identity
p ( y | M R ) p ( y | M U ) Pr ( D R | y , M R ) Pr ( D A | M U ) = = D A × D R p ( y | M R ) p ( y | M U ) p ( θ R , h | y , M R ) p ( θ A | M U ) d θ A d θ R d h ,
we obtain
p ( y | M R ) p ( y | M U ) Pr ( D R | y , M R ) Pr ( D A | M U ) =
= Θ A × Θ R × H I D A × D R ( θ A , θ R , h ) p ( θ R , h , y | M R ) p ( θ A | M U ) p ( y | M U ) p ( θ A , θ R , h | y , M U ) p ( θ A , θ R , h | y , M U ) d θ A d θ R d h =
= Θ A × Θ R × H I D A × D R ( θ A , θ R , h ) p ( θ R , h , y | M R ) p ( θ A | M U ) p ( θ A , θ R , h , y | M U ) p ( θ A , θ R , h | y , M U ) d θ A d θ R d h .
On the basis of the fact that
p ( θ R , h , y | M R ) p ( θ A | M U ) p ( θ A , θ R , h , y | M U ) = p ( y | θ R , h , M R ) p ( h | θ R , M R ) p ( θ R | M R ) p ( θ A | M U ) p ( y | θ A , θ R , h , M U ) p ( h | θ A , θ R , M U ) p ( θ A , θ R | M U ) ,
and under assumptions (i)–(vi), we have
p ( θ R , h , y | M R ) p ( θ A | M U ) p ( θ A , θ R , h , y | M U ) = p ( h | θ R , M R ) p ( h | θ A , θ R , M U ) .
Therefore,
p ( y | M R ) p ( y | M U ) = 1 Pr ( D R | y , M R ) Pr ( D A | M U ) × × Θ A × Θ R × H I D A × D R ( θ A , θ R , h ) p ( h | θ R , M R ) p ( h | θ A , θ R , M U ) p ( θ A , θ R , h | y , M U ) d θ A d θ R d h .
Identity (18) naturally leads to the following estimator of the Bayes factor:
B F ^ R , U , D , h = 1 P ^ r ( D R | y , M R ) P ^ r ( D A | M U ) × × q = 1 k p ( h ( q ) | θ R ( q ) , M R ) p ( h ( q ) | θ A ( q ) , θ R ( q ) , M U ) I D A × D R ( θ A ( q ) , θ R ( q ) , h ( q ) ) ,
where { ( θ A ( q ) ) , θ R ( q ) , h ( q ) ) } q = 1 k are drawn from p ( θ A , θ R , h | y , M U ) , or equivalently
B F ^ R , U , D , h = P ^ r ( D A × D R | y , M U ) P ^ r ( D R | y , M R ) P ^ r ( D A | M U ) × × 1 k q = 1 k p ( h D A × D R ( q ) | θ R , D A × D R ( q ) , M R ) p ( h D A × D R ( q ) | θ A , D A × D R ( q ) , θ R , D A × D R ( q ) , M U ) ,
where { ( θ A , D A × D R ( q ) , θ R , D A × D R ( q ) , h D A × D R ( q ) ) } q = 1 k are drawn from p ( θ A , θ R , h | y , M U ) truncated to the subset D A × D R . Because the posterior simulation support is the subset D A × D R of the parameter and latent variable space, most of the { ( θ A , D A × D R ( q ) , θ R , D A × D R ( q ) , h D A × D R ( q ) ) } q = 1 k have “similar” values of the ratio of the conditional densities of latent variables, so that, having probabilities P ^ r ( D A × D R | y , M U ) , P ^ r ( D R | y , M R ) , P ^ r ( D A | M U ) , the simulation process will be numerically more efficient than in the case of unrestricted space of parameters and latent variables. The estimates will not be dominated by few values of the ratio. Therefore, a smaller simulation sample will be required to obtain adequate precision in B F ^ R , U , D , h .
Now suppose that D Θ A × Θ R × H and 0 < P r ( D | M U ) < , 0 < P r ( D | y , M U ) < . This time we start with the identity
p ( y | M U ) p ( y | M R ) Pr ( D | y , M U ) = D p ( y | M U ) p ( y | M R ) p ( θ A , θ R , h | y , M U ) d θ A d θ R d h ,
and we obtain
p ( y | M U ) p ( y | M R ) Pr ( D | y , M U ) =
= Θ A × Θ R × H I D ( θ A , θ R , h ) p ( θ A , θ R , h , y | M U ) p ( y | M R ) p ( θ R , h | y , M R ) p ( θ R , h | y , M R ) d θ A d θ R d h =
= Θ A × Θ R × H I D ( θ A , θ R , h ) p ( θ A , θ R , h , y | M U ) p ( θ R , h , y | M R ) p ( θ R , h | y , M R ) d θ A d θ R d h .
In this case,
p ( θ A , θ R , h , y | M U ) p ( θ R , h , y | M R ) = p ( y | θ A , θ R , h , M U ) p ( h | θ A , θ R , M U ) p ( θ A , θ R | M U ) p ( y | θ R , h , M R ) p ( h | θ R , M R ) p ( θ R | M R ) ,
and under assumptions (i)–(vi)
p ( θ A , θ R , h , y | M U ) p ( θ R , h , y | M R ) = p ( h | θ A , θ R , M U ) p ( θ A | M U ) p ( h | θ R , M R ) .
Thus,
p ( y | M U ) p ( y | M R ) = 1 Pr ( D | y , M U ) × × Θ A × Θ R × H I D ( θ A , θ R , h ) p ( h | θ A , θ R , M U ) p ( h | θ R , M R ) p ( θ R , h | y , M R ) p ( θ A | M U ) d θ A d θ R d h .
Given a sample { ( θ R ( q ) , h ( q ) ) } q = 1 k , { θ A ( q ) } q = 1 k from the distributions p ( θ R , h | y , M R ) and p ( θ A | M U ) , respectively, then, as results from (22), an estimator of the Bayes factor for the model M U against the model M R can be
B F ^ U , R , D , h = 1 P ^ r ( D | y , M U ) 1 k q = 1 k p ( h ( q ) | θ A ( q ) , θ R ( q ) , M U ) p ( h ( q ) | θ R ( q ) , M R ) I D ( θ A ( q ) , θ R ( q ) , h ( q ) ) .
Additionally, if D = D A × D R Θ A × Θ R × H , then
B F ^ U , R , D , h = P ^ r ( D R | y , M R ) P ^ r ( D A | M U ) P ^ r ( D | y , M U ) 1 k q = 1 k p ( h D ( q ) | θ A , D ( q ) , θ R , D ( q ) , M U ) p ( h D ( q ) | θ R , D ( q ) , M R ) ,
where { ( θ R , D ( q ) , h D ( q ) ) } q = 1 k , { θ A , D ( q ) } q = 1 k are drawn from the distributions determined by p ( θ R , h | y , M R ) and p ( θ A | M U ) , respectively, restricted to the subset D.

2.2. Different Prior Distributions for Common Parameters in Both Models

Now, let us assume that
(v)
p ( y | θ U , h U , M U ) = p ( y | θ R , h R , M R ) , i.e., competing models have the same sampling density and
(vii)
p ( h | θ A , θ R , M U ) = p ( h | θ R , M R ) , i.e., in both models, the latent variables have the same prior distribution, instead of (vi) p ( θ R | M U ) = p ( θ R | M R ) .
Then, it is easy to show that in the first case
p ( y | M R ) p ( y | M U ) = 1 Pr ( D R | y , M R ) Pr ( D A | M U ) ×
× Θ A × Θ R × H I D A × D R ( θ A , θ R , h ) p ( θ R | M R ) p ( θ R | M U ) p ( θ A , θ R , h | y , M U ) d θ A d θ R d h ,
and in the second case
p ( y | M U ) p ( y | M R ) = 1 Pr ( D | y , M U ) × × Θ A × Θ R × H I D ( θ A , θ R , h ) p ( θ R | M U ) p ( θ R | M R ) p ( θ R , h | y , M R ) p ( θ A | M U ) d θ A d θ R d h .
Basing on identities (25) and (26), the following estimators of the Bayes factors can be formulated:
B ^ F R , U , D , θ = = 1 P ^ r ( D R | y , M R ) P ^ r ( D A | M U ) 1 k q = 1 k p ( θ R ( q ) | M R ) p ( θ R ( q ) | M U ) I D A × D R ( θ A ( q ) , θ R ( q ) , h ( q ) ) ,
where { ( θ A ( q ) , θ R ( q ) , h ( q ) ) } q = 1 k are drawn from p ( θ A , θ R , h | y , M U ) ,
B ^ F U , R , D , θ = 1 P ^ r ( D | y , M U ) 1 k q = 1 k p ( θ R ( q ) | M U ) p ( θ R ( q ) | M R ) I D ( θ A ( q ) , θ R ( q ) , h ( q ) ) ,
where { ( θ R ( q ) , h ( q ) ) } q = 1 k , { θ A ( q ) } q = 1 k are drawn from p ( θ R , h | y , M R ) and p ( θ A | M U ) , respectively.

2.3. Different Conditional Sampling Distributions of the Vector of Observations

Now we assume that conditions (i)–(iv), (vi), and (vii) hold. Thus, the conditional distribution of the observable vector can be different, but the prior distributions for vector of latent variables and common parameters are the same in both competing models. Then, it is easy to show that
p ( y | M R ) p ( y | M U ) = 1 Pr ( D R | y , M R ) Pr ( D A | M U ) × × Θ A × Θ R × H I D A × D R ( θ A , θ R , h ) p ( y | θ R , h , M R ) p ( y | θ A , h , M U ) p ( θ A , θ R , h | y , M U ) d θ A d θ R d h ,
and for the reciprocal of this Bayes factor we have
p ( y | M U ) p ( y | M R ) = 1 Pr ( D | y , M U ) × × Θ A × Θ R × H I D ( θ A , θ R , h ) p ( y | θ R , h , M U ) p ( y | θ R , h , M R ) p ( θ R , h | y , M R ) p ( θ A | M U ) d θ A d θ R d h .
Similarly to the above, basing on identities (29) and (30), the following estimators of the Bayes factors can be formulated:
B F ^ R , U , D , y = 1 P ^ r ( D R | y , M R ) P ^ r ( D A | M U ) × × 1 k q = 1 k p ( y | θ R ( q ) , h ( q ) , M R ) p ( y | θ A ( q ) , h ( q ) , M U ) I D A × D R ( θ A ( q ) , θ R ( q ) , h ( q ) ) ,
where { ( θ A ( q ) , θ R ( q ) , h ( q ) ) } q = 1 k are drawn from p ( θ A , θ R , h | y , M U ) , and
B F ^ U , R , D , y = 1 P ^ r ( D | y , M U ) 1 k q = 1 k p ( y | θ R ( q ) , h ( q ) , M U ) p ( y | θ R ( q ) , h ( q ) , M R ) I D ( θ A ( q ) , θ R ( q ) , h ( q ) ) ,
where { ( θ R ( q ) , h ( q ) ) } q = 1 k , { θ A ( q ) } q = 1 k are drawn from p ( θ R , h | y , M R ) and p ( θ A | M U ) , respectively.
Note that the estimators (19), (22), (27), (28), (31), and (32) are based on the arithmetic mean of the ratio of densities times the indicator function of an arbitrary subset of the space of model parameters and latent variables. Additionally, this arithmetic mean is corrected by the reciprocal of the posterior probability of the subset.

3. Simulation Study

In this section, we present a simple simulation study for models with latent variables in which, after having integrated out latent variables analytically, the true values of the marginal likelihoods can easily be assessed using, e.g., the corrected arithmetic mean estimator (CAME [25]). These assessments will be applied to obtain estimates of Bayes factors used as benchmark values for evaluation of the performance of the new method proposed.
Let us consider the Student t model
y t | μ , v i i t ( μ , 1 , v ) , t = 1 , 2 , . . . , T , μ N ( μ 0 , σ 0 2 ) , v E x p ( λ 0 )
where t ( μ , 1 , v ) denotes the Student t distribution with mode μ , precision 1, and v degrees of freedom. N ( μ 0 , σ 0 2 ) denotes the Normal distribution with mean μ 0 and variance σ 0 2 , in turn, E x p ( λ 0 ) stands for the Exponential distribution with mean λ 0 1 and variance λ 0 2 . The symbol iid stands for independent and identically distributed. Thus, the random variables y 1 , , y T are independent and have the same Student t distribution.
The Student t distribution can be expressed as a scale mixture of Gaussian distributions by introducing a random variable h t that is inverse-gamma distributed. The model can be written as
y t | ω t , μ , v i i N ( μ , h t 1 ) , h t | v i i G ( v / 2 , v / 2 ) , μ N ( μ 0 , σ 0 2 ) , v E x p ( λ 0 ) ,
where G ( v / 2 , v / 2 ) denotes the gamma distribution with shape v / 2 and rate v / 2 .
To simulate datasets we generated samples of size T = 500 , 1000 , 2000 , data points from model (33) with μ = 0 , 0.25 , 0.5 , v = 8 , and μ 0 = 0 ; σ 0 2 = 1 ; λ 0 = 0.1 .
In turn, to simulate from the posterior distribution the Gibbs algorithm was used and run for 20,000 iterations. The conditional posterior distribution for μ is Normal, i.e.,
μ | h , v , y N K ( μ 1 , σ 1 2 ) ,
where σ 1 2 = ( t = 1 T h t + σ 0 2 ) 1 , μ 1 = σ 1 2 ( t = 1 T y t h t + μ 0 σ 0 2 ) , while the conditional posterior distribution for h t is Gamma:
h t | μ , v , y G ( ( v + 1 ) / 2 , [ ( y t μ ) 2 + v ] / 2 ) , t = 1 , . . . , T .
An easy computation shows that the conditional posterior distribution of v is nonstandard:
v | h , μ , y p ( v | h , y , μ ) v 2 T v 2 Γ v 2 T e v κ / 2 ,
where κ = t = 1 T h t t = 1 T ln h t + 2 λ 0 .
However, reliable numerical methods for generating from this distribution do exist. We use one of them, the rejection sampling proposed by [27].
In the model under consideration, θ U = ( μ , v ) , θ R = v , and h = ( h 1 , , h T ) ; thus, the number of latent variables is equal to the number of observations: 500, 1000, and 2000, respectively. The direct computation of the marginal likelihood for model (34) is intensive and unstable due to the presence of latent variables. However, by integrating out the latent variables from the joint distribution of parameters, latent variables, and data, p ( y | h , μ , v ) p ( h , μ , v ) , the conditional density of the data given parameters only can be written in the following form:
p ( y | μ , v ) = t = 1 T Γ v + 1 2 Γ v 2 v π 1 + ( y t μ ) 2 v v + 1 2 .
Unfortunately, the marginal density of the data (i.e., p ( y ) ) cannot be presented in closed form. However, due to the lack of latent variables as well as thanks to the small number of parameters, the marginal data density value can easily and precisely be evaluated by using the corrected arithmetic mean estimator (CAME [25]). Estimates obtained by the use of the CAME are treated as benchmark values.
In order to show performance of our new estimators of the Bayes factor, we assume that the subset D is an intersection of the space of parameters and latent variables, obtained from the condition that the ratio of conditional density functions p ( y | θ U , h , M U ) and p ( y | θ R , h , M R ) is between the lowest and the highest values of the ratio evaluated on the basis of pseudo-random sample { θ ( q ) , h ( q ) } q = 1 k in both models, and the hyper-cuboid limited by the range of the sampler output, i.e.,
D = ( A × B ) C ,
where
A = i [ max m { M U , M R } min q m { θ i ( q m ) } , min m { M U , M R } max q m { θ i ( q m ) } ] ,
B = t [ max m { M U , M R } min q m { h t ( q m ) } , min m { M U , M R } max q m { h t ( q m ) } ] ,
C = { ( θ U , h ) : L p ( y | θ U , h , M U ) p ( y | θ R , h , M R ) Q } ,
L = max m { M U , M R } min q m { p ( y | θ U ( q m ) , h ( q m ) , M U ) p ( y | θ R ( q m ) , h ( q m ) , M R ) } ,
Q = min m { M U , M R } max q m { p ( y | θ U ( q m ) , h ( q m ) , M U ) p ( y | θ R ( q m ) , h ( q m ) , M R ) } .
In the restricted models ( M R ) , it is assumed that μ = 0 , whereas in the unrestricted model ( M U ) μ 0 . The ratio of density functions of the conditional distributions of the observable vector is as follows:
p ( y | θ U , h , M U ) p ( y | θ R , h , M R ) = e 1 2 t = 1 T [ ( y t μ ) 2 y t 2 ] h t = e 1 2 μ 2 t = 1 T h t + μ t = 1 T y t h t .
In Table 1, Table 2 and Table 3, we present results obtained by using newly proposed estimators of Bayes factors (i.e., the corrected arithmetic means of the ratio of the densities of the conditional distributions of the observable vector, B F ^ U , R , D , y and B F ^ R , U , D , y ) and uncorrected arithmetic means of the ratios B F ^ U , R , y and B F ^ R , U , y . Table 1, Table 2 and Table 3 present means, standard deviations, root mean square errors and average errors (relative to the CAM estimates) of the decimal logarithm of estimates obtained in models under consideration. As mentioned earlier, closed-form expressions for the marginal likelihoods do not exist. Therefore, the root mean square errors and average errors are determined relative to the CAM estimates obtained for each marginal likelihood separately.
To estimate Pr ( D | y , M U ) , Pr ( D R | y , M R ) , and Pr ( D A | M U ) , we use simulation from appropriate posterior or prior distributions, e.g.,
P ^ r ( D | y , M U ) = 1 k q = 1 k I D ( θ A ( q ) , θ R ( q ) , h ( q ) ) ,
where { ( θ A ( q ) , θ R ( q ) , h ( q ) ) } q = 1 k are drawn from the posterior distribution p ( θ A , θ R , h | y , M U ) . The remaining probabilities are calculated in a similar manner. Furthermore, we consider uncorrected arithmetic means of the ratios B F ^ U , R , y and B F ^ R , U , y to investigate sampling properties of these estimators.
Figure 1 indicates that even in such simple models performance of the uncorrected estimators are not satisfactory. The estimator B F ^ R , U , y is downwardly “pseudo-biased” (respective average errors are negative). On the other hand, our simulation study demonstrates that the performance of the estimators B F ^ U , R , y and B F ^ R , U , y can be improved by trimming the posterior simulation support to a given subset of the space of latent variables and parameters, and next making the correction of the arithmetic mean of the ratios using the posterior probabilities of the subset. As can be seen from Table 1, Table 2 and Table 3 and Figure 1, Figure 2 and Figure 3, corrected estimators of the Bayes factors perform very well in comparison to uncorrected ones. The estimators B F ^ U , R , D , y and B F ^ R , U , D , y produce estimates which are very close to those obtained on the basis of the CAME, while the estimators B F ^ U , R , y and B F ^ R , U , y , based on uncorrected arithmetic mean, provide biased estimates.
Finally, note that by using both estimators for the ratio of densities and their reciprocals, i.e., B F ^ U , R , D , y and B F ^ R , U , D , y , one can check the accuracy of assessments and of adequate (from numerical point of view) selection of the subset D. This is because, roughly speaking, the equality B F U , R = 1 B F R , U implies that it is natural to use B F ^ U , R , D , y and 1 B F ^ R , U , D , y to estimate B F U , R . Different estimates for B F U , R can indicate numerical inadequacy of selection of the subset D and/or too small simulation sample.

4. Empirical Illustration: Formal Bayesian Comparison of Hybrid MSV-MGARCH Models

In this section, the new method will be applied in order to formally compare the hybrid Multivariate Stochastic Volatility–Multivariate Generalized Autoregressive Conditional Heteroskedasticity (MSV-MGARCH) models and thus to show that it can be used in practice. The hybrid MSV-MGARCH models were proposed in [28,29,30] for modeling financial time series. These hybrid models are characterized by relatively simple model structures that exploit advantages of both model classes: flexibility of the Multivariate Stochastic Volatility (MSV) class, where volatility is modeled by latent stochastic processes, and relative simplicity of the Multivariate GARCH (MGARCH) class. The simplest specification of MSV-MGARCH model (called LN-MSF-SBEKK) is based on one latent process (Multiplicative Stochastic Factor (MSF)) and on the scalar BEKK [31] covariance structure. The LN-MSF-SBEKK structure is obtained by multiplying the SBEKK conditional covariance matrix H t by a scalar random variable h t such that { ln h t } is a Gaussian AR(1) latent process with autoregression parameter ϕ . The hybrid LN-MSF-SBEKK specification has been recognized in the literature [32,33,34,35] and proved to be useful in multivariate modeling of financial time series as well as of macroeconomic data [36,37,38,39,40,41]. The drawback of the LN-MSF-MGARCH process is that it cannot be treated as a direct extension of the MGARCH process with the Student t conditional distribution. When ϕ = 0 , the LN-MSF-SBEKK process reduces itself to the SBEKK process with the conditional distribution being a continuous mixture of multivariate normal distributions with covariance matrices H t h t and h t log-normally distributed. However, the multivariate Student t distribution can be expressed as a scale mixture of normal distributions with the inverted gamma as a mixing distribution. This fact was exploited in [42,43], where the IG-MSF-SBEKK specification was proposed as a natural hybrid extension of the SBEKK process with the Student t conditional distribution (t-SBEKK). In the new specification, the latent process { ln h t } remains an autoregressive process of order one, but it is non-Gaussian. For ϕ = 0 the latent variables h t (where t Z ) are independent and have inverted gamma (IG) distribution. Unfortunately, for ϕ 0 the unconditional distribution of the latent variables h t is unknown. To summarize, the IG-MSF-SBEKK model is obtained by multiplying the SBEKK conditional covariance matrix H t by a scalar random variable h t coming from such latent process (with autoregression parameter ϕ ) that, for ϕ = 0 , h t has an inverted gamma distribution. Thus, ϕ = 0 leads to the t-SBEKK specification, in which the conditional distribution is represented as a continuous mixture of multivariate normal distributions with covariance matrices H t h t , where h t is inverted gamma distributed. If ϕ 0 , the latent variables h t ( t Z ) are dependent, so (in comparison to the t-SBEKK model) in the IG-MSF-SBEKK model there exists an additional source of dependence.
Let us consider a two-dimensional autoregressive process r t = ( r 1 , t , r 2 , t ) defined by the equation
r t = δ 0 + r t 1 Δ + ε t , t = 1 , , T ,
where δ 0 and Δ are, respectively, 2 × 1 and 2 × 2 matrix parameters, and T is the length of the observed time series. The hybrid MSF-MGARCH specification for the disturbance term ε t is defined by the following equality:
ε t = ζ t H t 1 / 2 h t 1 / 2 ,
where
H t = ( 1 β 1 β 2 ) A + β 1 ε t 1 ε t 1 + β 2 H t 1 ,
ln g t = ϕ ln g t 1 + ln γ t ,
{ ζ t } is a Gaussian white noise sequence with mean vector zero and unit covariance matrix; { γ t } is a sequence of independent positive random variables; γ t is inverted gamma distributed with mean v v 2 for v > 2 , i.e., { γ t } i i I G ( v / 2 , v / 2 ) , ζ t γ s for all t , s { 1 , , T } , 0 < | ϕ | < 1 ; β 1 and β 2 are positive scalar parameters such that β 1 + β 2 < 1 ; and A is a free symmetric positive definite matrix of order 2. Under (41) and (42), the conditional distribution of r t (given the past of r t and the current latent variable h t ) is Normal with mean vector μ t = δ 0 + r t 1 Δ and covariance matrix Σ t = H t h t . For ϕ = 0 (this value is excluded in the definition of the hybrid models under consideration) h t = γ t , so the distribution of h t is an inverted gamma. In this case, r t in (41) is, given its past, an IG mixture of two-variate normal distributions N ( μ t , h t H t ) , i.e., it has the two-variate Student t distribution.
The assumptions presented so far determine the conditional distribution of the observations and latent variables given the parameters. Thus, it remains to formulate the prior distributions of parameters. We use the same prior distributions as in [42,43]. Six elements of δ = ( δ 0 v e c ( Δ ) ) are assumed to be a priori independent of other parameters, with the N ( 0 , I 6 ) prior. Matrix A has an inverted Wishart prior distribution such that A 1 has the Wishart prior distribution with mean I 2 ; the elements of β = ( β 1 , β 2 ) are jointly uniformly distributed over the unit simplex. As regards the initial conditions for H t , we take H 0 = h 0 I 2 and treat h 0 > 0 as an additional parameter, a priori exponentially distributed with mean 1; ϕ has the uniform distribution over (−1, 1), and for v we assume the gamma distribution with mean λ a / λ v , truncated to ( 2 , + ) . We assume that λ v = 0.1 with two cases: λ a = 3 and λ a = 1 (note that λ a = 1 leads to exponential distribution for v).
Furthermore, we use the same bivariate data sets as those modeled in [30,42,43]. The first data set consists of the official daily exchange rates of the National Bank of Poland (NBP fixing rates) for the US dollar and German mark in the period 1 February 1996–31 December 2001. The length of the modeled time series of their daily growth rates (logarithmic return rates) is 1482. The second data set consists of the daily quotations of the main index of the Warsaw Stock Exchange (WIG) and the S&P500 index of NYSE—1727 logarithmic returns are modeled from the period 8 January 1999–1 February 2006.
In order to obtain pseudo-random sample from the posterior distribution of parameters and latent variables, we use MCMC simulation techniques described in [42,44] and implemented within the GAUSS programming environment.
Using fully Bayesian methods, we want to answer the question whether the IG-MSF-SBEKK model can be reduced to the t-SBEKK case. Thus, we consider the hypothesis that a scalar parameter ϕ = 0 (the t-SBEKK model, M R ) and the alternative hypothesis that ϕ 0 (the IG-MSF-SBEKK model, M U ) . For the exchange rate data, the posterior probability that ϕ < 0 is approximately 0.001 only and ϕ = 0 is included in the highest posterior density (HPD) interval of probability content as high as 0.996 . Thus, Lindley’s procedure leads to the conclusion that the t-SBEKK is inadequate. But for the stock data, the posterior probability that ϕ < 0 is 0.017 for λ a = 3 and 0.054 for λ a = 1 , and ϕ = 0 is included in the HPD interval of probability content 0.87 or 0.80 , depending on the prior hyperparameter λ a . In the case of the stock data, Lindley’s testing procedure yields results that the t-SBEKK model cannot be rejected by the data.
Now, our new estimators of the Bayes factors will be used. We start with the assumption that the subset D is an intersection of the subspace of parameters and latent variables, obtained by the condition that the ratio of conditional density functions p ( h | θ U , M U ) and p ( h | θ R , M R ) is between the lowest and the highest values of the ratio evaluated at pseudo-random sample { θ ( q ) , h ( q ) } q = 1 k in both models, and the hyper-cuboid limited by the range of the sampler output, i.e., D = ( A × B ) C , where
A = i [ max m { M U , M R } min q m { θ i ( q m ) } , min m { M U , M R } max q m { θ i ( q m ) } ] ,
B = t [ max m { M U , M R } min q m { h t ( q m ) } , min m { M U , M R } max q m { h t ( q m ) } ] ,
C = { ( θ U , h ) : L p ( h | θ U , M U ) p ( h | θ R , M R ) Q } ,
L = max m { M U , M R } min q m { p ( h ( q m ) | θ U ( q m ) , M U ) p ( h ( q m ) | θ R ( q m ) , M R ) } ,
Q = min m { M U , M R } max q m { p ( h ( q m ) | θ U ( q m ) , M U ) p ( h ( q m ) | θ R ( q m ) , M R ) } .
The ratio of the densities of the conditional distributions of the vector of latent variables is as follows:
p ( h | θ U , M U ) p ( h | θ R , M R ) = t = 1 T h t 1 v 2 ϕ e v 2 h t ( 1 h t 1 ϕ ) .
In Table 4, we present results obtained by using newly proposed estimators of Bayes factors—the corrected arithmetic means of the ratio of the densities of the conditional distributions of latent variables, B ^ F U , R , D and B F ^ R , U , D ; the subscript h is omitted for convenience. Results obtained on the basis of our new method confirm that in the case of exchange rates, the t-SBEKK model is strongly rejected by the data, whereas it seems that the t-SBEKK specification can be adequate for stock indices. Under equal prior model probabilities, the IG-MSF-SBEKK model for exchange rates is about 14–15 times more probable a posteriori than the t-SBEKK model, indicating a strong (but not very strong) evidence against the t-SBEKK model. In turn, for stock indices the decimal logarithm of the Bayes factor of t-SBEKK relative to IG-MSF-SBEKK depends on prior distribution of the number of degrees of freedom, v. The Bayes factor in favor of t-SBEKK is equal to about 1.25 and 0.4 in models with λ a = 1 and λ a = 3 , respectively. Of course, according to scale of Kass and Raftery (1995) these results indicate evidence for a given model that is negligible: “not worth more than a bare of mention”.
For the sake of comparison, in the last row of the Table 4 the estimates of the Savage–Dickey ratio is presented. The marginal posterior density of ϕ is evaluated at ϕ = 0 on the basis of the histogram of the sampled values of ϕ from the posterior distribution. The main conclusions pertaining to the validity of the reduction of the IG-MSF-SBEKK model to the t-SBEKK one are very similar.
Additionally, it is very important to stress that our method makes it possible to compare IG-MSF-SBEKK models whose prior distributions are different (e.g., in respect of the number of degrees of freedom). As can be seen from Table 5, for exchange rates the IG-MSF-SBEKK and t-SBEKK models with exponential distribution for v are more probable a posteriori than those with the gamma distribution for v with the hyperparameter λ a = 3 . In contrary, for stock indices the IG-MSF-SBEKK model with the hyperparameter λ a = 3 is more probable a posteriori than the same model with λ a = 1 , but this improvement is negligible.

5. Discussion

In this paper, a new method of the estimation of the Bayes factor is proposed. The idea of proposed estimators is based on correction of the arithmetic mean estimator of the ratio of conditional distributions by trimming the posterior sample to a certain subset of the space of parameters and latent variables, D, and correcting the arithmetic mean by the posterior probabilities of this subset. The new method makes it possible to compare a finite number of different models with a large number of parameters and/or latent variables in respect to the goodness of fit measured by their posterior probabilities. Our simulation study and empirical illustration show that the method performs well.
In this paper, the question of an adequate selection of the subset D used in the method proposed is not addressed. The choice of the subset which minimizes the variance of the estimator remains an open question. The simulation study and empirical example considered in the paper indicate that the choice of D as an intersection of the parameter space (with additional restrictions) and the hyper-cuboid limited by the range of the posterior sampler outputs leads to acceptable results.

Funding

Research supported by a grant from the National Science Center in Poland under decision no. UMO-2018/31/B/HS4/00730.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are publicly available.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Zellner, A. An Introduction to Bayesian Inference in Econometrics; John Wiley and Sons, Inc.: New York, NY, USA, 1971. [Google Scholar]
  2. Gelman, A.; Meng, X.L. Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Stat. Sci. 1998, 13, 163–185. [Google Scholar] [CrossRef]
  3. Green, P.J. Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination. Biometrika 1995, 82, 711–732. [Google Scholar] [CrossRef]
  4. Han, C.; Carlin, B.P. Markov chain Monte Carlo methods for computing Bayes factors: A comparative review. J. Am. Stat. Assoc. 2001, 96, 1122–1132. [Google Scholar] [CrossRef]
  5. Sinharay, S.; Stern, H.S. An empirical comparison of methods for computing Bayes factors in generalized linear mixed models. J. Comput. Graph. Stat. 2005, 14, 415–435. [Google Scholar] [CrossRef] [Green Version]
  6. Jacquier, E.; Polson, N.G.; Rossi, P.E. Bayesian analysis of stochastic volatility models with fat-tails and correlated errors. J. Econom. 2004, 122, 185–212. [Google Scholar] [CrossRef]
  7. Ardia, D.; Baştürk, N.; Hoogerheide, L.F.; van Dijk, H.K. A Comparative Study of Monte Carlo Methods for Efficient Evaluation of Marginal Likelihood. Comput. Stat. Data Anal. 2012, 56, 3398–3414. [Google Scholar] [CrossRef] [Green Version]
  8. Chen, M.H.; Shao, Q.M.; Ibrahim, J.G. Monte Carlo Methods in Bayesian Computation; Springer: New York, NY, USA, 2012. [Google Scholar]
  9. Llorente, F.; Martino, L.; Delgado, D.; Lopez-Santiago, J. Marginal Likelihood Computation for Model Selection and Hypothesis Testing: An Extensive Review. Available online: https://arxiv.org/abs/2005.08334 (accessed on 3 March 2021).
  10. Cameron, E. A generalized Savage-Dickey density ratio. arXiv 2013, arXiv:1311.1292. [Google Scholar]
  11. Dickey, J. The Weighted Likelihood Ratio, Linear Hypotheses on Normal Location Parameters. Ann. Math. Stat. 1971, 42, 204–223. [Google Scholar] [CrossRef]
  12. Verdinelli, I.; Wasserman, L. Computing Bayes factors by using a generalization of the Savage-Dickey density ratio. J. Am. Stat. Assoc. 1995, 90, 614–618. [Google Scholar] [CrossRef]
  13. Lindley, D.V. Introduction to Probability and Statistics from a Bayesian Viewpoint. Part 2. Inference; Cambridge University Press: Cambridge, UK, 1965. [Google Scholar]
  14. De Bragança Pereira, C.; Stern, J.M. Evidence and Credibility: Full Bayesian Significance Test for Precise Hypotheses. Entropy 1999, 1, 99–110. [Google Scholar] [CrossRef]
  15. Borges, W.; Stern, J.M. The Rules of Logic Composition for the Bayesian Epistemic e-Values. Logic J. IGPL 2007, 15, 401–420. [Google Scholar] [CrossRef]
  16. Diniz, M.A.; Pereira, C.A.B.; Stern, J.M. Cointegration and Unit Root Tests: A Fully Bayesian Approach. Entropy 2020, 22, 968. [Google Scholar] [CrossRef]
  17. Esteves, L.G.; Izbicki, R.; Stern, J.M.; Stern, R.B. The Logical Consistency of Simultaneous Agnostic Hypothesis Tests. Entropy 2016, 18, 256. [Google Scholar] [CrossRef] [Green Version]
  18. Lauretto, M.S.; Faria, S.R.; Pereira, C.A.B.; Stern, J.M. The Problem of Separate Hypotheses via Mixtures Models. Am. Inst. Phys. Conf. Proc. 2007, 954, 268–275. [Google Scholar]
  19. Madruga, M.R.; Esteves, L.G.; Wechsler, S. On the Bayesianity of Pereira-Stern tests. Test 2001, 10, 291–299. [Google Scholar] [CrossRef]
  20. Madruga, M.R.; Pereira, C.A.B.; Stern, J.M. Bayesian Evidence Test for Precise Hypotheses. J. Stat. Plan. Inference 2003, 117, 185–198. [Google Scholar] [CrossRef] [Green Version]
  21. Pereira, C.A.B.; Stern, J.M.; Wechsler, S. Can a significance test be genuinely Bayesian? Bayesian Anal. 2008, 3, 79–100. [Google Scholar] [CrossRef]
  22. Berger, J.O.; Pericchi, C.R. Objective Bayesian methods for model selection: Introduction andcomparison (with discussion). In Model Selection; Lecture Notes—Monograph Series 38; Lahiri, P., Ed.; Institute of Mathematical Statistics: Hayward, CA, USA, 2001; pp. 135–207. [Google Scholar]
  23. Osiewalski, J.; Steel, M. Una perspectiva bayesiana en selección de modelos. Cuad. Econ. 1993, 55, 327–351. Available online: https://home.cyf-kr.edu.pl/eeosiewa/ABayesianPerspectiveonModelSelection.pdf (accessed on 17 March 2021).
  24. Chan, J.C.C.; Eisenstat, E. Marginal Likelihood Estimation with the Cross-Entropy Method. Econom. Rev. 2015, 34, 256–285. [Google Scholar] [CrossRef]
  25. Pajor, A. Estimating the Marginal Likelihood Using the Arithmetic Mean Identity. Bayesian Anal. 2017, 12, 261–287. [Google Scholar] [CrossRef]
  26. Lenk, P. Simulation pseudo-bias correction to the harmonic mean estimator of integrated likelihoods. J. Comput. Graph. Stat. 2009, 18, 941–960. [Google Scholar] [CrossRef]
  27. Geweke, J. Priors for macroeconomic time series and their application. In Institute for Empirical Macroeconomics Discussion Paper 64; Federal Reserve Bank of Minneapolis: Minneapolis, MN, USA, 1992; Available online: https://www.minneapolisfed.org/research/dp/dp64.pdf (accessed on 27 March 2021).
  28. Osiewalski, J. New hybrid models of multivariate volatility (a Bayesian perspective). Przegląd Stat. Stat. Rev. 2009, 56, 15–22. [Google Scholar]
  29. Osiewalski, J.; Pajor, A. Flexibility and parsimony in multivariate financial modelling: A hybrid bivariate DCC–SV model. In Financial Markets. Principles of Modeling, Forecasting and Decision-Making; FindEcon Monograph Series No.3; Milo, M., Wdowiński, P., Eds.; Łódź University Press: Łódź, Poland, 2007; pp. 11–26. [Google Scholar]
  30. Osiewalski, J.; Pajor, A. Bayesian analysis for hybrid MSF–SBEKK models of multivariate volatility. Cent. Eur. J. Econ. Model. Econom. 2009, 1, 179–202. [Google Scholar]
  31. Baba, Y.; Engle, R.; Kraft, D.; Kroner, K. Multivariate Simultaneous Generalised ARCH; Department of Economics, University of California at San Diego: San Diego, CA, USA, 1989. [Google Scholar]
  32. Amado, C.; Teräsvirta, T. Modelling volatility by variance decomposition. J. Econom. 2013, 175, 142–153. [Google Scholar] [CrossRef] [Green Version]
  33. Amado, C.; Teräsvirta, T. Specification and testing of multiplicative time-varying GARCH models with applications. Econom. Rev. 2017, 36, 421–446. [Google Scholar] [CrossRef] [Green Version]
  34. Carriero, A.; Clark, T.; Marcellino, M. Common drifting volatility in large Bayesian vars. J. Bus. Econ. Stat. 2016, 34, 375–390. [Google Scholar] [CrossRef]
  35. Teräsvirta, T. Nonlinear models for autoregressive conditional heteroscedasticity. In Handbook of Volatility Models and Their Applications; Bauwens, L., Hafner, C., Laurent, S., Eds.; Wiley: New York, NY, USA, 2012; pp. 49–69. [Google Scholar]
  36. Osiewalski, K.; Osiewalski, J. A long-run relationship between daily prices on two markets: The Bayesian VAR(2)-MSF-SBEKK model. Cent. Eur. J. Econ. Model. Econom. 2013, 5, 65–83. [Google Scholar]
  37. Osiewalski, J.; Osiewalski, K. Hybrid MSV-MGARCH models—General remarks and the GMSF-SBEKK specification. Cent. Eur. J. Econ. Model. Econom. 2016, 8, 241–271. [Google Scholar]
  38. Osiewalski, J.; Pajor, A. Bayesian Value-at-Risk for a portfolio: Multi- and univariate approaches using MSF-SBEKK models. Cent. Eur. J. Econ. Model. Econom. 2010, 2, 253–277. [Google Scholar]
  39. Pajor, A.; Osiewalski, J. Bayesian Value-at-Risk and Expected Shortfall for a large portfolio (multi- and univariate approaches). Acta Phys. Pol. A 2012, 121, 101–109. [Google Scholar] [CrossRef]
  40. Pajor, A.; Wróblewska, J. VEC-MSF models in Bayesian analysis of short- and long-run relationships. Stud. Nonlinear Dyn. E 2017, 21, 1–22. [Google Scholar] [CrossRef] [Green Version]
  41. Wróblewska, J.; Pajor, A. One-period joint forecasts of Polish inflation, unemployment and interest rate using Bayesian VEC-MSF models. Cent. Eur. J. Econ. Model. Econom. 2019, 11, 23–45. [Google Scholar]
  42. Osiewalski, J.; Pajor, A. A hybrid MSV-MGARCH generalisation of the t-MGARCH model. In The 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena: Conference Proceedings, No. 1; Papież, M., Śmiech, S., Eds.; Foundation of the Cracow University of Economics: Cracow, Poland, 2018; pp. 345–354. [Google Scholar]
  43. Osiewalski, J.; Pajor, A. On Sensitivity of Inference in Bayesian MSF-MGARCH Models. Cent. Eur. J. Econ. Model. Econom. 2019, 11, 173–197. [Google Scholar]
  44. Pajor, A. MCMC Method for the IG-MSF-SBEKK Model. In Quantitative Methods in the Contemporary Issues of Economics; Ciałowicz, B., Ed.; Wydawnictwo edu-Libri s.c.: Kraków, Poland, 2020; pp. 77–88. Available online: http://Quantitative-methods.pdf (accessed on 3 March 2021).
  45. Kass, R.E.; Raftery, A.E. Bayes factors. J. Am. Stat. Assoc. 1995, 90, 773–795. [Google Scholar] [CrossRef]
Figure 1. Estimates of the log-Bayes factor in the Student t model, obtained with the use of uncorrected ratios (blue line), and of the corrected arithmetic mean estimator after having integrated out latent variables (red line). Simulation study of the Student t model using 2000 simulated data points ( T = 2000 , which is also equal to the number of latent variables); 20,000 Gibbs sampler iterations for estimation were used. The l o g B F R , U denotes the decimal logarithm of the Bayes factor in favor of model M R against model M U .
Figure 1. Estimates of the log-Bayes factor in the Student t model, obtained with the use of uncorrected ratios (blue line), and of the corrected arithmetic mean estimator after having integrated out latent variables (red line). Simulation study of the Student t model using 2000 simulated data points ( T = 2000 , which is also equal to the number of latent variables); 20,000 Gibbs sampler iterations for estimation were used. The l o g B F R , U denotes the decimal logarithm of the Bayes factor in favor of model M R against model M U .
Entropy 23 00399 g001
Figure 2. Estimates of the log Bayes factor in the Student t model, obtained with the use of the new estimators (blue line), and of the corrected arithmetic mean estimator after having integrated out latent variables (red line). Simulation study of the Student t model using 1000 simulated data points ( T = 1000 , which is also equal to the number of latent variables); 20,000 Gibbs sampler iterations for estimation were used. The symbol B F ^ R , U , D denotes corrected estimator of the Bayes factor in favor of model M R against model M U . The symbol B F ^ U , R , D denotes corrected estimator of the Bayes factor in favor of model M U against model M R .
Figure 2. Estimates of the log Bayes factor in the Student t model, obtained with the use of the new estimators (blue line), and of the corrected arithmetic mean estimator after having integrated out latent variables (red line). Simulation study of the Student t model using 1000 simulated data points ( T = 1000 , which is also equal to the number of latent variables); 20,000 Gibbs sampler iterations for estimation were used. The symbol B F ^ R , U , D denotes corrected estimator of the Bayes factor in favor of model M R against model M U . The symbol B F ^ U , R , D denotes corrected estimator of the Bayes factor in favor of model M U against model M R .
Entropy 23 00399 g002
Figure 3. Estimates of the log Bayes factor in the Student t model, obtained with the use of the new estimators (blue line), and of the corrected arithmetic mean estimator after having integrated out latent variables (red line). Simulation study of the Student t model using 1000 simulated data points ( T = 2000 , which is also equal to the number of latent variables); 20,000 Gibbs sampler iterations for estimation were used. The symbol B F ^ R , U , D denotes corrected estimator of the Bayes factor in favor of model M R against model M U . The symbol B F ^ U , R , D denotes corrected estimator of the Bayes factor in favor of model M U against model M R .
Figure 3. Estimates of the log Bayes factor in the Student t model, obtained with the use of the new estimators (blue line), and of the corrected arithmetic mean estimator after having integrated out latent variables (red line). Simulation study of the Student t model using 1000 simulated data points ( T = 2000 , which is also equal to the number of latent variables); 20,000 Gibbs sampler iterations for estimation were used. The symbol B F ^ R , U , D denotes corrected estimator of the Bayes factor in favor of model M R against model M U . The symbol B F ^ U , R , D denotes corrected estimator of the Bayes factor in favor of model M U against model M R .
Entropy 23 00399 g003
Table 1. Mean (M), standard deviation (SD), average error (AE; corrected arithmetic mean estimator (CAME)—estimated), and root mean square error (RMSE) in the Student t model. The number of observations and of latent variables is equal to T = 500 .
Table 1. Mean (M), standard deviation (SD), average error (AE; corrected arithmetic mean estimator (CAME)—estimated), and root mean square error (RMSE) in the Student t model. The number of observations and of latent variables is equal to T = 500 .
Estimator and Its Sample Characteristicsfor μ = 0 for μ = 1 / 2
log BF U , R log BF R , U log BF U , R log BF R , U
CAME−1.2561.25621.191−21.191
l o g B F ^ U , R : M−1.25419.826
SD0.0120.748
AE−0.001−1.340
RMSE0.0111.631
l o g B F ^ U , R , D : M−1.25521.02
SD0.0130.357
AE−0.003−0.209
RMSE0.0130.589
l o g B F ^ R , U : M0.498−22.981
SD0.1650.505
AE−0.776−1.870
RMSE0.7831.935
l o g B F ^ R , U , D : M1.249−21.238
SD0.0320.284
AE−0.010−0.100
RMSE0.0360.246
Notes: Results obtained for T = 500 observations from a Student t distribution with mean μ, precision equal to 1 and 8 degrees of freedom. Estimations of the Bayes factors were repeated 100 times. The Bayes factors were estimated with Monte Carlo sampler based on 20,000 iterations. The log BFR,U denotes the decimal logarithm of the Bayes factor in favor of model MR against model MU. The log BFU,R denotes the decimal logarithm of the Bayes factor in favor of model model MU against model MR. B F ^ U , R denotes the estimator of the Bayes factor calculated using simulation from untruncated distributions, whereas B F ^ U , R , D denotes the estimator of the Bayes factor based on simulation from truncated distributions to the set D (the subscript y is omitted for convenience).
Table 2. Mean (M), standard deviation (SD), average error (AE; CAME—estimated), and root mean square error (RMSE) in the Student t model. The number of observations and of latent variables is equal to T = 1000 .
Table 2. Mean (M), standard deviation (SD), average error (AE; CAME—estimated), and root mean square error (RMSE) in the Student t model. The number of observations and of latent variables is equal to T = 1000 .
Estimator and Its Sample Characteristicsfor μ = 0 for μ = 1 / 2
log BF U , R log BF R , U log BF U , R log BF R , U
CAME−1.3091.30937.266−37.266
l o g B F ^ U , R : M−1.30534.806
SD0.0160.738
AE0.004−2.485
RMSE0.0202.612
l o g B F ^ U , R , D : M−1.30637.093
SD0.0170.423
AE0.002−0.151
RMSE0.0180.403
l o g B F ^ R , U : M0.415−40.079
SD0.2010.598
AE−0.813−2.626
RMSE0.8502.753
l o g B F ^ R , U , D : M1.304−37.341
SD0.0320.258
AE0.0003−0.137
RMSE0.0280.265
Notes: Results obtained for T = 1000 observations from a Student t distribution with mean μ, precision equal to 1 and 8 degrees of freedom. Meaning of symbols used the same as in Table 1.
Table 3. Mean (M), standard deviation (SD), average error (AE; CAME—estimated), and root mean square error (RMSE) in the Student t model. The number of observations and of latent variables is equal to T = 2000 .
Table 3. Mean (M), standard deviation (SD), average error (AE; CAME—estimated), and root mean square error (RMSE) in the Student t model. The number of observations and of latent variables is equal to T = 2000 .
Estimator and Its Sample Characteristicsfor μ = 0 for μ = 1 / 4
log BF U , R log BF R , U log BF U , R log BF R , U
CAME−1.5001.50023.837−23.837
l o g B F ^ U , R : M−1.48222.683
SD0.0170.53
AE0.021−1.204
RMSE0.0301.321
l o g B F ^ U , R , D : M−1.48023.638
SD0.0160.371
AE0.024−0.134
RMSE0.0310.325
l o g B F ^ R , U : M0.451−25.521
SD0.1080.517
AE−1.070−1.715
RMSE1.0721.755
l o g B F ^ R , U , D : M1.497−23.874
SD0.0390.226
AE−0.010−0.051
RMSE0.0390.283
Notes: Results obtained for T = 2000 observations from a Student t distribution with mean μ, precision equal to 1 and 8 degrees of freedom. Meaning of symbols used the same as in Table 1.
Table 4. Estimates of Bayes factors for the IG-MSF-SBEKK ( M U ) and t-SBEKK ( M R ) models.
Table 4. Estimates of Bayes factors for the IG-MSF-SBEKK ( M U ) and t-SBEKK ( M R ) models.
EstimatorExchange Rates, T = 1482 Stock Indices, T = 1742
λ a = 1 λ a = 3 λ a = 1 λ a = 3
log B F ^ U , R , D 1.1471.191−0.1100.400
(NSE)(0.017)(0.024)(0.012)(0.014)
log B F ^ R , U , D −1.134−1.1820.092−0.388
(NSE)(0.009)(0.019)(0.011)(0.024)
log SD ratio−1.159−1.1670.118−0.396
Notes: The Bayes factors were estimated with Monte Carlo sampler based on 2,000,000 iterations. The log B F ^ R , U , D denotes the decimal logarithm of the estimator of the Bayes factor in favor of model MR against model MU. NSE denotes the numerical standard error and the log SD ratio stands for the decimal logarithm of the Savage–Dickey density ratio. Scale for the strength of evidence against MR [45]: 0 < log BFU,R ≤ 1/2—negligible; 1/2 < log BFU,R ≤ 1—mild; 1 < log BFU,R ≤ 2—strong; 2 < log BFU,R—very strong.
Table 5. Estimates of Bayes factors for the IG-MSF-SBEKK models with different prior distributions of the number of degrees of freedom.
Table 5. Estimates of Bayes factors for the IG-MSF-SBEKK models with different prior distributions of the number of degrees of freedom.
Exchange Rates, T = 1482 Stock Indices, T = 1742
Estimator ϕ 0 ϕ = 0 ϕ 0 ϕ = 0
(IG-MSF-SBEKK)(t-SBEKK)(IG-MSF-SBEKK)(t-SBEKK)
B F ^ λ a = 1 , λ a = 3 , D 8.1239.5170.5241.487
log B F ^ λ a = 1 , λ a = 3 , D 0.9100.979−0.2810.172
B F ^ λ a = 3 , λ a = 1 , D 0.1230.1051.8500.622
log B F ^ λ a = 3 , λ a = 1 , D −0.909−0.9780.269−0.206
Notes: The Bayes factors were estimated with Monte Carlo sampler based on 2,000,000 iterations. The log B F ^ λ a = 1 , λ a = 3 , D denotes the decimal logarithm of the estimator of the Bayes factor in favor of the IG-MSF-SBEKK model with λ a = 1 against the IG-MSF-SBEKK model with λ a = 3 .
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Pajor, A. New Estimators of the Bayes Factor for Models with High-Dimensional Parameter and/or Latent Variable Spaces. Entropy 2021, 23, 399. https://doi.org/10.3390/e23040399

AMA Style

Pajor A. New Estimators of the Bayes Factor for Models with High-Dimensional Parameter and/or Latent Variable Spaces. Entropy. 2021; 23(4):399. https://doi.org/10.3390/e23040399

Chicago/Turabian Style

Pajor, Anna. 2021. "New Estimators of the Bayes Factor for Models with High-Dimensional Parameter and/or Latent Variable Spaces" Entropy 23, no. 4: 399. https://doi.org/10.3390/e23040399

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop