New Estimators of the Bayes Factor for Models with High-Dimensional Parameter and/or Latent Variable Spaces

Formal Bayesian comparison of two competing models, based on the posterior odds ratio, amounts to estimation of the Bayes factor, which is equal to the ratio of respective two marginal data density values. In models with a large number of parameters and/or latent variables, they are expressed by high-dimensional integrals, which are often computationally infeasible. Therefore, other methods of evaluation of the Bayes factor are needed. In this paper, a new method of estimation of the Bayes factor is proposed. Simulation examples confirm good performance of the proposed estimators. Finally, these new estimators are used to formally compare different hybrid Multivariate Stochastic Volatility–Multivariate Generalized Autoregressive Conditional Heteroskedasticity (MSV-MGARCH) models which have a large number of latent variables. The empirical results show, among other things, that the validity of reduction of the hybrid MSV-MGARCH model to the MGARCH specification depends on the analyzed data set as well as on prior assumptions about model parameters.


Introduction
The Bayes factor, defined as a ratio of the marginal likelihoods for the two models being compared, is an important quantity in Bayesian model comparison and hypothesis testing, see, e.g., in [1]. The posterior odds ratio, used for comparing two competing models, is equal to the product of the prior odds and the Bayes' factor. Therefore, one of the challenges for Bayesians is to accurately estimate the factor, especially in models with a large number of parameters and/or latent variables. Most popular methods of calculating the Bayes factor, based on estimation of the marginal likelihoods of each model separately, are very time-consuming (or even computationally infeasible) in high-dimensional cases. Therefore, methods for direct estimation of the Bayes factor, instead of estimating two marginal likelihoods, have been devised. There already exist different approaches, such as the product-space approach, reversible jump MCMC, and path sampling (see, e.g., in [2][3][4][5]), which make it possible to estimate the Bayes factor without calculation of the marginal densities of the data, but in many empirical cases (especially in the case of a large number of model parameters) they require extremely complicated and extensively time-consuming operations, with no general guarantees of success.
Note that for nested models with a large number of latent variables such as stochastic volatility models, Jacquier et al. [6] have shown how to estimate the Bayes factor directly using Markov Chain Monte Carlo (MCMC) simulations from the posterior distribution. They pointed out that the Bayes factor can be expressed as the expected value of the ratio of corresponding densities with respect to the posterior distribution of parameters and latent variables. This expected value can be estimated by averaging over the MCMC draws. Thus, using the special structure of stochastic volatility models and exploiting prior assumptions about parameters, they have estimated Bayes factors for comparing the basic stochastic volatility model to the models with fat tails and correlated errors. Unfortunately, the finite 1.
the space of parameters is denoted by Θ and p(θ) is a prior density function of the parameters collected in θ ∈ Θ, 2.
the space of latent variables is denoted by H and p(h|θ) is a prior density function of the latent variables collected in h ∈ H, and 3.
y is a vector of observations.
The marginal data density, p(y), is defined as the integral (calculated over the whole space of parameters and latent variables, Θ × H) of the conditional data density given the vector of parameters and latent variables, p(y|θ, h), with respect to the prior distribution of the parameters and latent variables: In the majority of models it is impossible to analytically integrate out parameters and latent variables from the joint distribution for y, h, and θ, p(y, h, θ). It very often results from the number of dimensions of the space of the latent variables and parameters being too large. Furthermore, a correct assessment of the marginal likelihood is computationally challenging (see, e.g., in [7][8][9], where summaries of various methods can be found). In the presence of a large number of latent variables (e.g., in stochastic volatility models) popular methods of Monte Carlo estimation of p(y) for the observed vector y are computationally extremely intensive, and often they turn out to be infeasible. Fortunately, we can consider the ratio of marginal data densities instead of p(y). In fact, the main criterion of comparison of two competing models M i and M j is the posterior odds ratio between the two models: p(M j ) is the prior odds ratio. Thus, if the prior odds ratio is equal to 1 (i.e., both models are equally probable a priori, p(M i ) = p(M j )), the posterior odds ratio between the two models is then equal to the Bayes factor. Moreover, if we assume that the set of models {M 1 , . . . , M m } is exhaustive, Bayes factors can be used to obtain the posterior model probabilities: for i, k ∈ {1, . . . , m}. By choosing k = i, we obtain for i ∈ {1, . . . , m}. Thus, the Bayes factors can be used instead of the marginal data density values. Note that in many situations it is easier to estimate the Bayes factor than the marginal likelihood. In this paper, we show how to estimate the Bayes factor in models with a large number of parameters and/or latent variables, in which calculation of the marginal data density value is numerically impossible. It will be shown that the Bayes factor is equal to the posterior mean, restricted to a certain subset D of the parameter and latent variable space, of the ratio of conditional densities of the corresponding quantities times the reciprocal of the posterior probability of the subset D. This fact leads the researcher to using arithmetic mean estimator of the ratio based on simulation from the posterior distribution restricted to the subset D.
It is well known that the Savage-Dickey density ratio and its generalizations (see in [10][11][12]) are relatively simple and widely used methods for computing Bayes factors for nested models. The Savage-Dickey method requires an estimate of the value of the posterior distribution at a single point. It is not reliable when this point lies in the tail of the posterior distribution. As has already been mentioned, the Savage-Dickey method can be used only for nested models. Our method can be applied for both nested and non-nested ones.
Note that although the posterior odds principle is fundamental, there are other Bayesian approaches to model comparison or hypothesis testing, e.g., the so-called Lindley type test (see [13]) or the Full Bayesian Significance Test (FBST), introduced by Pereira and Stern [14] as a Bayesian significance test for precise (sharp) hypotheses. A solid theoretical background of the FBST can be found in [15]. Detailed discussions and extensions of the Pereira-Stern test as well as of its applications are presented in, e.g., [16][17][18][19][20][21]. Our approach to Bayesian model comparison (or comparing hypotheses) is based on posterior probabilities associated with models (or hypotheses). This leads directly to the so-called posterior odds approach. Motivations for the formal Bayesian approach to model selection and for the use of Bayes factors are discussed in [22]. First of all, the Bayes factors and posterior model probabilities are easy to understand. Moreover, this approach is consistent, penalizes model complexity, remains conceptually the same regardless of the number of models (or hypotheses) under consideration, and does not require nested models or regular asymptotics. This approach allows one not only to test hypotheses, but also to compare them-"the process of revising prior probabilities associated with alternative hypotheses does not necessarily involve a decision to reject or accept these hypotheses" (see [1], p. 291). Furthermore, the Bayes factors make it possible to compare models whose prior distributions are different. Finally, the posterior probabilities of models or the Bayes factors can be used in the so-called Bayesian pooling approach (or Bayesian model averaging, see, e.g., in [23]). This paper is organized as follows. In Section 2, our new method is presented. Section 3 contains the simulation study. In Section 4, we present results of comparison of hybrid MSV-MGARCH models. The last section contains conclusions and direction of further research.

New Estimators of the Bayes Factor
It is easy to show that the Bayes factor in favor of the model M i against the model M j can be expressed as an integral: where p(y|θ i , h i , M i ), p(h i |θ i , M i ), and p(θ i |M i ) denote the conditional probability density function of the vector of observations (conditional sampling density), the conditional probability density function of latent variables, and the prior density of parameters in the model M i , respectively. Note that when the two competing models M i and M j have the same parameter vector and the vector of latent variables (i.e., , then the Bayes factor can be computed in a relatively straightforward way which does not require estimation of marginal data density values for each model separately. Of course, it is possible only if draws from one of the posterior distributions are available (see in [24]). We have where Therefore, given the sample {(θ (q) , h (q) )} k q=1 from the posterior distribution p(θ, h|y, M j ), an estimator of BF ij can be expressed as As was pointed out in [24], it is crucial that the variability of the ratio r ij (θ, h) is small under the posterior distribution p(θ, h|y, M j ); otherwise, estimates of BF ij might be driven by few values of h (q) and θ (q) , and thus an extremely large simulation sample could be required to obtain adequate result. To deal with this problem, we propose a certain modification of the estimator (7). The idea of this modification is based on trimming the posterior sample to a certain subset of parameter and latent variable space, similarly to the idea of the correction of arithmetic mean estimator, proposed in [25]. Let us assume that D ⊆ Θ × H and 0 < Pr(D|y, M i ) < ∞. The equality implies that We can see immediately that Equality (9) means that the Bayes factor can be expressed as a product of the reciprocal of the posterior probability of the subset D in the model M i , Pr(D|y, M i ), and of the expected value of the indicator function of subset D times the ratio r ij (θ, h), I D (θ, h)r ij (θ, h). This expected value is calculated with respect to the posterior distribution of the model parameters and latent variables in the model M j . Therefore, given the sample {(h (q) , θ (q) )} k q=1 from the posterior distribution p(θ, h|y, M j ), an estimator of BF ij can be expressed as wherePr(D|y, M i ) is an assessment of the posterior probability of the subset D in the model M i . Unfortunately, this assessment requires also sampling from the posterior distribution in the model M i . Note that equality (9) can obviously be written as provided that 0 < Pr(D|y, M j ) < ∞. This equality suggests that the Bayes factor can be estimated by the product of the ratio of posterior probabilities of the subset D and the sample arithmetic mean of the ratio r ij (θ, h): based on {(h (q) , θ (q) )} k q=1 drawn from the posterior distribution given by p(θ, h|y, M j ), truncated to the subset D, that is, p(θ, h|D, y, M j ).
As a by-product of our considerations, we have just shown that if a Monte Carlo sampler only visits a subset of the support of the posterior distribution, the correction of the sample arithmetic mean of the ratio r ij (θ,h) is needed. Now, we will extend our analysis to models which contain two groups of parameters: the parameters common to both models and parameters specific to only one of them. Moreover, additional assumptions pertaining to conditional probability density functions will be discussed.
Let us assume that , the random vectors θ R and θ A are a priori independent; (iii) , the prior distribution for the vector of parameters θ A is proper; and (iv) h i = h j = h; H i = H j = H, i.e., competing models have the same vector of latent variables.

Different Prior Distributions of Latent Variables in Both Models
In this section, we additionally assume that , competing models have the same conditional sampling density and (vi) p(θ R |M U ) = p(θ R |M R ), i.e., in both models, the common parameters have the same prior distribution.
Under above assumptions the Bayes factor in favor of the model M R versus the model M U is given by the following integral: Thus, the estimator of the Bayes factor takes the form R , h (q) )} k q=1 are drawn from the posterior distribution of parameters and latent variables in the model M U , i.e., from the distribution given by p(θ A , θ R , h|y, M U ).
On the other hand, it is easy to show that the Bayes factor in favor of the model M U against the model M R is as follows: and consequently A } k q=1 are drawn from the posterior distribution p(θ R , h|y, M R ) and from the prior distribution, p(θ A |M U ), respectively. However, if the dimension of the vector (θ , h ) is high, then the estimators BF U,R,h and BF R,U,h tend to suffer from the so-called "simulation pseudo-bias", similarly to the harmonic mean estimator (see in [26]). This "simulation pseudo-bias" of the estimators will be illustrated on the basis of simulation studies in the next section. To deal with the problem of "pseudo-bias", we propose drawing from the posterior and prior distributions restricted to a subset of the space of parameters and latent variables with non-zero posterior probability, and correcting the arithmetic mean of the ratio by the posterior probability of the subset visited by the sampler.
Let us assume that Starting from the identity we obtain On the basis of the fact that and under assumptions (i)-(vi), we have Therefore, Identity (18) naturally leads to the following estimator of the Bayes factor: Because the posterior simulation support is the subset D A × D R of the parameter and latent variable space, most of the {(θ have "similar" values of the ratio of the conditional densities of latent variables, so that, having probabilitiesPr(D A × D R |y, M U ),Pr(D R |y, M R ),Pr(D A |M U ), the simulation process will be numerically more efficient than in the case of unrestricted space of parameters and latent variables. The estimates will not be dominated by few values of the ratio. Therefore, a smaller simulation sample will be required to obtain adequate precision in BF R,U,D,h . Now suppose that D ⊆ Θ A × Θ R × H and 0 < Pr(D|M U ) < ∞, 0 < Pr(D|y, M U ) < ∞. This time we start with the identity and we obtain In this case, and under assumptions (i)-(vi) Thus, Given a sample {(θ A } k q=1 from the distributions p(θ R , h|y, M R ) and p(θ A |M U ), respectively, then, as results from (22), an estimator of the Bayes factor for the model M U against the model M R can be A,D } k q=1 are drawn from the distributions determined by p(θ R , h|y, M R ) and p(θ A |M U ), respectively, restricted to the subset D.

Different Prior Distributions for Common Parameters in Both Models
Now, let us assume that , competing models have the same sampling density and (vii) p(h|θ A , θ R , M U ) = p(h|θ R , M R ), i.e., in both models, the latent variables have the same prior distribution, instead of (vi) p(θ R |M U ) = p(θ R |M R ). Then, it is easy to show that in the first case and in the second case Basing on identities (25) and (26), the following estimators of the Bayes factors can be formulated: A } k q=1 are drawn from p(θ R , h|y, M R ) and p(θ A |M U ), respectively.

Different Conditional Sampling Distributions of the Vector of Observations
Now we assume that conditions (i)-(iv), (vi), and (vii) hold. Thus, the conditional distribution of the observable vector can be different, but the prior distributions for vector of latent variables and common parameters are the same in both competing models. Then, it is easy to show that and for the reciprocal of this Bayes factor we have Similarly to the above, basing on identities (29) and (30), the following estimators of the Bayes factors can be formulated: A } k q=1 are drawn from p(θ R , h|y, M R ) and p(θ A |M U ), respectively. Note that the estimators (19), (22), (27), (28), (31), and (32) are based on the arithmetic mean of the ratio of densities times the indicator function of an arbitrary subset of the space of model parameters and latent variables. Additionally, this arithmetic mean is corrected by the reciprocal of the posterior probability of the subset.

Simulation Study
In this section, we present a simple simulation study for models with latent variables in which, after having integrated out latent variables analytically, the true values of the marginal likelihoods can easily be assessed using, e.g., the corrected arithmetic mean estimator (CAME [25]). These assessments will be applied to obtain estimates of Bayes factors used as benchmark values for evaluation of the performance of the new method proposed.
Let us consider the Student t model where t(µ, 1, v) denotes the Student t distribution with mode µ, precision 1, and v degrees of freedom. N(µ 0 , σ 2 0 ) denotes the Normal distribution with mean µ 0 and variance σ 2 0 , in turn, Exp(λ 0 ) stands for the Exponential distribution with mean λ −1 0 and variance λ −2 0 . The symbol iid stands for independent and identically distributed. Thus, the random variables y 1 , . . . , y T are independent and have the same Student t distribution.
The Student t distribution can be expressed as a scale mixture of Gaussian distributions by introducing a random variable h t that is inverse-gamma distributed. The model can be written as where G(v/2, v/2) denotes the gamma distribution with shape v/2 and rate v/2.
To simulate datasets we generated samples of size T = 500, 1000, 2000, data points from model (33) with µ = 0, 0.25, 0.5, v = 8, and µ 0 = 0; σ 2 0 = 1; λ 0 = 0.1. In turn, to simulate from the posterior distribution the Gibbs algorithm was used and run for 20,000 iterations. The conditional posterior distribution for µ is Normal, i.e., where , while the conditional posterior distribution for h t is Gamma: An easy computation shows that the conditional posterior distribution of v is nonstandard: However, reliable numerical methods for generating from this distribution do exist. We use one of them, the rejection sampling proposed by [27].
In the model under consideration, θ U = (µ, v) , θ R = v, and h = (h 1 , . . . , h T ) ; thus, the number of latent variables is equal to the number of observations: 500, 1000, and 2000, respectively. The direct computation of the marginal likelihood for model (34) is intensive and unstable due to the presence of latent variables. However, by integrating out the latent variables from the joint distribution of parameters, latent variables, and data, p(y|h, µ, v)p(h, µ, v), the conditional density of the data given parameters only can be written in the following form: Unfortunately, the marginal density of the data (i.e., p(y)) cannot be presented in closed form. However, due to the lack of latent variables as well as thanks to the small number of parameters, the marginal data density value can easily and precisely be evalu-ated by using the corrected arithmetic mean estimator (CAME [25]). Estimates obtained by the use of the CAME are treated as benchmark values.
In order to show performance of our new estimators of the Bayes factor, we assume that the subset D is an intersection of the space of parameters and latent variables, obtained from the condition that the ratio of conditional density functions p(y|θ U , h, M U ) and p(y|θ R , h, M R ) is between the lowest and the highest values of the ratio evaluated on the basis of pseudo-random sample {θ (q) , h (q) } k q=1 in both models, and the hyper-cuboid limited by the range of the sampler output, i.e., In the restricted models (M R ), it is assumed that µ = 0, whereas in the unrestricted model (M U ) µ = 0. The ratio of density functions of the conditional distributions of the observable vector is as follows: In Tables 1-3, we present results obtained by using newly proposed estimators of Bayes factors (i.e., the corrected arithmetic means of the ratio of the densities of the conditional distributions of the observable vector, BF U,R,D,y and BF R,U,D,y ) and uncorrected arithmetic means of the ratios BF U,R,y andBF R,U,y . Tables 1-3 present means, standard deviations, root mean square errors and average errors (relative to the CAM estimates) of the decimal logarithm of estimates obtained in models under consideration. As mentioned earlier, closed-form expressions for the marginal likelihoods do not exist. Therefore, the root mean square errors and average errors are determined relative to the CAM estimates obtained for each marginal likelihood separately.
To estimate Pr(D|y, M U ), Pr(D R |y, M R ), and Pr(D A |M U ), we use simulation from appropriate posterior or prior distributions, e.g., R , h (q) )} k q=1 are drawn from the posterior distribution p(θ A , θ R , h|y, M U ). The remaining probabilities are calculated in a similar manner. Furthermore, we consider uncorrected arithmetic means of the ratios BF U,R,y and BF R,U,y to investigate sampling properties of these estimators. Figure 1 indicates that even in such simple models performance of the uncorrected estimators are not satisfactory. The estimator BF R,U,y is downwardly "pseudo-biased" (respective average errors are negative). On the other hand, our simulation study demon-strates that the performance of the estimators BF U,R,y and BF R,U,y can be improved by trimming the posterior simulation support to a given subset of the space of latent variables and parameters, and next making the correction of the arithmetic mean of the ratios using the posterior probabilities of the subset. As can be seen from Tables 1-3 and Figures 1-3, corrected estimators of the Bayes factors perform very well in comparison to uncorrected ones. The estimators BF U,R,D,y and BF R,U,D,y produce estimates which are very close to those obtained on the basis of the CAME, while the estimators BF U,R,y and BF R,U,y , based on uncorrected arithmetic mean, provide biased estimates.       Finally, note that by using both estimators for the ratio of densities and their reciprocals, i.e., BF U,R,D,y and BF R,U,D,y , one can check the accuracy of assessments and of adequate (from numerical point of view) selection of the subset D. This is because, roughly speaking, the equality BF U,R = 1 BF R,U implies that it is natural to use BF U,R,D,y and 1 BF R,U,D,y to estimate BF U,R . Different estimates for BF U,R can indicate numerical inadequacy of selection of the subset D and/or too small simulation sample.

Empirical Illustration: Formal Bayesian Comparison of Hybrid MSV-MGARCH Models
In this section, the new method will be applied in order to formally compare the hybrid Multivariate Stochastic Volatility-Multivariate Generalized Autoregressive Conditional Heteroskedasticity (MSV-MGARCH) models and thus to show that it can be used in practice. The hybrid MSV-MGARCH models were proposed in [28][29][30] for modeling financial time series. These hybrid models are characterized by relatively simple model structures that exploit advantages of both model classes: flexibility of the Multivariate Stochastic Volatility (MSV) class, where volatility is modeled by latent stochastic processes, and relative simplicity of the Multivariate GARCH (MGARCH) class. The simplest specification of MSV-MGARCH model (called LN-MSF-SBEKK) is based on one latent process (Multiplicative Stochastic Factor (MSF)) and on the scalar BEKK [31] covariance structure. The LN-MSF-SBEKK structure is obtained by multiplying the SBEKK conditional covariance matrix H t by a scalar random variable h t such that {ln h t } is a Gaussian AR(1) latent process with autoregression parameter φ. The hybrid LN-MSF-SBEKK specification has been recognized in the literature [32][33][34][35] and proved to be useful in multivariate modeling of financial time series as well as of macroeconomic data [36][37][38][39][40][41]. The drawback of the LN-MSF-MGARCH process is that it cannot be treated as a direct extension of the MGARCH process with the Student t conditional distribution. When φ = 0, the LN-MSF-SBEKK process reduces itself to the SBEKK process with the conditional distribution being a continuous mixture of multivariate normal distributions with covariance matrices H t h t and h t log-normally distributed. However, the multivariate Student t distribution can be expressed as a scale mixture of normal distributions with the inverted gamma as a mixing distribution. This fact was exploited in [42,43], where the IG-MSF-SBEKK specification was proposed as a natural hybrid extension of the SBEKK process with the Student t conditional distribution (t-SBEKK). In the new specification, the latent process {ln h t } remains an autoregressive process of order one, but it is non-Gaussian. For φ = 0 the latent variables h t (where t ∈ Z) are independent and have inverted gamma (IG) distribution. Unfortunately, for φ = 0 the unconditional distribution of the latent variables h t is unknown. To summarize, the IG-MSF-SBEKK model is obtained by multiplying the SBEKK conditional covariance matrix H t by a scalar random variable h t coming from such latent process (with autoregression parameter φ) that, for φ = 0, h t has an inverted gamma distribution. Thus, φ = 0 leads to the t-SBEKK specification, in which the conditional distribution is represented as a continuous mixture of multivariate normal distributions with covariance matrices H t h t , where h t is inverted gamma distributed. If φ = 0, the latent variables h t (t ∈ Z) are dependent, so (in comparison to the t-SBEKK model) in the IG-MSF-SBEKK model there exists an additional source of dependence.
Let us consider a two-dimensional autoregressive process r t = (r 1,t , r 2,t ) defined by the equation where δ 0 and ∆ are, respectively, 2 × 1 and 2 × 2 matrix parameters, and T is the length of the observed time series. The hybrid MSF-MGARCH specification for the disturbance term ε t is defined by the following equality: where {ζ t } is a Gaussian white noise sequence with mean vector zero and unit covariance matrix; {γ t } is a sequence of independent positive random variables; γ t is inverted gamma distributed with mean v v−2 for v > 2, i.e., {γ t } ∼ iiIG(v/2, v/2), ζ t ⊥γ s for all t, s ∈ {1, . . . , T}, 0 < |φ| < 1; β 1 and β 2 are positive scalar parameters such that β 1 + β 2 < 1; and A is a free symmetric positive definite matrix of order 2. Under (41) and (42), the conditional distribution of r t (given the past of r t and the current latent variable h t ) is Normal with mean vector µ t = δ 0 + r t−1 ∆ and covariance matrix Σ t = H t h t . For φ = 0 (this value is excluded in the definition of the hybrid models under consideration) h t = γ t , so the distribution of h t is an inverted gamma. In this case, r t in (41) is, given its past, an IG mixture of two-variate normal distributions N(µ t , h t H t ), i.e., it has the two-variate Student t distribution.
The assumptions presented so far determine the conditional distribution of the observations and latent variables given the parameters. Thus, it remains to formulate the prior distributions of parameters. We use the same prior distributions as in [42,43]. Six elements of δ = (δ 0 vec(∆) ) are assumed to be a priori independent of other parameters, with the N(0, I 6 ) prior. Matrix A has an inverted Wishart prior distribution such that A −1 has the Wishart prior distribution with mean I 2 ; the elements of β = (β 1 , β 2 ) are jointly uniformly distributed over the unit simplex. As regards the initial conditions for H t , we take H 0 = h 0 I 2 and treat h 0 > 0 as an additional parameter, a priori exponentially distributed with mean 1; φ has the uniform distribution over (−1, 1), and for v we assume the gamma distribution with mean λ a /λ v , truncated to (2, +∞). We assume that λ v = 0.1 with two cases: λ a = 3 and λ a = 1 (note that λ a = 1 leads to exponential distribution for v).
Furthermore, we use the same bivariate data sets as those modeled in [30,42,43]. The first data set consists of the official daily exchange rates of the National Bank of Poland (NBP fixing rates) for the US dollar and German mark in the period 1 February 1996-31 December 2001. The length of the modeled time series of their daily growth rates (logarithmic return rates) is 1482. The second data set consists of the daily quotations of the main index of the Warsaw Stock Exchange (WIG) and the S&P500 index of NYSE-1727 logarithmic returns are modeled from the period 8 January 1999-1 February 2006.
In order to obtain pseudo-random sample from the posterior distribution of parameters and latent variables, we use MCMC simulation techniques described in [42,44] and implemented within the GAUSS programming environment.
Using fully Bayesian methods, we want to answer the question whether the IG-MSF-SBEKK model can be reduced to the t-SBEKK case. Thus, we consider the hypothesis that a scalar parameter φ = 0 (the t-SBEKK model, M R ) and the alternative hypothesis that φ = 0 (the IG-MSF-SBEKK model, M U ). For the exchange rate data, the posterior probability that φ < 0 is approximately 0.001 only and φ = 0 is included in the highest posterior density (HPD) interval of probability content as high as 0.996. Thus, Lindley's procedure leads to the conclusion that the t-SBEKK is inadequate. But for the stock data, the posterior probability that φ < 0 is 0.017 for λ a = 3 and 0.054 for λ a = 1, and φ = 0 is included in the HPD interval of probability content 0.87 or 0.80, depending on the prior hyperparameter λ a . In the case of the stock data, Lindley's testing procedure yields results that the t-SBEKK model cannot be rejected by the data. Now, our new estimators of the Bayes factors will be used. We start with the assumption that the subset D is an intersection of the subspace of parameters and latent variables, obtained by the condition that the ratio of conditional density functions p(h|θ U , M U ) and p(h|θ R , M R ) is between the lowest and the highest values of the ratio evaluated at pseudorandom sample {θ (q) , h (q) } k q=1 in both models, and the hyper-cuboid limited by the range of the sampler output, i.e., The ratio of the densities of the conditional distributions of the vector of latent variables is as follows: In Table 4, we present results obtained by using newly proposed estimators of Bayes factors-the corrected arithmetic means of the ratio of the densities of the conditional distributions of latent variables,BF U,R,D and BF R,U,D ; the subscript h is omitted for convenience. Results obtained on the basis of our new method confirm that in the case of exchange rates, the t-SBEKK model is strongly rejected by the data, whereas it seems that the t-SBEKK specification can be adequate for stock indices. Under equal prior model probabilities, the IG-MSF-SBEKK model for exchange rates is about 14-15 times more probable a posteriori than the t-SBEKK model, indicating a strong (but not very strong) evidence against the t-SBEKK model. In turn, for stock indices the decimal logarithm of the Bayes factor of t-SBEKK relative to IG-MSF-SBEKK depends on prior distribution of the number of degrees of freedom, v. The Bayes factor in favor of t-SBEKK is equal to about 1.25 and 0.4 in models with λ a = 1 and λ a = 3, respectively. Of course, according to scale of Kass and Raftery (1995) these results indicate evidence for a given model that is negligible: "not worth more than a bare of mention". For the sake of comparison, in the last row of the Table 4 the estimates of the Savage-Dickey ratio is presented. The marginal posterior density of φ is evaluated at φ = 0 on the basis of the histogram of the sampled values of φ from the posterior distribution. The main conclusions pertaining to the validity of the reduction of the IG-MSF-SBEKK model to the t-SBEKK one are very similar.
Additionally, it is very important to stress that our method makes it possible to compare IG-MSF-SBEKK models whose prior distributions are different (e.g., in respect of the number of degrees of freedom). As can be seen from Table 5, for exchange rates the IG-MSF-SBEKK and t-SBEKK models with exponential distribution for v are more probable a posteriori than those with the gamma distribution for v with the hyperparameter λ a = 3. In contrary, for stock indices the IG-MSF-SBEKK model with the hyperparameter λ a = 3 is more probable a posteriori than the same model with λ a = 1, but this improvement is negligible.

Discussion
In this paper, a new method of the estimation of the Bayes factor is proposed. The idea of proposed estimators is based on correction of the arithmetic mean estimator of the ratio of conditional distributions by trimming the posterior sample to a certain subset of the space of parameters and latent variables, D, and correcting the arithmetic mean by the posterior probabilities of this subset. The new method makes it possible to compare a finite number of different models with a large number of parameters and/or latent variables in respect to the goodness of fit measured by their posterior probabilities. Our simulation study and empirical illustration show that the method performs well.
In this paper, the question of an adequate selection of the subset D used in the method proposed is not addressed. The choice of the subset which minimizes the variance of the estimator remains an open question. The simulation study and empirical example considered in the paper indicate that the choice of D as an intersection of the parameter space (with additional restrictions) and the hyper-cuboid limited by the range of the posterior sampler outputs leads to acceptable results.