Next Article in Journal
Bias-Corrected Maximum Likelihood Estimation and Bayesian Inference for the Process Performance Index Using Inverse Gaussian Distribution
Previous Article in Journal
Product Recalls in European Textile and Clothing Sector—A Macro Analysis of Risks and Geographical Patterns
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Hierarchical Copula Models with a Dirichlet–Laplace Prior

The Department MEMOTEF, Sapienza University of Rome, 00161 Roma, Italy
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Stats 2022, 5(4), 1062-1078; https://doi.org/10.3390/stats5040063
Submission received: 22 September 2022 / Revised: 20 October 2022 / Accepted: 26 October 2022 / Published: 1 November 2022
(This article belongs to the Section Bayesian Methods)

Abstract

:
We discuss a Bayesian hierarchical copula model for clusters of financial time series. A similar approach has been developed in recent paper. However, the prior distributions proposed there do not always provide a proper posterior. In order to circumvent the problem, we adopt a proper global–local shrinkage prior, which is also able to account for potential dependence structures among different clusters. The performance of the proposed model is presented via simulations and a real data analysis.

1. Introduction

There is a large body of literature with respect to hierarchical model settings. The concept to pull the mean of a single group towards the mean across different groups can be found at least in Kelley [1]. Tiao and Tan [2] and Hill [3] consider the one-way random effects model and they discuss a Bayesian approach for the analysis of variance because the frequentist unbiased estimator of the variance of random effects could be negative. For the same model, Stone and Springer [4] discuss and resolve a paradox that arises with the use of Jeffreys’ prior. The foundation for the Bayesian hierarchical linear model is established in Lindley and Smith [5]. More recently,  Gelman [6] discuss a review on prior distributions for variance parameters in the hierarchical model.
More recently, Zhuang et al. [7] introduced a hierarchical model in a copula framework; they suggest using, for the variance parameters of two different priors, (i) the standard improper prior for scale parameters, which is proportional to σ 2 , or (ii) a vaguely informative prior, say an inverse gamma density with both parameters equal to a small value.
However, both the above proposals might be impractical: in the first case, the posterior is simply not proper (as we show in the Appendix A); in the second case, the use of small parameters of the inverse Gamma priors simply hides the problem without actually solving it; see for example Berger [8].
Hobert and Casella [9] also provide another review on the effect of improper priors in the Gibbs sampling algorithm.
In this paper, we propose a Bayesian hierarchical copula model using a different prior. In particular, we adopt a global–local shrinkage prior. These prior distributions naturally arise in a linear regression framework with high dimensional data and where a sparsity constraint is necessary for the vector of coefficients. Several different global–local shrinkage families of priors have been proposed:  Park and Casella [10] and Hans [11] discuss the Bayesian LASSO;  Carvalho et al. [12] introduce the Horseshoe prior, Armagan et al. [13] propose a Generalized Double Pareto prior. Here, we will use a Dirichlet–Laplace prior, proposed in Bhattacharya et al. [14], with a slight modification; while in a regression framework, it is natural to adopt a prior that shrinks the parameters towards zero, this is not the case for our hierarchical copula model, where the zero value does not have a particular interpretation in the model. For this reason we need to introduce a further level of hierarchy, assuming a prior distribution on the location of the shrinkage point.
The rest of this paper is organized as follows: The next section is devoted to illustrating the statistical model and the prior distribution, highlighting the differences with the approach described in  Zhuang et al. [7]; we conclude the section with a description of the sampling algorithm. In the third section, we perform a simulation study in order to compare the mean square error of the estimates produced by our model and compare them with a standard maximum likelihood approach. Then, we reconsider a dataset discussed in Zhuang et al. [7] and compare the results of the two approaches. We conclude with another illustration of the model in the problem of clustering financial time series.

2. Materials and Methods

2.1. The Statistical Model

2.1.1. Likelihood and Priors Distributions

Copula representation is a way to recast a multivariate distribution in such a way that the dependence structure is not influenced by the shape, the parametrization, and the unit of measurement of the marginal distributions. Their applications in statistical inferences and a review on the most popular approaches can be found in Hofert et al. [15]. In this paper we will consider several different parametric forms of copula functions: In particular, in the bivariate case, we will use the standard Archimedean families, namely the Joe, Clayton, Gumbel, and Frank copulae. For more than two dimensions, we will concentrate on the use of the most popular elliptical versions, namely the Gaussian and Student’s t copulae. Since the main objective of the paper is the clusterization of the dependence structure, for the sake of simplicity and without a loss of generality, we will assume that all marginal distributions are known or, equivalently, their parameters have been previously estimated. In this way, we can directly work with the transformed variables: U j = F X j ( x j ) , j { 1 , , n } .
Let c i ( · | ψ i ) be the generic copula density function associated with the i-th group. The statistical model can be stated as follows:
U 1 i , U 2 i , , U d i i | ψ i c i ( · | ψ i ) i { 1 , , m }
where m denotes the number of groups or clusters. Set the following:
γ i = log ψ i b i B i ψ i ,
and assume the following.
γ i | ξ , τ , α i i n d L a p l a c e ( ξ , τ α i ) i { 1 , , m } , τ G a m m a m a , 1 2 , ( α 1 , α 2 , , α m ) D i r i c h l e t ( a , a , , a ) , ξ L o g i s t i c ( 0 , 1 ) .
In the previous expressions, b i and B i , respectively, denote the lower and the upper bound of the parameter space of the corresponding ψ i , and γ i is the mapping of ψ i into the real axis; d i is the dimension of i-th group, and a is a hyperparameter, which we typically set to 1, although different values can be used. In general, the Archimedean copulae are parametrized in terms of Kendall’s Tau, for which its range of values has been restricted to ( 0 , 1 ) for the Clayton, Joe, and Gumbel copulae, while it is set to ( 1 , 1 ) for the Frank copula. In the elliptical case, the Gaussian copula is parametrized in terms of the correlation coefficient ρ , which ranges in ( 1 , 1 ) ; finally, Student’s t copula has the additional parameter ν , and that is the number of degrees of freedom: A discrete uniform prior on { 1 , 2 , , 35 } has been used here. When dimension d of the specific group is larger than two, we restrict the analysis to elliptical copulae with an equi-correlation matrix: in that case, it is well known that the range of the correlation parameter is ( 1 / ( d 1 ) , 1 ) .
Let U be entire observed sample and let U i j k be the k-th observation of i-th component in the j-th group, and let n j be the number of observation in the j-th group. The posterior distribution on the parameter vector ( γ , ξ , α , τ ) is then described as follows:
p γ , ξ , α , τ | U i = 1 m [ j = 1 n i c i ( U 1 i j , U 2 i j , , U d i i j | γ i ) p ( γ i | ξ , τ , α i ) ] p ( ξ ) p ( τ ) p ( α ) ,
where γ = ( γ 1 , γ 2 , , γ m ) and α = ( α 1 , α 2 , , α m ) .
The complex form of the posterior distribution requires the use of simulation based methods of inference. In particular, we will adapt the algorithm of  Bhattacharya et al. [14] with a minor modification for the updates of γ and the shrinkage location ξ . Following, Bhattacharya et al. [14], we introduce a vector β = β 1 , β 2 , , β m R m in order to have a latent variable representation of the γ prior; then, the following is obtained.
γ i | ξ , τ , α i , β i i n d N o r m a l ( ξ , β i τ 2 α i 2 ) i { 1 , , m } , β i i i d E x p 1 2 i { 1 , , m } .
Here, we briefly describe the algorithm. Start the chain at time 0 by drawing a sample from the prior. At time t, we use the following updating procedure:
1.
Update γ | ξ , τ , α , β :
(a)
Sample γ ˜ i from a proposal Cauchy ( γ i t , δ γ ) i { 1 , , m } ;
(b)
Set γ ˜ = ( γ ˜ 1 , γ ˜ 2 , , γ ˜ m ) and compute the following.
q = i = 1 m j = 1 n i c i ( U 1 i j , U 2 i j , , U d i i j | γ ˜ i ) p ( γ ˜ i | ξ t , τ t , α i t , β i t ) i = 1 m j = 1 n i c i ( U 1 i j , U 2 i j , , U d i i j | γ i t ) p ( γ i t | ξ t , τ t , α i t , β i t )
(c)
Sample u U ( 0 , 1 ) ,
(d)
Set γ t + 1 = γ ˜ if u q ; otherwise, γ t + 1 = γ t .
2.
Update ξ | γ , τ , α , β :
(a)
Sample ξ ˜ from a proposal Cauchy ( ξ t , δ ξ ) ;
(b)
Compute the following.
q = i = 1 m p ( γ i t + 1 | ξ ˜ , τ t , α i t , β i t ) p ( ξ ˜ ) i = 1 m p ( γ i t + 1 | ξ t , τ t , α i t , β i t ) p ( ξ t )
(c)
Sample u U ( 0 , 1 ) ;
(d)
Set ξ t + 1 = ξ ˜ if u q ; otherwise, ξ t + 1 = ξ t .
3.
Update τ | γ , ξ , α , β : sample τ t + 1 G I G 0 , 1 , 2 i = 1 n | γ i t + 1 ξ t + 1 | α i t .
4.
Update α | γ , ξ , τ , β : sample α ˜ i G I G ( 0 , 1 , 2 | γ i t + 1 ξ t + 1 | ) i { 1 , , m } , and set the following.
α i t + 1 = α ˜ i j = 1 m α ˜ j i { 1 , , m }
5.
Update β i | γ , ξ , τ , α i { 1 , , m } : sample β ˜ i I G ( τ t + 1 α i t + 1 | γ i t + 1 ξ t + 1 | , 1 ) and set the following.
β i t + 1 = 1 β ˜ i i { 1 , , m } .
In previous statements, Cauchy ( a , b ) denotes a one-dimensional Cauchy distribution with location a and scale b, while G I G ( p , a , b ) is the generalized inverse Gaussian distribution with the following density function.
f ( x ) x p 1 exp 1 2 a x 1 2 b x .
Notice that I G ( a , b ) is the inverse Gaussian distribution, and it is known that X I G ( a , b ) X G I G 1 2 , b a 2 , b . Finally, δ γ and δ ξ are scalar tuning parameters.
In the case of the Student’s t copula, we need to add another step between stride 1 and 2 in order to update ν = ( ν 1 , ν 2 , , ν m ) :
  • Update ν i | γ , ξ , τ , α , β i { 1 , , m } :
    (a)
    Sample ν ˜ from discrete uniform distribution in { 1 , 2 , , 35 } ;
    (b)
    Compute the following.
    q = j = 1 n i c ( U 1 i j , U 2 i j , , U d i i j | γ i t + 1 , ν ˜ ) j = 1 n i c ( U 1 i j , U 2 i j , , U d i i j | γ i t + 1 , ν i t )
    (c)
    Sample u U ( 0 , 1 ) ;
    (d)
    Set ν i , t + 1 = ν ˜ if u q ; otherwise, ν i , t + 1 = ν i t .

2.1.2. Prior Distribution of ξ

The choice of the prior distribution for the shrinkage location ξ needs some explanation. First of all, notice that, according to our prior specification,
P ( γ i ξ ) = 1 2 i { 1 , , m } ;
however γ i = log ψ i b i B i ψ i , so otherwise is the case.
P ψ i B i e ξ + b i 1 + e ξ = 1 2 .
Therefore, given ξ , the median of ψ i is Y i = ( B i e ξ + b i ) / ( 1 + e ξ ) i { 1 , , m } . Then, it is easy to show that the natural choice of a uniform prior on Y i U ( b i , B i ) for all i { 1 , , m } implies a standard logistic density for ξ .

2.1.3. Previous Work

Apart form the prior specification, the model described in previous sections is the one proposed by Zhuang et al. [7]. We restrict our discussion to the case where each copula expression has one parameter only. Their prior can be stated as follows.
γ i | μ i , σ i 2 i n d N ( μ i , σ i 2 ) i { 1 , , m } , μ i | λ , δ 2 i i d N ( λ , δ 2 ) i { 1 , , m } , σ i 2 i i d π σ 2 ( · ) i { 1 , , m } , λ π λ ( · ) , δ 2 π δ 2 ( · ) .
There is no unique choice for the distributions of ( σ 2 , λ , δ ) , although the authors suggest using weakly informative priors, for example, inverse gamma densities with small hyperparameters values or, as an alternative, an objective prior: for example, an improper uniform prior. However, one can prove that, in the second case, the posterior distribution cannot be proper no matter what the sample size is. We show this result in Appendix A. When the posterior distribution is improper, the resulting summary statistics are meaningless. In fact, the Markov Chain implied by the MCMC does not have a limiting distribution so the Ergodic theorem does not hold and the posterior is completely useless. Moreover, even the first solution is not feasible. In fact, when an improper prior produces an improper posterior, using a vague proper prior can typically hide—not solve—the problem. In these cases, in fact, as shown in Berger [8] (p. 398), the use of a vague prior approximating an improper prior typically concentrates the posterior mass on some boundary of the parameter space.

3. Results

3.1. Simulation Study

We compare the performance of our approach with the results based on a maximum likelihood approach in a simulation study. We will use a Student’s t copula with an equi-correlation matrix and set the number of groups m equal to five. We repeat the procedure 100 times; at iteration j for the i-th group, we sample the true value γ i j T from a standard normal distribution, the degrees of freedom ν i j T are sampled from the prior distribution, and the dimensions d i j of the groups are sampled from the uniform discrete distribution in { 1 , 2 , , 5 } . Given the parameters and dimensions of the groups, we sample 20 observations for each group. In the maximum likelihood framework, we estimate the following:
( γ ^ i j mle , ν ^ i j mle ) = arg max j = 1 20 c ( U 1 i j , U 2 i j , , U d i i j | γ i , ν i i { 1 , , 5 } ,
and compute the standard errors.
S E ^ i j mle = γ i j T γ ^ i j mle 2 i { 1 , , 5 } .
In a Bayesian framework, we use the posterior mean as a point estimate, obtained from the use of the MCMC algorithm described above. We ran six independent chains of 2.5 × 10 5 scans, discarded the first 5 × 10 4 as a burn-in, and finally computed the γ ^ i j Bay via the sample mean of simulation outputs for all i { 1 , , 5 } . As a tuning parameters, we set δ γ = 10 3 and δ ξ = 10 1 . Then, we compute the following.
S E ^ i j Bay = γ i j T γ ^ i j Bay 2 i { 1 , , 5 } .
Comparison are performed in terms of the corresponding mean square errors.
M S E ^ i mle = 1 100 j = 1 100 S E ^ i j mle , M S E ^ i Bay 1 100 j = 1 100 S E ^ i j Bay ,
Table 1 reports values M S E ^ i mle against M S E ^ i Bay for all groups based on 100 simulations.

3.2. Real Data Applications

This section is devoted to the implementation of the method in two different applications. The first one is the same as in Zhuang et al. [7] and we include it for comparative purposes; to this end, we quantify the goodness of fit of the model using a predictive approach based on the conditional version of the Widely Applicable Information Criterion, WAIC, in a hierarchical setting, as discussed in Millar [16]. The second one deals with clustering financial time series.

3.2.1. Column Vertebral Data

We apply our model to the Column Vertebral Data, available at the UCI Machine Learning Repository. It consists of 60 patients with disk hernia, 150 subjects with spondylolisthesis, and 100 healthy individuals; data are available for the following variables: angle of pelvic incidence (PI), angle of pelvic tilt (PT), lumbar lordosis angle (LL), sacral slope (SS), pelvic radius (PR), and the degree of spondylolisthesis (DS). As in Zhuang et al. [7], we adopt the generalized skew-t distribution for the marginals, use a maximum likelihood estimator in order to calibrate the parameters and then transform data via the fitted cumulative distribution function. Computations were performed using the R package sgt available on CRAN. Table 2 reports the values of fitted parameters for the marginals.
Following  Zhuang et al. [7], we consider the same parametric copulae for the bivariate distributions of the features of interest, and for each of these, we construct our Bayesian hierarchical copula model for three groups of subjects. We run six independent chains of 2.5 × 10 6 simulations and discard the first 5 × 10 5 . We also set δ γ = 10 3 and δ ξ = 10 1 . We did not report any convergence issues, and the multiple Gelman–Rubin test scores for each of the six implemented models  Gelman [17] were very close to the optimal value 1. In terms of the goodness of fit, we have computed the WAIC index for all six models. Our findings is that the most significant relation is the one between PI and PT. Table 3 compares the results of Zhuang et al. [7] (model A) with our ones (model B). The main difference between the results obtained with the two methods is related to the posterior uncertainty quantification. Credible intervals obtained with model B are systemically larger than those obtianed with model A. Our feeling is that it depends on the fact that results in model A are obtained by running a chain where some hyperparameters are fixed to some estimated values, as explained in Zhuang et al. [7]. Fixing values of the hyperparameters eliminates a critical source of variation, inducing shrinkage in credible intervals size.
For the ease of comparisons, we follow Zhuang et al. [7] and report the results not in terms of parameter γ but rather according the natural parameter of each copula, that is, ρ for the Gaussian copula and θ for the Archimedean ones.

3.2.2. Financial Data Application

Grouping financial time series is important for diversification purposes; a portfolio manager should avoid investing in instruments with a high degree of positive dependence, and clustering procedures allow the construction of groups according to some specific risk measure. In this way, financial instruments that belong to the same group will show a certain degree of association; however, the strength of dependence within groups may well be different in different groups. It is then important to assess the strength of the association for each single cluster, and a method to perform this is to use a hierarchical structure, such as the one discussed in this paper.
As a risk measure, we consider the so-called tail index, which measures the strength of dependence between two variables when one of them takes extremely low values. Following De Luca and Zuccolotto [18], we construct a dissimilarity measure based on the lower tail coefficient. Let ( Y 1 , Y 2 ) be a bivariate random vector; the lower tail coefficient λ L of ( Y 1 , Y 2 ) is defined as follow:
λ L = lim u 0 + P ( F Y 1 ( Y 1 ) u | F Y 2 ( Y 2 ) u ) ,
or, equivalently,
λ L = lim u 0 + C ( u , u ) u ,
where C ( · , · ) is the cumulative distribution function of the copula associated to ( Y 1 , Y 2 ) . In order to estimate λ L , we use the empirical estimator discussed in [19]:
λ ^ L = C ^ ( n n , n n ) n n ,
where C ^ ( · , · ) is the empirical copula, and n is the sample size. The dissimilarity measure is then defined as follows.
d ( Y 1 , Y 2 ) = 1 λ L ( Y 1 , Y 2 ) ,
The preliminary clustering procedure has been implemented using a complete linkage method. Notice that a bivariate lower tail coefficient is not the unique method for modeling dependence on extreme low values: Durante et al. [20] proposed a conditioned correlation coefficient estimated using a nonparametric approach; Fuchs et al. [21] analyzed dissimilarity measure applicable to a multivariate lower tail coefficient.
We consider the “S&P 500 Full Dataset” available at Kaggle: It contains more relevant information for the components of S&P 500. We take the daily closing prices from 5 June 2000 to 5 June 2020 and discard instruments without a complete record for this period. Then, we restrict our analysis to 379 components. For all of them, we computed the log-returns by taking log-differences and filter data by fitting; for each time series, an ARMA(1,1)GJR-GARCH(1,1) model with Student’s t innovations was used; then, we extracted residuals and transformed them via the fitted cumulative distribution function in order to obtain pseudo-data. Computations were performed using the CRAN package rugarch. Hence, we compute the empirical estimator of the lower tail coefficient for any possible pair and the dissimilarity measure associated and use them to feed the clustering algorithm. Due to computational complexities, we used the coarsest partition under the constraint that the largest group must have at most 10 components. We obtained 30 groups with dimensions of more than one and discarded instruments that belong to groups with only one component. The final number of instruments was thus reduced to 93.
We ran the MCMC algorithm described above for the 30 clusters, performing 12 independent chains of 10 5 scans and discarding the first 1.5 × 10 4 as they burned in. Tuning parameters were set to δ γ = 10 6 , δ ξ = 10 3 . Moreover, in this example, we did not report any convergence issues, and the Gelman–Rubin test score was 1.02. For each scan and for any group, we compute the lower tail coefficient via the following formula:
λ L = 2 T ν + 1 ( ν + 1 ) ( 1 ρ ) 1 + ρ ,
where T ν ( · ) is the univariate cumulative distribution function of a Student’s t random variable with ν degrees of freedom. The copula used in this example was a Student’s t copula with an equi-correlation matrix: As a consequence, we obtained a single value for the lower tail coefficient for each cluster. Table 4 reports the results for each pair that belongs to the same group. Finally, we report the estimation results.

4. Conclusions

We discussed and improved a fully Bayesian analysis for a hierarchical copula model proposed in Zhuang et al. [7]. We proposed the use of a proper prior, which is able to induce shrinkage and, at the same time, dependence among different clusters of observations. This prior does not mimic the behavior of an improper prior and is better suited for objectively representing information coming from the data. Our prior belongs to the large family of globa–local shrinkage densities, with an extra stage in the hierarchy, due to the absence of a significant shrinkage value; we experienced that this approach is very effective and useful in the case of parametric copulae depending on a single parameter. In a more general situation, this approach needs to be modified, and this can be easily accommodated.
Finally, we presented an application in a financial context, where the goal was to estimate the lower tail coefficient of several financial time series in a parametric way using the Student’s t copula.

Author Contributions

This work has been conceived and realized by the two authors. B.L. wrote Section 1 and Section 4; P.O. wrote Section 2 and Section 3. All authors have read and agreed to the published version of the manuscript.

Funding

B. Liseo acknowledges the financial support of Sapienza Università di Roma, Italy, grant n. RG12117A85687F4D, year 2021.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset Vertebral Column can be found at the website http://archive.ics.uci.edu/ml/datasets/vertebral+column (accessed on 1 June 2021). Dataset S&P stock can be found at the website https://www.kaggle.com/datasets/nroll12/sp-500-full-dataset (accessed on 1 June 2021).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
S&PStandard and Poor’s 500 stock exchange index;
mle or MLEMaximum likelihood estimator;
MSEMean squared error;
MCMCMarkov chain Monte Carlo.

Appendix A

Here, we show that the prior proposed in Zhuang et al. [7] leads to an improper posterior.
The statistical model consists of m d-dimensional copulae governing different sets of observations.
U 1 i , U 2 i , , U d i i | θ i c i ( · | θ i ) i { 1 , , m } .
Let γ i = η i g i ( θ i ) ; here, η i is a scaling parameter that can be considered known. One-to-one mapping functions g i ( · ) are needed to put all dependence parameters on the real line. Zhuang et al. [7] made the following assumptions.
γ i | μ i , σ i 2 i n d N ( μ i , σ i 2 ) i { 1 , , m } ; μ i | λ , δ 2 i i d N ( λ , δ 2 ) i { 1 , , m } .
Hyper-parameters σ i ’s, λ , and δ 2 are given a suitable prior distribution. For the moment, we do not specify the priors and set the following.
σ i 2 i i d π σ 2 ( · ) i { 1 , , m } . λ π λ ( · ) , δ 2 π δ 2 ( · ) .
Since the g i ( θ i ) ’s are one-to-one, we write c i ( · | γ i ) instead of c i ( · | θ i ) . Let U be the observed sample, and let U i j k be the k-th observation of i-th component in the j-th group. Let n j be the sample size of the j-th group. Furthermore, let γ = ( γ 1 , γ 2 , , γ m ) , μ = ( μ 1 , μ 2 , , μ m ) , and σ 2 = ( σ 1 2 , σ 2 2 , , σ m 2 ) . Finally, let S ( ω ) denote the parameter space of the generic parameter ω .
The next proposition shows that, using standard noninformative priors for scale and location parameters, the resulting posterior will be improper independently of the sample size.
Proposition A1.
If π σ i 2 ( σ i 2 ) σ i 2 , for i { 1 , , m } , and π δ 2 ( δ 2 ) δ 2 , π λ ( λ ) 1 , the posterior distribution γ | U is improper for any choice of the copula densities c i ( · | γ i ) and independently of the sample size.
Proof. 
For the sake of clarity, set d σ 2 = d σ 1 2 d σ 2 2 d σ m 2 and d μ = d μ 1 d μ 2 d μ m . We need to show that the following pseudo-marginal posterior distribution of γ is not integrable:
π ( γ | U ) = S ( μ ) s ( σ 2 ) S ( δ 2 ) S ( λ ) π ( γ , μ , σ 2 , λ , δ 2 | U ) d λ d δ 2 d σ 2 d μ S ( μ ) s ( σ 2 ) S ( δ 2 ) S ( λ ) π ( U | γ , μ , σ 2 , λ , δ 2 ) π ( γ , μ , σ 2 , λ , δ 2 ) d λ d δ 2 d σ 2 d μ ,
where π ( U | γ , μ , σ 2 , λ , δ 2 ) represents the likelihood function. Then, we obtain the following:
π ( γ | U ) S ( μ ) s ( σ 2 ) S ( δ 2 ) S ( λ ) i = 1 m j = 1 n i c i ( U 1 i j , U 2 i j , U d i i j | γ i ) × π ( γ | μ , σ 2 ) π ( μ | λ , δ 2 ) π ( σ 2 ) π ( λ ) π ( δ 2 ) d λ d δ 2 d σ 2 d μ i = 1 m j = 1 n i c i ( U 1 i j , U 2 i j , U d i i j | γ i ) S ( μ ) s ( σ 2 ) π ( γ | μ , σ 2 ) π ( σ 2 ) × S ( δ 2 ) S ( λ ) π ( μ | λ , δ 2 ) π ( λ ) π ( δ 2 ) d λ d δ 2 d σ 2 d μ = i = 1 m j = 1 n i c i ( U 1 i j , U 2 i j , U d i i j | γ i ) π ( γ ) ,
with
π ( γ ) = S ( μ ) s ( σ 2 ) π ( γ | μ , σ 2 ) π ( σ 2 ) π ( μ ) d σ 2 d μ
and
π ( μ ) = S ( δ 2 ) S ( λ ) π ( μ | λ , δ 2 ) π ( λ ) π ( δ 2 ) d λ d δ 2 .
Consider only the following:
π ( μ ) = 0 π ( μ | λ , δ 2 ) π ( λ ) π ( δ 2 ) d λ d δ 2 0 + + ( 2 π δ 2 ) m 2 exp 1 2 δ 2 i = 1 m ( μ i λ ) 2 1 δ 2 d λ d δ 2 0 + 1 δ 2 m 2 + 1 + exp 1 2 δ 2 i = 1 m ( μ i 2 2 λ μ i + λ 2 ) d λ d δ 2 = 0 + 1 δ 2 m 2 + 1 + exp 1 2 δ 2 i = 1 m μ i 2 2 λ i = 1 m μ i + m λ 2 d λ d δ 2 ;
and set μ ¯ = 1 m i = 1 m μ i ; then, we obtain the following.
π ( μ ) 0 + 1 δ 2 m 2 + 1 exp 1 2 δ 2 i = 1 m μ i 2 + exp 1 2 δ 2 m λ 2 2 λ μ ¯ + μ ¯ 2 μ ¯ 2 d λ d δ 2 = = 0 + 1 δ 2 m 2 + 1 exp 1 2 δ 2 i = 1 m μ i 2 m μ ¯ 2 + exp 1 2 δ 2 m ( λ μ ¯ 2 ) d λ d δ 2 = = 0 + 1 δ 2 m 2 + 1 exp 1 2 δ 2 m 1 m i = 1 m μ i 2 μ ¯ 2 2 π δ 2 m d δ 2 0 + 1 δ 2 m 1 2 + 1 exp 1 2 δ 2 i = 1 m ( μ i μ ¯ ) 2 d δ 2 ,
For any choice of m > 1 , π ( μ ) can be written as follows.
π ( μ ) 1 2 i = 1 m ( μ i μ ¯ ) 2 m 1 2 Γ m 1 2 i = 1 m ( μ i μ ¯ ) 2 m 1 2 .
Now, we compute the following.
π ( γ ) = S ( σ 1 ) S ( σ m ) S ( μ 1 ) S ( μ m ) π ( γ | μ , σ 2 ) π ( μ ) π ( σ 2 ) d σ 2 d μ S ( σ 1 ) S ( σ m ) S ( μ 1 ) S ( μ m ) i = 1 m ( 2 π σ i 2 ) 1 2 exp 1 2 σ i 2 ( γ i μ i ) 2 × i = 1 m ( σ i ) 2 i = 1 m ( μ i μ ¯ ) 2 m 1 2 d σ 2 d μ S ( μ 1 ) S ( μ m ) i = 1 m ( μ i μ ¯ ) 2 m 1 2 i = 1 m S ( σ i 2 ) 1 σ i 2 3 2 exp 1 σ i 2 ( γ i μ i ) 2 2 d σ i 2 d μ S ( μ 1 ) S ( μ m ) i = 1 m ( μ i μ ¯ ) 2 m 1 2 i = 1 m ( γ i μ i ) 2 1 2 d μ = S ( μ 1 ) 1 | γ 1 μ 1 | S ( μ 2 ) 1 | γ 2 μ 2 | S ( μ m ) 1 | γ m μ m | 1 i = 1 m ( μ i μ ¯ ) 2 m 1 2 d μ .
Notice that the following is the case:
i = 1 m ( μ i μ ¯ ) 2 = i = 1 m μ i 2 m μ ¯ 2 = μ m 2 + i = 1 m 1 μ i 2 1 m i = 1 m μ i 2 = μ m 2 + i = 1 m 1 μ i 2 1 m i = 1 m 1 μ i 2 + 2 μ m i = 1 m 1 μ i + μ m 2 ;
and set K = i = 1 m 1 μ i 2 and H = i = 1 m 1 μ i : then, we obtain the following.
i = 1 m ( μ i μ ¯ ) 2 = μ m 2 + K 1 m ( H 2 + 2 H μ m + μ m 2 ) = m 1 m μ m 2 2 H m μ m + K 1 m H 2 .
So i = 1 m ( μ i μ ¯ ) 2 is a convex parabolic function of μ m , and by the Weierstrass theorem, a global maximum exists for all bounded and closed sets. By integrating μ m , one obtains the following.
S ( μ m ) 1 | γ m μ m | 1 i = 1 m ( μ i μ ¯ ) 2 m 1 2 d μ m = + 1 | γ m μ m | 1 m 1 m μ m 2 2 H m μ m + K 1 m H 2 m 1 2 d μ m = γ m 1 | γ m μ m | 1 m 1 m μ m 2 2 H m μ m + K 1 m H 2 m 1 2 d μ m + γ m ϵ 1 | γ m μ m | 1 m 1 m μ m 2 2 H m μ m + K 1 m H 2 m 1 2 d μ m + ϵ + 1 | γ m μ m | 1 m 1 m μ m 2 2 H m μ m + K 1 m H 2 m 1 2 d μ m .
Let A = m a x μ m [ γ m , ϵ ] m 1 m μ m 2 2 H m μ m + K 1 m H 2 . The second term of the last expression is as follows:
γ m ϵ 1 | γ m μ m | 1 m 1 m μ m 2 2 H m μ m + K 1 m H 2 m 1 2 d μ m γ m ϵ 1 | γ m μ m | 1 A m 1 2 d μ m = 1 A m 1 2 γ m ϵ 1 μ m γ m d μ m = 1 A m 1 2 log ( μ m γ m ) | γ m ϵ = + ,
which also implies the following.
S ( μ m ) 1 | γ m μ m | 1 i = 1 m ( μ i μ ¯ ) 2 m 1 2 d μ m = + .
For the same argument, one can also see that the following obtains.
π ( γ ) S ( μ 1 ) 1 | γ 1 μ 1 | S ( μ 2 ) 1 | γ 2 μ 2 | S ( μ m ) | γ m μ m | 1 i = 1 m ( μ i μ ¯ ) 2 m 1 2 d μ = + ,
It follows that
π ( γ | U ) i = 1 m j = 1 n i c i ( U 1 i j , U 2 i j , U d i i j | γ i ) π ( γ ) = + .
A similar argument can be used to prove the following result.
Proposition A2.
If π σ i 2 ( σ i 2 ) 1 , for i { 1 , , m } , and π δ 2 ( δ 2 ) 1 , π λ ( λ ) 1 , the posterior distribution γ | U is improper for any choice of copula densities c i ( · | γ i ) and is independent of the sample size.
Proof. 
As before, one needs to show that the following pseudo-marginal posterior distribution of γ does not have a finite integral.
π ( γ | U ) = S ( μ ) S ( σ 2 ) S ( δ 2 ) S ( λ ) π ( γ , μ , σ 2 , λ , δ 2 | U ) d λ d δ 2 d σ 2 d μ i = 1 m j = 1 n i c i ( U 1 i j , U 2 i j , , U d i i j | γ i ) π ( γ )
We use the same notation as in Proposition 1 and assume m > 3 (when m 3 , the theorem is trivially true since π ( μ ) itself is not defined). With a slight modification in the proof of the proposition, we obtain the following:
π ( μ ) = S ( δ 2 ) S ( λ ) π ( μ | λ , δ 2 ) π ( λ ) π ( δ 2 ) d λ d δ 2 i = 1 m ( μ i μ ¯ ) 2 m 3 2 ,
and
π ( γ ) = S ( σ 1 2 ) S ( σ m 2 ) S ( μ 1 ) S ( μ m ) π ( γ | μ , σ 2 ) π ( μ ) π ( σ 2 ) d μ d σ 2 S ( σ 1 2 ) S ( σ m 2 ) S ( μ 1 ) S ( μ m ) i = 1 m ( 2 π σ i 2 ) 1 2 exp 1 2 σ i 2 ( γ i μ i ) 2 1 i = 1 m ( μ i μ ¯ ) 2 m 3 2 d μ d σ 2 S ( μ 1 ) S ( μ m ) i = 1 m ( μ i μ ¯ ) 2 m 3 2 i = 1 m S ( σ i 2 ) 1 σ i 2 1 2 exp 1 σ i 2 ( γ i μ i ) 2 2 d σ i 2 d μ
However, for all i { 1 , , m } , the integral with respect to σ i 2 is not finite, and this again implies the following.
π ( γ | U ) i = 1 m j = 1 n i c i ( U 1 i j , U 2 i j , U d i i j | γ i ) π ( γ ) = + .

References

  1. Kelley, L.T. The Interpretation of Educational Measurement; Measurement and Adjustment Series; WorldBook Company: Yonkers-on-Hudson, NY, USA, 1927. [Google Scholar]
  2. Tiao, G.C.; Tan, W.Y. Bayesian Analysis of Random-Effect Models in the Analysis of Variance. i. Posterior Distribution of Variance-Components. Biometrika 1965, 52, 37–53. [Google Scholar] [CrossRef]
  3. Hill, B.M. Inference About Variance Components in the One-Way Model. J. Am. Stat. Assoc. 1965, 60, 806–825. [Google Scholar] [CrossRef]
  4. Stone, M.; Springer, B.G.F. A Paradox Involving Quasi Prior Distributions. Biometrika 1965, 52, 623–627. [Google Scholar] [CrossRef]
  5. Lindley, D.V.; Smith, A.F.M. Bayes Estimates for the Linear Model. J. R. Stat. Soc. Ser. B (Methodol.) 1972, 34, 1–41. [Google Scholar] [CrossRef]
  6. Gelman, A. Prior Distributions for Variance Parameters in Hierarchical Models (Comment on Article by Browne and Draper. Bayesian Anal. 2006, 1, 515–534. [Google Scholar] [CrossRef]
  7. Zhuang, H.; Diao, L.; Yi, G.Y. A Bayesian Hierarchical Copula Model. Electron. J. Stat. 2020, 14, 4457–4488. [Google Scholar] [CrossRef]
  8. Berger, J. The Case for Objective Bayesian Analysis. Bayesian Anal. 2006, 1, 385–402. [Google Scholar] [CrossRef]
  9. Hobert, J.P.; Casella, G. The Effect of Improper Priors on Gibbs Sampling in Hierarchical Linear Mixed Models. J. Am. Stat. Assoc. 1996, 91, 1461–1473. [Google Scholar] [CrossRef]
  10. Park, T.; Casella, G. The Bayesian Lasso. J. Am. Stat. Assoc. 2008, 103, 681–686. [Google Scholar] [CrossRef]
  11. Hans, C. Bayesian Lasso Regression. Biometrika 2009, 96, 835–845. [Google Scholar] [CrossRef]
  12. Carvalho, C.M.; Polson, N.G.; Scott, J.G. The Horseshoe Estimator for Sparse Signals. Biometrika 2010, 97, 465–480. [Google Scholar] [CrossRef] [Green Version]
  13. Armagan, A.; Dunson, D.; Lee, J. Generalized Double Pareto Shrinkage. Stat. Sin. 2013, 23, 119–143. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Bhattacharya, A.; Pati, D.; Pillai, N.S.; Dunson, D.B. Dirichlet–Laplace Priors for Optimal Shrinkage. J. Am. Stat. Assoc. 2016, 110, 1479–1490. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Hofert, M.; Kojadinovic, I.; Maechler, M.; Yan, J. Elements of Copula Modeling with R; Springer Use R! Series; Springer: New York, NY, USA, 2018. [Google Scholar]
  16. Millar, R. Conditional vs. marginal estimation of the predictive loss of hierarchical models using WAIC and cross-validation. Stat. Comput. 2018, 28, 375–385. [Google Scholar] [CrossRef]
  17. Gelman, A. and Rubin, D.B. Inference from iterative simulation using multiple sequences (with discussion). Stat. Sci. 1992, 1, 457–472. [Google Scholar]
  18. De Luca, G.; Zuccolotto, P. A Tail Dependence-Based Dissimilarity Measure for Financial Time Series Clustering. Adv. Data Anal. Classif. 2011, 5, 323–340. [Google Scholar] [CrossRef]
  19. Joe, H.; Smith, R.L.; Weissman, I. Bivariate Threshold Methods for Extremes. J. R. Stat. Soc. Ser. B (Methodol.) 1992, 54, 171–183. [Google Scholar] [CrossRef]
  20. Durante, F.; Pappadà, R.; Torelli, N. Clustering of Financial Time Series in Risky Scenarios. Adv. Data Anal. Classif. 2014, 8, 359–376. [Google Scholar] [CrossRef]
  21. Fuchs, S.; Di Lascio, F.M.L.; Durante, F. Dissimilarity Functions for Rank-Invariant Hierarchical Clustering of Continuous Variables. Comput. Stat. Data Anal. 2021, 159, 107201. [Google Scholar] [CrossRef]
Table 1. MSE of the proposed Bayesian Hierarchical Model and of the likelihood-based one.
Table 1. MSE of the proposed Bayesian Hierarchical Model and of the likelihood-based one.
12345Mean
Bayes0.14490.15140.11040.11060.12830.1291
MLE0.18610.18320.12510.14770.18540.1655
Table 2. Fitted parameters for each margin distribution.
Table 2. Fitted parameters for each margin distribution.
GroupFeature μ σ λ pq
Disk HerniaPI50.287413.94080.9992104.937050.7792
PT17.36866.96090.31371.807068.7768
LL32.894811.71791.00005.2906364.8091
SS30.44017.8546−0.15993.56171.4520
PR116.514212.9605−0.17425.93040.4001
DS2.48495.4948−0.15571.7725358.2803
SpondylolisthesisPI71.619115.0308−0.02611.637567.3817
PT20.798011.47660.28621.941144.5023
LL64.092016.34050.26332.105773.7317
SS49.513013.14270.305746.47720.0649
PR114.621615.56660.02591.496232.5924
DS51.637552.39300.575742.05840.0520
HealthyPI51.508612.46460.68372.538824.2468
PT12.81406.7551−0.11211.703671.8428
LL44.9715187.12740.358328.33010.0707
SS38.87859.61350.28671.904017.9808
PR124.071253.43950.127455.38120.0364
DS2.14276.14300.30691.20307.8901
Table 3. Fitted parameters of copulae.
Table 3. Fitted parameters of copulae.
Model AModel B
GroupFeaturesCopulaPosterior MeanPosterior s.d.Posterior CI (95%)Posterior MeanPosterior s.d.Posterior CI (95%)
Disk HerniaPI vs. PTGaussian0.6960.046(0.599, 0.775)0.6320.073(0.469, 0.751)
PI vs. SSGaussian0.7260.040(0.633, 0.793)0.6800.076(0.506, 0.789)
DS vs. PIGaussian0.1610.098(−0.031, 0.339)0.2290.126(−0.041, 0.450)
DS vs. PTFrank−0.5110.577(−1.489, 0.522)−0.2450.820(−1.858, 1.340)
DS vs. LLGaussian0.2440.103(0.031, 0.435)0.2650.109(0.037, 0.462)
DS vs. PRGaussian−0.0550.113(−0.263, 0.175)−0.0750.126(−0.315, 0.174)
SpondylolisthesisPI vs. PTFrank5.7180.505(0.599, 0.775)5.7190.756(4.383, 7.138)
PI vs. SSGumbel1.7290.099(1.554, 1.943)1.7250.128(1.490, 1.984)
DS vs. PIFrank3.4270.431(2.552, 4.245)3.6740.867(2.447, 4.897)
DS vs. PTSurvival Clayton0.8870.143(0.608, 1.174)1.0360.193(0.679, 1.422)
DS vs. LLFrank3.2300.426(2.437, 4.104)3.1910.801(2.016, 4.370)
DS vs. PRJoe1.4660.115(1.265, 1.698)1.4210.154(1.121, 1.734)
HealthyPI vs. PTGaussian0.6330.038(0.555, 0.699)0.6210.057(0.496, 0.717)
PI vs. SSGumbel2.5740.178(2.239, 2.910)2.5520.235(2.115, 3.023)
DS vs. PIFrank1.8220.430(0.936, 2.632)1.7941.100(0.465, 3.139)
DS vs. PTGaussian0.2420.080(0.085, 0.401)0.2100.102(−0.000, 0.394)
DS vs. LLFrank1.4090.570(0.335, 2.538)1.6610.680(0.362, 2.970)
DS vs. PRGaussian−0.1110.093(−0.289, 0.065)−0.0760.123(−0.310, 0.169)
Table 4. Posterior distributions for lower tail coefficients.
Table 4. Posterior distributions for lower tail coefficients.
GroupComponentsPosterior MeanPosterior s.d.Posterior CI (95%)
1NTRS0.50010.0592(0.4153, 0.5918)
STT
2CVX0.48330.0592(0.4061, 0.5715)
XOM
3AMAT0.44990.0633(0.3648, 0.5573)
LRCX
4BEN0.42590.0649(0.3457, 0.5359)
TROW
5CMS0.42560.0661(0.3347, 0.5296)
PNW
6APD0.41980.0655(0.3389, 0.5274)
LIN
7PEAK0.41700.0636(0.3538, 0.5097)
VTR
WELL
8DHI0.39420.0643(0.3137, 0.4895)
LEN
PHM
9MLM0.38270.0678(0.2881, 0.4963)
VMC
10HD0.37570.0675(0.2828, 0.4851)
LOW
11COP0.36850.0681(0.2765, 0.4880)
MRO
12ADP0.35320.0692(0.2663, 0.4704)
PAYX
13CSX0.33950.0674(0.2672, 0.4535)
NSC
UNP
14T0.33380.0699(0.2368, 0.4509)
VZ
15CAH0.33370.0691(0.2414, 0.4401)
MCK
16BAC0.32350.0671(0.2590, 0.4203)
C
JMP
MS
17AIV0.32210.0668(0.2593, 0.4187)
AVB
EQR
ESS
UDR
18RSG0.31680.0694(0.2275, 0.4255)
WM
19DVN0.29790.0682(0.2166, 0.4103)
EOG
NBL
20D0.29320.0708(0.1953, 0.4113)
SO
21NI0.29200.0700(0.2022, 0.4032)
SRE
22IP0.29140.0713(0.1957, 0.4145)
PKG
23CB0.28390.0715(0.1815, 0.4132)
TRV
24GL0.28180.0677(0.2177, 0.3804)
LNC
MET
UNM
25CMA0.22940.0666(0.1526, 0.3273)
FITB
HBAN
KEY
MTB
PNC
RF
TFC
USB
26ATO0.22010.0692(0.1256, 0.3412)
EVRG
27ETR0.19230.0652(0.1175, 0.2953)
NEE
PEG
28AEE0.17680.0633(0.1174, 0.2855)
AEP
DTE
DUK
ED
ES
LNT
WEC
XEL
29ARE0.15220.0605(0.0874, 0.2439)
BXP
DRE
FRT
KIM
MAA
PLD
REG
SPG
30EW0.00080.0011(0.0000, 0.0028)
SYK
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Onorati, P.; Liseo, B. Bayesian Hierarchical Copula Models with a Dirichlet–Laplace Prior. Stats 2022, 5, 1062-1078. https://doi.org/10.3390/stats5040063

AMA Style

Onorati P, Liseo B. Bayesian Hierarchical Copula Models with a Dirichlet–Laplace Prior. Stats. 2022; 5(4):1062-1078. https://doi.org/10.3390/stats5040063

Chicago/Turabian Style

Onorati, Paolo, and Brunero Liseo. 2022. "Bayesian Hierarchical Copula Models with a Dirichlet–Laplace Prior" Stats 5, no. 4: 1062-1078. https://doi.org/10.3390/stats5040063

APA Style

Onorati, P., & Liseo, B. (2022). Bayesian Hierarchical Copula Models with a Dirichlet–Laplace Prior. Stats, 5(4), 1062-1078. https://doi.org/10.3390/stats5040063

Article Metrics

Back to TopTop