Next Article in Journal
Energy-Efficient Clustering Routing Protocol for Wireless Sensor Networks Based on Yellow Saddle Goatfish Algorithm
Next Article in Special Issue
An SVEIRE Model of Tuberculosis to Assess the Effect of an Imperfect Vaccine and Other Exogenous Factors
Previous Article in Journal
Filters in Strong BI-Algebras and Residuated Pseudo-SBI-Algebras
Previous Article in Special Issue
Software-Automatized Individual Lactation Model Fitting, Peak and Persistence and Bayesian Criteria Comparison for Milk Yield Genetic Studies in Murciano-Granadina Goats
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Using Copula to Model Dependence When Testing Multiple Hypotheses in DNA Microarray Experiments: A Bayesian Approximation

1
Departamento de Matemática e Estatística, Faculdade de Ciências Naturais, Matemática e Estatística, Universidade Rovuma, 3100 Nampula, Mozambique
2
Departamento de Producción Animal, Facultad de Veterinaria, Universidad Complutense de Madrid, 28040 Madrid, Spain
3
Departamento de Estadística e IO, Facultad de Ciencias Matemáticas, Universidad Complutense de Madrid, 28040 Madrid, Spain
*
Author to whom correspondence should be addressed.
Mathematics 2020, 8(9), 1514; https://doi.org/10.3390/math8091514
Submission received: 9 August 2020 / Revised: 26 August 2020 / Accepted: 28 August 2020 / Published: 4 September 2020
(This article belongs to the Special Issue Mathematical Biology: Modeling, Analysis, and Simulations)

Abstract

:
Many experiments require simultaneously testing many hypotheses. This is particularly relevant in the context of DNA microarray experiments, where it is common to analyze many genes to determine which of them are differentially expressed under two conditions. Another important problem in this context is how to model the dependence at the level of gene expression. In this paper, we propose a Bayesian procedure for simultaneously testing multiple hypotheses, modeling the dependence through copula functions, where all available information, both objective and subjective, can be used. The approach has the advantage that it can be used with different dependency structures. Simulated data analysis was performed to examine the performance of the proposed approach. The results show that our procedure captures the dependence appropriately classifying adequately a high percentage of true and false null hypotheses when choosing a prior distribution beta skewed to the right for the initial probability of each null hypothesis, resulting in a very powerful procedure. The procedure is also illustrated with real data.

1. Introduction

There are many experiments that require simultaneously testing many hypotheses. In that context, if each hypothesis is tested individually at a given significance level, α , the probability of erroneously rejecting at least one hypothesis increases rapidly depending on the number of hypotheses, i.e., a problem arises if the multiplicity of the problem is not taken into account when simultaneously evaluating all of the hypotheses. DNA microarray experiments exhibit this problem, as data analysis often requires simultaneously testing many hypotheses, one for each gene. The first to warn of this problem was [1]. The literature regarding this subject is extensive, especially under assumption of independence.
From a frequentist point of view, procedures for testing multiple hypotheses are based on controlling a measure related to Type I errors, such as the family wise error rate (FWER). However, this usually leads to especially conservative procedures, in the sense that few false null hypotheses are rejected, thus reducing the power of the test.
When testing multiple hypotheses, the false discovery rate (FDR) was proposed by [2] as a measure of error that results in less conservative procedures than those controlling the FWER. The authors argue that in some situations, it may be acceptable to tolerate some false positives, provided that there are few in relation to the number of rejected null hypotheses. A review of multiple hypothesis tests is presented by [3]. Additionally, these authors proposed methods based on ordered p-values for multiple comparisons between means of normal populations.
Following with frequentist approaches, the different error rates were analyzed by [4] who also compared the different procedures in the context of DNA microarrays, and a general statistical framework is proposed by [5] for multiple tests of association between known fixed features of a genome and unknown parameters of the distribution of variable features of this genome in a population of interest.
In the context of DNA microarray experiments, the FDR is the most commonly used error type in the frequentist approach. This is because multiple hypotheses are used, in many situations, as a first exploratory step to identify groups of genes that are expressed differentially to perform further research with them subsequently. Thus, it may be acceptable to support a higher number of false positives. Furthermore, the authors derived a procedure for controlling the FDR at a certain level, α , for independent test statistics. There are many published studies in this field. For a detailed review of the multiple testing problem, see [4,5].
In some cases, however, test statistics are dependent. For instance, in the context of DNA microarray experiments, where genes usually present high correlation and the number of hypotheses is very high, this presents a challenge and an important problem in statistics. The following research considers multiple hypothesis testing through a frequentist approach under dependence. The case in which test statistics have positive regression dependency on each of the test statistics corresponding to the true null hypotheses is analyzed in [6]. A step-down procedure to control the FDR under the independence of test statistics while also controlling the FDR under positive dependence can be seen in [7]. An application of the Archimedean copulas to resampled p-values generated by permutations in the context of multiple testing is shown in [8]. Detailed review of multiple testing problem addressed to control FDR under Archimedean copulas can be found in [9] and their references.
From a Bayesian perspective, the posterior probability of each null hypothesis must be obtained to make decisions. Several publications offer essential insights on this approach under the assumption of independence. For example, as distribution for the observations, a mixture of a discrete and a continuous component in a Bayesian approach is proposed by [10]. Hierarchical Bayesian models can be seen in [11,12], the former, robust with respect to extreme values and powerful even with a small number of observations, the latter based on a mixture of two distributions along with an empirical Bayes approach.
Moreover, the sensitivity regarding the choice of the prior distribution on the probability of each null hypothesis was analyzed in [13]. On the other hand, the multiple hypothesis testing problem from the perspective of Bayesian decision theory was dealt with in [14], where a decision criterion based on an estimate of the number of false null hypotheses is proposed. Finally, procedures that control the Bayes FDR and the Bayes FNR, which are applicable in any situation are suggested by [15].
Under the assumption of dependence in the field of genomics, an empirical Bayesian method is proposed in [16], which essentially combines multiple testing with a clustering technique. Similarly, a procedure that combines a clustering technique with multiple testing from a Bayesian perspective to deal with the correlation effect in data analysis is presented by [17].
Other recent approaches have been developed to address dependence in testing multiple hypotheses, such as graphical models. The hidden Markov model (HMM), in the context of graphical models, has emerged as a tool to support the structure of data dependence when testing multiple hypotheses and it has had considerable impact on the field of genomics. The potential offered by the HMM to support the dependency structure has been explored by [18,19,20,21], among others. The dependency structure through Markov chain models was explored by [18], demonstrating the optimality of the FDR under certain conditions at an appropriate alpha level along with the empirical realization of these models. This procedure was extended in [21,22,23], by developing a graphical model based on the multiple test procedure and a Markov-random-field-coupled mixture model. The extended procedure allows for an arbitrary dependency structure ( N 2 ) and heterogeneously dependent parameters. The effect of the dependency structure of the finite states of the HMM and on the likelihood ratio for optimal multiple tests in hidden states is analyzed in [19].
The main aim of this paper is to provide a Bayesian procedure for testing multiple hypotheses in cases with many hypotheses, under the assumption of dependency, and modeling this dependence through copula functions. Copulas are attractive because they can be used to model a wide range of dependency structures. The remainder of this paper is organized as follows. In Section 2, we propose a full Bayesian approach for the problem of testing multiple hypotheses, and we describe the theory regarding copula functions. In Section 3, we present the full Bayesian approach for modeling dependence through an N-dimensional Gaussian copula with normal marginal densities, together with the prior and conditional posterior distributions necessary to apply a Markov chain Monte Carlo (MCMC) algorithm (the Metropolis-Hastings-within-Gibbs algorithm), along with a summary of this algorithm (the details for which are given in the Appendix A) and a simulation study that evaluates the performance of our approach. Section 4 shows the Bayesian approach for modeling the dependence through an N-dimensional Clayton copula with normal marginal densities, together with a simulation study and a comparison for model selection based on the Deviance Information Criterion (DIC). Section 5 applies the proposed methodology to a real data set from experiments with DNA microarrays. Finally, Section 6 presents the main conclusions.

2. A Bayesian Approach: Model Specification

Suppose N dependent random variables are measured under two different independent treatment conditions. In particular, suppose X = X 1 , , X N is an N-dimensional random vector of dependent variables measured from one condition, and that Y = Y 1 , Y 2 , , Y N is an N-dimensional random vector of dependent variables measured from another condition, where X and Y arise independently from distributions F X X | Θ X , λ X and F Y Y | Θ Y , λ Y , respectively, and where Θ X = ( θ X 1 , , θ X N ) and Θ Y = ( θ Y 1 , , θ Y N ) are the parameter vectors of interest and Λ = λ X , λ Y is the other group of parameter vectors for the model. We consider the problem of simultaneous testing as follows:
H 0 i : θ X i = θ Y i versus H 1 i : θ X i θ Y i , i = 1 , 2 , , N
We decide which null hypothesis to accept through the posterior probability of each null hypothesis. Thus, we build a distribution probability model for X and Y, preceding the inference of the parameters when the variables X i and Y i have been observed.
The joint probability density of X and Y is defined as product of the joint probability density’s f X and f Y , because we previously considered the independence between these two conditions. Thus,
f X , Y | Θ = f X | Θ X , λ X f Y | Θ Y , λ Y
where Θ = ( Θ X , λ X , Θ Y , λ Y ) denotes the model parameters.

2.1. Copula Function

We built a multivariate distribution for each treatment condition according to copula function. According to [24,25], a copula is a joint distribution function defined in the unit cube [ 0 , 1 ] n with standard uniform univariate margins. This concept was introduced by [26], and other recent works have been proposed, such as [27,28,29], among others. Copulas are especially useful because they can be used to model the dependence in data completely.
According to Sklar’s Theorem(1959), given a joint cumulative distribution function F x 1 , , x N for random variables X 1 , X 2 , , X N with marginal cumulative distribution functions (CDFs) F 1 x 1 , F 2 x 2 , , F N x N , F can be written as a function of its marginals:
F x 1 , x 2 , x N = C F 1 x 1 , F 2 x 2 , , F N x N = C ( u 1 , u 2 , , u N )
where C ( u 1 , u 2 , , u N ) is a joint distribution function with uniform marginals, u i = F i x i , for i = 1 , , N and C is called a copula. If F 1 , , F N are continuous, then the copula C is unique, and if each F i is discrete, then C is unique on R a n ( F 1 ) × × R a n ( F N ) , where R a n ( F i ) is the range of F i . For a strict analysis of copulas, see [29].
As a consequence of Sklar’s Theorem (Without loss of generality, we will treat the absolutely continuous case), the joint probability density can be written as a product of the marginal density and the copula density. Thus, an N-dimensional joint density function is defined as follows:
f ( x 1 , , x N ) = N F 1 ( x 1 ) F N ( x N ) C ( F 1 ( x 1 ) , F 2 ( x 2 ) , , F N ( x N ) ) i = 1 N x i F i x i = c ( u 1 , u 2 , , u N ) i = 1 N f i ( x i )
where
u i = F i x i , f i x i = x i F i x i , c u 1 , u 2 , u N = N u 1 u N C u 1 , u 2 , , u N
The dependence function c u 1 , u 2 , u N is called the copula density and it encodes the dependence among the variables ( x 1 , x 2 , , x N ) . For instance, if the random variables x 1 , x 2 , , x N are independent, c u 1 , u 2 , u N = 1 . Thus, f ( x 1 , , x N ) = i = 1 N f i ( x i ) .

2.2. Modeling Dependence with N-Dimensional Copulas

From (3) and (4), the N-dimensional joint density function (2) for X and Y can be expressed as follows:
f x 1 , , x N ; y 1 , , y N | Θ = c X u X 1 , u X 2 , , u X N ; ω X i = 1 N f i x i | θ X i , λ X i × c Y u Y 1 , u Y 2 , , u Y N ; ω Y i = 1 N f i y i | θ Y i , λ Y i
where u X i = F i x i , u Y i = F i y i , i = 1 , , N , and ω X and ω Y denote the vector parameters for the copula density through conditions X and Y, respectively. Next, we update the parameter vectors for the model: Θ = ( Θ X , λ X , Θ Y , λ Y , ω X , ω Y ) . To simplify the notation, throughout the following, we use the notation c X u X ; ω X and c Y u Y ; ω Y , rather than c X u X 1 , u X 2 , , u X N ; ω X and c Y u Y 1 , u Y 2 , , u Y N ; ω Y , respectively.
In the Bayesian framework, to proceed with the inference, all unknown quantities Θ must be estimated from the posterior distribution:
π Θ | x . 1 , , x . n x ; y . 1 , y . n y π Θ L Θ | x . 1 , , x . n x ; y . 1 , y . n y
Therefore, a prior distribution π T h e t a is needed, and the likelihood L Θ | x . 1 , , x . n x ; y . 1 , y . n y of being the observations x . j = x 1 j , x 2 j , , x N j , j = 1 , 2 , , n x , and  y . k = y 1 k , y 2 k , y N k , k = 1 , 2 , , n y samples from X = X 1 , , X N and Y = Y 1 , . . , Y N , where n x and n y represent the number of samples of X and Y, respectively. Thus, the likelihood is derived as follows:
L Θ | x . 1 , , x . n x ; y . 1 , y . n y = j = 1 n x c X u X . j ; ω X i = 1 N f i x i j | θ X i , λ X i × k = 1 n y c Y u Y . k ; ω Y i = 1 N f i y i k | θ Y i , λ Y i
As we can see, this likelihood is complex because it depends on H 0 i and H 1 i , defined in (1). To make it tractable, we introduce N independent latent variables τ i [30] following a B e r n o u l l i 1 p i distribution for all i = 1 , 2 , , N , where p i is the initial probability of each null hypothesis:
τ i = 0 i f θ X i = θ Y i 1 i f θ X i θ Y i
Then, P r ( τ i = 0 | p i ) = p i and P r ( τ i = 1 | p i ) = 1 p i . Thus, each vector of observations x i . , y i . proceeds from a distribution under H 0 i when τ i = 0 , and under H 1 i when τ i = 1 , for i = 1 , 2 , , N , i.e.,
X i j τ i = 0 , Θ X , λ X = X i j τ i = 1 , Θ Y , λ Y f i x i j | θ X i , λ i , i = 1 , , N , j = 1 , , n x Y i k | τ i = 0 , Θ Y , λ Y f i ( y i k | θ X i , λ Y i ) , k = 1 , , n y Y i k | τ i = 1 , Θ Y , λ Y f i ( y i k | θ Y i , λ Y i )
In a Bayesian framework, we can consider the latent variables τ = ( τ 1 , , τ N ) as an additional group of parameters. As a result, the likelihood (6) is written as follows:
L Θ , τ | x . 1 , , x . n x ; y . 1 , y . n y = j = 1 n x c X u X . j ; ω X i = 1 N f i x i j | θ X i , λ X i × k = 1 n y c Y u Y . k ; ω Y i : τ i = 0 f i y i k | θ X i , λ Y i i : τ i = 1 f i y i k | θ Y i , λ Y i
Then, the posterior distribution is
π Θ , τ | x . 1 , , x . n x ; y . 1 , y . n y π Θ π τ | Θ L Θ , τ | x . 1 , , x . n x ; y . 1 , y . n y
where Θ = ( Θ X , λ X , Θ Y , λ Y , ω X , ω Y , p ) , being p = ( p 1 , , p N ) .
Given π Θ , we seek to obtain the posterior probability of each null hypothesis through the correspondent marginal distributions of τ i .
In the following sections, we consider normal marginal densities for the model, as it is usually done when modeling gene expression data, and the means are the parameters of interest. In this context, to assume that the joint distribution is normal may seem reasonable. Then, in Section 3 the dependency between each treatment condition variables is modeled using an N-dimensional Gaussian copula. However, the Gaussian copula is not always the most appropriate to model dependency even if the marginals are normal, because the assumption that the marginal distributions are normal does not imply that the joint distribution is normal as can be seen, for example, in [31,32,33]. For this reason, in Section 4, the dependency is modeled using an N-dimensional Clayton copula.

3. Modeling Dependence Through N-Dimensional Gaussian Copulas with Normal Marginal Densities

The typical objective when analyzing data arising from microarray experiments is to identify genes that are differentially expressed. Normal marginal distributions have been widely used in the field of genomics to model gene expression data [13,14,34,35], among others.
Thus, we may assume a normal distribution for the variables X i and Y i , i = 1 , 2 , , N . We consider that the vector of observations ( x i . , y i . ) proceeds from a distribution under H 0 i when τ i = 0 , and the marginal density of each observation of this vector is defined by the same law, N μ X i , σ i 2 , for both treatment conditions. Likewise, we consider that the vector of observations ( x i . , y i . ) proceeds from a distribution under H 1 i , when the latent variable τ i = 1 , for each i = 1 , 2 , , N . However, X i and Y i random variable marginal densities are defined by N μ X i , σ i 2 and N μ Y i , σ i 2 , respectively. Please note that we consider the variance of the population from the two treatment conditions to be σ i 2 = σ X i 2 = σ Y i 2 , i = 1 , 2 , , N ; however, this could differ for all hypotheses (For simplicity’s sake, we have considered σ X i 2 = σ Y i 2 ; however, the procedure is also applicable when the variances σ X i 2 and σ Y i 2 are different).
The main objective of this paper is to identify differentiated expressions under two experimental conditions. Therefore, we developed multiple hypothesis tests to decide between treatments, which is equivalent to testing the following hypotheses:
H 0 i : μ X i = μ Y i v e r s u s H 1 i : μ X i μ Y i , i = 1 , 2 , , N
where μ X i and μ Y i are the means of X i and Y i , respectively.
To model the dependence between the variables, we assume the density distribution for each condition defined by the N-dimensional Gaussian copula, since it uses only the pairwise correlation among variables. This is done in precisely the same way that a multivariate normal distribution encodes the dependence between variables, and it allows for any marginal distribution and any positive-definite correlation matrix [36], defined as follows:
c X u X ; Σ X = 1 Σ X e x p 1 2 ξ X ( Σ X 1 I N ) ξ X ) c Y u Y ; Σ Y = 1 Σ Y e x p 1 2 ψ Y Σ Y 1 I N ψ Y
where ξ X = ξ X 1 = Φ 1 u X 1 , , ξ X N = Φ 1 u X N and ψ Y = ψ Y 1 = Φ 1 u Y 1 , , ψ Y 1 = Φ 1 u Y N , Σ X , Σ Y are correlation matrices for the copula through conditions X and Y, respectively, and Φ is the normal CDF.
For the sake of simplicity, to build the model we considered the same dependency structure for the two treatment conditions. Consequently, the correlation matrix is denoted by Σ X = Σ Y = Σ . The normal scores ξ X and ψ Y are quantiles of order u X i and u Y i , respectively, from the standard normal distribution N 0 , 1 , i = 1 , 2 , , N .
Then, the joint density (5) defined through the Gaussian copula with normal marginal densities is
f x 1 , , x N ; y 1 , , y N | Θ = c X u X ; Σ i = 1 N f i x i | μ X i , σ i 2 × c Y u Y ; Σ i = 1 N f i y i | μ Y i , σ i 2
where Θ = μ X , μ Y , σ 2 , Σ , μ X = ( μ X 1 . . , μ X N ) , μ Y = ( μ Y 1 . . , μ Y N ) , σ 2 = ( σ 1 2 , , σ N 2 ) .
Suppose we observe x . j = x 1 j , x 2 j , , x N j , j = 1 , 2 , . , n x and y . k = y 1 k , y 2 k , y N k , k = 1 , 2 , . , n y to be two independent random samples from X = X 1 , , X N and Y = Y 1 , . . , Y N , where  n x and n y represent the number of samples of X and Y, respectively.
To obtain the likelihood, we consider the latent variables defined in (7) with μ X i and μ Y i , i = 1 , , N , the interest parameters.
In accordance with the parametric model defined above (in Section 2), the likelihood (8) for parameters Θ = μ X , μ Y , σ 2 , Σ is defined as follows:
L Θ , τ | x . 1 , , x . n x ; y . 1 , y . n y = j = 1 n x c X u X . j ; Σ i = 1 N f i x i j | μ X i , σ i 2 × k = 1 n y c Y u Y . k ; Σ i : τ i = 0 f i y i k | μ X i , σ i 2 i : τ i = 1 f i y i k | μ Y i , σ i 2
Due to the multiplicity of the problem and the need to estimate too many parameters, we used the uniform correlation structure matrix in accordance with [36] to build a multivariate Gaussian copula that is dependent on the correlation parameter ρ , defined as follows:
c X ( u X . j ; ρ ) = 1 1 + ( N 1 ) ρ ( 1 ρ ) N 1 1 2 × e x p ρ 2 ( 1 ρ ) 1 1 + ( N 1 ) ρ ( N 1 ) ρ i = 1 N ξ i j 2 2 i = 1 N m > i ξ i j ξ m j c Y u Y . k ; ρ = 1 1 + N 1 ρ ( 1 ρ ) N 1 1 2 × e x p ρ 2 1 ρ 1 1 + N 1 ρ N 1 ρ i = 1 N ψ i k 2 2 i = 1 N m > i ψ i k ψ m k
where
ρ 1 N 1 , 1
We use the copula function as a tool to describe the dependency relation between variables. There can be many ways of quantifying this relation. The most common is the Pearson correlation, although this measure is the most limited, as it reflects only linear dependence. As alternatives, the Kendall rank correlation or Spearman correlation can be used, as they are invariant under monotonic transformations.
Thus, the posterior distribution of the parameters is as follows:
π Θ , τ | x . 1 , , x . n x ; y . 1 , y . n y π Θ π τ | p L Θ , τ | x . 1 , , x . n x ; y . 1 , y . n y
To develop the Bayesian approach, we need to specify the prior distribution for the model parameters ( Θ , τ ) = μ X , μ Y , σ 2 , p , ρ , τ .

3.1. Prior and Posterior Distributions

To present a model as simple as possible, complex as it is though, in this paper we assume independence for all prior distributions of Θ parameters. However, it is possible to consider prior distributions that reflect some kind of dependency between parameters, as for example in [12,13]. Then, the joint prior distribution for the model parameters ( Θ , τ ) is
π Θ , τ = π ( ρ ) i = 1 N π ( μ X i ) π ( μ Y i ) π ( σ i 2 ) π ( p i ) π ( τ i p i
where the prior distribution for the means for both conditions of treatment μ X i and μ Y i follow a uniform distribution with range a X i , b X i and a Y i , b Y i , respectively, as indicated by [37].
The prior distribution for the variance σ i 2 ( i = 1 , 2 , , N ) is defined through Jeffrey’s prior density function π ( σ i 2 ) 1 σ i 2 , i = 1 , 2 , , N .
The parameter τ i is then introduced to the model, assuming that each latent variable follows a Bernoulli distribution, τ i | p i Bernoulli ( 1 p i ) .
We assume the prior distribution for the parameter p i B e t a ( α i , β i ) , following [12,35], among others.
As explained in the previous section, we use an N-dimensional Gaussian copula function as the tool for modeling the dependence between variables through the Pearson correlation. The same dependency structure for the two treatment conditions Σ X = Σ Y = Σ , and, likewise, for the correlation coefficient, is used for both treatment conditions ρ X = ρ Y = ρ and follows a uniform distribution with range a , b .
Thus, the joint prior distribution for the model parameters (12) is defined as follows:
π Θ , τ 1 b a i = 1 N p i 1 τ i ( 1 p i ) τ i p i α i 1 ( 1 p i ) β i 1 ( b X i a X i ) ( b Y i a Y i ) σ i 2
Then, given the data, the joint prior densities (13), and the likelihood (10), the posterior distribution of the parameters Θ , τ = μ X , μ Y , σ , p , ρ , τ is defined as follows:
π Θ , τ | x . 1 , , x . n x ; y . 1 , y . n y 1 b a i = 1 N p i 1 τ i ( 1 p i ) τ i p i α i 1 ( 1 p i ) β i 1 b X i a X i b Y i a Y i σ i 2 × j = 1 n x c X ( u X . j ; ρ ) i = 1 N f i ( x i j | μ X i , σ i 2 ) k = 1 n y c Y ( u Y . k ; ρ ) × i : τ i = 0 f i ( y i k | μ X i , σ i 2 ) i : τ i = 1 f i ( y i k | μ Y i , σ i 2 )
As we can see, this joint posterior distribution is complex and has no known analytical expression. However, Bayesian inference may be performed using the MCMC approach. The MCMC approach can produce a Markov chain Θ ( l ) , τ ( l ) : l = 1 , , M that convergences to the joint posterior distribution. Consequently, we can estimate the parameters from the generated sample, for example, from the marginal means of these samples. In particular, we used the Metropolis-Hastings-within-Gibbs algorithm. For details regarding this, see [38,39], and their references.

3.2. Conditional Posterior Distributions

In this subsection, we derive the conditional posterior distributions of the model parameters to construct the MCMC chain.
The conditional posterior distributions are derived as follows for each τ i = 0 , i = 1 , 2 , , N , given the data and the remaining parameters:
P r τ i = 0 | Θ , τ i , X , Y = p i k = 1 n y c Y u Y . k ; ρ e x p 1 2 σ i 2 n y y ¯ i . μ X i 2 p i k = 1 n y c Y u Y . k ; ρ e x p 1 2 σ i 2 n y y ¯ i . μ X i 2 + ( 1 p i ) k = 1 n y c Y u Y . k ; ρ e x p 1 2 σ i 2 n y y ¯ i . μ Y i 2 .
Therefore, the conditional posterior distributions of each τ i = 1 , for i = 1 , 2 , , N , are
P r τ i = 1 | Θ , τ i , X , Y = 1 P r τ i = 0 | Θ , τ i , X , Y
The conditional posterior distributions of each p i , for i = 1 , 2 , , N , given the data and the remaining parameters are   
π ( p i | X , Y , Θ p i ) B e t a ( α i + 1 τ i , β i + τ i )
The conditional posterior distributions of each μ X i for i = 1 , 2 , , N , given the data and the remaining parameters, when τ i = 0 and τ i = 1 , are respectively defined by
π ( μ X i | X , Y , τ i = 0 , τ i , Θ μ X i ) = j = 1 n x c X ( u X . j ; ρ ) k = 1 n y c Y ( u Y . k ; ρ ) f i ( μ X i ) E f i ( μ X i ) j = 1 n x c X ( u X . j ; ρ ) k = 1 n y c Y ( u Y . k ; ρ )
where
f i ( μ X l ) N n x x ¯ i . + y ¯ i . n x + n y , σ i n x + n y
π ( μ X i | X , Y , τ i = 1 , τ i , Θ μ X i ) = j = 1 n x c X ( u X . j ; ρ ) f i ( μ X i ) E f i ( μ X i ) j = 1 n x c X ( u X . j ; ρ )
and where
f i ( μ X i ) N x ¯ i . , σ i n x
The conditional posterior distributions for each μ Y i of the model, when τ i = 1 , for i = 1 , 2 , , N , given the data and the remaining model parameters, are defined by
π ( μ Y i | X , Y , τ i = 1 , τ i , Θ μ Y i ) = k = 1 n y c Y ( u Y . k ; ρ ) f i ( μ Y i ) E f i ( μ Y i ) k = 1 n y c Y ( u Y . k ; ρ )
where
f i ( μ Y i ) N y ¯ i . , σ i n y
The conditional posterior distributions for each σ i 2 of the model, for i = 1 , 2 , , N , given the data and the remaining model parameters, when τ i = 0 and τ i = 1 , are respectively defined by
π σ i 2 | X , Y , τ i = v , τ i , Θ σ i 2 = j = 1 n x c X u X . j ; ρ k = 1 n y c Y u Y . k ; ρ f i σ i 2 E f i σ i 2 j = 1 n x c X u X . j ; ρ k = 1 n y c Y u Y . k ; ρ , v = { 0 , 1 }
where
if v = 0 , f i σ i 2 InverseGamma n x + n y 2 , A 2 with A = j ( x i j μ X i ) 2 + k ( Y i k μ X i ) 2
if v = 1 , f i σ i 2 InverseGamma n x + n y 2 , B 2 with B = k ( x i j μ X i ) 2 + k ( Y i k μ Y i ) 2
Finally, the conditional posterior distribution for ρ given the data and the remaining parameters is
π ρ | X , Y , Θ ρ , τ i = v , τ i = j = 1 n x c X u X . j ; ρ k = 1 n y c Y u Y . k ; ρ j = 1 n x c X u X . j ; ρ k = 1 n y c Y u Y . k ; ρ d ρ , v = { 0 , 1 }
Please note that the posterior conditional distribution in Equations (17), (19), (21), (23) and (26) for the parameters has no analytic form.
For the computational implementation of the algorithm, we used the R program, as it is free statistical software and it provides an easy structure for manipulating complex models.

3.3. MCMC Algorithm (Metropolis-Hastings-within-Gibbs Algorithm)

We make use the Metropolis-Hastings-within-Gibbs algorithm based on the MCMC sampling strategies to obtain a sample from the joint posterior distribution (14). The structure of the proposed MCMC method is implemented as follows. The detailed algorithm is described in the Appendix A.
Algorithm 1 MCMC Algorithm
Require: initial values ( Θ ( 0 ) , τ ( 0 ) ) = ( μ X ( 0 ) , μ Y ( 0 ) , σ 2 ( 0 ) , p ( 0 ) , ρ ( 0 ) , τ ( 0 ) ) . Where τ ( 0 ) = ( τ 1 ( 0 ) , , τ N ( 0 ) ) , p ( 0 ) = ( p 1 ( 0 ) , . . , p N ( 0 ) ) , μ X ( 0 ) = ( μ X 1 ( 0 ) , , μ X N ( 0 ) ) , μ Y ( 0 ) = ( μ Y 1 ( 0 ) , , μ Y N ( 0 ) ) , σ 2 ( 0 ) = ( σ 1 2 ( 0 ) , . . , σ N 2 ( 0 ) )
Procedure
1:
Let the current state of the Markov chain be ( Θ ( l ) , τ ( l ) ) = ( μ X ( l ) , μ Y ( l ) , σ ( l ) , p ( l ) , ρ ( l ) , τ ( l ) )
2:
for l 1 : M do
3:
    Update τ i ( l ) , for i = 1 , , N                     ▹ by sampling from (15)
4:
    Update p i ( l ) , for i = 1 , , N                     ▹ by sampling from (16)
5:
    Update μ X i ( l ) , for i = 1 , , N      ▹ by sampling from (17) and (19) when τ i ( l + 1 ) = 0 and τ i ( l + 1 ) = 1 , respectively
6:
    Update μ Y i ( l ) , for i = 1 , , N                     ▹ by sampling from (21)
7:
    Update σ i 2 ( l ) , for i = 1 , , N    ▹ by sampling from (23) with (24) when τ i ( l + 1 ) = 0 and with (25) when τ i ( l + 1 ) = 1
8:
    Update ρ ( l )                             ▹ by sampling from (26).
9:
end for
End Procedure: Return Θ ( l ) , τ ( l ) : l = 1 , . . , M
Given an MCMC sample, μ X ( l ) , μ Y ( l ) , σ 2 l , p l , τ l , ρ l l = 1 M with μ X = ( μ X 1 , , μ X N ) , μ Y = ( μ Y 1 , , μ Y N ) , σ 2 = ( σ 1 2 , , σ N 2 ) , p = ( p 1 , , p N ) , and τ = ( τ 1 , , τ N ) , obtained from Metropolis-Hasting-within-Gibbs sampling algorithm. We can obtain estimates of the posterior marginal means as follows:
μ ^ X i = E μ X i X , Y 1 M l = 1 M μ X i ( l )
μ ^ Y i = E μ Y i X , Y 1 M j = 1 M μ Y i ( l )
σ ^ i 2 = E σ i 2 X , Y 1 M j = 1 M σ i 2 ( l )
p ^ i = E p i X , Y 1 M l = 1 M p i ( l ) ρ ^ = E ρ X , Y 1 M l = 1 M ρ ( l )
for each i = 1 , 2 , , N .
We can also approximate the posterior probability of the alternative hypothesis by
P r τ i = 1 | X , Y = P r H 1 i = 1 | X , Y = 1 M j = 1 M I ( τ i ( j ) = 1 )
for i = 1 , 2 , , N .
Please note that we can use these posterior probabilities to solve the multiple hypothesis testing problem.

3.4. Simulation Study

We performed a simulation with N = 50 simultaneous hypotheses, with n = 20 observations per hypothesis/gene, where n x = 7 and n y = 13 observations express Treatment Conditions 1 and 2, respectively, since the microarray dates typically have a smaller number of samples than the number of genes/hypothesis.
In this context, the datasets for both experimental conditions were generated following multivariate Gaussian distributions N 50 μ X , Σ and N 50 μ Y , Σ , with the same covariance-variance matrix Σ and correlation coefficient ρ = 0.8 . The vector of means μ X for Condition 1 was defined in the range (1150, 1160), and the vector of means μ Y for Condition 2 was defined in the range (1165, 1180). The vector of standard deviations was defined in the range (8, 16).
We considered simultaneous updates of the vector Θ , τ = μ X , μ Y , σ 2 , p , τ , ρ of the model parameters. The proposed algorithm was run for 15,000 draws (i.e., iterations) of the Metropolis-Hastings-within-Gibbs sampler sequence, discarding the first 7500 burn-in iterations in one-step MCMC outputs.
To implement the proposed algorithm (i.e., the Metropolis-Hastings-within-Gibbs algorithm), we opted to draw the candidate using an independence chain, to explore the form of the conditional posterior distributions in Equations (17), (19), (21), (23), and (26), and to set the candidate-generating densities in Equations (18), (20), (22), (24), and (25) for the parameters μ X , μ Y , σ 2 . In addition, we selected the uniform prior distribution for the dependence parameter ρ in the range ( 0.6 , 0.9 ) as the candidate-generating density.
The simulation compared the performance of our approach when applied to three simulated datasets. Thus, the simulated dataset assumed 80%, 50%, and 20% of the true null hypotheses, and all sets were generated at N = 50 . For the prior distribution p i B e t a α i , β i , we assumed the same α and β parameters for i = 1 , , N for the sake of simplicity, i.e., p i B e t a α , β for the three simulated datasets. Furthermore, to analyze the sensitivity with respect to the election of beta distribution parameters, we selected the following values for ( α , β ) : 0.5 , 1 , 1 , 1 , 1 , 0.5 , and 2 , 0.5 . Indeed, with these parameters, very different distributions are obtained. As in [14,40,41], we obtain the FDR from the expected false discovery rate introduced by [42,43]. The FDR has been estimated using an MCMC sample obtained from Metropolis-Hasting-within-Gibbs algorithm. Table 1, Table 2 and Table 3 present the simulation results.
As shown in Table 1 and Table 2, for both data structures there was too much sensitivity in the choice of parameters α and β for the prior distribution on p i . It seems that these parameters have a considerable influence on the results, since we can observe significant differences in the estimations. Therefore, it appears that the parameters α and β of the prior distribution for p i are better when the prior distribution is skewed to the right, for instance, when we assume the prior distributions B e t a 1 , 0.5 and B e t a 2 , 0.5 .
To emphasize that this is in fact better for our approach, we simulated a dataset by assuming 20% of the true null hypotheses. The simulation results are presented in Table 3. We can observe that these results are similar to those obtained in Table 1 and Table 2, i.e., the estimations are closer to reality when the prior distribution is skewed to the right, even with a low percentage of true null hypotheses. Therefore, we conclude that with our model, good results will be obtained whenever a prior distribution for p i is skewed to the right, regardless of the number of true null hypotheses.

4. Modeling Dependence Through N-Dimensional Clayton Copulas with Normal Marginal Densities

The Archimedean family includes a large number of copulas with different peculiarities, characterized by allowing the modeling of multivariate distributions using a single univariate function, thus simplifying the calculations.
As in the previous section, we consider normal marginal densities, but the dependency is modeled using an N-dimensional Clayton copula of the Archimedean family. This copula has already been used to model dependency in gene expression analysis, as can be seen in [8].
In this case, the likelihood is expressed as in (10) but now c X u X . j ; θ c and c Y u Y . k ; θ c are the Clayton copulas given by:   
c X u X . j ; θ c = 1 N + i = 1 N u X i j θ c N ( 1 θ c ) l = 1 N u X l j θ c 1 ( θ c ( l 1 ) + 1 ) c Y u Y . k ; θ c = 1 N + i = 1 N u Y i k θ c N ( 1 θ c ) l = 1 N u Y l k θ c 1 ( θ c ( l 1 ) + 1 )
where θ c is the dependency parameter for the Clayton copula, u X . j = ( u X 1 j , u X N j ) , u Y . k = ( u Y 1 k , u Y N k ) , being u X i j = F x i j and u Y i k = F y i j , i = 1 , , N , j = 1 , 2 , , n x and k = 1 , 2 , , n y .
Then, the parameter vector for the model is Θ = μ X , μ Y , σ 2 , p , θ c , where μ X = ( μ X 1 . . , μ X N ) , μ Y = ( μ Y 1 . . , μ Y N ) , σ 2 = ( σ 1 2 , , σ N 2 ) and p = ( p 1 , , p N ) , being p i the initial probability of each null hypothesis.
Thus, given the data, the joint prior densities as in (13), with the uniform distribution for θ c over a , b and the likelihood, the posterior distribution takes the same form as in (14) with the c X u X . j ; θ c and c Y u Y . k ; θ c the Clayton copulas.
As in Section 3.1, this posterior distribution has no analytic form, but Bayesian inference may be performed using MCMC methods. The parameters conditional posterior distributions, which are necessary to build the algorithm, are the same as those described in Section 3.2, replacing Gaussian copulas by Clayton copulas. Therefore, the same Metropolis-Hastings-within-Gibbs algorithm is used substituting Gaussian copulas by Clayton copulas, defined in (32).
Likewise, given an MCMC sample, μ X ( l ) , μ Y ( l ) , σ 2 l , p l , τ l , θ c l l = 1 M with μ X = ( μ X 1 , , μ X N ) , μ Y = ( μ Y 1 , , μ Y N ) , σ 2 = ( σ 1 2 , , σ N 2 ) , p = ( p 1 , , p N ) , and τ = ( τ 1 , , τ N ) , obtained from Metropolis-Hasting-within-Gibbs sampling algorithm. We can obtain estimates of the posterior marginal means as in (27), (28), (29) and (30). Analogously, we can estimate the parameter of the Clayton copula as follows:
θ c ^ = E θ c X , Y 1 M l = 1 M θ c ( l )
Finally, we can also approximate the posterior probability of the alternative hypothesis through (31).

4.1. Simulation Study

In this section, the same data sets from Section 3.4 was used. When working with non-elliptical distributions, Kendall τ coefficient should be used instead of Pearson correlation coefficient [36]. Under normality, both coefficients are related as follows:
τ = 2 π a r c s i n ρ ρ = s i n π 2 τ
while Kendall coefficient and Clayton copula parameter, θ c , are also related:
τ ( θ c ) = θ c θ c + 2
Therefore, we select the uniform prior distribution over the interval ( 1.3 , 4 ) for θ c as the candidate-generating density for the MCMC algorithm, since, from (33) and (34), this interval corresponds to the interval ( 0.58 , 0.87 ) for Pearson coefficient.
To carry out the sensitivity analysis, the same values as in Section 3.4 were considered for parameters ( α , β ) of the beta prior distribution for p i , ( 0.5 , 1 ) , ( 1 , 1 ) , ( 1 , 0.5 ) and ( 2 , 0.5 ) .
The Metropolis-Hastings-within-Gibbs algorithm, described in Section 3.3, was used replacing Gaussian copulas by Clayton copulas. The algorithm was run for 15,000 iterations, discarding the first 7500 burn-in iterations in MCMC outputs. The F D R was also obtained as in Section 3.4.
As shown in Table 4, Table 5 and Table 6, there was a high sensitivity regarding the choice of parameters α and β for the prior distribution on p i , as it happened for the model with Gaussian copulas. Likewise, the procedure is more accurate when the prior distribution of p i is skewed to the right, for instance, when the prior distributions B e t a ( 1 , 0.5 ) and B e t a ( 2 , 0.5 ) are assumed. However, the FDR obtained is higher than in the Gaussian copula model.

4.2. Model Selection

To compare normal marginal distribution models using Gaussian and Clayton copulas, we use the Deviance Information Criterion (DIC). A model with smaller DIC should be preferred to models with larger DIC [44]. The DIC value is given by:
D I C = 4 E Θ , τ l o g L Θ , τ | X , Y | X , Y + 2 l o g L E Θ , τ Θ , τ | X , Y | X , Y
Given an MCMC sample of size M from the posterior distribution, the DIC value (35), can be approximated by,
D I C = 4 M l = 1 M l o g L Θ ( l ) , τ ( l ) | X , Y + 2 l o g L 1 M l = 1 M Θ ( l ) , 1 M l = 1 M τ ( l ) | X , Y
The DIC value was obtained for the model with Gaussian Copulas and the model with Clayton copulas and, in both cases, using prior distributions for p i skewed to the right, B e t a ( 1 , 0.5 ) and B e t a ( 2 , 0.5 ) . The results are shown in Table 7, for data sets with 80 % , 50 % and 20 % true null hypotheses, respectively.
As it can be seen in Table 7, the smallest DIC values obtained correspond to the model with Gaussian copulas, as expected based on the data that had been generated. On the other hand, it can be seen that there were no differences between the DIC values there were no important differences between the DIC values corresponding to the parameters ( 1 , 0.5 ) and ( 2 , 0.5 ) of the beta prior distribution. Furthermore, the FDR values are lower using Gaussian copulas than Clayton copulas. Thus, the model with Gaussian copulas is more suitable for our simulated data.

5. Application to a Real Data Set: DNA Microarrays

The procedure described in the previous section is applied to a DNA microarrays data set. This data set consists of 38 genes obtained from duodenal biopsies tissues, performed in 13 children with celiac disease of mean age 5.6 ( ± 0.6 ) years old and 7 children controls of mean age 8.1 ( ± 2.2 ) years old, belonging to part of the study carried out in [45]. The data is available in NCBI-GEO datasets (The National Center for Biotechnology Information-Gene Expression Omnibud) via the GSE76168 access number.
The aim is to identify deferentially expressed genes. Thus, we tested simultaneously if there are significant differences between celiac patients and controls in the expression mean level of 38 genes, i.e., we consider the multiple hypothesis tests given in (9) and the main objective is to obtain the posterior probability of each null hypothesis.
The Bayesian procedure described by [12,14] was applied by [45] considering independence between the levels of genes expression. However, there is correlation between the levels of expression of these genes. To model the data, we consider normal marginal densities, as [45], and the dependency is modeled first through Gaussian copulas and secondly through Clayton copulas.

5.1. Modeling Dependency through N-Dimensional Gaussian Copulas

To apply Metropolis-Hastings-within-Gibbs algorithm, described in Section 3.3, we need to consider a candidate-generating density for the dependency parameter ρ . Since there are positive and negative correlations, the most suitable procedure would be to select the uniform prior distribution in the range (–1,1). However, we consider as a candidate-generating distribution for ρ the uniform prior distribution in the range ( 0.027 , 1 ) due to the constraint (11).
For the prior distribution on p i , we consider the B e t a ( 2 , 0.5 ) distribution, because, according to the results obtained in Section 3.4, this is the prior distribution that produces the most accurate results when using Gaussian copulas.
The algorithm was run for 40,000 iterations, discarding the first 20,000 burn-in iterations. If we compare our results, using a data dependency structure with those of [45] that suppose independence, different results are obtained: we identified 16 genes that are expressed differentially while they find 15, additionally out of the 16 identified, 4 do not coincide with those identified by their procedure.

5.2. Modeling Dependence Through N-Dimensional Clayton Copulas

In this subsection, the same Metropolis-Hastings-within-Gibbs algorithm, replacing the Gaussian copulas by the Clayton copulas, is used to the data of [45].
For θ c , the uniform distribution over ( 0.01 , 5 ) was considered to be candidate-generating density. For the parameters p i , we chose a B e t a ( 1 , 0.5 ) prior distribution because, according to the results obtained in Section 4.1, this is the prior distribution that produces the most accurate results when using Clayton copulas. Likewise, the algorithm was run for 40,000 iterations, discarding the first 20,000 burn-in iterations.
In this case, we found 22 genes that are differentially expressed, 6 more than when using Gaussian copulas and if we compare them with the results of [45], we identified the 15 genes that they had already identified and 7 additional ones.
Finally, DIC = 1487.213 was obtained for the model with Gaussian copulas and DIC = 1504.445 for the model with Clayton copulas. Therefore, the model with Gaussian copulas and normal marginal densities turns out to be the most suitable for the data associated with celiac disease by [45], because with this model the smallest value DIC is obtained.

6. Conclusions

The proposed approach is very useful when many hypotheses are tested simultaneously under the assumption of dependence. For the proposed procedure for testing multiple hypotheses, the full data are used directly, rather than using test statistics, especially for modeling the dependency structure. Therefore, the modeling process is more complex than when using test statistics, and this presents computational problems when thousands of hypotheses are tested simultaneously with a large sample size. In any case all available information, both objective and subjective, can be used.
In the field of genomics, the normal distribution is widely used to model gene expression data. Thus, we adopted normal marginal distributions and modeled the dependency structure through the Gaussian copula, which shares the properties of a multivariate normal distribution. We opted to use a uniform correlation matrix to reduce the dimensionality of the parameters. To model the dependency structure, we also consider the Clayton copula of the Archimedean family, which enables modeling of multivariate distributions using a single univariate function, thus simplifying calculations. The proposed approach is flexible as far as it can be used with other correlation matrices more realistically, or with other copula functions to model the dependence, as well as other marginal distributions.
For the model with Gaussian copulas, with a lower DIC value and therefore the most appropriate for our simulated data, the results obtained demonstrated that the procedure fits the dependence well. The estimated correlation coefficient was close to the true value with which the data were generated. However, the procedure is not robust with respect to the choice of prior distribution for the initial probability of each null hypothesis. Nevertheless, in all simulated examples, the procedure rejected almost all false hypotheses when we used a prior distribution beta skewed to the right. Therefore, our proposal turned out to be a very powerful procedure for testing multiple hypotheses.
In the cases analyzed using the Gaussian copula model, the highest FDR value obtained was 0.079 when using a right-skewed beta prior distribution. However, this need not be inconvenient in the context of experiments with DNA microarrays, because, as explained above, the main objective in many of these studies is to obtain the greatest possible number of potentially expressed genes, with which more detailed studies can be carried out subsequently. As a result, in this phase of analysis, we can tolerate more false positives to obtain the greatest possible number of interesting genes.

Author Contributions

Conceptualization, E.C.J.M., I.S., L.S. and M.A.G.-V.; methodology, E.C.J.M., I.S., L.S. and M.A.G.-V.; software, E.C.J.M. and I.S.; validation, E.C.J.M.; formal analysis, E.C.J.M., I.S., L.S. and M.A.G.-V.; writing—original draft preparation, E.C.J.M. and I.S.; writing—review and editing, I.S., L.S. and M.A.G.-V.; visualization, E.C.J.M., I.S., L.S. and M.A.G.-V.; supervision, I.S. and L.S.; project administration, I.S., L.S. and M.A.G.-V.; funding acquisition, I.S., L.S. and M.A.G.-V. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Universidad Complutense de Madrid, Spain, grant GR29/20 and research group HUMLOG UCM-GR970643.

Conflicts of Interest

The authors declare no conflict of interest, financial or ethical of any kind.

Appendix A. MCMC Algorithm

In this appendix, we explain the proposed MCMC algorithm in detail. We constructed a Metropolis-Hastings-within-Gibbs sampling scheme that combines the Gibbs and Metropolis-Hastings sample strategies. The scheme involves interactively sampling from the Gibbs algorithm and from a single iteration of the Metropolis-Hastings algorithm, for standard conditional posterior distributions and for no standard conditional posterior distributions, respectively.
Algorithm 2 MCMC
Require: initial values ( Θ ( 0 ) , τ ( 0 ) ) = ( μ X ( 0 ) , μ Y ( 0 ) , σ 2 ( 0 ) , p ( 0 ) , ρ ( 0 ) , τ ( 0 ) ) . Where τ ( 0 ) = ( τ 1 ( 0 ) , , τ N ( 0 ) ) , p ( 0 ) = ( p 1 ( 0 ) , . . , p N ( 0 ) ) , μ X ( 0 ) = ( μ X 1 ( 0 ) , , μ X N ( 0 ) ) , μ Y ( 0 ) = ( μ Y 1 ( 0 ) , , μ Y N ( 0 ) ) , σ 2 ( 0 ) = ( σ 1 2 ( 0 ) , . . , σ N 2 ( 0 ) )
Procedure
1:
Let the current state of the Markov chain be ( Θ ( l ) , τ ( l ) ) = ( μ X ( l ) , μ Y ( l ) , σ ( l ) , p ( l ) , ρ ( l ) , τ ( l ) )
2:
for l 1 : M do
3:
    Update τ i by sampling from τ i ( l + 1 ) , for i = 1 , , N             ▹ Algorithm 3
4:
    Update p i by sampling from p i ( l + 1 ) , for i = 1 , , N             ▹ Algorithm 4
5:
    Update μ X i by sampling from μ X i ( l + 1 ) , for i = 1 , , N            ▹ Algorithm 5
6:
    Update μ Y i by sampling from μ Y i ( l + 1 ) , for i = 1 , , N            ▹ Algorithm 6
7:
    Update σ i 2 by sampling from σ i 2 ( l + 1 ) , for i = 1 , , N            ▹ Algorithm 7
8:
    Update ρ by sampling from ρ ( l + 1 )                      ▹ Algorithm 8
9:
end for
End Procedure: Return Θ ( l ) , τ ( l ) : l = 1 , . . , M
Algorithm 3 MCMC for τ i , i = 1 , , N
Require: initial values ( Θ ( 0 ) , τ ( 0 ) ) = ( μ X ( 0 ) , μ Y ( 0 ) , σ 2 ( 0 ) , p ( 0 ) , ρ ( 0 ) , τ ( 0 ) ) . Where τ ( 0 ) = ( τ 1 ( 0 ) , , τ N ( 0 ) ) , p ( 0 ) = ( p 1 ( 0 ) , . . , p N ( 0 ) ) , μ X ( 0 ) = ( μ X 1 ( 0 ) , , μ X N ( 0 ) ) , μ Y ( 0 ) = ( μ Y 1 ( 0 ) , , μ Y N ( 0 ) ) , σ 2 ( 0 ) = ( σ 1 2 ( 0 ) , . . , σ N 2 ( 0 ) )
Procedure
1:
Update μ Y ( 0 ) = ( μ Y 1 ( 0 ) , , μ Y N ( 0 ) )             ▹Ifelse τ i ( 0 ) = 0 , μ Y i ( 0 ) = μ X i ( 0 ) , μ Y i ( 0 )
2:
Calculate copula c Y ( u Y ; ρ ( 0 ) )
3:
Let the current state of the Markov chain be ( Θ ( l ) , τ ( l ) ) = ( μ X ( l ) , μ Y ( l ) , σ ( l ) , p ( l ) , ρ ( l ) , τ ( l ) )
4:
for i 1 : N do
5:
    Calculate K i = P r ( τ i l + 1 = 0 | μ X ( l ) , μ Y ( l ) , σ 2 ( l ) , p i ( l ) , ρ ( l ) , τ j < i ( l + 1 ) , τ j > i ( l ) )      ▹ Equation (15)
6:
    Generate a random uniform number U i ( 0 , 1 )
7:
    if U i K i then
8:
         τ i l + 1 = 0
9:
        Update μ Y i ( l ) = μ X i ( l )
10:
    else
11:
         τ i l + 1 = 1
12:
    end if
13:
    Update μ Y ( l ) = ( μ Y 1 ( l ) , , μ Y N ( l ) )               ▹Ifelse τ i ( l + 1 ) = 0 , μ Y i ( l ) = μ X i ( l ) , μ Y i ( l )
14:
    Calculate copula c Y ( u Y ; ρ ( l ) )
15:
end for
End procedure Return ( τ i l + 1 ) , i = 1 , , N
Algorithm 4 MCMC: GIBBS for p i , i = 1 , , N
Require: current values ( Θ ( 0 ) , τ ( l + 1 ) ) = ( μ X ( 0 ) , μ Y ( 0 ) , σ 2 ( 0 ) , p ( 0 ) , ρ ( 0 ) , τ ( 0 ) ) . Where τ ( l + 1 ) = ( τ 1 ( 1 + 1 ) , , τ N ( l + 1 ) ) , p ( 0 ) = ( p 1 ( 0 ) , . . , p N ( 0 ) ) , μ X ( 0 ) = ( μ X 1 ( 0 ) , , μ X N ( 0 ) ) , μ Y ( 0 ) = ( μ Y 1 ( 0 ) , , μ Y N ( 0 ) ) , σ 2 ( 0 ) = ( σ 1 2 ( 0 ) , . . , σ N 2 ( 0 ) )
procedure
1:
Let the current state of the Markov chain be ( Θ ( l ) , τ ( l + 1 ) ) = ( μ X ( l ) , μ Y ( l ) , σ ( l ) , p ( l ) , ρ ( l ) , τ ( l + 1 ) )
2:
for i 1 : N do
3:
    Update p i by sampling from p i l + 1 p ( p i τ i ( l + 1 ) )            ▹ Equation (16)
4:
end for
End procedure Return ( p i l + 1 ) , i = 1 , , N
Algorithm 5 Single iteration of the Metropolis-Hasting for μ X i , i = 1 , , N
Require: current values τ ( l + 1 ) = ( τ 1 ( 1 + 1 ) , , τ N ( l + 1 ) ) , μ X ( l ) = ( μ X 1 ( l ) , , μ X N ( l ) ) , μ Y ( l ) = ( μ Y 1 ( l ) , , μ Y N ( l ) ) , σ 2 ( l ) = ( σ 1 2 ( l ) , . . , σ N 2 ( l ) ) , ρ ( l )
procedure
1:
for i 1 : N do
2:
    Sample candidate μ X i ( c )               ▹Ifelse τ i ( l + 1 ) = 0 , (18), from (20)
3:
    Calculate copula c X ( u X ; ρ ( l ) )                     ▹ With μ X i ( c )
4:
    Calculate copula c Y ( u Y ; ρ ( l ) )           ▹Ifelse τ i ( l + 1 ) = 0 , μ Y i ( l ) = μ X i ( c ) , μ Y i ( l )
5:
    Sample random uniform number U i ( 0 , 1 )
6:
    if U i α ( μ X i ( l ) , μ X i ( c ) ) then                 ▹ α ( μ X i ( l ) , μ X i ( c ) ) = m i n { 1 , A 1 }
7:
         μ X i ( l + 1 ) = μ X i ( c )
8:
    else
9:
         μ X i ( l + 1 ) = μ X i ( l )
10:
    end if
11:
    Update μ Y ( l ) = ( μ Y 1 ( l ) , , μ Y N ( l ) )            ▹Ifelse τ i ( l + 1 ) = 0 , μ Y i ( l ) = μ X i ( l + 1 ) , μ Y i ( l )
12:
    Calculate copula c X ( u X ; ρ ( l ) )
13:
    Calculate copula c Y ( u Y ; ρ ( l ) )
14:
end for
End procedure Return μ X i ( l + 1 ) , i = ( 1 , , N )
A 1 = j = 1 n x c X c ( u X . j ; ρ ) k = 1 n y c Y c ( u Y . k ; ρ ) f i ( μ X i ( c ) ) j = 1 n x c X ( u X . j ; ρ ) k = 1 n y c Y ( u Y . k ; ρ ) f i ( μ X i l ) , where c X c ( u X . j ; ρ ) and c Y c ( u Y . k ; ρ ) computed when μ X i l = μ X i c .
Algorithm 6 Single iteration of the Metropolis-Hasting for μ Y i , i = 1 , , N
Require: current values τ ( l + 1 ) = ( τ 1 ( 1 + 1 ) , , τ N ( l + 1 ) ) , μ X ( l + 1 ) = ( μ X 1 ( l + 1 ) , , μ X N ( l + 1 ) ) , μ Y ( l ) = ( μ Y 1 ( l ) , , μ Y N ( l ) ) , σ 2 ( l ) = ( σ 1 2 ( l ) , . . , σ N 2 ( l ) ) , ρ ( l )
procedure
1:
for i 1 : N do
2:
    if τ i ( l + 1 ) = 0 then
3:
         μ Y i ( l + 1 ) = μ X i ( l + 1 )
4:
    else
5:
        Sample candidate μ Y i ( c )            ▹ candidate-generating density (22)
6:
        Calculate copula c Y ( u Y ; ρ ( l ) )                      ▹ With μ Y ( c )
7:
        Sample random uniform number U i ( 0 , 1 )
8:
        if U i α ( μ Y i ( l ) , μ Y i ( c ) ) then               ▹ α ( μ Y i ( l ) , μ Y i ( c ) ) = m i n { 1 , A 2 }
9:
            μ Y i ( l + 1 ) = μ Y i ( c )
10:
        else
11:
            μ Y i ( l + 1 ) = μ Y i ( l )
12:
        end if
13:
        Calculate copula c Y ( u Y ; ρ ( l ) )
14:
    end if
15:
end for
End procedure Return μ Y i ( l + 1 ) , i = ( 1 , , N )
A 2 = k = 1 n y c Y c ( u Y . k ; ρ ) f i ( μ X i ( c ) ) k = 1 n y c Y ( u Y . k ; ρ ) f i ( μ X i l ) , where c Y c ( u Y . k ; ρ ) computed when μ Y i l = μ Y i c .
Algorithm 7 Single iteration of the Metropolis-Hasting for σ i 2 , i = 1 , , N
Require: current values τ ( l + 1 ) = ( τ 1 ( 1 + 1 ) , , τ N ( l + 1 ) ) , μ X ( l + 1 ) = ( μ X 1 ( l + 1 ) , , μ X N ( l + 1 ) ) , μ Y ( l + 1 ) = ( μ Y 1 ( l + 1 ) , , μ Y N ( l + 1 ) ) , σ 2 ( l ) = ( σ 1 2 ( l ) , . . , σ N 2 ( l ) ) , ρ ( l )
procedure
1:
for i 1 : N do
2:
    Sample candidate σ i 2 ( c )            ▹Ifelse τ i ( l + 1 ) = 0 ,from (24), from ()
3:
    Calculate copula c X ( u X ; ρ ( l ) ) , c Y ( u Y ; ρ ( l ) )                ▹ With σ i 2 ( c )
4:
    Sample random uniform number U i ( 0 , 1 )
5:
    if U i α ( σ i 2 ( l ) , σ i 2 ( c ) ) then             ▹ α ( σ i 2 ( l ) , σ i 2 ( c ) ) = m i n { 1 , A 3 }
6:
         σ i 2 ( l + 1 ) = σ i 2 ( c )
7:
    else
8:
         σ i 2 ( l + 1 ) = σ i 2 ( l )
9:
    end if
10:
    Calculate copula c X ( u X ; ρ ( l ) )
11:
    Calculate copula c Y ( u Y ; ρ ( l ) )
12:
end for
End procedure Return σ i 2 ( l + 1 ) , i = ( 1 , , N )
A 3 = j = 1 n x c X c ( u X . j ; ρ ) k = 1 n y c Y c ( u Y . k ; ρ ) f i ( σ i 2 ( l ) ) j = 1 n x c X ( u X . j ; ρ ) k = 1 n y c Y ( u Y . k ; ρ ) f i ( σ i 2 ( l ) ) , where c X c ( u X . j ; ρ ) and c Y c ( u Y . k ; ρ ) computed when σ i 2 ( l ) = σ i 2 ( c ) .
Algorithm 8 Single iteration of the Metropolis-Hasting for ρ
Require: current values μ X ( l + 1 ) = ( μ X 1 ( l + 1 ) , , μ X N ( l + 1 ) ) , μ Y ( l + 1 ) = ( μ Y 1 ( l + 1 ) , , μ Y N ( l + 1 ) ) , σ 2 ( l + 1 ) = ( σ 1 2 ( l + 1 ) , . . , σ N 2 ( l + 1 ) )
procedure
1:
Sample candidate ρ ( c )                ▹ from U ( a , b ) , a , b ( 0 , 1 )
2:
Sample random uniform number U ( 0 , 1 )
3:
if U α ( ρ ( l ) , ρ ( c ) ) then                       ▹ α = m i n { 1 , A 4 }
4:
     ρ ( l + 1 ) = ρ ( c )
5:
else
6:
     ρ ( l + 1 ) = ρ ( l )
7:
end if
End procedure Return ρ ( l + 1 )
A 4 = j = 1 n x c X ( u X . j ; ρ c ) k = 1 n y c Y ( u Y . k ; ρ c ) j = 1 n x c X ( u X . j ; ρ l ) k = 1 n y c Y ( u Y . k ; ρ l ) .

References

  1. Fisher, R.A. The Design of Experiments, 9th ed.; Macmillan: New York, NY, USA, 1971; [1935]; ISBN 0-02-844690-9. [Google Scholar]
  2. Benjamini, Y.; Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. R. Stat. Soc. Ser. B-Stat. Methodol. 1995, 57, 289–300. [Google Scholar] [CrossRef]
  3. Shaffer, J.P. Multiple hypothesis testing. Annu. Rev. Psychol. 1995, 46, 561–584. [Google Scholar] [CrossRef]
  4. Dudoit, S.; Shaffer, J.P.; Boldrick, J.C. Multiple hypothesis testing in microarray experiments. Stat. Sci. 2003, 1, 71–103. [Google Scholar] [CrossRef]
  5. Dudoit, S.; Keleş, S.; van der Laan, M.J. Multiple tests of association with biological annotation metadata. In Probability and Statistics: Essays in Honor of David A. Freedman; Collections; Institute of Mathematical Statistics: Beachwood, OH, USA, 2008; Volume 2, pp. 153–218. [Google Scholar] [CrossRef]
  6. Benjamini, Y.; Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 2001, 29, 1165–1188. [Google Scholar] [CrossRef]
  7. Gavrilov, Y.; Benjamini, Y.; Sarkar, S.K. An adaptive step-down procedure with proven FDR control under independence. Ann. Stat. 2009, 37, 619–629. [Google Scholar] [CrossRef]
  8. Dickhaus, T.; Gierl, J. Simultaneous test procedures in terms of p-value copulae. In Proceedings of the 2nd Annual International Conference on Computacional Mathematics, Computational Geometry & Statictics (CMCGS), Paris, France, 4–5 February 2013. [Google Scholar]
  9. Bodnar, T.; Dickhaus, T. False discovery rate control under Archimedean copula. Electron. J. Statist. 2014, 8, 2207–2241. [Google Scholar] [CrossRef]
  10. Ibrahim, J.G.; Chen, M.H.; Gray, R.J. Bayesian models for gene expression with DNA microarray data. J. Am. Stat. Assoc. 2002, 97, 88–99. [Google Scholar] [CrossRef]
  11. Gottardo, R.; Raftery, A.E.; Yee Yeung, K.; Bumgarner, R.E. Bayesian robust inference for differential gene expression in microarrays with multiple samples. Biometrics 2006, 62, 10–18. [Google Scholar] [CrossRef] [Green Version]
  12. Ausín, M.C.; Gómez-Villegas, M.A.; González-Pérez, B.; Rodríguez-Bernal, M.T.; Salazar, I.; Sanz, L. Bayesian analysis of multiple hypothesis testing with applications to microarray experiments. Commun. Stat. Theory Methods 2011, 40, 2276–2291. [Google Scholar] [CrossRef]
  13. Scott, J.G.; Berger, J.O. An exploration of aspects of Bayesian multiple testing. J. Stat. Plan. Infer. 2006, 136, 2144–2162. [Google Scholar] [CrossRef]
  14. Gómez-Villegas, M.A.; Salazar, I.; Sanz, L. A Bayesian decision procedure for testing multiple hypotheses in DNA microarray experiments. Stat. Appl. Genet. Mol. Biol. 2014, 13, 49–65. [Google Scholar] [CrossRef] [PubMed]
  15. Sarkar, S.K.; Zhou, T.; Ghosh, D. A general decision theoretic formulation of procedures controlling FDR and FNR from a Bayesian perspective. Stat. Sin. 2008, 18, 925–945. [Google Scholar]
  16. Yuan, M.; Kendziorski, C. A unified approach for simultaneous gene clustering and differential expression identification. Biometrics 2006, 62, 1089–1098. [Google Scholar] [CrossRef] [PubMed]
  17. Marín, J.M.; Rodríguez-Bernal, M.T. Multiple hypothesis testing and clustering with mixtures of non-central t-distributions applied in microarray data analysis. Comput. Stat. Data Anal. 2012, 56, 1898–1907. [Google Scholar] [CrossRef]
  18. Sun, W.; Cai, T.T. Large-scale multiple testing under dependence. J. R. Stat. Soc. Ser. B-Stat. Methodol 2009, 71, 393–424. [Google Scholar] [CrossRef]
  19. Chi, Z. Effects of statistical dependence on multiple testing under a hidden Markov model. Ann. Statist. 2011, 39, 439–473. [Google Scholar] [CrossRef]
  20. Rayaprolu, S.; Chi, Z. Multiple Testing under Dependence with Approximate Conditional Likelihood. arXiv 2014, arXiv:1412.7778. [Google Scholar]
  21. Liu, J.; Zhang, C.; Burnside, E.S.; Page, D. Learning Heterogeneous Hidden Markov Random Fields. In Proceedings of the JMLR Workshop Conference Proceedings, Nha Trang City, Vietnam, 26–28 November 2014; Volume 33, pp. 576–584. [Google Scholar]
  22. Liu, J.; Peissig, P.; Zhang, C.; Burnside, E.; McCarty, C.; Page, D. Graphical-model based multiple testing under dependence, with applications to genome-wide association studies. In Proceedings of the Uncertainty in Artificial Intelligence: Conference on Uncertainty in Artificial Intelligence, Catalina Island, CA, USA, 14–18 August 2012; pp. 511–522. [Google Scholar]
  23. Liu, J.; Zhang, C.; Page, D. Multiple testing under dependence via graphical models. Ann. Appl. Stat. 2016, 10, 1699–1724. [Google Scholar] [CrossRef]
  24. Genest, C.; MacKay, J. The joy of copulas: Bivariate distributions with uniform marginals. Am. Stat. 1986, 40, 280–283. [Google Scholar] [CrossRef]
  25. Genest, C.; Ghoudi, K.; Rivest, L.P. A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 1995, 82, 543–552. [Google Scholar] [CrossRef]
  26. Sklar, M. Fonctions de repartition an dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris 1959, 8, 229–231. [Google Scholar]
  27. Joe, H. Multivariate Models and Dependence Concepts; Chapman & Hall/CRC: New York, NY, USA; London, UK; Washington, DC, USA, 1997; ISBN 10: 0412073315. [Google Scholar]
  28. Cherubini, U.; Luciano, E.; Vecchiato, W. Copula Methods in Finance; John Wiley & Sons: New York, NY, USA, 2004; ISBN 978-0-470-86345-9. [Google Scholar]
  29. Nelsen, R.B. An Introduction to Copulas; Springer Science & Business Media: New York, NY, USA, 2007; ISBN 0-387-28678-5. [Google Scholar]
  30. Diebolt, J.; Robert, C.P. Estimation of finite mixture distributions through Bayesian sampling. J. R. Stat. Soc. Ser. B-Stat. Methodol. 1994, 56, 363–375. [Google Scholar] [CrossRef]
  31. Feller, W. An Introduction to Probability Theory and Its Applications; John Wiley & Sons: New York, NY, USA; London, UK; Sidney, BC, Canada, 1966; Volume 2, ISBN 10: 0471257095. [Google Scholar]
  32. Kowalski, C.J. Non-normal bivariate distributions with normal marginals. Am. Stat. 1973, 27, 103–106. [Google Scholar] [CrossRef]
  33. Gelman, A.; Meng, X.L. A note on bivariate distributions that are conditionally normal. Am. Stat. 1991, 45, 125–126. [Google Scholar] [CrossRef]
  34. Zhao, H.; Chan, K.L.; Cheng, L.M.; Yan, H. Multivariate hierarchical Bayesian model for differential gene expression analysis in microarray experiments. BMC Bioinform. 2008, 9, S9. [Google Scholar] [CrossRef] [Green Version]
  35. Salazar, I. Aproximación bayesiana a los Contrastes de Hipótesis Múltiples Con Aplicaciones a los Microarrays; E-Prints Complutense: Madrid, Spain, 2011; ISBN 978-84-694-6254-6. [Google Scholar]
  36. Žežula, I. On multivariate Gaussian copulas. J. Stat. Plan. Infer. 2009, 139, 3942–3946. [Google Scholar] [CrossRef]
  37. Broët, P.; Richardson, S.; Radvanyi, F. Bayesian hierarchical model for identifying changes in gene expression from microarray experiments. J. Comput. Biol. 2002, 9, 671–683. [Google Scholar] [CrossRef]
  38. Patz, R.J.; Junker, B.W. A straightforward approach to Markov chain Monte Carlo methods for item response models. J. Educ. Behav. Stat. 1999, 24, 146–178. [Google Scholar] [CrossRef]
  39. Robert, C.; Casella, G. Monte Carlo Statistical Methods; Springer Science & Business Media: New York, NY, USA, 2013; ISBN 978-1-4757-3073-9. [Google Scholar]
  40. Müller, P.; Parmigiani, G.; Robert, C.; Rousseau, J. Optimal Sample Size for Multiple Testing: The Case of Gene Expression Microarrays. J. Am. Stat. Assoc. 2004, 99, 990–1001. [Google Scholar] [CrossRef] [Green Version]
  41. Do, K.A.; Müller, P.; Tang, F. A Bayesian mixture model for differential gene expression. J. R. Stat. Soc. Ser. C-Appl. Stat. 2005, 54, 627–644. [Google Scholar] [CrossRef] [Green Version]
  42. Genovese, C.; Wasserman, L. Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. B-Stat. Methodol. 2002, 64, 499–517. [Google Scholar] [CrossRef]
  43. Genovese, C.; Wasserman, L. Bayesian and Frequentist Multiple Testing. In Proceedings of the Seventh Valencia International Meeting, 2–6 June 2002, Bayesian Statistics 7; Bernardo, J.M., Bayarri, M.J., Berger, J.O., Dawid, A.P., Heckerman, D., Smith, A.F.M., West, M., Eds.; Oxford University Press: Oxford, UK, 2003; ISBN 0-19-852615-6. [Google Scholar]
  44. Spiegelhalter, D.J.; Best, N.G.; Carlin, B.P.; Van Der Linde, A. Bayesian measures of model complexity and fit. J. R. Statist. Soc. B-Stat. Methodol. 2002, 64, 583–639. [Google Scholar] [CrossRef] [Green Version]
  45. Pascual, V.; Medrano, L.; López-Palacios, N.; Bodas, A.; Dema, B.; Fernández-Arquero, M.; González-Pérez, B.; Salazar, I.; Núñez, C. Different gene expression signatures in children and adults with celiac disease. PLoS ONE 2016, 11, e0146276. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Table 1. Results for the model with Gaussian copulas and for data corresponding to 80% of true null hypotheses to distinct values of the prior distributions of p i .
Table 1. Results for the model with Gaussian copulas and for data corresponding to 80% of true null hypotheses to distinct values of the prior distributions of p i .
( α , β ) ( 0.5 , 1 ) ( 1 , 1 ) ( 1 , 0.5 ) ( 2 , 0.5 )
ρ ^ 0.730.770.7520.76
F D R ^ 0.2190.2980.0450.023
AcceptedRejectedAcceptedRejectedAcceptedRejectedAcceptedRejectedTotal
True039142537238139
False01101101101111
Total05014363713381250
Table 2. Results for the model with Gaussian copulas and for data corresponding to 50% of true null hypotheses to distinct values of the prior distributions of p i .
Table 2. Results for the model with Gaussian copulas and for data corresponding to 50% of true null hypotheses to distinct values of the prior distributions of p i .
( α , β ) ( 0.5 , 1 ) ( 1 , 1 ) ( 1 , 0.5 ) ( 2 , 0.5 )
ρ ^ 0.8030.8130.8040.773
F D R ^ 0.0890.170.0790.065
AcceptedRejectedAcceptedRejectedAcceptedRejectedAcceptedRejectedTotal
True02512419622325
False02502502502525
Total0501491931222850
Table 3. Results for the model with Gaussian copulas and for data corresponding to 20% of true null hypotheses to distinct values of the prior distributions of p i .
Table 3. Results for the model with Gaussian copulas and for data corresponding to 20% of true null hypotheses to distinct values of the prior distributions of p i .
( α , β ) ( 0.5 , 1 ) ( 1 , 1 ) ( 1 , 0.5 ) ( 2 , 0.5 )
ρ ^ 0.7730.780.7750.795
F D R ^ 0.040.0850.0540.07
AcceptedRejectedAcceptedRejectedAcceptedRejectedAcceptedRejectedTotal
True0120128411112
False03803803813738
Total050050842123850
Table 4. Results for the model with Clayton copulas and for data corresponding to 80% of true null hypotheses to distinct values of the prior distributions of p i .
Table 4. Results for the model with Clayton copulas and for data corresponding to 80% of true null hypotheses to distinct values of the prior distributions of p i .
( α , β ) ( 0.5 , 1 ) ( 1 , 1 ) ( 1 , 0.5 ) ( 2 , 0.5 )
θ c ^ 1.94 1.99 1.94 1.99
F D R ^ 0.280.3850.2580.27
AcceptedRejectedAcceptedRejectedAcceptedRejectedAcceptedRejectedTotal
True039102939039039
False0110111106511
Total0501040401045550
Table 5. Results for the model with Clayton copulas and for data corresponding to 50% of true null hypotheses to distinct values of the prior distributions of p i .
Table 5. Results for the model with Clayton copulas and for data corresponding to 50% of true null hypotheses to distinct values of the prior distributions of p i .
( α , β ) ( 0.5 , 1 ) ( 1 , 1 ) ( 1 , 0.5 ) ( 2 , 0.5 )
θ c ^ 1.89 1.93 2.01 1.97
F D R ^ 0.2260.290.1660.177
AcceptedRejectedAcceptedRejectedAcceptedRejectedAcceptedRejectedTotal
True02561925025025
False025124817111425
Total0507433317361450
Table 6. Results for the model with Clayton copulas and for data corresponding to 20% of true null hypotheses to distinct values of the prior distributions of p i .
Table 6. Results for the model with Clayton copulas and for data corresponding to 20% of true null hypotheses to distinct values of the prior distributions of p i .
( α , β ) ( 0.5 , 1 ) ( 1 , 1 ) ( 1 , 0.5 ) ( 2 , 0.5 )
θ c ^ 2.59 2.19 2.29 2.27
F D R ^ 0.2030.260.220.194
AcceptedRejectedAcceptedRejectedAcceptedRejectedAcceptedRejectedTotal
True0124812012012
False0381371523241438
Total0505452723361450
Table 7. DIC values for the different percentages of true null hypotheses and for the prior distributions of p i skewed to the right.
Table 7. DIC values for the different percentages of true null hypotheses and for the prior distributions of p i skewed to the right.
ModelGaussianCopulasClaytonCopulas
( α , β ) ( 1 , 0 . 5 ) ( 2 , 0 . 5 ) ( 1 , 0 . 5 ) ( 2 , 0 . 5 )
80%8548.3268540.2068961.5178973.411
% of true null hypotheses50%8682.9818696.989033.879034.77
20%8565.808543.0459114.7689130.447

Share and Cite

MDPI and ACS Style

Maria, E.C.J.; Salazar, I.; Sanz, L.; Gómez-Villegas, M.A. Using Copula to Model Dependence When Testing Multiple Hypotheses in DNA Microarray Experiments: A Bayesian Approximation. Mathematics 2020, 8, 1514. https://doi.org/10.3390/math8091514

AMA Style

Maria ECJ, Salazar I, Sanz L, Gómez-Villegas MA. Using Copula to Model Dependence When Testing Multiple Hypotheses in DNA Microarray Experiments: A Bayesian Approximation. Mathematics. 2020; 8(9):1514. https://doi.org/10.3390/math8091514

Chicago/Turabian Style

Maria, Elisa C. J., Isabel Salazar, Luis Sanz, and Miguel A. Gómez-Villegas. 2020. "Using Copula to Model Dependence When Testing Multiple Hypotheses in DNA Microarray Experiments: A Bayesian Approximation" Mathematics 8, no. 9: 1514. https://doi.org/10.3390/math8091514

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop