Next Article in Journal
A Method of Image Quality Assessment for Text Recognition on Camera-Captured and Projectively Distorted Documents
Previous Article in Journal
On a One-Dimensional Hydrodynamic Model for Semiconductors with Field-Dependent Mobility
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Convergence of the Benjamini–Hochberg Procedure

1
Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, 1113 Sofia, Bulgaria
2
Big Data for Smart Society (GATE) Institute, Sofia University “St. Kliment Ohridski”, 1113 Sofia, Bulgaria
3
Faculty of Mathematics and Informatics, Sofia University “St. Kliment Ohridski”, 1164 Sofia, Bulgaria
*
Authors to whom correspondence should be addressed.
Mathematics 2021, 9(17), 2154; https://doi.org/10.3390/math9172154
Submission received: 27 July 2021 / Revised: 30 August 2021 / Accepted: 31 August 2021 / Published: 3 September 2021
(This article belongs to the Section Probability and Statistics)

Abstract

:
The Benjamini–Hochberg procedure is one of the most used scientific methods up to date. It is widely used in the field of genetics and other areas where the problem of multiple comparison arises frequently. In this paper we show that under fairly general assumptions for the distribution of the test statistic under the alternative hypothesis, when increasing the number of tests, the power of the Benjamini–Hochberg procedure has an exponential type of asymptotic convergence to a previously shown limit of the power. We give a theoretical lower bound for the probability that for a fixed number of tests the power is within a given interval around its limit together with a software routine that calculates these values. This result is important when planning costly experiments and estimating the achieved power after performing them.

1. Introduction

In many modern studies scientists perform the same statistical test on many objects, e.g., genes or single-nucleotide polymorphisms. The number of these objects could be very large, say millions, resulting in a multiple comparison issue. This is due to the fact that even under the null hypothesis we expect some of these tests to have p-values below a predetermined significance level α . There are various techniques to overcome this problem with the Benjamini–Hochberg method [1] being the most used. Given α , for a set of m p-values, the method rejects the tests corresponding to the smallest k p-values, where k is the largest index such that p ( k ) k α / m . The method controls the FDR (false discovery rate) in a sense that the expected proportion of false discoveries out of all discoveries (rejected hypotheses) is not more than α when the tests are independent as shown in the original paper [1]. In some cases, rather than finding the cutoff index for rejecting tests, the so-called Benjamini–Hochberg-adjusted p-values are used; these are the p-values transformed by the formula on the right-hand side of the above inequality and truncated at 1.
In the field of genetics it is used in many software packages, for example, DESeq2 [2], edgeR [3], cuffdiff [4], GenomeStudio (Illumina). It is also often used when studying various types of images, e.g., PET or CT [5], fMRI [6], or astrophysical [7].
Despite the fact that there are improved methods either in terms of weaker assumptions, e.g., dependency or better statistical power, e.g., [8,9,10], the Benjamini–Hochberg method is one of the most cited scientific articles, determined to be 59th most cited article in [11]. We should underline that in this work we are not interested in improving the Benjamini–Hochberg method, but rather than that in inferring about the statistical power of the original method as it is the most used one in practice. Moreover, article [12] shows that no correction of the standard methods is needed in certain cases even when there is some degree of dependence among the tests. See Section 5 for more information and related work.
Calculating the statistical power, that is the probability of correctly rejecting the null when the alternative hypothesis is true, is an important task when designing scientific studies. For example it allows us to estimate the required sample size, and therefore the budget needed to perform the experiment, in order to make sure that we have a reasonable chance of detecting the objects for which there is indeed a difference between the two groups. After performing the study, it allows us to determine whether we have found a relatively large proportion of significant objects.
For practical applications we are interested not only in the limit of the statistical power as shown in previous works (see Section 5), but also in the speed of convergence to that limit. The speed of convergence would allow us to find the probability that the statistical power is within a given range around its limit. Consequently this would allow us to calculate the probability that the statistical power is above a pre-determined threshold, typically 80%. In cases when this probability is relatively low, one would have to increase the sample size in order to have more favorable parameters of the Beta distribution that models the p-values under the alternative hypothesis.
In more detail, in this work we show that under natural condition, that we call (M), the convergence is exponential and give explicit expression of the constants involved in the asymptotics. That condition allows the usage of suitable Beta distributions amongst others. The relationship between our condition and the non-criticality condition in [13] will be discussed below. Condition (M) allows us to find a lower bound for the probability that for fixed parameters of the model the power is within given interval around that asymptotic limit. In this sense our work can be understood as a tool which can be used in the planning of the experiment and which complements [14], another article by the authors of [15], which deals with a variety of questions, such as estimation of the proportion of significant tests, the effect of the sample size to the quality of results, etc.

2. Main Results and Their Proofs

2.1. Some Preliminary Notation

Let X 1 , X 2 , , X m , m N + , with N + = 1 , 2 , , be independent random variables that are defined on a common probability space Ω , F , P . For some 1 m 0 < m the random variables X j j m 0 are assumed to have a common cumulative distribution function (c.d.f.) F, whereas X j m 0 < j m are assumed to have a common c.d.f. G. In this work we set F to be the c.d.f. of the uniform random variable on 0 , 1 and let for the time being G be any continuous c.d.f. We note that even if G is discrete, one can always approximate it with a suitable continuous distribution. We also work with the condition that m 0 m = γ 0 , 1 is fixed and set m 1 : = m m 0 0 , m , which is strictly non-binding since we develop exponential bounds that are valid with different degree of accuracy for any fixed m 0 , m and the respective ratio m 0 m . It will be usually the case that γ is larger than 1 / 2 as the first m 0 random variables represent the non-significant observations. If X m ( i ) , i m , is i-th order statistics of X 1 , X 2 , , X m then for a fixed level of rejection α 0 , 1 , the Benjamini–Hochberg procedure declares significant the first R m order statistics, where
R m = max i : X m ( i ) α i m .
The truly non-significant tests have c.d.f F and the truly significant tests possess c.d.f. G. It is well known from [15] (1.2) that for x 0 , 1 we can express the event R m m x in the following fashion
R m m x = max t A α m , x H m ( t ) t t 1 α 1 ,
where H m ( t ) = 1 m j = 1 m I X j t is the empirical c.d.f. of X 1 , X 2 , , X m and
A α m , x = k m α : m x k m ,
where y = min m N + : m y is the ceiling function. We can express H m as
H m ( t ) = γ m 0 j = 1 m 0 I X j t + 1 γ m 1 j = m 0 + 1 m I X j t = γ F m 0 ( t ) + 1 γ G m 1 ( t ) ,
where F m 0 is the empirical c.d.f. of X 1 , X 2 , , X m 0 and G m 1 is the empirical c.d.f. of X m 0 + 1 , , X m . Therefore a simple rearrangement yields that
H m ( t ) t t = γ F m 0 ( t ) t t + 1 γ G m 1 ( t ) G ( t ) t + 1 γ G ( t ) t t .

2.2. General Inequalities and Information concerning the Distribution of R m

We introduce the function c ( ϵ ) = ( 1 + ϵ ) ln 1 + ϵ ϵ , for ϵ > 0 , which is investigated in Proposition A1 in Appendix A. Then the following inequalities hold true.
Proposition 1.
For any distribution function T with T ( α x ) > 0 and empirical cumulative distribution function T n ( · ) = 1 n j = 1 n I X j · , where X j j 1 are i.i.d. random variables with c.d.f. T, any k , m N + , any α , x , ρ 0 , 1 and A α ( x , m ) as in (3) above, we have, for every ϵ > 0 , that
P ρ max t A α ( m , x ) | T k ( t ) T ( t ) t | > ϵ 2 min m ( 1 x ) e k inf t A α ( m , x ) T ( t ) c ϵ ρ t T ( t ) , e 2 k ρ 2 ϵ α x 2 : = K T ( x , m , k , ρ , ϵ , α ) .
Finally, if T ( t ) c ϵ ρ t T ( t ) as a function of t is non-decreasing on α x , α we get that
P ρ max t A α ( m , x ) | T k ( t ) T ( t ) t | > ϵ 2 min m ( 1 x ) e k T ( α x ) c ϵ ρ α x T ( α x ) , e 2 k ρ 2 ϵ α x 2 = K T ( x , m , k , ρ , ϵ , α ) .
Remark 1.
Let us compute K T when T ( t ) = F ( t ) = t for t 0 , 1 . Then T ( t ) c ϵ ρ t T ( t ) = t c ϵ ρ is non-decreasing in t and substituting in the right-hand side of (6) we deduct that
K F ( x , m , k , ρ , ϵ , α ) = 2 min m ( 1 x ) e k α x c ϵ ρ , e 2 k ρ 2 ϵ α x 2 .
Remark 2.
Given that c ( ϵ ) min 1 , ϵ 2 2 e , see Proposition A1 we may eradicate the dependence on the function c in K T , see (5) by providing an upper bound for it:
K T ( x , m , k , ρ , ϵ , α ) 2 min m ( 1 x ) e k min T α x , inf t A α ( m , x ) ϵ 2 ρ 2 t 2 2 e T ( t ) , e 2 k ρ 2 ϵ α x 2 .
Remark 3.
Inspecting the proof below one easily discovers that for computational purposes it is better to use the sharper estimate
P ρ max t A α ( m , x ) | T k ( t ) T ( t ) t | > ϵ 2 min t A α ( m , x ) e k T ( t ) c ϵ ρ t T ( t ) , e 2 k ρ 2 ϵ α x 2 .
For theoretical purposes the bounds stated in the theorem are more suitable.
Proof of Proposition 1.
The second part of the minimum (5) follows from the Dvoretsky–Kiefer–Wolfowitz inequality, see [16], which gives, for any x , ϵ > 0 that
P ρ max t A α ( m , x ) | T k ( t ) T ( t ) t | > ϵ P ρ sup t α x , α T k ( t ) T ( t ) t > ϵ P sup t α x , α T k ( t ) T ( t ) > ϵ α x ρ 2 e 2 k ϵ α x 2 ρ 2 .
In Proposition A1 we use the function c ( ϵ ) = 1 + ϵ ln 1 + ϵ ϵ . The first part of the minimum of (5) is immediate from Proposition A1 since k T k ( t ) = j = 1 k I X j t is binomially distributed with parameters k and T ( t ) = P X 1 t , and thus
P ρ max t A α ( m , x ) | T k ( t ) T ( t ) t | > ϵ t A α ( m , x ) P | k T k ( t ) k T ( t ) | > ϵ t ρ T ( t ) k T ( t ) 2 t A α ( m , x ) e k T ( t ) c ϵ ρ t T ( t ) 2 | A α ( m , x ) | e k inf t A α ( m , x ) T ( t ) c ϵ ρ t T ( t ) 2 m ( 1 x ) e k inf t A α ( m , x ) T ( t ) c ϵ ρ t T ( t ) ,
where we have used the fact that the number of elements in A α ( m , x ) is bounded from above as follows | A α ( m , x ) | m ( 1 x ) . Next, (6) is deduced from the assumption that T ( t ) c ϵ ρ t T ( t ) as a function of t is non-decreasing on α x , α and the simple fact A α ( m , x ) α x , α .  □
Applying Proposition 1 with T = F and T = G and respectively m 0 = m γ and m 1 = m ( 1 γ ) and ρ = γ and ρ = 1 γ we can confirm from (4) the proof of Theorem 3.1 in [15] in the sense that we have the convergence in probability
lim m max t α m x m , , α m 1 m , α H m ( t ) t t = P 1 γ sup α x , α G ( t ) t t = : u α , γ ( x ) .
Clearly, u α , γ : 0 , 1 R , with u α , γ ( 1 ) = G ( α ) α α , is a non-increasing function on 0 , 1 . Denote
x ̲ α , γ β = inf { x 0 , 1 : u α , γ ( x ) 1 β 1 } , α , β 0 , 1 ,
and
x ¯ α , γ β = inf { x 0 , 1 : u α , γ ( x ) < 1 β 1 } , α , β 0 , 1 .
Since u α , γ is non-increasing we have that x ¯ α , γ β x ̲ α , γ β . Therefore from relations (2) and (4) we can easily get that
lim m P R m m x = 1 , for x < x ̲ α , γ α lim m P R m m x = 0 , for x > x ¯ α , γ α .
We can provide more precise information about probabilities of the type P R m m < x in the general case. In the sequel we use I A for the indicator function of the set A.
Proposition 2.
We have for any ϵ > 0 and x 0 , 1 that
P R m m < x 1 K F ( x , m , m 0 , γ , ϵ , α ) K G ( x , m , m 1 , 1 γ , ϵ , α ) I 1 γ max t A α ( x , m ) G ( t ) t t 1 α 1 2 ϵ .
Proof. 
Using successively (2), (4) and (10) with ρ = γ , k = m 0 = γ m for the probability involving F m 0 and ρ = 1 γ , k = m 1 = 1 γ m for the expression concerning G m 1 , we arrive at the inequalities
P R m m < x = 1 P R m m x = 1 P max t A α ( x , m ) H m ( t ) t t 1 α 1 1 P γ max t A α ( x , m ) F m 0 ( t ) t t > ϵ P 1 γ max t A α ( x , m ) G m 1 ( t ) G ( t ) t > ϵ I 1 γ max t A α ( x , m ) G ( t ) t t 1 α 1 2 ϵ × × P max t A α ( x , m ) γ F m 0 ( t ) t t , ( 1 γ ) G m 1 ( t ) G ( t ) t ϵ 1 K F ( x , m , m 0 , γ , ϵ , α ) K G ( x , m , m 1 , 1 γ , ϵ , α ) I 1 γ max t A α ( x , m ) G ( t ) t t 1 α 1 2 ϵ .
We only note that in the last expression of the first inequality above we have used that once the dependence on F m 0 , G m 1 is estimated away, the remaining term is deterministic and thus the indicator function appears.  □

2.3. Condition (M)

In view of the fact that Theorem 3.1 in [15] holds, the limit in probability, that is lim m R m m = x α , γ exists if and only if x ̲ α , γ α = x ¯ α , γ α = x α , γ . We shall therefore restrict our attention only to the case when for fixed α we have that the function G ( t ) t t is decreasing on 0 , α . In this case we have from (11) that
u α , γ ( x ) = ( 1 γ ) G ( α x ) α x α x .
If in addition lim t 0 G ( t ) t t = then
x α , γ β : = x ̲ α , γ β = x ¯ α , γ β , for any β < 1 ,
and if β = α then we set x α , γ : = x α , γ α . From now on we work under this condition which we call condition (M) and which is synthesized as follows.
Definition 1.
We say that condition (M) holds for fixed α 0 , 1 if and only if H ( t ) : = G ( t ) t t is decreasing on 0 , α with lim t 0 H ( t ) = .
Remark 4.
As mentioned in [17], “[u]nder the alternative hypothesis, the p-values will have a distribution that has high density for small p-values and the density will decrease as the p-values increase”. Therefore when modeling the p-values under the alternative hypothesis with a Beta distribution, B ( a , b ) , its parameters should be so that a < 1 and b 1 . Note that condition (M) also holds when b < 1 and a < ( 1 2 α ) / ( 1 α ) , although that case should not happen for well-behaved statistical procedures.
Remark 5.
Note that in our context Equation (2.2) of [13] translates to α * = inf u > 0 { u / G ( u ) } and as it is mentioned there, the asymptotic of the Benjamini–Hochberg procedure exhibits a very different behavior on the regions α 0 , α * and α > α * . In the same paper it is demonstrated that when G ( t ) / t 0 then the overwhelming proportion of discoveries will be amongst the non-significant features. Under our assumptions it is always true that α * = 0 and therefore we are always in the second regime, for which [13] provides a law of the iterated logarithm for R n , see Theorem 2.2 of [13]. In more detail it is shown that the precise estimate
lim sup n ± R n x α , γ n log log ( n ) = x α , γ 1 x α , γ 1 α G x α , γ
holds true.
A very interesting contribution is made in Theorems 3.1 and 3.2 in [18]. The author finds a Donsker-type of convergence for the so-called threshold procedures, which are essentially functionals on the empirical functions H m ( t ) which recover, for example, the false discovery rate, see Theorem 3.2 in [18]. This implies a convergence rate 1 / n in | R n / n x α , γ | and is applicable in a variety of other examples. Moreover, their results go beyond the scope of the Benjamini–Hochberg procedure and capture different procedures some of which are of higher power than the Benjamini–Hochberg procedure. In the particular case of the Benjamini–Hochberg procedure, however, we know from [15] that the probabilities of the events | R n n x α , γ | β x α , γ and of | S n n x α , γ 1 α γ 1 γ | β x α , γ 1 α γ 1 γ , where S n / n is the power of that procedure, converge to 1. Our Theorems 1 and 3 strengthen this by showing that the speed of convergence is of an exponential order. As discussed earlier, this allows a direct estimate on the probability that S n / n is in a prescribed interval.
Remark 6.
We note that upon the validity of condition (M) it is impossible that G has atoms on 0 , α as otherwise if a 0 , α is an atom then
lim t a H ( t ) = H ( a ) > lim t a H ( t ) = H ( a ) ,
which leads to a contradiction with the assumption that H is decreasing. Thus, our initial requirement that G is continuous is included in condition (M). This also means that H is continuous on 0 , α and thus x ̲ α , γ β = x ¯ α , γ β = x α , γ β .
We then have the following elementary lemma.
Lemma 1.
Condition (M) is valid for fixed α 0 , 1 if G has on 0 , α a density g = G satisfying lim x 0 g ( x ) = g ( 0 ) = and t g ( t ) G ( t ) < 0 on 0 , α . The latter is satisfied if, for example, g ( t ) < 0 on 0 , α .
Proof. 
From the assumptions it is clear that on 0 , α
d H ( t ) d t = g ( t ) 1 t 0 t g ( s ) 1 d s t 2 < 0 t g ( t ) G ( t ) < 0 .
Furthermore, since lim t 0 G ( t ) t = g ( 0 ) = we conclude that lim t 0 H ( t ) = . Thus the first part of the proof is settled. Clearly, t g ( t ) G ( t ) < 0 if g ( t ) < 0 on 0 , α and the overall proof is thus completed.  □
The next lemma ensures an easier expression for K G , see (5) and (6), provided condition (M) holds.
Lemma 2.
Let condition (M) hold for G and m 1 = m ( 1 γ ) . Then we have that
K G M x , m , 1 γ , ϵ , α = K G ( x , m , m 1 , 1 γ , ϵ , α ) = 2 min m ( 1 x ) e m ( 1 γ ) G ( α x ) c ϵ 1 γ α x G ( α x ) , e 2 m 1 γ ϵ α x 2 .
The same holds for the c.d.f. of the uniform distribution F. In particular if γ > 1 / 2 then
K F M x , m , γ , ϵ , α = K F ( x , m , m 0 , γ , ϵ , α ) = 2 min m ( 1 x ) e m γ α x c ϵ γ , e 2 m γ ϵ α x 2 .
Remark 7.
Under condition (M) we have that G ( t ) c d t G ( t ) is non-decreasing on 0 , α , for any d > 0 , since t G ( t ) is non-decreasing on 0 , α and c is increasing on 0 , , see Proposition A1. Hence the function K G M x , m , 1 γ , ϵ , α is decreasing in x as long as the other parameters stay fixed. The same is valid for K F M .
Proof of Lemma 2.
The proof is immediate from (6) since under condition (M) we have that the function G ( t ) c ϵ 1 γ t G ( t ) is non-decreasing on 0 , α , see Remark 7. We just note that by definition m 1 = m ( 1 γ ) and we have employed this for k in (6) to derive (19). Finally, (20) follows from (7).  □

2.4. Theoretical Bounds on the Distribution of R m under Condition ( M)

If condition (M) holds we have the identity equivalent to (17)
max t A α ( x , m ) H ( t ) = G α m x m α m x m α m x m = H α m x m
and with β = α relation (18) together with (12) and (13) gives that
1 α 1 = ( 1 γ ) H α x α , γ = 1 γ G α x α , γ α x α , γ α x α , γ .
Let β > 0 and denote using (21)
m G ( x , β , α , γ ) : = inf { k N + : max m k I 1 γ max t A α ( x , m ) H ( t ) β = 0 } = inf { k N + : max m k I 1 γ H α m x m β = 0 } .
We then have the following key result:
Theorem 1.
Let condition (M) hold for given G and fixed α 0 , 1 with G ( α ) < 1 . Fix γ 0 , 1 . Recall from (18) that x α , γ = x ̲ α , γ α = x ¯ α , γ α . Then lim m R m m = x α , γ 0 , 1 . Moreover, for any x x α , γ , 1 , ϵ ( x ) such that for any 0 ϵ ϵ ( x ) the inequality 1 γ H α x < 1 α 1 2 ϵ holds and then for any m m G ( x , 1 α 1 2 ϵ , α , γ ) we have that
P R m m x 1 K F M ( x , m , γ , ϵ , α ) K G M ( x , m , 1 γ , ϵ , α ) .
Finally, for any x 0 , x α , γ , ϵ ( x ) such that for any 0 ϵ ϵ ( x ) the inequality 1 γ H α x > 1 α 1 + 2 ϵ holds and then for any m m G ( x , 1 α 1 + 2 ϵ , α , γ )
P R m m < x K F M ( x , m , γ , ϵ , α ) + K G M ( x , m , 1 γ , ϵ , α ) .
Remark 8.
It seems important to evaluate or approximate m G ( x , 1 α 1 2 ϵ , α , γ ) with say m so that (24) is valid beyond this estimate, that is for m m . If H d : = inf t 0 , α | H ( t ) | > 0 , since α x < α m x m , 1 > x > x α , γ and H is decreasing on 0 , α we get using 22 in the first relation
1 α 1 = ( 1 γ ) H α x α , γ = 1 γ H α x α , γ H ( α x ) + 1 γ H α x 1 γ H d α x x α , γ + 1 γ H α x .
However, if 1 γ H d α x x α , γ > 2 ϵ we get from the last observation that
1 γ H α m x m 1 γ H α x 1 α 1 1 γ H d α x x α , γ < 1 α 1 2 ϵ = 1 α ϵ
for any m 1 and therefore if 1 γ α H d x x α , γ > 2 ϵ then
m G ( x , 1 α 1 2 ϵ , α , γ ) = inf { k N : max m k I 1 γ H α m x m 1 α 1 2 ϵ = 0 } = 1 .
and (24) is always valid. Of course various similar and more precise estimates can be achieved, but these are beyond the scope of this work.
Remark 9.
Considering Remark 3 the inequalities (24) and (25) can be improved for computational purposes by using as an upper bound the expression
2 min t A α ( m , x ) e k T ( t ) c ϵ ρ t T ( t ) , e 2 k ρ 2 ϵ α x 2
for K T M , T = G , F with k = m 1 , m 0 ; ρ = 1 γ , γ respectively.
Remark 10.
We point out that in our setting lim m R m m > 0 and there have been a number of works that attempt to offer FDR even under weak dependence structure between the random variables following the uniform distribution or X j j m 0 . We point out, for example, to [19]. There is an interesting contribution, see [20], which discusses difficulties in controlling FDR under weak dependence in the case when lim m R m m = 0 .
Proof of Theorem 1.
Under condition (M) we have that x ̲ α , γ α = x ¯ α , γ α : = x α , γ , see (18). Clearly, x α , γ 0 , 1 since u α , γ ( 0 ) = and u α , γ ( 1 ) = G ( α ) α α < 1 α 1 , see (17), and by assumption G ( α ) < 1 . The fact that lim m R m m = x α , γ 0 , 1 follows from Theorem 3.1 in [15]. Henceforth, it remains to show (24) and (25). The former thanks to (16) and (19) follows for any m m G ( x , 1 α 1 2 ϵ , α , γ ) , if m G ( x , 1 α 1 2 ϵ , α , γ ) < , since then
I 1 γ max t A α ( x , m ) G ( t ) t t 1 α 1 2 ϵ = I 1 γ H α m x m 1 α 1 2 ϵ = 0 .
However, since lim m α m x m = α x , x > x α , γ and H decreasing on 0 , α , then
lim m 1 γ H α m x m = 1 γ H ( α x ) < 1 γ H ( α x α , γ ) = 1 α 1 .
Thus, for all ϵ small enough, we conclude that 1 γ H ( α x ) < 1 α 1 2 ϵ and thus m G ( x , 1 α 1 2 ϵ , α , γ ) < , see (23). Hence, (24) follows.
To prove (25) for any x 0 , x α , γ we work with the complement event, that is R m m x . Similarly as in (16) and utilizing (4) and (21) we get that on the event
max t A α ( x , m ) F m 0 ( t ) t t ϵ γ ; max t A α ( x , m ) G m 1 ( t ) G ( t ) t ϵ 1 γ
we have that max t A α ( x , m ) H m ( t ) t t 1 γ H α m x m 2 ϵ and therefore
P R m m x P R m m x ; max t A α ( x , m ) F m 0 ( t ) t t ϵ γ ; max t A α ( x , m ) G m 1 ( t ) G ( t ) t ϵ 1 γ P max t A α ( x , m ) F m 0 ( t ) t t ϵ γ ; max t A α ( x , m ) G m 1 ( t ) G ( t ) t ϵ 1 γ × I 1 γ H α m x m 1 α 1 + 2 ϵ .
Again as m x m x < x α , γ and H ( α x ) > H ( α x α , γ ) we conclude that for all ϵ small enough the inequality ( 1 γ ) H ( α x ) 1 α 1 + 2 ϵ holds and we have that
lim m I 1 γ H α m x m 1 α 1 + 2 ϵ = 1 .
From Proposition 1, the expressions for K F M , K G M in Lemma 2 and the independence of X j j m 0 and X j m j > m 0 we have that
P max t A α ( x , m ) F m 0 ( t ) t t ϵ γ ; max t A α ( x , m ) G m 1 ( t ) G ( t ) t ϵ 1 γ = P max t A α ( x , m ) F m 0 ( t ) t t ϵ γ P max t A α ( x , m ) G m 1 ( t ) G ( t ) t ϵ 1 γ 1 K F M ( x , m , γ , ϵ , α ) 1 K G M ( x , m , 1 γ , ϵ , α ) .
and we conclude the proof of the theorem via the now trivial relations
P R m m < x = 1 P R m m x 1 P R m m x ; max t A α ( x , m ) F m 0 ( t ) t t ϵ γ ; max t A α ( x , m ) G m 1 ( t ) G ( t ) t ϵ 1 γ 1 1 K F M ( x , m , γ , ϵ , α ) 1 K G M ( x , m , 1 γ , ϵ , α ) × × I 1 γ H α m x m 1 α 1 + 2 ϵ
and an appeal to (26).  □
For practical reasons and especially for the accompanying software routine it is optimal to choose ϵ in Theorem 1 as large as possible. The latter is due to fact that the bounds for K F M , K G M , see (19) and (20), are decreasing in ϵ . Since in practical computations m is usually available, it is then beneficial to choose the largest ϵ as a function in x , m in a way that the previous theorem holds. In this direction we have the following result:
Theorem 2.
Under the conditions of Theorem 1 and m N + , x x α , γ , 1 fixed we have that
P R m m x 1 K F M ( x , m , γ , ϵ G , α ) K G M ( x , m , 1 γ , ϵ G , α ) ,
where
ϵ G = ϵ ( x , m ) = 1 2 1 α 1 1 γ 2 H α m x m = 1 γ 2 H α x α , γ H α m x m .
Similarly, if m N , x 0 , x α , γ are fixed and m x m < x α , γ then
P R m m < x K F M ( x , m , γ , ϵ G , α ) + K G M ( x , m , 1 γ , ϵ G , α ) ,
where
ϵ G = ϵ ( x , m ) = 1 γ 2 H α m x m 1 2 1 α 1 = 1 γ 2 H α m x m H α x α , γ .
Proof. 
To show (28) with ϵ G as in (29) it suffices to show that for the chosen ϵ G and the given m , x we have that
I 1 γ max t A α ( x , m ) G ( t ) t t 1 α 1 2 ϵ = 0
in (16). However, as condition (M) holds, then
max t A α ( x , m ) G ( t ) t t = max t A α ( x , m ) H ( t ) = H α m x m
and we see that ϵ G is the maximal possible choice such that
I 1 γ H α m x m 1 α 1 2 ϵ = 0 .
For the second part of (29) we only recall that 1 γ H α x α , γ = 1 α 1 , see (22). The rest of the theorem, that is (30) and (31), follows in a similar fashion from (27) wherein the expression of (31) is the maximal ϵ such that for the fixed m , x 0 , x α , γ
I 1 γ H α m x m 1 α 1 + 2 ϵ = 1 .
We only note that in this instance we need to have that x < m x m < x α , γ , which is part of the assumptions of the theorem.  □

2.5. Exponential Convergence to the Theoretical Limit of the Power of the Benjamini–Hochberg Procedure

We proceed by studying the question about the power of the test. It is clear that if we denote by
S m = j = 1 m 1 I X j α R m m ,
where X j 1 j m 1 are i.i.d. with c.d.f G then
S m * = S m m 1
is the proportion of tested elements that we have correctly identified as significant or the power of the test. If one wishes to understand the behavior of S m * with the increase of m one ought to decouple its dependence on R m m in (32). Theorem 1 aids in this matter. We note that if we choose x 1 < lim m R m m = x α , γ < x 2 , see (18) for the definition of x α , γ , then from Theorem 1 we have for every
m max m G ( x 1 , 1 α 1 2 ϵ , α , γ ) ; m G ( x 2 , 1 α 1 + 2 ϵ , α , γ ) : = m * ( x 1 , x 2 , ϵ , α , γ )
with ϵ min ϵ ( x 1 ) , ϵ ( x 2 ) that
P R m m x 1 , x 2 = P R m m x 2 P R m m < x 1 1 K F M ( x 2 , m , γ , ϵ , α ) + K G M ( x 2 , m , 1 γ , ϵ , α ) K F M ( x 1 , m , γ , ϵ , α ) + K G M ( x 1 , m , 1 γ , ϵ , α ) .
Then we can formulate the following fundamental result that offers exponential bounds on the convergence of S m * .
Theorem 3.
Let condition (M) be valid and assume the notation of Theorem 1. Let next x α , γ x 1 , x 2 0 , 1 . Then there exists m * = m * ( x 1 , x 2 , ϵ , α , γ ) and ϵ * = ϵ ( x 1 , x 2 ) such that for any m m * and any ϵ ϵ *
P 1 m 1 j = 1 m 1 I X j α x 1 S m * 1 m 1 j = 1 m 1 I X j α x 2 1 K F M ( x 1 , m , γ , ϵ , α ) + K G M ( x 1 , m , 1 γ , ϵ , α ) K F M ( x 2 , m , γ , ϵ , α ) + K G M ( x 2 , m , 1 γ , ϵ , α ) .
Clearly, the random variables j = 1 m 1 I X j α x are Binomial with parameters m 1 and G ( α x ) = P X 1 α x . Henceforth,
lim m S m * = G ( α x α , γ ) = x α , γ 1 α γ 1 γ .
Moreover, for any η 0 , 1 with m , ϵ as above, we have that
P 1 η G α x 1 S m * 1 + η G α x 2 1 K F M ( x 1 , m , γ , ϵ , α ) + K G M ( x 1 , m , 1 γ , ϵ , α ) K F M ( x 2 , m , γ , ϵ , α ) + K G M ( x 2 , m , 1 γ , ϵ , α ) e m c ( η ) ( 1 γ ) G α x 1 e m c ( η ) ( 1 γ ) G α x 2
where c η = ( 1 + η ) ln 1 + η η , η > 0 , is considered in Proposition A1.
Remark 11.
We note that (36) has been proved in Corollary 3.3 in [15], but here we offer some exponential bounds on this convergence.
Proof of Theorem 3.
The proof of (35) follows immediately from (32), (34) and the considerations thereabout since
P 1 m 1 j = 1 m 1 I X j α x 1 S m * 1 m 1 j = 1 m 1 I X j α x 2 P R m m x 1 , x 2 .
The fact that j = 1 m 1 I X j α x is Binomial with parameters m 1 and G ( α x ) = P X 1 α x is straightforward which with the help of (35) and
lim m K F M ( x 1 , m , γ , ϵ , α ) + K G M ( x 1 , m , 1 γ , ϵ , α ) + lim m K F M ( x 2 , m , γ , ϵ , α ) + K G M ( x 2 , m , 1 γ , ϵ , α ) = 0
prove (36). Indeed, the strong law of large numbers gives almost surely that
lim m 1 1 m 1 j = 1 m 1 I X j α x = G ( α x )
and from (35) lim m S m * = G ( α x α , γ ) . Next, from (22) we get that
H ( α x α , γ ) = G ( α x α , γ ) α x α , γ α x α , γ = 1 α α
and henceforth G ( α x α , γ ) = x α , γ 1 α γ 1 γ . The latter is the limit on the right-hand side of (36). The very last relation, that is (37), follows again from (35) and an application of Proposition A1 which yields in our case with m 1 = m 1 γ the bounds
P 1 m 1 j = 1 m 1 I X j α x 2 1 + η G α x 2 e c ( η ) ( 1 γ ) m G α x 2
and
P 1 m 1 j = 1 m 1 I X j α x 1 1 η G α x 1 e c ( η ) ( 1 γ ) m G α x 1 .
This concludes the proof.  □
As in the case of R m m we state a more practical version of Theorem 3 which is aimed to compute the largest ϵ for fixed m , x 1 , x 2 . It is in the spirit of Theorem 2.
Theorem 4.
Let condition (M) be valid and assume the notation of Theorem 2. Let m N + be fixed. Let next x α , γ x 1 , x 2 0 , 1 and x 1 is such that m x 1 m < x α , γ . Finally, let η 0 , 1 be fixed. Then with
ϵ 1 = ϵ ( x 1 , m ) = 1 2 1 α 1 1 γ 2 H α m x m = 1 γ 2 H α x α , γ H α m x m ,
and
ϵ 2 = ϵ ( x 2 , m ) = 1 γ 2 H α m x m 1 2 1 α 1 = 1 γ 2 H α m x m H α x α , γ
we have that
P 1 η G α x 1 S m * 1 + η G α x 2 max { 1 K F M ( x 1 , m , γ , ϵ 1 , α ) + K G M ( x 1 , m , 1 γ , ϵ 1 , α ) K F M ( x 2 , m , γ , ϵ 2 , α ) + K G M ( x 2 , m , 1 γ , ϵ 2 , α ) e m c ( η ) ( 1 γ ) G α x 1 e m c ( η ) ( 1 γ ) G α x 2 ; 0 }
and c η = ( 1 + η ) ln 1 + η η is considered in Proposition A1. For a one-sided bound we get the inequality
P 1 η G α x 1 S m * max { 1 K F M ( x 1 , m , γ , ϵ 1 , α ) K G M ( x 1 , m , 1 γ , ϵ 1 , α ) e m c ( η ) ( 1 γ ) G α x 1 ; 0 } .
Remark 12.
Noting Remark 9 we see that the expressions involving K F M , K G M can be substituted with
2 min t A α ( m , x ) e k T ( t ) c ϵ ρ t T ( t ) , e 2 k ρ 2 ϵ α x 2 ,
where T is either F or G.
Proof of Theorem 4.
The proof is immediate but we note that the assumption m x 1 m < x α , γ and the choice of ϵ 1 , ϵ 2 are such that Theorem 2 applies to the right-hand side of (38), which yields an inequality identical to (35). The rest is as in the proof of Theorem 3.  □

3. Workflow

In this section we consider the case when the non-uniform distribution of X j m 0 < j m is Beta with parameters a , b , that is G · , a , b is the cdf of B ( a , b ) , where a < 1 and b > 1 are taken so that condition (M) is satisfied. In this case Lemma 1 is valid and condition (M) holds since elementary calculations yield that g a , b < 0 on ( 0 , α ) and lim x 0 g a , b ( x ) = , where g a , b is the probability density function of B ( a , b ) .
For a given number of hypotheses m, level of significance α and proportion of significant hypotheses γ , we are interested in finding a theoretical bound for the probability that the power is between a number l and 1, where of course only l < G ( α x α , γ , a , b ) makes sense to be considered. We shall only provide a step by step description of the implementation and we shall refrain from developing a particular, explicit theoretical example for given values of a , b since the latter involves tedious, non-instructive and lengthy computations. We proceed with describing the workflow that has been implemented numerically in R:
  • we calculate x α , γ by solving numerically Equation (22) wherein G ( · ) = G · , a , b ;
  • using the value of x α , γ , we calculate the power limit of S m * , see (33), i.e., G ( α x α , γ , a , b ) , according to (36);
  • for given l < G ( α x α , γ , a , b ) , any given η and x 1 is such that m x 1 m < x α , γ we have an estimate based on (42), that is
    P 1 η G α x 1 S m * max { 1 K F M ( x 1 , m , γ , ϵ 1 , α ) K G M ( x 1 , m , 1 γ , ϵ 1 , α ) e m c ( η ) ( 1 γ ) G α x 1 , a , b ; 0 } ;
  • fixing η we attempt to solve (numerically) in x 1 the equation
    l = 1 η G α x 1 , a , b ;
  • if m x 1 m < x α , γ then (42) represents a valid theoretical lower bound, that is
    p ( η ) = max { 1 K F M ( x 1 , m , γ , ϵ 1 , α ) K G M ( x 1 , m , 1 γ , ϵ 1 , α ) e m c ( η ) ( 1 γ ) G α x 1 , a , b ; 0 }
    with P l S m * p η ;
  • finally, we optimize p η in η and choose η * so that the estimated probability bound is the largest, that is P l S m * sup η 0 , 1 p η = p η * .

4. Results

The following Table 1, Table 2, Table 3 and Table 4 give empirical results for some values of m , a , b , γ . We have included different cases for the real power limit—taking large, moderate and small values. In all cases the significance level is α = 0.05 . These tables also show the empirical probability based on numerical simulations with 1 million repetitions. Empirical probability of 1 means that no simulation repetition had power lower than l. Theoretical probability of 0 or empty value of η * indicate that some of the conditions of the method were not met.
Larger sample size would result in a smaller parameter a of the Beta distribution, thus squeezing the distribution towards 0 and increasing the statistical power and its theoretical estimates, e.g., as in Table 2 and Table 3.
We have carried out numerical simulations for many values of the parameters and based on the results it appears that the theoretical bound works well when l is not close to the real power limit l G ( α x α , γ , a , b ) . Link to our R code is provided in the Supplementary Materials section.
Interestingly, when m increases and all other parameters are kept fixed, the expected power decreases for most values of m. This is shown in the following Figure 1. Perhaps this is due to the fact that the set for which we take the maximum in (2) increases its size for most values of m (unless the ceiling function jumps).
We note that as expected when either a is closer to 1 or/and γ is smaller then the theoretical estimates are getting significantly worse. This is natural as the closer the alternative distribution gets to the uniform the harder is to distinguish that the respective random variable X originates from it. The same is valid if the lower bound l is very close to the true limit and it is harder to account for the possible error. All these points reflect natural restrictions when one deals with universal, theoretical bounds, which are seemingly conducted in an optimal way.

5. Discussion

5.1. Related Work

There are a number of results concerning the statistical power of the Benjamini–Hochberg procedure that relate to our result. Paper [15] discusses different properties of the procedure and shows that when increasing the number of tests m, while keeping the underlying distributions and the proportion of true significant tests the same, the statistical power under some conditions converges to a limit that can be found as a solution of a particular equation, see (3.8) of Corollary 3.3 of that article. A step forward is made in [13] wherein it is shown that the proportions related to the power of the test obey even a law of the iterated logarithm. Other notable contributions amongst others include the papers [18] which deal with various properties of the Benjamini–Hochberg procedure, including a Donsker-type of convergence. They are discussed in more detail in Remark 5. We wish to highlight explicitly that to our knowledge there are no bounds, albeit conservative, for the probability to uncover a given proportion of the truly significant tests (that is S n / n ) with respect to the limit of S n / n , as n goes to infinity. We achieve this by evaluating the speed of convergence, whereas the results under more general assumptions, including under weak dependence between the tests, consider limit results. This does not allow for the derivation of any concrete intervals, depending solely on the set of parameters, which give a certain probability for the proportion of correctly identified significant tests.
While we argue below that for genetic studies the assumption of independence is not binding at all, we wish to mention some achievements under the weak dependence assumption since it is natural to pursue as a future development the extension of our results to this scenario. The limit for the power of the Benjamini–Hochberg procedure is proved in [15]. An interesting contribution is [12] which shows that for a large number of tests the FDR is controlled even when there is some degree of dependence between the test statistics. A conservative estimator for the proportion of false nulls among the hypotheses (the quantity 1 γ in our work) is found (the proof there uses the Dvoretsky–Kiefer–Wolfowitz inequality [16], as we do for different purposes in our work). Then the estimator is used in a plug-in procedure in order to boost the power of the procedure while keeping the FDR controlled. An estimator for γ may precede the application of our procedure.
We are not interested in improving the procedure itself, but rather than that in estimating the power of the original procedure, that as we mentioned in the Introduction section, is the most widely used among the similar procedures.
As mentioned in [20], the results in [12] assume that R n / n is asymptotically larger than zero, where R n is the number of rejected tests out of all n tests (the R notation is consistent with ours through this work). A counterexample is given of a case in which R n / n converges asymptotically to zero with positive probability and in which the FDR is larger than α . Such situation does not appear to be relevant to the RNA-seq data used for single-step differential expression methods that prompted our work. Moreover, paper [12] is concerned with controlling the FDR rather than estimates of the power, which is the scope of our work in which we are interested in the power of the procedure that the scientists predominantly use, regardless of whether the FDR is controlled.

5.2. Considerations about Our Work

Because this is the first result of this exact type that we are aware of, we start our considerations under independence assumption. We want to point out that, as mentioned above, there are more general results under convergence, but they do not include explicit bounds for given set of parameters. The independence assumption holds approximately in the case of RNA-seq expression level analysis, where there are groups of genes that are correlated within the group, but most pairs of genes are weakly or not correlated. Even in the case of weak dependence, having real-world RNA-seq expression data, one cannot reliably estimate or replicate the dependence unless the sample size is much larger than the one of the currently available datasets. In addition, more complicated assumptions for the structure of dependence would make the numerical procedure impossible to set up as there would be too many parameters in the input and these parameters cannot be easily estimated for small sample sizes that are typical for RNA-seq experiments.
Theoretical results under more complicated assumptions for the dependence could be a subject of a follow-up studies. Numerical simulations of RNA-seq data under different basic types of weak dependencies could also be considered in subsequent studies.
It is important to also emphasize that our considerations differ from the basic situations in which the statistical power is studied. In a case of a simple test the statistical power is typically considered as a function of the sample size. However, in a case of a complex tests such as the ones cited above, often it is not possible to estimate the distribution of the test statistic, even for a single test in the multiple comparison setup. In order to be able to model a general setup we model the test statistic under the alternative hypothesis using the Beta family of distributions. In order to use our current results, one would first have to approximately or exactly estimate the parameters of the Beta distribution as function of the sample size of the particular test. Because such estimates are test-specific and cannot be studied in general, this would be a subject of another article. One would also have to estimate the proportion of significant tests and, having achieved that, one would apply our results for that proportion and those parameters of the Beta distribution, taking into account the possible values of the number of tests. For example, in the differential expression analysis methods, mentioned above, the sample size corresponds to the number of individuals and the number of tests may correspond to the number of genes.

Supplementary Materials

The R script and instructions are available at the following page: http://www.math.bas.bg/~palejev/BH.

Funding

D.P. was supported by the European Union Horizon 2020 WIDESPREAD-2018-2020 TEAMING Phase 2 programme under Grant Agreement No. 857155 and Operational Programme Science and Education for Smart Growth under Grant Agreement No. BG05M2OP001-1.003-0002-C01. M.S. was partially supported by the financial funds allocated to the Sofia University “St. Kliment Ohridski”, grant No. 80-10-87/2021. The computational resources and infrastructure were provided by NCHDC—part of the Bulgarian National Roadmap of RIs, with the financial support by Grant No DO1—387/18.12.2020.

Acknowledgments

The extensive numerical simulations were done on the Avitohol supercomputer that is described in [21].

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
FDRFalse Discovery Rate
PETPositron Emission Tomography
CTComputerized Tomography
fMRIFunctional Magnetic Resonance Imaging

Appendix A. Petrov’s Inequality

In this part we state and outline the proof of a celebrated inequality which can be found in the monograph [22].
Proposition A1.
Let X is binomially distributed with parameters n , p . Then we have that for any ϵ > 0
P X n p > ϵ n p e n p c ( ϵ ) , P n p X > ϵ n p e n p c ( ϵ )
where c ( ϵ ) = ( 1 + ϵ ) ln 1 + ϵ ϵ , c : 0 , 0 , is an increasing function and the limit lim ϵ 0 2 c ϵ / ϵ 2 = 1 holds. Finally, c ϵ ϵ 2 and c ϵ min 1 , ϵ 2 2 e for any ϵ > 0 .
Proof. 
We consider
P X > 1 + ϵ n p e λ n p ( 1 + ϵ ) E e λ X = e n ln 1 p + p e λ λ n p ( 1 + ϵ ) ,
where the latter is simply the Markov’s inequality with some λ > 0 . Then using ln 1 + ϵ ϵ , ϵ > 0 , we deduct that
P X > 1 + ϵ n p e n p e λ 1 λ ( 1 + ϵ ) .
The minimum of f ( λ ) = e λ 1 λ ( 1 + ϵ ) is attained at λ = ln 1 + ϵ and thus we conclude that
P X > 1 + ϵ n p e n p ( 1 + ϵ ) ln 1 + ϵ ϵ .
Similar computation works for P X > 1 ϵ n p . Clearly, setting the function
c ( ϵ ) = ϵ ( 1 + ϵ ) × ln 1 + ϵ
we arrive at the limit
lim ϵ 0 2 ϵ ( 1 + ϵ ) ln 1 + ϵ ϵ 2 = 1
and ( 1 + ϵ ) ln 1 + ϵ ϵ is increasing in ϵ . Using again ln 1 + ϵ ϵ , ϵ > 0 , we also conclude that c ϵ ϵ 2 for any ϵ > 0 . Considering the properties of the function f ( ϵ ) = c ( ϵ ) ϵ 2 2 e for ϵ 0 , e 1 and noting that c ( ϵ ) 1 for ϵ > e 1 we deduct that c ϵ min 1 , ϵ 2 2 e .  □

References

  1. Benjamini, Y.; Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. R. Stat. Soc. Ser. B (Methodol.) 1995, 57, 289–300. [Google Scholar] [CrossRef]
  2. Love, M.; Huber, W.; Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014, 15, 550. [Google Scholar] [CrossRef] [Green Version]
  3. Robinson, M.D.; McCarthy, D.J.; Smyth, G.K. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010, 26, 139–140. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Trapnell, C.; Hendrickson, D.G.; Sauvageau, M.; Goff, L.; Rinn, J.L.; Pachter, L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 2012, 31, 46–53. [Google Scholar] [CrossRef]
  5. Chalkidou, A.; O’Doherty, M.J.; Marsden, P.K. False Discovery Rates in PET and CT Studies with Texture Features: A Systematic Review. PLoS ONE 2015, 10, e0124165. [Google Scholar] [CrossRef] [Green Version]
  6. Bennett, C.M.; Wolford, G.L.; Miller, M.B. The principled control of false positives in neuroimaging. Soc. Cogn. Affect. Neurosci. 2009, 4, 417–422. [Google Scholar] [CrossRef]
  7. Miller, C.J.; Genovese, C.; Nichol, R.C.; Wasserman, L.; Connolly, A.; Reichart, D.; Hopkins, A.; Schneider, J.; Moore, A. Controlling the False-Discovery Rate in Astrophysical Data Analysis. Astron. J. 2001, 122, 3492. [Google Scholar] [CrossRef] [Green Version]
  8. Benjamini, Y.; Yekutieli, D. The Control of the False Discovery Rate in Multiple Testing under Dependency. Ann. Stat. 2001, 29, 1165–1188. [Google Scholar] [CrossRef]
  9. Storey, J.D.; Taylor, J.E.; Siegmund, D. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2004, 66, 187–205. [Google Scholar] [CrossRef]
  10. Pollard, K.S.; van der Laan, M.J. Choice of a null distribution in resampling-based multiple testing. J. Stat. Plan. Inference 2004, 125, 85–100. [Google Scholar] [CrossRef]
  11. Van Noorden, R.; Maher, B.; Nuzzo, R. The top 100 papers. Nature 2014, 514, 550–553. [Google Scholar] [CrossRef] [Green Version]
  12. Farcomeni, A. Some Results on the Control of the False Discovery Rate under Dependence. Scand. J. Stat. 2007, 34, 275–297. [Google Scholar] [CrossRef]
  13. Chi, Z. On the performance of FDR control: Constraints and a partial solution. Ann. Statist. 2007, 35, 1409–1431. [Google Scholar] [CrossRef] [Green Version]
  14. Ferreira, J.A.; Zwinderman, A.H. Approximate Power and Sample Size Calculations with the Benjamini-Hochberg Method. Int. J. Biostat 2006, 2, 8. [Google Scholar] [CrossRef]
  15. Ferreira, J.A.; Zwinderman, A.H. On the Benjamini—Hochberg method. Ann. Statist. 2006, 34, 1827–1849. [Google Scholar] [CrossRef] [Green Version]
  16. Dvoretzky, A.; Kiefer, J.; Wolfowitz, J. Asymptotic Minimax Character of the Sample Distribution Function and of the Classical Multinomial Estimator. Ann. Math. Stat. 1956, 27, 642–669. [Google Scholar] [CrossRef]
  17. Pounds, S.; Morris, S.W. Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 2003, 19, 1236–1242. [Google Scholar] [CrossRef] [PubMed]
  18. Neuvial, P. Asymptotic properties of false discovery rate controlling procedures under independence. Electron. J. Stat. 2008, 2, 1065–1110. [Google Scholar] [CrossRef]
  19. Finner, H.; Dickhaus, T.; Roters, M. On the false discovery rate and an asymptotically optimal rejection curve. Ann. Statist. 2009, 37, 596–618. [Google Scholar] [CrossRef]
  20. Gontscharuk, V.; Finner, H. Asymptotic FDR control under weak dependence: A counterexample. Stat. Probab. Lett. 2013, 83, 1888–1893. [Google Scholar] [CrossRef]
  21. Atanassov, E.; Gurov, T.; Karaivanova, A.; Ivanovska, S.; Durchova, M.; Dimitrov, D. On the parallelization approaches for Intel MIC architecture. AIP Conf. Proc. 2016, 1773, 070001. [Google Scholar] [CrossRef]
  22. Petrov, V.V. Sums of Independent Random Variables; Brown, A.A., Translator; Ergebnisse der Mathematik und ihrer Grenzgebiete; Springer: Berlin/Heidelberg, Germany, 1975; Volume 82. [Google Scholar]
Figure 1. Empirical results when the p-values under the alternative hypothesis have B ( 0.5 , 100 ) distribution and the proportion of alternative tests is 2 % . The x-axis shows the common logarithm of the number of tests m. The y-axis shows the empirical power. The dots depict the mean of the empirical power, the vertical lines are of length one standard deviation below and above the mean. The dotted line shows the asymptotic power.
Figure 1. Empirical results when the p-values under the alternative hypothesis have B ( 0.5 , 100 ) distribution and the proportion of alternative tests is 2 % . The x-axis shows the common logarithm of the number of tests m. The y-axis shows the empirical power. The dots depict the mean of the empirical power, the vertical lines are of length one standard deviation below and above the mean. The dotted line shows the asymptotic power.
Mathematics 09 02154 g001
Table 1. Simulation results for B(0.1, 100), number genes = 20,000, proportion significant = 0.1, real power limit = 0.940359.
Table 1. Simulation results for B(0.1, 100), number genes = 20,000, proportion significant = 0.1, real power limit = 0.940359.
Lower Bound lEmpirical ProbabilityTheoretical Estimate p η * η *
0.7501.0001.0000.098
0.7601.0001.0000.097
0.7701.0001.0000.096
0.7801.0001.0000.096
0.7901.0001.0000.095
0.8001.0001.0000.095
0.8101.0001.0000.094
0.8201.0001.0000.094
0.8301.0001.0000.093
0.8401.0001.0000.093
0.8501.0000.9990.085
0.8601.0000.9960.078
0.8701.0000.9850.068
0.8801.0000.9520.058
0.8901.0000.8730.048
0.9001.0000.7140.037
0.9101.0000.4760.027
0.9201.0000.2040.016
0.9300.9670.0180.005
0.9400.5500.000
Table 2. Simulation results for B(0.25, 2), number genes = 20,000, proportion significant = 0.1, real power limit = 0.233737.
Table 2. Simulation results for B(0.25, 2), number genes = 20,000, proportion significant = 0.1, real power limit = 0.233737.
Lower Bound lEmpirical ProbabilityTheoretical Estimate p η * η *
0.1901.0000.7790.105
0.2000.9970.4340.072
0.2100.9720.0000.001
0.2200.8680.0000.001
0.2300.6260.0000.001
Table 3. Simulation results for B(0.1, 2), number genes = 20,000, proportion significant = 0.1, real power limit = 0.620014.
Table 3. Simulation results for B(0.1, 2), number genes = 20,000, proportion significant = 0.1, real power limit = 0.620014.
Lower Bound lEmpirical ProbabilityTheoretical Estimate p η * η *
0.5001.0001.0000.119
0.5101.0001.0000.117
0.5201.0001.0000.116
0.5301.0001.0000.115
0.5401.0000.9990.106
0.5501.0000.9960.097
0.5601.0000.9790.081
0.5701.0000.9260.067
0.5801.0000.7860.052
0.5900.9940.5210.036
0.6000.9530.1890.02
0.6100.8030.0000.001
0.6200.5110.000
Table 4. Simulation results for B(0.1, 10), number genes = 20,000, proportion significant = 0.1, real power limit = 0.755055.
Table 4. Simulation results for B(0.1, 10), number genes = 20,000, proportion significant = 0.1, real power limit = 0.755055.
Lower Bound lEmpirical ProbabilityTheoretical Estimate p η * η *
0.6001.0001.0000.109
0.6101.0001.0000.108
0.6201.0001.0000.107
0.6301.0001.0000.106
0.6401.0001.0000.105
0.6501.0001.0000.105
0.6601.0001.0000.104
0.6701.0000.9990.096
0.6801.0000.9950.085
0.6901.0000.9800.074
0.7001.0000.9340.062
0.7101.0000.8210.049
0.7200.9990.6080.037
0.7300.9910.3090.024
0.7400.9240.0480.009
0.7500.6930.0000.001
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Palejev, D.; Savov, M. On the Convergence of the Benjamini–Hochberg Procedure. Mathematics 2021, 9, 2154. https://doi.org/10.3390/math9172154

AMA Style

Palejev D, Savov M. On the Convergence of the Benjamini–Hochberg Procedure. Mathematics. 2021; 9(17):2154. https://doi.org/10.3390/math9172154

Chicago/Turabian Style

Palejev, Dean, and Mladen Savov. 2021. "On the Convergence of the Benjamini–Hochberg Procedure" Mathematics 9, no. 17: 2154. https://doi.org/10.3390/math9172154

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop