Next Article in Journal
Mean-Variance Portfolio Selection with Tracking Error Penalization
Next Article in Special Issue
On the Accuracy of the Exponential Approximation to Random Sums of Alternating Random Variables
Previous Article in Journal
Topologically Stable Chain Recurrence Classes for Diffeomorphisms
Previous Article in Special Issue
Feynman Integral and a Change of Scale Formula about the First Variation and a Fourier–Stieltjes Transform
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Asymptotic Properties of MSE Estimate for the False Discovery Rate Controlling Procedures in Multiple Hypothesis Testing

by
Sofia Palionnaya
1,2,* and
Oleg Shestakov
1,2,3,*
1
Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University, 119991 Moscow, Russia
2
Moscow Center for Fundamental and Applied Mathematics, 119991 Moscow, Russia
3
Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 119333 Moscow, Russia
*
Authors to whom correspondence should be addressed.
Mathematics 2020, 8(11), 1913; https://doi.org/10.3390/math8111913
Submission received: 14 October 2020 / Revised: 27 October 2020 / Accepted: 29 October 2020 / Published: 1 November 2020
(This article belongs to the Special Issue Analytical Methods and Convergence in Probability with Applications)

Abstract

:
Problems with analyzing and processing high-dimensional random vectors arise in a wide variety of areas. Important practical tasks are economical representation, searching for significant features, and removal of insignificant (noise) features. These tasks are fundamentally important for a wide class of practical applications, such as genetic chain analysis, encephalography, spectrography, video and audio processing, and a number of others. Current research in this area includes a wide range of papers devoted to various filtering methods based on the sparse representation of the obtained experimental data and statistical procedures for their processing. One of the most popular approaches to constructing statistical estimates of regularities in experimental data is the procedure of multiple testing of hypotheses about the significance of observations. In this paper, we consider a procedure based on the false discovery rate (FDR) measure that controls the expected percentage of false rejections of the null hypothesis. We analyze the asymptotic properties of the mean-square error estimate for this procedure and prove the statements about the asymptotic normality of this estimate. The obtained results make it possible to construct asymptotic confidence intervals for the mean-square error of the FDR method using only the observed data.

1. Introduction

The problems involved in testing statistical hypotheses occupy an important place in applied statistics and are used in such areas as genetics, biology, astronomy, radar, computer graphics, etc. The classical methods for solving these problems are based on a single hypothesis test. There is a sample X of size m and the null hypothesis H 0 is tested against the general alternative H 1 . The hypothesis is tested using the statistic T, a function of the sample with a known distribution under the null hypothesis (zero distribution). For a given zero distribution, the attainable p-values are calculated, and the decision to reject the null hypothesis is made on their basis. Errors arising from the application of this one-time hypothesis testing algorithm are divided into two types, and the probability of falsely rejecting the correct null hypothesis (the probability of a type I error) is bounded by a given significance level α :
P ( type I error ) = P ( T t | H 0 ) α ,
where t is the critical threshold value.
With this approach, we can often not only find the region for which the α -constraint on the probability of a type I error is satisfied, but also minimize the probability of a type II error, i.e., maximize the statistical power.
When considering the problem of multiple hypothesis testing, the task becomes more complicated: now we are dealing with n different null hypotheses { H 0 i , i = 1 , , n } and the alternatives { H 1 i , i = 1 , , n } . These hypotheses are tested by statistics T i with given zero distributions. Thus, for each hypothesis, the attainable p-values { p i , i = 1 , , n } can be calculated as well as type II error probabilities.
Let us introduce the notation: M 0 is the set of indices of true null hypotheses, R is the set of indices of rejected hypotheses. Then V = | M 0 R | is the number of type I errors. The task is to minimize V by changing the parameter R.
There are many statistical procedures that offer different ways to solve the multiple hypothesis testing problem. One of the first measures proposed to generalize the type I error was the family-wise error rate (FWER) [1]. This value is defined as the probability of making at least one type I error, i.e., instead of controlling the probability of a type I error at the level α for each test, the overall FWER is controlled: FWER = P ( V 1 ) α . However, such a strict criterion significantly increases the type II error for a large number of tested hypotheses.
In [2], an alternative measure called the false discovery rate (FDR) was proposed. This measure assumes to control the expected proportion of false rejections:
FDR = E V max ( R , 1 ) .
This approach is widely used in situations where the number of tested hypotheses is so large that it is preferable to allow a certain number of type I errors in order to increase the statistical power.
To control FDR, the Benjamini–Hochberg [2] multiple hypothesis testing algorithm is often used, which under the condition of independency of the testing statistics allows the FDR value to be bounded by the parameter α , i.e.,
E V max ( R , 1 ) α .
In this procedure, the significance levels change linearly:
α i = i n α , i = 1 , , n .
To apply the Benjamini–Hochberg method, a variational series is constructed from the attained p-values:
p ( 1 ) p ( 2 ) p ( n ) .
All hypotheses H 0 1 , , H 0 k are rejected, where k, k { 1 , n } , is the maximum index for which
p ( i ) α i .
There are other measures to control the total number of type I errors. In [1], a q-value is considered that provides control of the positive false discovery rate (pFDR). Controlling the full coverage ratio (FCR) involves solving the problem of multiple hypothesis testing in terms of the confidence intervals [3]. The papers [4,5] are devoted to the harmonic mean p-value (HMP) method. However, in this paper we focus on the properties of the FDR method. It is believed that the widespread use of the FDR measure is due to the development of technologies that allow collecting and analyzing large amounts of data. Computing power makes it easy to perform hundreds or thousands of statistical tests on a given data set, and therefore the use of FWER loses its relevance.
In this paper, we study the asymptotic properties of the mean-square risk estimate for the FDR method in the problem of multiple hypothesis testing for the mathematical expectation of a Gaussian vector with independent components. The consistency of this estimate was proved in [6]. In this paper, we prove its asymptotic normality.
The paper is organized as follows. Section 2 provides some basic information about the statement of the problem and the considered vector classes. In Section 3 we define the mean-square risk of the thresholding method and describe the properties of the FDR-threshold. Section 4 considers the asymptotic properties of the mean-square risk estimate, and Section 5 contains some concluding remarks.

2. Preliminaries

Consider the problem of estimating the mathematical expectation of a Gaussian vector
X i = μ i + W i , i = 1 , , n ,
where W i are independent normally distributed random variables with zero expectation and known variance σ 2 , and μ = ( μ 1 , , μ n ) is an unknown vector belonging to some given set (class). The key assumption adopted in this paper is the “sparsity” of the vector μ , i.e., it is assumed that only a relatively small number of its components are significantly large. A similar problem statement arises, for example, in the analysis and processing of signals containing noise. In this case, the sparsity or “economical” representation of the signal is achieved using some special preprocessing, for example, a discrete wavelet transform of the signal vector.
In this paper, we consider the following definitions of sparsity. Let μ 0 denote the number of nonzero components of μ . Fixing η n , define the class
L 0 ( η n ) = { μ R n : μ 0 η n n } .
For small values of η n , only a small number of vector components are nonzero.
Another possible way to define sparsity is to limit the absolute values of μ i . To do this, consider the sorted absolute values
| μ | ( 1 ) | μ | ( n )
and for 0 < p < 2 define the class
L p ( η n ) = { μ R n : | μ | ( k ) η n n 1 / p k 1 / p for   all k = 1 , , n } .
In addition, sparsity can be modeled using the p -norm
μ p = i = 1 n | μ i | p 1 / p .
In this case, the sparse class is defined as
M p ( η n ) = { μ R n : i = 1 n | μ i | p η n p n } .
There are important relationships between these classes. As p 0 , the p -norm approaches 0 : μ p p μ 0 . The embedding M p ( η n ) L p ( η n ) is also valid.

3. Mean-Square Risk and Properties of the FDR Threshold

In the considered problem, one of the widespread and well-proven methods for constructing an estimate of μ is the method of (hard) thresholding of each vector component:
μ ^ i = ρ H ( X i , T ) = X i | X i | > T , 0 | X i | T ,
i.e., the vector component is zeroed if its absolute value does not exceed the critical threshold T. This procedure is equivalent to testing the hypothesis of zero mathematical expectation for each component of the vector, and when using the FDR method, the threshold value T is selected according to the following rule. The initial sample is used to construct a variational series of decreasing absolute values
| X | ( 1 ) | X | ( n ) ,
and | X | ( k ) are compared with the right tail Gaussian quantiles t k = σ z ( α / 2 · k / n ) . Let k F be the largest index k for which | X | ( k ) t k , then the threshold T F = t k F is chosen.
In combination with hypothesis testing methods, the penalty method is also widely used, in which the target loss function is minimized with the addition of a penalty term [7,8,9]. In a particular case, this method leads to the so-called soft thresholding: the estimates of the vector components are calculated according to the rule
μ ^ i = ρ S ( X i , T ) = X i T X i > T , X i + T X i < T , 0 | X i | T .
This approach is in some cases more adequate than (2), since the function ρ S in (3) is continuous in T.
The mean-square error (or risk) of the considered procedures is determined as
R ( T ) = i = 1 n E μ ^ i μ i 2 .
Methods for selecting the threshold value T are usually focused on minimizing the risk (4) provided that the vector μ belongs to a given class. A “perfect” value of the threshold is
T m i n : R ( T m i n ) = min T R ( T ) .
Note that the expression (4) contains unknown values of μ i and it is impossible to calculate R ( T ) and T m i n in practice. Therefore, a minimax approach is used. The threshold T F is calculated based on the observed values of X i and has the property of an adaptive minimax optimality in the considered sparse classes [7]. In addition, T F has the following important property [7], which we will use later in proving the asymptotic normality of the risk estimate.
Theorem 1.
[7] Suppose that μ L 0 ( η n ) or μ L p ( η n ) , 0 < p < 2 , where η n [ n 1 ( log n ) 5 , n γ ] for L 0 ( η n ) and η n p [ n 1 ( log n ) 5 , n γ ] for L p ( η n ) , 0 < γ < 1 . Then there exists c > 0 such that for the FDR-threshold T F with a controlling parameter α n 0 and large n,
sup μ L 0 [ η n ] P ( T F < T 1 ) 2 n   exp { c α n κ n γ n 2 } ,
sup μ L p [ η n ] P ( T F < T 1 ) 2 n   exp { c α n κ n γ n 2 } ,
where
γ n = 1 log log n , κ n = n η n 1 α n γ n , T 1 = σ 2 log η n 1 1 / 2
for L 0 ( η n ) and
γ n = 1 log log n , τ η = σ 2 log η n p 1 / 2 , κ n = n η n p τ η p 1 α n γ n , T 1 = σ 2 log η n p 1 / 2
for L p ( η n ) .
Thus, if α n is chosen so that α n κ n γ n 2 log n , the value of T 1 is the lower bound for the threshold T F .
Note also that the so-called universal threshold T U = σ 2 log n is popular as well. This threshold is, in a certain sense, the maximum (it was shown in [10,11] that T > T U can be ignored). Based on this, we will assume everywhere that T T U .

4. Asymptotic Properties of the Risk Estimate

As already mentioned, since the expression (4) explicitly depends on the unknown values of μ i , it cannot be calculated in practice. However, it is possible to construct its estimate, which is calculated using only the observed data. This estimate is determined by the expression
R ^ ( T ) = i = 1 n F [ X i , T ] ,
where F [ X i , T ] = ( X i 2 σ 2 ) 1 ( | X i | T ) + σ 2 1 ( | X i | > T ) for the hard thresholding and F [ X i , T ] = ( X i 2 σ 2 ) 1 ( | X i | T ) + ( σ 2 + T 2 ) 1 ( | X i | > T ) for the soft thresholding [12].
In [6] it is proved that the estimate (7) is consistent.
Theorem 2.
[6] Let the conditions of Theorem 1 be satisfied and α n 0 as n so that α n κ n γ n 2 log n , then
R ^ ( T F ) R ( T m i n ) n P 0 .
Let us prove a statement about the asymptotic normality of the estimate (7), which, in particular, allows constructing asymptotic confidence intervals for the mean-square risk (4). In the proof, we will use the same notation C for different positive constants that may depend on the parameters of the classes and methods under consideration, but do not depend on n.
First, consider the class L 0 ( η n ) .
Theorem 3.
Let μ L 0 ( η n ) , η n [ n 1 ( log n ) 5 , n γ ] , 1 / 2 < γ < 1 . Let T F be the FDR-threshold with a controlling parameter α n 0 and α n κ n γ n 2 log n as n , where κ n and γ n are defined in (5). Then
R ^ ( T F ) R ( T m i n ) σ 2 2 n N ( 0 , 1 )
Proof. 
Let us prove the theorem for the soft thresholding method. In the case of hard thresholding, the proof is similar.
Denote
U ( T ) = R ^ ( T ) R ^ ( T m i n ) = i = 1 n H i ( T , T m i n ) ,
where
H i ( T , T m i n ) = F [ X i , T ] F [ X i , T m i n ] ,
and write
R ^ ( T F ) R ( T m i n ) + R ^ ( T m i n ) R ^ ( T m i n ) = R ^ ( T m i n ) R ( T m i n ) + U ( T F ) .
Let us show that
R ^ ( T m i n ) R ( T m i n ) σ 2 2 n N ( 0 , 1 )
With soft thresholding, R ^ ( T m i n ) is an unbiased estimate of R ( T m i n ) , and with hard thresholding, under the conditions of the theorem the bias tends to zero when divided by n [12]. For the variance of the numerator [13]
lim n D i = 1 n ( F [ X i , T m i n ] E F [ X i , T m i n ] ) D i = 1 n X i 2 = 1 .
Moreover, since X i are independent, D X i 2 = 2 σ 4 + 4 σ 2 μ i 2 and the number of nonzero μ i does not exceed η n n , we obtain
lim n D i = 1 n X i 2 2 σ 4 n = 1 .
Finally, the Lindeberg condition is met: for any ϵ > 0 as n
1 V n 2 i = 1 n E F [ X i , T m i n ] E F [ X i , T m i n ] 2 1 | F [ X i , T m i n ] E F [ X i , T m i n ] | > ε V n 0 ,
where V n 2 = D i = 1 n ( F [ X i , T m i n ] E F [ X i , T m i n ] ) . Indeed, due to (9) and (10) and since the summands in R ^ ( T m i n ) are modulo bounded by the value T U 2 + σ 2 , starting from some n all indicators in (11) vanish.
Therefore, (8) holds, and to prove the theorem it remains to show that
U ( T F ) n P 0 .
Repeating the reasoning from [14,15,16] it can be shown that T m i n T 1 α n , where | α n | C log log n log n . To shorten the notation without compromising the proof, we can omit α n and assume that T m i n T 1 .
For any ε > 0
P | U ( T F ) | n > ε P T F T 1 + P sup T [ T 1 , T U ] | U ( T ) | n > ε
P T F T 1 + P sup T [ T 1 , T U ] | U ( T ) E U ( T ) | + sup T [ T 1 , T U ] | E U ( T ) | n > ε .
Let U ( T ) = S 1 ( T ) + S 2 ( T ) , T [ T 1 , T U ] , where the sum S 2 ( T ) contains terms with μ i = 0 , and S 1 ( T ) contains all other terms. By the definition of the class L 0 ( η n ) , the number of terms in S 1 ( T ) does not exceed n 1 η n n . Moreover, the absolute value of each term is bounded by T U 2 + σ 2 . For convenience, we will assume that S 1 ( T ) contains terms with indices from 1 to n 1 , i.e.,
S 1 ( T ) = i = 1 n 1 H i ( T , T m i n ) , S 2 ( T ) = i = n 1 + 1 n H i ( T , T m i n ) .
Next, E U ( T ) = E S 1 ( T ) + E S 2 ( T ) and sup T [ T 1 , T U ] | E S 1 ( T ) | n 1 ( T U 2 + σ 2 ) . Given the definition of the class L 0 ( η n ) and the form of T 1 , it can be shown that for the terms of S 2 the estimate | E H i ( T , T m i n ) | C ( log n ) 1 / 2 n γ is valid when T [ T 1 , T U ] . So
sup T [ T 1 , T U ] | E U ( T ) | C log n n 1 γ
and for γ > 1 / 2
sup T [ T 1 , T U ] | E U ( T ) | n 0
as n .
Next, take T < T and denote
Z 1 ( T ) = S 1 ( T ) E S 1 ( T ) , N 1 ( T , T ) = i = 1 n 1 1 ( T < | x i | T ) .
Then [10]
| Z 1 ( T ) Z 1 ( T ) | 4 σ 2 N 1 ( T , T ) + 2 n 1 T 2 T 2 a . s .
Divide the segment [ T 1 , T U ] into equal parts: T j = T 1 + j δ n 1 [ T 1 , T U ] , j = 1 , , n 1 1 , δ n 1 = ( T U T 1 ) / n 1 . Then
A n = sup T [ T 1 , T U ] | S 1 ( T ) E S 1 ( T ) | 5 ε n D n E n ,
where
D n = sup j | Z 1 ( T j ) | > ε n , E n = sup j sup T [ T j , T j + δ n 1 ) | Z 1 ( T ) Z 1 ( T j ) | 4 ε n .
Applying the Hoeffding inequality [17] for D n , we obtain the estimate
P ( D n ) j P ( | Z 1 ( T j ) | > ε n ) 2 n 1 exp ε 2 n 2 ( T U 2 + 2 σ 2 ) 2 n 1 2 n 1 γ exp C ε 2 n 1 γ log n .
Then, given (13),
E n sup j sup T [ T j , T j + δ n 1 ) 4 σ 2 N 1 ( T j , T ) + 4 n 1 · T U δ n 1 4 ε n
sup j σ 2 N 1 ( T j , T j + δ n 1 ) + n 1 T U δ n 1 ε n = sup j σ 2 N 1 ( T j , T j + δ n 1 ) ε n n 1 T U δ n 1 .
It is easy to show that E N 1 ( T j , T j + δ n 1 ) C n 1 δ n 1 . Hence,
sup j σ 2 N 1 ( T j , T j + δ n 1 ) ε n n 1 T U δ n 1
sup j 1 n 1 | N 1 ( T j , T j + δ n 1 ) E N 1 ( T j , T j + δ n 1 ) | ε n σ 2 n 1 T U δ n 1 σ 2 C σ 2 δ n 1 = E n .
Applying the Hoeffding inequality, we obtain
P ( E j ) = P 1 n 1 | N 1 ( t j , t j + δ n 1 ) E N 1 ( t j , t j + δ n 1 ) | ε n σ 2 n 1 T U δ n 1 σ 2 C σ 2 δ n 1
2 exp 2 ε n σ 2 n 1 T U δ n 1 σ 2 c σ 2 δ n 1 2 n 1 2 exp C ε 2 n 1 γ .
Hence,
P ( E n ) j = 1 n 1 P ( E j ) C n 1 γ exp C ε 2 n 1 γ .
Thus, for an arbitrary ε > 0
P sup T [ T 1 , T U ] | S 1 ( T ) E S 1 ( T ) | n > ε 0
as n .
Let us now consider the sum S 2 ( T ) . For large n, the number of terms in this sum is n n 1 n . Repeating the above reasoning, we divide the segment [ T 1 , T U ] into equal parts: T j = T 1 + j δ n 1 [ T 1 , T U ] , j = 1 , , n 1 , δ n = ( T U T 1 ) / n . Then
Z 2 ( T ) = S 2 ( T ) E S 2 ( T ) , N 2 ( T , T ) = i = n 1 + 1 n 1 ( T < | x i | T ) ,
A n = sup T [ T 1 , T U ] | S 2 ( T ) E S 2 ( T ) | 5 ε n D n E n ,
where
D n = sup j | Z 2 ( T j ) | > ε n , E n = sup j sup T [ T j , T j + δ n ) | Z 2 ( T ) Z 2 ( T j ) | 4 ε n .
Taking into account the definition of the class L 0 ( η n ) and the form of T 1 , we can bound the variance of the terms in S 2 (and hence Z 2 ): D H i ( T , T m i n ) C log n ( log n ) 5 3 / 2 n γ . Then, applying Bernstein’s inequality [18] for D n , we obtain
P ( D n ) j P ( | Z 2 ( T j ) | > ε n ) 2 n exp C ε 2 n 2 log n ( log n ) 5 3 / 2 n 1 γ + 2 ( T U 2 + σ 2 ) n .
Next,
E n sup j | N 2 ( T j , T j + δ n ) E N 2 ( T j , T j + δ n ) | ε n σ 2 n T U δ n σ 2 C n σ 2 δ n = E n .
The variance of the terms in N 2 ( T j , T j + δ n ) is bounded by C log n ( log n ) 5 1 / 2 n γ .
Applying Bernstein’s inequality, we obtain
P ( E j ) = P | N 1 ( t j , t j + δ n ) E N 1 ( t j , t j + δ n ) | ε n σ 2 n T U δ n σ 2 C n σ 2 δ n
2 exp C ε 2 n 2 log n ( log n ) 5 1 / 2 n 1 γ + 2 n .
Hence,
P ( E n ) j = 1 n P ( E j ) 2 n exp C ε 2 n 2 log n ( log n ) 5 1 / 2 n 1 γ + 2 n .
Thus, for an arbitrary ε > 0
P sup T [ T 1 , T U ] | S 2 ( T ) E S 2 ( T ) | n > ε 0
as n .
Combining (8), (12), (14) and (15), we obtain the statement of the theorem. □
A similar statement is true for the class L p ( η n ) .
Theorem 4.
Let μ L p ( η n ) , 0 < p < 2 , η n p [ n 1 ( log n ) 5 , n γ ] , 1 / 2 < γ < 1 . Let T F be the FDR-threshold with a controlling parameter α n 0 and α n κ n γ n 2 log n as n , where κ n and γ n are defined in (6). Then
R ^ ( T F ) R ( T m i n ) σ 2 2 n N ( 0 , 1 ) .
Proof. 
The main steps in the proof of this theorem repeat the proof of Theorem 3. We also write
R ^ ( T F ) R ( T m i n ) + R ^ ( T m i n ) R ^ ( T m i n ) = R ^ ( T m i n ) R ( T m i n ) + U ( T F ) .
The statement
R ^ ( T m i n ) R ( T m i n ) σ 2 2 n N ( 0 , 1 )
is proved exactly the same as the statement (8). Let U ( T ) = S 1 ( T ) + S 2 ( T ) , T [ T 1 , T U ] , where the sum S 2 ( T ) contains terms with | μ i | C / T 1 , and S 1 ( T ) contains all other terms. By the definition of the class L p ( η n ) , the number of terms in S 1 ( T ) does not exceed n 1 C η n p n and each term is modulo bounded by T U 2 + σ 2 . Considering the form of T 1 , it can be shown that the mathematical expectations of the terms in S 2 do not exceed C ( log n ) 1 / 2 n γ , and their variances do not exceed C log n ( log n ) 5 3 / 2 n γ . Next, arguing as in Theorem 3, we see that
sup T [ T 1 , T U ] | E U ( T ) | n 0
and
P sup T [ T 1 , T U ] | U ( T ) E U ( T ) | n > ε 0
for an arbitrary ε > 0 as n . Thus, since P T F T 1 0 ,
U ( T F ) n P 0 ,
as n . □
The above statements demonstrate that the considered method for constructing estimates in the model (1) has very similar properties to the method based on minimizing the estimate (7) in the parameter T (see [19]).

5. Conclusions

In this paper, we considered a method of estimating the mean of a Gaussian vector based on the procedure of multiple hypothesis testing. The estimation is based on the false discovery rate measure, which controls the expected percentage of false rejections of the null hypothesis. It is common to use the mean-square risk for evaluating the performance of this approach. Its value cannot be calculated in practice, so its estimate must be considered instead. We analyzed the asymptotic properties of this estimate and proved that it is asymptotically normal for the classes of sparse vectors. This result justifies the use of the mean-square risk estimate for practical purposes and allows constructing asymptotic confidence intervals for a theoretical mean-square risk. For more accurate analysis it is desirable to have guaranteed confidence intervals. These intervals could be constructed based on the estimates of the convergence rate in Theorems 3 and 4. Guaranteed confidence intervals would help to understand how the results of Theorems 3 and 4 affect the risk estimation for a finite sample size. We therefore leave the problem of estimating the rate of convergence and numerical simulation for future work.

Author Contributions

Conceptualization, O.S.; methodology, S.P. and O.S.; formal analysis, S.P. and O.S.; investigation, S.P. and O.S.; writing—original draft preparation, S.P. and O.S.; writing—review and editing, S.P. and O.S.; supervision, O.S.; funding acquisition, O.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Ministry of Science and Higher Education of the Russian Federation, project No. 075-15-2020-799.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Storey, J.D. A direct approach to false discovery rates. J. Roy. Statist. Soc. Ser. B 2002, 64, 479–498. [Google Scholar] [CrossRef] [Green Version]
  2. Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Stat. Soc. Ser. B 1995, 57, 289–300. [Google Scholar] [CrossRef]
  3. Benjamini, Y.; Yekutieli, D. False discovery rate-adjusted multiple confidence intervals for selected parameters. J. Am. Stat. Assoc. 2005, 100, 71–93. [Google Scholar] [CrossRef]
  4. Wilson, D.J. The harmonic mean p-value for combining dependent tests. Proc. Natl. Acad. Sci. USA 2019, 116, 1195–1200. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Wilson, D.J. Reply to Held: When is a harmonic mean p-value a Bayes factor? Proc. Natl. Acad. Sci. USA 2019, 116, 5857–5858. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Zaspa, A.Y.; Shestakov, O.V. Consistency of the risk estimate of the multiple hypothesis testing with the FDR threshold. Her. Tver State Univ. Ser. Appl. Math. 2017, 1, 5–16. [Google Scholar] [CrossRef] [Green Version]
  7. Abramovich, F.; Benjamini, Y.; Donoho, D.; Johnstone, I. Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 2006, 34, 584–653. [Google Scholar] [CrossRef]
  8. Donoho, D.; Jin, J. Asymptotic minimaxity of false discovery rate thresholding for sparse exponential data. Ann. Statist. 2006, 34, 2980–3018. [Google Scholar] [CrossRef] [Green Version]
  9. Neuvial, P.; Roquain, E. On false discovery rate thresholding for classification under sparsity. Ann. Statist. 2012, 40, 2572–2600. [Google Scholar] [CrossRef]
  10. Donoho, D.; Johnstone, I.M. Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc. 1995, 90, 1200–1224. [Google Scholar] [CrossRef]
  11. Marron, J.S.; Adak, S.; Johnstone, I.M.; Neumann, M.H.; Patil, P. Exact risk analysis of wavelet regression. J. Comput. Graph. Stat. 1998, 7, 278–309. [Google Scholar]
  12. Mallat, S. A Wavelet Tour of Signal Processing; Academic Press: New York, NY, USA, 1999. [Google Scholar]
  13. Markin, A.V. Limit distribution of risk estimate of wavelet coefficient thresholding. Inform. Appl. 2009, 3, 57–63. [Google Scholar]
  14. Jansen, M. Noise Reduction by Wavelet Thresholding, Volume 161 of Lecture Notes in Statistics; Springer: New York, NY, USA, 2001. [Google Scholar]
  15. Kudryavtsev, A.A.; Shestakov, O.V. Asymptotic behavior of the threshold minimizing the average probability of error in calculation of wavelet coefficients. Dokl. Math. 2016, 93, 295–299. [Google Scholar] [CrossRef]
  16. Kudryavtsev, A.A.; Shestakov, O.V. Asymptotically optimal wavelet thresholding in models with non-gaussian noise distributions. Dokl. Math. 2016, 94, 615–619. [Google Scholar] [CrossRef]
  17. Hoeffding, W. Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 1963, 58, 13–30. [Google Scholar] [CrossRef]
  18. Bennett, G. Probability inequalities for the sum of independent random variables. J. Am. Stat. Assoc. 1962, 57, 33–45. [Google Scholar] [CrossRef]
  19. Shestakov, O.V. Asymptotic normality of adaptive wavelet thresholding risk estimation. Dokl. Math. 2012, 86, 556–558. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Palionnaya, S.; Shestakov, O. Asymptotic Properties of MSE Estimate for the False Discovery Rate Controlling Procedures in Multiple Hypothesis Testing. Mathematics 2020, 8, 1913. https://doi.org/10.3390/math8111913

AMA Style

Palionnaya S, Shestakov O. Asymptotic Properties of MSE Estimate for the False Discovery Rate Controlling Procedures in Multiple Hypothesis Testing. Mathematics. 2020; 8(11):1913. https://doi.org/10.3390/math8111913

Chicago/Turabian Style

Palionnaya, Sofia, and Oleg Shestakov. 2020. "Asymptotic Properties of MSE Estimate for the False Discovery Rate Controlling Procedures in Multiple Hypothesis Testing" Mathematics 8, no. 11: 1913. https://doi.org/10.3390/math8111913

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop