Reproducibility Probability Estimation and RP-Testing for Some Nonparametric Tests

Several reproducibility probability (RP)-estimators for the binomial, sign, Wilcoxon signed rank and Kendall tests are studied. Their behavior in terms of MSE is investigated, as well as their performances for RP-testing. Two classes of estimators are considered: the semi-parametric one, where RP-estimators are derived from the expression of the exact or approximated power function, and the non-parametric one, whose RP-estimators are obtained on the basis of the nonparametric plug-in principle. In order to evaluate the precision of RP-estimators for each test, the MSE is computed, and the best overall estimator turns out to belong to the semi-parametric class. Then, in order to evaluate the RP-testing performances provided by RP estimators for each test, the disagreement between the RP-testing decision rule, i.e., “accept H0 if the RP-estimate is lower than, or equal to, 1/2, and reject H0 otherwise”, and the classical one (based on the critical value or on the p-value) is obtained. It is shown that the RP-based testing decision for some semi-parametric RP estimators exactly replicates the classical one. In many situations, the RP-estimator replicating the classical decision rule also provides the best MSE.


Introduction
Statistical tests are usually applied in almost all fields of science to evaluate experimental results.The reproducibility probability (RP) is the true power of a statistical test, and its estimation provides useful information to evaluate the stability of statistical test results.Indeed, when the Neyman-Pearson approach is adopted, that is the Type I error probability is fixed before starting the experiment, the statistical test turns out to be a Bernoullian random variable (viz.significant/non-significant), whose parameter is the RP.Therefore, looking at the RP estimate is the natural perspective for evaluating the stability of test results: the higher the estimated RP, the more stable the observed result is estimated to be; see [1].RP estimation was applied, for example, in the context of clinical trials [2][3][4][5][6].Moreover, RP-testing, that is the adoption of the RP estimate to evaluate the significance of statistical test results, can substitute the p-value testing [7,8].In detail, the RP-testing decision rule, which sounds very intuitive, states: "accept H 0 if the RP-estimate is lower than, or equal to, 1/2, and reject H 0 otherwise".We argue that the RP-testing rule can be adopted in order to bypass the many, well-known criticisms raised by the p-value [9][10][11][12][13] In the context of nonparametric tests, RP estimation has not yet been widely studied.The only works in this field concern RP estimation and testing for the Wilcoxon rank sum test [14,15].
In this paper, some RP estimators for the most commonly-used nonparametric tests are introduced and studied.Specifically, the sign test, the binomial test, the Kendall test and the Wilcoxon signed rank test are considered.Both nonparametric and semi-parametric RP estimators are presented, for each test.Focus is placed on two features: (1) the behavior of different estimators for a given test and their consequent comparison, for example in terms of MSE; (2) the validity, exact or approximated, of the RP-testing rule based on the RP estimators presented here.For the first task, we resort to some simulation studies, whereas appropriate theoretical results are developed for the second one.
The theoretical framework of nonparametric RP estimation and testing is introduced in Section 2, where the problems that can be encountered are explained in depth; then, the class of semi-parametric estimators and that of nonparametric plug-in estimators are introduced, and some theoretical results on RP-testing are provided.In Section 3, the sign test and the binomial test are considered: semi-parametric RP estimation and testing for the binomial test are studied first; then, nonparametric estimation techniques are studied for the same aim; finally, the sign test is considered, showing that the results obtained for the binomial test hold true also for the sign test.RP estimation and testing for the Wilcoxon signed rank test is studied in Section 4, where semi-parametric and nonparametric plug-in estimators are considered and studied separately; then, the behavior of different estimators is compared through simulation.The last test considered (Section 5) is the Kendall test of monotonic association.As in the previous sections, semi-parametric and nonparametric estimators are studied separately, then a simulation is run to compare the behavior of different estimators, in terms of MSE and RP-testing performances.An example of the applications is shown in Section 6, and the conclusions are reported in Section 7.

The General Nonparametric Framework
Let t F be the true cumulative distribution function of a study variable X.This distribution function is unknown and belongs to the class of distributions F .Assume that starting from a random sample X n = (X 1 , • • • , X n ) drawn from t F, it is of interest to solve the testing problem: where F 0 ⊂ F .Let T n = T (X n ) be the test statistic used to solve (1).There are two typical cases that can be encountered when considering nonparametric tests: (A) the exact and asymptotic distributions of T n are known both under H 0 and H 1 ; (B) the exact and asymptotic distributions of T n are known under H 0 .Under H 1 , only the asymptotic distribution can be derived.
Case (A) is rather an exception.The binomial and sign tests are examples of tests under this case.Case (B) is the common situation: for almost all of the distribution-free tests, the exact null-distribution of T n can be derived by using permutations, combinatorics and ad hoc algorithms (see, e.g., [16]).On the contrary, the non-null distribution can be derived only recurring to large-sample approximations.A few examples include the Wilcoxon signed rank test, the Wilcoxon rank sum test and the Kendall test.
Under both Cases (A) and (B), the knowledge of the exact null distribution of T n allows the definition of the exact test: where, as usual, α denotes the Type I error probability and R n,α is a level-α critical region corresponding to the sample size n.For example, if the testing problem (1) is one-sided, the critical region takes, without loss of generality, the form R n,α = (t n,1−α , ∞), where t n,1−α is the (1 − α)-quantile of the null distribution G 0 of T n .Note that, if T n is a discrete random variable, the critical region R n,α is exact, but conservative, i.e., its Type I error probability can be lower than α, since In practice, if the sample size n is sufficiently high, an asymptotic test is usually preferred to avoid the computational effort needed to compute the exact distribution of T n .In particular, the following test is used: where R n,α is the level-α asymptotic critical region, which, considering the one-sided example mentioned above, takes the form R n,α = ( tn,1−α , ∞), where tn,1−α denotes the (1 − α)-quantile of the large sample null distribution of T n .Obviously, the tests Ψ α and Ψ α become closer as the sample size n increases, and they are asymptotically equivalent.However, whatever the sample size is, there is a certain probability of disagreement between ( 2) and (3).To clearly explain the definition of the probability of disagreement, consider sets A 1 , A 2 and A defined as follows: Set A 2 collects the realizations x n of X n for which the null hypothesis is accepted by the asymptotic test and rejected by the exact one.Conversely, set A 1 collects the realizations x n of X n for which the null hypothesis is accepted by the exact test and rejected by the asymptotic one.Therefore, the probability of disagreement between ( 2) and ( 3) is: The differences between Cases (A) and (B) do not impact the definition of the statistical test to solve (1), but they determine the way the power of the test and, therefore, the RP can be evaluated: under Case (A), the power of the test can be exactly computed; under Case (B), the power can be evaluated only approximately.In detail, under Case (A), the exact power Ψ α corresponding to the distribution F ∈ F can be computed as π(n, α, Consequently, the exact RP of the test (i.e., the exact "true power" of the test) coincides with Under Case (B), the exact power of Ψ α can be approximated by π(n, α, where the symbols PF and E emphasize that probability and expectation are computed according to the asymptotic distribution of T n .In this case, the approximated RP is RP = π(n, α, t F).Analogously, under Case (B), the power of Ψ α can be approximated by π a (n, α, and the approximate RP results RP a = π a (n, α, t F).Obviously, the approximate power π(n, α, F) and π a (n, α, F) and the approximate RP π(n, α, t F) and π a (n, α, t F), can be computed under Case (A), as well.Moreover, in this latter case, it is also possible to compute the exact power of the approximate test.However, in practice, under Case (A), if the computational burden is acceptable, the exact test and power are usually computed.In the case of a huge computational cost, the asymptotic test and its approximate power are used.To summarize, in Table 1, the possible approaches to compute the power of a test are represented under the different scenarios that can arise under Cases (A) and (B).The background of the cell representing the approaches commonly employed in practice are colored in gray.Under both Cases (A) and (B), it is possible to get an RP-estimator following several methodologies.These methodologies can be divided into two main subgroups: semi-parametric estimators and non-parametric estimators.

Semi-Parametric RP-Estimation and RP-Testing
As for the WRS test (see [14,15]), in common nonparametric tests, the asymptotic/exact distribution of T n depends on a vector θ t of parameters defined as particular functionals of t F. In such cases, the asymptotic/exact power can be interpreted as a function of θ t instead of a functional of t F. Now, a semi-parametric RP-estimator can be obtained by plugging an appropriate point estimator θ of θ t into the expression of the exact/asymptotic power: • under Case (A.1), the semi-parametric RP-estimator is π = π(n, α, θ); • under Case (A.2) and Case (B.2), the semi-parametric RP-estimator is πa = π a (n, α, θ); • under Case (B.1), the semi-parametric RP-estimator is π a = π(n, α, θ).
As will be explained later, if the estimator θ is appropriately chosen and the testing problem (1) is one sided, the semi-parametric RP-estimator π and πa can be used to replicate the tests Ψ α and Ψα through the RP-testing technique: "accept H 0 if the RP-estimate is lower or equal to 1/2 and reject H 0 otherwise".For several non-parametric tests (see [8] for the general parametric case), if the estimator θ is appropriately chosen, it is possible to demonstrate that Then, the exact and asymptotic tests can be rewritten as The above identities cover Cases (A.1), (A.2) and (B.2).In Case (B.1), the exact test cannot generally be replicated through the RP-testing technique based on semi-parametric estimators.However, the following lemma (proved in the Supplementary Material) describes a case in which this is possible: Lemma 1. Assume that the testing problem (1) is one sided and that the exact test based on the test statistic As pointed out in [17] and in [3,18], it is possible to estimate the RP by using a non-parametric plug-in estimator.Under Cases (A.1) and (B.1), it is possible to consider the plug-in estimators πPI where Fn denotes the empirical cumulative distribution function (ecdf).In practice, πPI e coincides with the rejection rate computed performing test Ψ α over all n n possible samples of size n that can be drawn from the ecdf: πPI where X (X n ) denotes the set of all of the samples of size n that can be drawn with replacement from the ecdf corresponding to X n .Apart from some special cases, the analytical expression of πPI e cannot be derived.Consequently, it is usually approximated by the Monte-Carlo method: B samples of length n are drawn from the ecdf.The test Ψ α is then performed over all of the B samples, and the plug-in RP-estimate is computed as the rejection rate.In detail: where X j n denotes the j-th re-sample drawn from the ecdf.Similarly, under Case (A.2) and Case (B.2), it is possible to define the plug-in RP-estimator starting from the asymptotic test obtaining The plug-in RP-estimators introduced above can be used to define the RP-based test However, there are no general theoretical results assuring that Ψ PI,e α and Ψ PI,e α or Ψ PI α and Ψ PI α are level-α tests equivalent to Ψ α and Ψ α , respectively.

RP-Estimation and Testing for the Binomial and Sign Test
In this section, the performances of the semi-parametric and non-parametric RP estimators for binomial and sign tests are evaluated.At first, the binomial test is considered.Let X n = (X 1 , ..., X n ) be a random sample drawn from the Bernoulli distribution with unknown parameter p t .The statistical hypotheses of interest are: The previous hypotheses can be tested by using the statistic P = 1 n ∑ n i=1 X i .The exact and asymptotic distribution of P is known both under H 0 and under H 1 , and consequently, this test falls under Case (A).Specifically, n P ∼ Binomial(n, p t ) and √ n and b (q;n,p) is the q-quantile of the binomial distribution with parameters n and p (the test so-defined is conservative).The , z q is the q-quantile of the standard normal distribution and • denotes the floor function.Obviously, the exact and the asymptotic critical regions are not equivalent.Their disagreement can be evaluated using Expression (4), which, in this case,can be exactly evaluated.In Table S1 of the Supplementary Material, the values of D(p t , n, α, p 0 ) are computed by fixing α = 0.05 for some values of n, p 0 and p t .
From this table, it emerges that the probability of disagreement between the tests Ψ α (X n ) and Ψ α (X n ) can be very high for some combinations of n, p 0 and p t : it is often higher than 10% (up to 20%) with sample size n = 15, and it remains higher than 10%, for just a few cases, even with n = 30.

Semi-Parametric RP-Estimation and Testing for the Binomial Test
The exact power function and the exact RP of Ψ α (Case (A.1)) are π(n, α, p) = 1 − B(nc α ; n, p) and RP = π(n, α, p t ), where B(•; n, p) is the binomial cumulative distribution function with parameters n and p.Following [8], the semi-parametric RP-estimator based on the exact power is obtained by plugging the median estimator for p t into the expression of π(n, α, p).The median estimator P• is defined as the solution of the equation B(n p; n, P• ) = 1/2, and the resulting RP-estimator is π = 1 − B(nc α ; n, P• ).Similarly, the approximate power function of Ψ α (Case (A.2)) is , where Φ(•) is the standard normal cdf, and the approximate RP results RP = π a (n, α, p t ).
The corresponding RP-estimator is then πa = π a (n, α, P).Note that, in this case, the probability distribution of π and πa can be obtained analytically.In particular, the support of π is given by the values π(s) = 1 − B(nc α ; n, p• s ), s = 0, 1, ..., n, where p• s is the solution of B(s; n, p• ) = 1/2.The probability function of π is given by P .., n.Now, both the asymptotic and the exact tests can be replicated by using the RP-estimators defined above.Specifically, thanks to the results in [8] (which require the adoption of the median estimator P• in the definition of the RP-estimator π), it results that: Similarly, it is easy to verify that: Note that, also for the validity of this last identity, the use of the point estimator P in the definition of πa is fundamental.

Non-Parametric RP-Estimation and Testing for the Binomial Test
The case of the binomial test is particularly interesting when studying the features of the plug-in RP-estimators, since, in this context, the probability function of the estimators πPI e and πPI a,e can be analytically derived, and the RP-based decision rules based on the latter can be analytically studied.Lemma 2 below describes the analytical expression of the non-parametric plug-in RP-estimator for the exact binomial test (Point 1); provides the probability distribution of this estimator (Point 2); establishes the equivalence between the exact binomial test and the RP-based decision rule derived by the non-parametric plug-in estimator (Point 3).Similar results concerning the asymptotic binomial test are provided in Lemma 3. Lemma 2. Let X n = (X 1 , ..., X n ) be a random sample drawn from the Bernoulli distribution with unknown parameter p t in order to test hypotheses (7).It results that: Lemma 3. Let X n = (X 1 , ..., X n ) be a random sample drawn from the Bernoulli distribution with unknown parameter p t in order to test Hypotheses (7).It results that: The proofs of Lemma 2 and Lemma 3 are reported in the Supplementary Material.

Evaluating the Performances of the RP-Estimators for the Binomial Test
In the case of the binomial test, it is possible to compute the exact expectation and the MSE of π and πa , πPI e and πPI a,e .In order to make a comparison among these estimators, their exact bias and MSE are represented in Figure S1 and Figure S2 of the Supplementary Material.Here, in Figure 1, only the MSE curves with n = 15, α = 0.05, and p = 0.2, 0.5, are given.From these figures, it emerges that there is no RP-estimator that uniformly performs best.Concerning the estimators for the power of Ψ α (X n ), there is a tangible difference between the performance of π and πPI e .For a wide range of small values of p t , π has a bias and MSE, which is greater than the one of πPI e ; for large values of p t , π generally performs better than πPI e ; whereas, the performances of πa and πPI a for the power of Ψ α (X n ) are very similar.Regarding RP-testing, we recall that there is no disagreement between classical binomial tests (exact or approximated) and their RP-based version.The results obtained here for the binomial test still hold for the sign test.The interested reader is referred to the Supplementary Material where the connection between these tests is explained in depth.

RP-Estimation and Testing for the Wilcoxon Signed Rank Test
Let X n = (X 1 , ..., X n ) be a random sample from a continuous and symmetric cdf F θ t with median θ t .In order to test H 0 : θ t ≤ θ 0 vs H 1 : θ t > θ 0 , it is possible to apply the Wilcoxon signed rank (WSR) test, which is based on the statistic ) and: Following the classification proposed in Section 2, the WSR test falls under Case (B), since the exact distribution of W can be derived by enumeration (see [19] on p. 126) under H 0 , but, under H 1 , it can only be approximated by using a central limit theorem.In particular, it is well known (see [19] on p. 166) that where: with: being Z = X − θ 0 , Z and Z i.i.d. to Z.Note that, under H 0 , p = p 1 = 1 2 and p 2 = 1 3 .These results allow the use of W in order to define the exact and asymptotic tests where w α denotes the (1 − α)-quantile of the exact null distribution of W and wα = n(n+1) . Obviously, the exact and asymptotic tests are not equivalent, and their disagreement is evaluated using Expression (4).In Table S2 of the Supplementary Material, the values of D(α, n, F θ t , θ 0 ) are computed by fixing α = 0.05 and θ 0 = 0 for some values of n and θ t and by considering X ∼ N (θ t , 1) (light tails) and X ∼ Cauchy(θ t ) (fat tails).

Semi-Parametric RP-Estimation and Testing for the WSR Test
As mentioned above, the WSR is classified under Case (B).Therefore, its exact power function cannot be generally determined.However, it can be approximated thanks to the asymptotic normality of W. The approximation of the power function of the exact test . Analogously, the approximation of the power function of the asymptotic test . Now, in order to define some semi-parametric RP-estimators starting from the approximated power function reported above, it is necessary to derive the estimators for E F θ [W] and Var F θ [W].They can be obtained by plugging into Expressions ( 9) and (10) the estimators for the parameters p, p 1 and p 2 , defined in (11).Below, two different estimators for these parameters are considered.
• Analogic estimators: . Let G n be the empirical distribution function of the Z i 's (i.e., of the X i − θ 0 's).By plugging G n into the above expressions, the following estimators are obtained: Now, the following RP-estimators for the exact test can be introduced: π1 , where Ê = e( p, p1 ), V = v( p, p1 , p2 ), Ẽ = e( p, p1 ) and Ṽ = v( p, p1 , p2 ).Analogously, the following RP-estimators for the asymptotic test can be introduced: . Following the idea in [20], the approximated power of nonparametric tests can be simplified by assuming that the variance of the test statistic is close to its value under H 0 (see [19] on pp.72 and 167, for other applications of Noether's approach).In that case, the approximated and simplified power , and π a (n, α, . These expressions give rise to the following additional RP-estimators: -RP-estimators for the exact test: ; -RP-estimators for the asymptotic test: .
Finally, the estimators based on the following Noether's power approximation π(n, α, are also considered here.In particular, the estimators are applied to estimate the RP of both the exact and asymptotic WSR tests. Concerning the RP-based version of the WSR test based on the introduced semi-parametric RP-estimators, the following corollary (proven in the Supplementary Material) can be stated: Corollary 4. The decision rules based on the RP-estimators π1 and π1S exactly replicate the exact WSR test Ψ α .Analogously, the decision rules based on the RP-estimators πa1 and πaS1 exactly replicate the asymptotic WSR test Ψ α .
Concerning the RP-based decision rules stemming from the remaining semi-parametric RP-estimators (i.e., π2 , π2S , πa2 , πaS2 , πN1 and πN2 ), they do not replicate the exact/asymptotic WSR tests, and their disagreement probabilities will be evaluated in Section 4.3.

Non-Parametric RP-Estimation and Testing for the WSR Test
As explained, in Section 2, the RP of the exact and asymptotic WSR test can be estimated by using ( 5) and ( 6), respectively.Here, we consider the non-parametric RP-estimators πPI 5 , πPI 10 , πPI 20 , πPI a5 , πPI a10 and πPI a20 .The first three estimators coincide with (5) with B = 500, B = 1000, B = 2000.The last three estimators coincide with ( 6) with B = 500, B = 1000, B = 2000.As mentioned above, the RP-based decision rules based on these estimators do not replicate the exact and asymptotic WSR tests, respectively, and their disagreement probabilities will be evaluated in Section 4.3.

Evaluating the Performances of the RP-Estimators for the WSR Test
In order to evaluate the performances of the several RP-estimators introduced above for the exact and asymptotic WSR test, a simulation study is built.The scenarios considered in the simulation study regard the testing problem H 0 : θ t ≤ 0 vs. H 0 : θ t > 0 with α = 0.05.The considered sample sizes are n = 15, 30, 60, 120, 240.Data are drawn from normal distribution with unit variance and mean (median) θ t and shifted Cauchy with median θ t .For each one of the considered sample sizes and distributions (normal or Cauchy), 19 values for θ t have been considered.These values have been obtained by simulation and have been chosen in order to provide the following prefixed values for the power of the exact/asymptotic test: (α, 0.1, 0.15, 0.20, 0.25, ..., 0.85, 0.9, 0.95).In each simulation, 10 4 replications are considered.
The results of the simulation study are summarized in Tables S3 and S4 in the Supplementary Material, where the averages (computed over the 19 different values of θ t ) of the simulated MSE, simulated bias and disagreement rate are provided.Here, in Table 2, only the simulated MSE and disagreement rate related to the Cauchy distribution are provided.Note that the disagreement between the exact Wilcoxon signed rank test and its approximated versions is often higher than 2% with n = 15 (up to 2.5%) and can reach 0.8% with n = 30 (see Table S2 in the Supplementary Material).Rather, the averaged disagreement between classical tests and their RP-based version, with n = 15, surpasses 2% just with two estimators, whereas for some of them, no disagreement is shown; with n = 30, some RP estimators provide a disagreement between the classical test and the RP-based one resulting in a little higher than 1%, but no disagreement is shown for two of them.
Regarding RP estimation, the estimators that globally have the lowest MSE are πaS2 for the approximated test and πS2 for the exact test.However, these estimators do not exactly replicate the corresponding classical test.By considering both the estimation performance and the disagreement probability, we suggest using the estimators πaS1 for the approximated test and πS1 for the exact test, since their MSE is very similar to the ones of πaS2 and πS2 , but their disagreement probability is null.As a final remark, note the good performance of the non-parametric plug-in estimators, which is not far from the one of the semi-parametric ones, even if they are not ad hoc estimators.

RP-Estimation and Testing for the Kendall Test of Monotonic Association
Let (X, Y) be a bivariate continuous random variable with joint distribution F t and margins t F X and t F Y .Let (X, Y) n = {(X i , Y i ), i = 1, ..., n} be a random sample drawn from F t .To test the presence of positive or negative monotone association between X and Y, the Kendall test can be adopted.Without loss of generality, consider the alternative hypothesis of positive monotone association.In that case, the testing problem of interest is H 0 : τ t ≤ 0 vs. H 0 : τ t > 0, where τ t is the Kendall rank-correlation coefficient, which, under the assumption of absolute continuity of F t , is defined as the difference between the probability of concordance p 1 and the probability of discordance p 1 : where for the WSR test, the Kendall test falls under Case (B).The exact distribution of τ can be derived, under H 0 , by enumeration or by using a recurrence relation (see [21]), but generally, it can only be approximated through a central limit theorem under H 1 .In particular, it is well known (see [22]) that where: and with being (X , Y ) and (X , Y ) i.i.d as (X, Y).Note that, under H 0 , p 2 = 5 9 , and consequently: Var 0 [ τ] = u 0, 5 9 = 2(2n+5) 9n(n−1) .These results allow the introduction of the exact and asymptotic Kendall tests otherwise where t α denotes the (1 − α)-quantile of the exact null distribution of τ and tα = z 1−α 2(2n+5) 9n(n−1) .Note that the computational burden necessary to compute the exact null distribution of τ increases rapidly with n.From a practical point of view, the exact test can be performed, if n < 9, by computing the exact (1 − α)-quantile from the null distribution of τ using, for example, the software R [23] function qKendall of package SuppDists [24].If n > 9, the asymptotic test Ψ α is generally performed or an Edgeworth expansion (see [25]) is used to obtain a better approximation of t α .The (1 − α)-quantile from the null distribution of τ approximated by the Edgeworth expansion is also computed by qKendall.When n > 9, it is common practice to refer to the test based on the Edgeworth expansion as the "exact" Kendall test, even if it is actually an approximated test.From here onwards, this commonly-used terminology will be adopted.
Obviously, the exact and asymptotic tests are not equivalent, and their disagreement is evaluated, again, using Expression (4).In Table S5 of the Supplementary Material, the probabilities of disagreement between the asymptotic and exact (based on Edgeworth expansion) tests are computed by fixing α = 0.05 for some values of n and τ t when sampling from the bivariate normal distribution with correlation coefficient ρ and from the bivariate Student's t distribution with three degrees of freedom (df) and correlation coefficient ρ.

Semi-Parametric RP-Estimation and Testing for the Kendall Test
The exact power function of the Kendall test cannot be generally determined, but it can be approximated thanks to the asymptotic normality of τ.In particular, the approximation of the power function of the exact test Analogously, the approximation of the power function of the asymptotic test . Now, in order to define semi-parametric RP-estimators starting from the approximated power function reported above, it is necessary to derive estimators for E F t [ τ] and Var F t [ τ].From Expressions ( 12) and (13), it follows that E F t [ τ] can be estimated by τ, while an estimator for Var F t [ τ] can be introduced once an estimator for p 2 has been defined.Two different estimators for p 2 are considered here: • Analogic estimators: Remembering Expression ( 14), the analogic estimator for p 2 results: p2 = ∑ n k=1 I ijk where otherwise .
• Plug-in estimators: In order to introduce these estimators, the following alternative expression for p 2 is useful: Now, let F nX , F nY and F n be the ecdfs of X, Y and (X, Y), respectively.The plug-in estimators for p 2 results: p2 = Now, the following RP-estimators for the exact test can be introduced: π1 , where Û = u( τ, p2 ), Ũ = u( τ, p2 ).Analogously, the following RP-estimators for the asymptotic test can be introduced: πa1 As for the WSR test, the approximated power of nonparametric tests can be simplified following Noether's idea by assuming that the variance of the test statistic is close to the value it assumes under H 0 , obtaining the estimators   for the exact and asymptotic tests, respectively.Note that the above estimators are very simple, since they do not require the estimation of p 2 .Another approach that can be followed in order to introduce an approximation of the power function π(n, α, F t ) and π a (n, α, F t ) is described below.From Expression (14), it is clear that p 2 is not independent from τ t , and there is no unique function describing the behavior of p 2 as a function of τ t since the relation between p 2 and τ t depends on the entire shape of F t .However, if τ t = ±1, then p 2 = 1, while if τ t = 0, then p 2 = 5/9.Then, the relation between τ t and p 2 can be intuitively represented by the parabola passing through the points (−1, 1), (0, 5/9) and (1, 1): . By substituting this expression into (13) one obtains the RP-estimators   for the exact and asymptotic tests, respectively.For completeness, the [20] estimator is also considered and applied both to the exact and asymptotic tests: Finally, the estimators deduced from the power approximation provided in [2] are considered:

Non-Parametric RP-Estimation and Testing for the Kendall Test
As for the WSR test, the RP of the exact and asymptotic Kendall test can be estimated by using the non-parametric RP-estimators πPI 5 , πPI 10 , πPI 20 , πPI a5 , πPI a10 and πPI a20 .It is recalled once again that the RP-based decision rules based on these estimators do not replicate the exact and asymptotic Kendall tests, respectively, and their disagreement probabilities will be evaluated next.

Evaluating the Performances of the RP-Estimators for the Kendall Test
In order to evaluate the performances of the several RP-estimators introduced above for the exact and asymptotic Kendall test, a simulation study is built.The scenarios considered in the simulation study regards the testing problem H 0 : τ t ≤ 0 vs. H 1 : τ t > 0. The considered sample sizes are n = 15, 30, 60, 120.Data are drawn from the bivariate standard normal distribution with correlation coefficient ρ and from the bivariate Student's t distribution with three df and correlation coefficient ρ.For each one of the considered sample sizes and distributions, 19 values for ρ have been considered.These values have been obtained by simulation and have been chosen in order to provide the following prefixed values for the power of the exact/asymptotic test: (α, 0.1, 0.15, 0.20, 0.25, ..., 0.85, 0.9, 0.95).In each simulation, 10 4 replications are considered.The results of the simulation study are summarized in Table S6 and Table S7 of the Supplementary Materials.In these tables, the averages (computed over the 19 different values of θ) of the simulated MSE, simulated bias and disagreement rate are provided.Here (see Table 3), only the simulated MSE and disagreement rate obtained under the bivariate Student's t distribution with three df are reported.As for the binomial and sign tests, the disagreement between the exact Kendall test and its approximated versions is quite high: often higher than 5% and in some cases higher than 10%, both with n = 15 and n = 30.The disagreement is still remarkable even with n = 120.On the contrary, the averaged disagreement between the classical asymptotic test and its RP-based version is between 3% and 7% for just three estimators, for each sample size, whereas the other estimators provide a disagreement often lower than 1%.The disagreement between the classical exact test and the RP-based one results in being a little higher than the previous case, but still lower than the disagreements between classical tests.
Regarding RP estimation, the simulation results suggest that the best estimators for the approximated and exact tests are πa2 and π2 , respectively.Indeed, these two estimators generally have the least MSE and a null probability of disagreement.Also in these cases, the good performance of the general non-parametric plug-in estimators should be noted.

Example of Applications
Let us consider the data reported in Table 4 (see [26], p.38), concerning the Hamilton depression scale factor (HDSF) in nine patients with mixed anxiety and depression, observed at a first visit before the initiation of a therapy (X) and at a second visit after administration of a tranquilizer (Y).An improvement due to the tranquilizer corresponds to a reduction of the HDSF.Six patients out of nine showed a reduction; one was almost invariant; and two gave small increments.The sign test and the WSR test have been applied to evaluate HDSF reduction and the Kendall test to evaluate the association between X and Y.
For each test, the RP estimates given by the best semiparametric estimator (among those studied above) and by the nonparametric πPI 20 are computed, at three levels of α: 0.01, 0.05, 0.1 (see Table 5).First, note that RP estimates decrease as tests become stricter, i.e., as α decreases.Second, RP estimates fulfill RP-testing.Third, RP estimates might differ from one technique to another: the nonparametric technique is not the most reliable, but is a general one, whereas the best RP estimation technique should be customized for each test.
As concerns the interpretation of the results, RP estimates highlight that significant outcomes are often less reproducible than one may think.For example, when α = 0.05, the significance threshold for the WSR test with n = 9 data is w 0.05 = 36, and the significant result W ob = 40, although providing a p-value that is quite small (i.e., 0.02), gave an RP estimate of about 2/3: this means that, it is estimated that, under the same experimental conditions, about one out of three test replications will not show significance.
On the other hand, non-significant outcomes might be highly variable, and significance can be found with non-negligible probability when replicating the experiment.Continuing the example above, and assuming that α was 0.01, the observed test statistic provides a non-significant p-value of about two-times α, but also gives an RP estimate not far from 50%.Finally, we remark that even when p-values are quite a bit smaller than α, RP estimates may not be high, that is the test results (viz.significances) are estimated to be quite variable.For example, when p-values result in being about one order of magnitude smaller than α, RP estimates are still close to 80%.

Conclusions
Several results have been obtained, concerning both the precision of RP estimators and of RP-testing in the cases of the binomial, sign, Wilcoxon signed rank and Kendall tests.
For both the binomial and sign tests, the RP-testing rule holds exactly, also when nonparametric estimators of RP are adopted.
In terms of estimation performances, semi-parametric and nonparametric estimators behave similarly.
For the WSR and Kendall tests, the RP-testing rule holds exactly for just some RP estimators, and for the remaining ones, the disagreement is very small.It is worth noting that the disagreement between these two classical exact tests and their respective approximated version is often higher than the disagreement between the classical tests (exact or approximated) and their RP-based version.

Table 1 .
Possible approaches to compute the power of a test under the different scenarios related to Cases (A) and (B).The cells with gray background represent the possible approaches commonly employed in practice.

Table 2 .
Averaged MSE and disagreement rate for the asymptotic and exact Wilcoxon signed rank (WSR) test when sampling from the Cauchy distribution.The averages are computed over the 19 different values of θ considered in the simulation study.The smallest values for the averaged MSE and disagreement are highlighted in bold. RP-

Table 3 .
Averaged MSE and disagreement rate for the asymptotic and exact Kendall's test when sampling from the t copula with 3 degrees of freedom.The averages are computed over the 19 different values of ρ considered in the simulation study.The least values for the averaged MSE and disagreement are highlighted in bold.

Table 5 .
RP estimates for the example data.