Next Article in Journal
RHCA: Robust HCA via Consistent Revoting
Next Article in Special Issue
Nonparametric Estimation of Conditional Copula Using Smoothed Checkerboard Bernstein Sieves
Previous Article in Journal
Revisiting the Dynamics of Two-Body Problem in the Framework of the Continued Fraction Potential
Previous Article in Special Issue
Design and Analysis of Extended Exponentially Weighted Moving Average Signed-Rank Control Charts for Monitoring the Process Mean
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Non-Parametric Sequential Procedure for the Generalized Partition Problem

Department of Mathematics, University of New Orleans, New Orleans, LA 70148, USA
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(4), 591; https://doi.org/10.3390/math12040591
Submission received: 29 December 2023 / Revised: 29 January 2024 / Accepted: 13 February 2024 / Published: 17 February 2024
(This article belongs to the Special Issue Nonparametric Statistical Methods and Their Applications)

Abstract

:
In selection and ranking, the partitioning of treatments by comparing them to a control treatment is an important statistical problem. For over eighty years, this problem has been investigated by a number of researchers via various statistical designs to specify the partitioning criteria and optimal strategies for data collection. Many researchers have proposed designs in order to generalize formulations known at that time. One such generalization adopted the indifference-zone formulation to designate the region between the boundaries for “good” and “bad” treatments as the indifference zone. Since then, this formulation has been adopted by a number of researchers to study various aspects of the partition problem. In this paper, a non-parametric purely sequential procedure is formulated for the partition problem. The “first-order” asymptotic properties of the proposed non-parametric procedure are derived. The performance of the proposed non-parametric procedure for small and moderate sample sizes is studied via Monte Carlo simulations. An example is provided to illustrate the proposed procedure.

1. Introduction

The statistical problem of comparing treatments with a control population has been an active area of research for nearly eighty years. One of the earlier research studies that had proposed a formal statistical design to compare treatments with a control is reported in [1]. Soon after this, Ref. [2] investigated this problem for normal means and binomial proportions with an idea of spacing between treatments. Ref. [3] extended this further by exploring the idea of multiple comparisons and formulated a procedure to carry out comparisons with a control population. The idea of spacing was further refined in [4] which formally conceptualized the “indifference zone” formulation for selecting the best normal population from a group of several normally distributed populations in the preference zone with the predetermined probability. In statistical literature, the region outside the indifference zone is referred to as the preference zone. Also in the 1950s, another formulation was proposed for the problem of selecting or isolating the best population in [5], which had the property that it did not restrict the selection from the preference zone but rather the selection was carried out from the entire parameter space. This formulation of the problem, known as the “subset-selection formulation”, selects a subset of the populations of random size which includes the best treatment with the prespecified probability. A number of researchers have studied this problem by formulating it under various requirements and goals and while adopting various sampling methodologies. Once such formulation that has been extensively studied in the literature is in which the experimenter wants the selected population to be some “specified amount better” than other treatments, which is referred to as a control or standard. This area of research is typically known as the problem of “comparisons with a control” or the “partition problem” in statistical literature. For the partition problem formulation, one formulation that has been used by a number of practitioners and researchers is the one introduced in [6] for the populations that follow a normal distribution.
In Section 2, we have summarized the [6] formulation and provided a summary of the current research in the area. In Section 3, we have proposed a distribution-free version of the [6] formulation and proposed a purely sequential methodology and derived its first-order asymptotic properties. In Section 4, we have studied the performance of the proposed non-parametric procedure by picking different values of design constants to study how the asymptotic expansions provided in Theorem 1 compare with the observed values when the procedure is simulated for small and moderate sample sizes. In Section 5, we have provided an example to illustrate an application of the proposed non-parametric purely sequential procedure.

2. Normal Populations Case

Assume that we have k + 1 independently distributed normal populations to be donated as π 0 , π 1 , , π k , with respective means μ 0 , μ 1 , , μ k and a common variance σ 2 . We will assume that all the parameters are unknown. The population π 0 is referred to as the control or standard population. The formulation presented in [6] starts by mathematically defining the “good” and “bad” populations based on the input from practitioners or experts in the area of the application.
Next, for fixed but arbitrary constants, δ 1 and δ 2 , with δ 2 > δ 1 , ref. [6] defined the “good” and “bad” populations via three sets by adopting the [4] indifference zone formulation, as defined below
Ω B = { π i : μ i μ 0 + δ 1 , i = 1 , , k } , Ω G = { π i : μ i μ 0 + δ 2 , i = 1 , , k } , Ω I = { π i : μ 0 + δ 1 < μ i < μ 0 + δ 2 , i = 1 , , k } .
The set Ω G is termed to as the set of “good” populations while the set Ω B is termed as the set of “bad” populations. Note that the two constants δ 1 and δ 2 are determined based on the input of experts in the area specifying how much better or worse a population has to be compared to the control to be termed as a good population or a bad population. The goal in [6] was to partition the populations that belong to Ω G or Ω B correctly with the prespecified probability. On the other hand, the set Ω I is termed as the indifference-zone set, and the experimenter is indifferent to the correct partition of the populations that fall in the set Ω I . The parition problem is designed to partition the set Ω = { π i , i = 1 , , k } into two mutually disjoint sets S B and S G , with high accuracy, so that all populations in Ω B fall inside S B and all populations in Ω G fall inside S G . That is, when all the populations in Ω B or Ω G are partitioned correctly, then such a partition is defined as a correct decision (CD). Mathematically, let us denote by P * the probability of correct decision that the experimenter wants to achieve. Note that 1 2 k < P * < 1 , as the probability of selecting correctly randomly is 1 2 for each of the k populations.
Next, using a sampling design, determine N as the sample size from each of the k populations and the control population and the sample mean X ¯ i N from π i , i = 0 , 1 , , k . Define d = ( δ 1 + δ 2 ) / 2 ; then, the decision rule proposed by [6] to partition all the populations in Ω took the following form:
S B = { π i : X ¯ i N X ¯ 0 N d , i = 1 , , k } , S G = { π i : X ¯ i N X ¯ 0 N d , i = 1 , , k } .
Ref. [6] has shown that if the sample size N satisfies N 2 σ 2 a 2 b 2 , and we partition the k populations according to the partition rule (2), then
P C D P * , μ R k + 1 , σ R + .
Note that a = ( δ 2 δ 1 ) / 2 , l = k / 2 when k is even and l = ( k + 1 ) / 2 when k is odd, and the k × k matrix covariance matrix Σ = ( σ i j ) is a given by
  σ i j =     1 when i = j ,     1 / 2 when i j and 0 < i , j l or l < i , j k , 1 / 2 when 0 < i l and l < j k ,
and b is a constant satisfying the integral equation given by
P * = b b Σ 1 2 2 π k 2 exp y Σ 1 y / 2 d y 1 d y k .
Ref. [6] has tabulated the values of design constant b for various choices of k and P * . For the unknown σ 2 case, ref. [6] also constructed a two-stage and a purely sequential procedure.
For the normal distributions case, ref. [7] constructed several multistage methodologies focusing on the second-order asymptotic expansions. For references on the partition problem for binomial treatments, the reader is referred to [8]. In [9], a generalization of the “Tongs formulation” was introduced so that the treatments that fall between the “good” and “bad” treatments can be partitioned as a separately identifiable group by introducing two indifference zones. Ref. [10] extended this generalization by constructing an asymptotically unbiased fine-tuned purely sequential procedure to guarantee the probability requirement.
Next, we have constructed a non-parametric procedure to partition the k populations compared to a control population that does not require the populations to be normally distributed. However, we have assumed that the unknown distributions are symmetric. Next, in Section 3, we have proposed a distribution-free version of the [6] formulation, proposed a purely sequential methodology and derived its first-order asymptotic properties.

3. Non-Parametric Partition Problem

Assume that we are given ( k + 1 ) independent populations π 0 , π 1 , π 2 , , π k , where the control population is denoted as π 0 . Assume that the cumulative distribution function (cdf) of π i is F ( x Δ i ) for i = 0 , 1 , , k . We will assume the cdf F . is continuous and symmetric. Note that the function F ( . ) and all the centers of symmetries, namely, Δ 0 , Δ 1 , ⋯, Δ k are assumed to be unknown. Following [6], we have defined below what an experimenter may define as “good” and “bad” populations compared to a control based on the input from experts in the area of application. As in Section 2 for the normal populations, we will partition all k populations by comparing the centers of symmetry Δ i , i = 1 , , k with the control population’s center of symmetry Δ 0 to define the set of “good” and “bad” populations which has the probability of correct decision ( C D ) of at least P * . As before, 1 2 k < P * < 1 .
Based on the input from experts in the area, the statistical design would start by selecting two arbitrary but fixed design constants, δ 1 and δ 2 , with δ 2 > δ 1 . Next, as in [6], we define three subsets for Ω = { π 1 , , π k } following the idea of spacing from [4] the indifference-zone formulation as follows:
Ω L = { π i : Δ i Δ 0 + δ 1 , i = 1 , , k } , Ω R = { π i : Δ i Δ 0 + δ 2 , i = 1 , , k } , Ω I = { π i : Δ 0 + δ 1 < Δ i < μ 0 + δ 2 , i = 1 , , k } .
Note that Ω R and Ω L are the sets of “good” populations and “bad” populations, respectively, whereas Ω I is the set of populations the experimenter would be indifferent to. We define two constants based on δ 1 and δ 2 as d = ( δ 1 + δ 2 ) / 2 and δ * = ( δ 2 δ 1 ) / 2 . Let Λ denote a class of symmetric and continuous distributions which satisfy some regularity conditions to be specified in Section 4. Next, we propose a purely sequential procedure for the partition problem described in (5). The procedure starts with an initial sample size of m 2 observations from all the (k + 1) populations. Next, implementing the “vector-at-a-time” sampling procedure, we will sample one observation from all the (k + 1) populations according to the stopping rule defined below in (7). Having recorded an independent sample X i 1 , X i 2 , , X i n , a sample of size n from π i , i = 0 , 1 , , k , a statistic L i n , to be defined below, is proposed to estimate the center of symmetry Δ i , i = 0 , 1 , , k . The estimator L i n has an asymptotic normal distribution. That is, N Δ i , 1 1 n A 2 n A 2 , as n for i = 1 , , k , F ( . ) Λ . Note that the unknown constant A is a finite and positive function of F. For the literature of non-parametric procedures in the area of selecting the best population, the reader is refereed to [11]. One may also refer to [12] who had constructed a non-parametric accelerated sequential procedure to select the population with the largest center of symmetry.
Based on a sample of size n, the decision rule is to compare each L i ( n ) with L 0 ( n ) , i = 1 , , k , and then partition the k populations following the partition rule given by:
P L = π i : L i ( n ) L 0 ( n ) < d , i = 1 , , k P R = π i : L i ( n ) L 0 ( n ) d , i = 1 , , k ,
Next, as in [11], we will assume that the following regularity conditions are satisfied by the unknown distribution F ( . ) and the purely sequential stopping rule, which is implemented to obtain the sample size N:
Regularity Conditions: We will assume the following three conditions hold for all ω ( δ * ) Ω and F ( . ) Λ :
  • n 1 / 2 L i n Δ i = A 1 Z i n + o 1 a.s. as n where Z i n is a standardized average of independent and identically distributed random variables having a finite second moment and 0 < A = A F < .
  • For an estimator S n 2 of A, as n , we have lim S n 2 = A 2 a.s.
  • The set δ 2 N δ : δ > 0 is uniformly integral.
Next, following [7], one can obtain that P ( C D ) is asymptotically at least P * if the sample size n is at least 2 b 2 ( A δ * ) 2 . Here, “b” is a constant, as reported earlier, which is a function of k and P * . Let us denote n * =   2 b 2 ( A δ * ) 2 . The expression n * is known as the optimal sample size. However, it is unknown as A is unknown. Next, to estimate A, a purely sequential procedure is constructed which satisfies the correct decision probability requirement and has lim inf P C D P * whenever θ ω ( δ * ) and the unknown cdf F ( . ) Λ , as δ * 0 . The purely sequential procedure starts with m observations from each population, and it samples one observation from all (k + 1) according to the stopping rule:
N = inf { n m : n 2 b 2 S n 2 δ * 2 }
where S n 2 , an estimator of A, is computed using the control and all k populations. Also, S n 2 depends on the estimator of the center of symmetry Δ i , i = 0 , 1 , , k . Next, we present a theorem to the first-order properties of the proposed purely sequential procedure (7).
Theorem 1.
The purely sequential procedure defined in (7), under the assumptions as outlined above, satisfies the following properties for all F ( . ) Λ and ω δ * Ω :
(i) 
N δ * monotonically as δ * 0 a.s.
(ii) 
E N δ * as δ * 0 .
(iii) 
lim δ * 2 N δ * = 2 b 2 2 b 2 A 2 A 2 a.s.
(iv) 
lim inf P C D P * as δ * 0 .
Proof. 
We start with an estimator S n 2 for the center of symmetry. Based on a sample of size n, let L i ( n ) denote the Hodges–Lehmann estimator for the center of symmetry Δ i of the ith population i = 0 , 1 , , k . That is, the sample median of the n n + 1 n n + 1 2 2 quantiles X i j + X i l X i j + X i l 2 2 for j l , j, l = 1 , , n ; i = 0 , 1 , , k . Then, we consider the estimator of A 2 is given by
S n 2 = n k + 1 K α 2 1 4 i = 0 k W n , a n i W n , b n i 2 ,
where W n , 1 i W n , 2 i W n , n n + 1 , n n + 1 2 2 i are the ordered X i j + X i l X i j + X i l 2 2 for 1 j l n and for i = 0 , 1 , , k . The sequence a n and b n are specified as
b n = max 1 , n n + 1 n n + 1 4 4 K α n n + 1 2 n + 1 n n + 1 2 n + 1 24 24 1 2 a n = n n + 1 n n + 1 2 2 b n + 1 .
where x is defined as the largest integer less than or equal to x. K α is defined by ϕ K α = 1 α for some 1 / 2 < α < 1 . The Hodges–Lehmann estimator has been used extensively in statistical literature, and it is well known that L i ( n ) is a consistent estimator of the center of symmetry. The reader is referred to [13] for details.
Next, note that N ( δ 1 * ) N ( δ 2 * ) w.p. 1 if 0 < δ 1 * < δ 2 * , that is N ( δ * ) is non-decreasing in δ * . Now, the assumption 1.1 [13] in regularity conditions will lead to part (i). Part (ii) follows by applying the monotone convergence theorem. Since the stopping rule is
N δ * = inf n m 0 : n 2 b 2 S n 2 2 b 2 S n 2 δ * 2 δ * 2 ,
then the basic inequality simplifies to
2 b 2 S n 2 2 b 2 S n 2 δ * 2 δ * 2 N m 0 + 2 b 2 S n 1 2 2 b 2 S n 1 2 δ * 2 δ * 2 .
Now, multiply δ * 2 throughout (10) and take limits as δ * 0 ; this leads to part (3). For the population π i , statistic L i ( N ) is proposed to estimate Δ i . For θ Ω ( δ * ) , we have
P C D θ Ω ( δ * ) = P L i N L 0 N < d , 0 < i r ; L j N L 0 N d , r < j k = P L i N Δ i L 0 N Δ 0 n * A 2 < d Δ i Δ 0 n * A 2 , 0 < i r ; L j N Δ j L 0 N Δ 0 n * A 2 d Δ j Δ 0 n * A 2 , r < j k = P Z i Z 0 2 < n * A δ * 2 , 0 < i r ; Z j Z 0 2 n * A δ * 2 , r < j k = P Y i N n * A δ * 2 , i = 1 , , k .
where
Z i ( N ) = n * A L i ( N ) Δ i
for i = 1 , , k ,
Y i ( N ) = Z i ( N ) Z 0 ( N ) 2 , Y j ( N ) = Z 0 ( N ) Z j ( N ) 2
for 0 < i r , r < j k . If we define the ( k × k ) covariance matrix Σ r = ( σ i j ) by
σ i j = 1 , f o r i = j ; = 1 2 , f o r 0 < i , j r o r r < i , j k ; = 1 2 , f o r 0 < i r a n d r < j k ,
then
P C D θ Ω ( δ * ) = n * A δ * 2 n * A δ * 2 2 π k 2 Σ r k 2 exp 1 2 y Σ r 1 y i = 1 k d y i .
Equation (12) gives the infimum of the P C D for the set of all configurations such that there are r populations from Ω L (bad populations) and k r populations from Ω R (good populations). The right side of (12) achieves a minimum over all r 0 < r k under the LFC. Let b = b ( P , k ) be the solution of the equation
P = b b b 2 π k 2 Σ k k 2 exp 1 2 y Σ k 1 y i = 1 k d y i
Also, for any real number c and q, let
P q c = c c c 2 π q 2 Σ q q 2 exp 1 2 y Σ q 1 y i = 1 q d y i
where the q × q covariance matrix Σ q = σ i j is such that
σ i j = 1 , f o r i = j ; = 1 2 , f o r i j .
Define
A = Y i b , i = 1 , , r B = Y i b , i = r + 1 , , k
then
P r b + P k r b = 1 + P *
which leads to
P A B = P Y i N b , i = 1 , , k = P C D θ Ω ( δ * ) P *
i.e., lim inf P C D P * , which is part (4). This completes the proof of the theorem.  □

4. Monte Carlo Simulation Results

In this section, using the Monte Carlo simulation study, the “purely sequential procedure” (7) is replicated independently 5000 times by picking different values of design constants to study how the asymptotic expansions provided in Theorem 1 compare with the observed values when the procedure is simulated for small and moderate sample sizes. In our simulation study, we considered k = 8 independent populations and one control population. To construct the LFC, we generated f o u r populations with the center of symmetry equal to μ 0 δ , and the remaining f o u r populations are generated to have the center of symmetry as μ 0 + δ . The control population is generated to have the center of symmetry as μ 0 . Without loss of generality, we set μ 0 = 0 . For k = 8 and P * = 0.95 , the value of the constant b equals 2.44177 from [6]. Next, we considered the following symmetric distributions: normal distribution, Laplace distribution, t-distribution, uniform distribution, and a mixture of two normal distributions. For these distributions, the parameter A 2 is given by
A 2 = 12 f 2 x d x 2
f x is the density function for normal distribution, Laplace distribution, t-distribution, uniform distribution and a mixture of two normal distributions, respectively. In our simulations, N o r m a l 0 , 1 , the Laplace distribution with μ = 0 , b = 2 / 2 , t-distribution with d f = 5 , U 1 , 1 , and two mixed normal distribution: 0.35 N x 1 ; 0 , 1 + 0.65 N x 2 ; 0 , 2 and 0.8 N x 1 ; 0 , 1 + 0.2 N x 2 ; 0 , 5 were used here.
A N o r m a l 2 = 12 + 1 2 π e x 2 2 2 d x 2 = 12 + 1 2 π e x 2 d x 2 = 0.9549 A L a p l a c e 2 = 12 + 1 2 e 2 x 2 d x 2 = 12 + 1 2 e 2 2 x d x 2 = 1.5 A U n i f o r m 2 = 12 1 1 1 b a 2 d x 2 = 12 1 1 1 2 2 d x 2 = 3 A t 2 = 12 + Γ v + 1 2 v π Γ v 2 1 + x 2 v v + 1 2 2 d x 2 v = 5 = 0.7447 A M i x e d 1 2 = 12 + 0.35 1 2 π e x 2 2 + 0.65 1 2 2 π e x 2 2 · 2 2 2 d x 2 = 0.3689 A M i x e d 2 2 = 12 + 0.80 1 2 π e x 2 2 + 0.20 1 5 2 π e x 2 2 · 5 2 2 d x 2 = 0.5183
After, we obtained the value of the A 2 for each distribution; the value of δ was determined by δ = 2 b 2 n * A 2 . The values of n * which we selected were 50, 100, 200, 400, and 800. For each value of n * , the corresponding value of δ was obtained, and those values have been summarized in Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6. As described earlier, the estimator S n 2 as described in (8) is used to estimate the unknown parameter A 2 . Note that the purely sequential rule does not rely upon the knowledge of A 2 . Next, we generated data from the normal distribution with σ = 1 , Laplace distribution with λ = 2 2 2 2 , t-distribution with d f = 5 , uniform distribution, and two mixed normal distributions given by 0.35 N x 1 ; 0 , 1 + 0.65 N x 2 ; 0 , 2 and 0.8 N x 1 ; 0 , 1 + 0.2 N x 2 ; 0 , 5 , respectively. Note that the Hodges–Lehmann estimator holds for 1 / 2 < α < 1 . In the simulations, we have considered several possible choices of the α and studied the impact of α on the estimation of A 2 . The simulation results are reported in Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6.
From Table 1 and Table 2, note that the purely sequential procedure (7) is oversampling by roughly two to three observations when the population is normally distributed and by just below 10 observations for the Laplace distribution. Also, note that the estimated probability of correct selection is below the target value of 0.95 for the normal case. However, for the Laplace distribution, the estimated probability of correct selection matches the target value of 0.95 quite well. This feature of the statistical estimation should not come as a surprise. The Hodges–Lehmann estimator is more appropriate when the distribution has tails longer than normal distribution tails. That is, when the distribution is close to being normally distributed, then the partition procedures are designed for normally distributed populations, such as the ones described in [7]. However, if the tails are significantly longer than the normal tails, like for the Laplace distribution, then the non-parametric partition procedures are more appropriate.
In Table 3, the underlying distribution is t-distribution with 5 degrees of freedom. The distribution has tails longer than a normal distribution but shorter than the Laplace distribution. Note that the estimated probability of correct selection is somewhat below the target value of 0.95 for smaller values of α . However, as α increases, the estimated probability of correct selection is approaching the target value of 0.95.
Next, we have considered the uniform distribution case which has tails even shorter than the normal tails. One will note that the estimated probability of correct selection is well below the target value of 0.95. This feature is again along the lines of comments made earlier in this section about the Hodges–Lehmann estimator being more appropriate when the distribution has tails longer than normal distribution tails. Next, we have considered the mixture of two normal populations. In the first case, we have considered the 0.35 N x 1 ; 0 , 1 + 0.65 N x 2 ; 0 , 2 which is a mixture of two normal populations with somewhat long tails. The first population is the mixture that has a variance of 1, and the second has a variance of 2. In the second mixture of the two normal populations considered, we have 0.8 N x 1 ; 0 , 1 + 0.2 N x 2 ; 0 , 5 . This second mixture has two normal populations again, but the two variances being 1 and 5, respectively, are farther apart. Intuitively, these two mixture cases are symmetric but are not unimodal like normal distribution or the other distributions considered earlier. The two tables below again exhibit the same behavior: the longer the tails, the better is the performance of the Hodges–Lehmann estimator.

5. An Example

In this section, we study the performance of the non-parametric sequential procedure via a real-world dataset. Ref. [14] conducted a pilot investigation to see if active exercise can preserve walking beyond the 2nd month. In this experiment, newborn children were randomly placed into one of four treatment groups: (1) active exercise group; (2) passive exercise group; (3) no exercise group (these were observed weekly); and (4) control group (observed once after 8 weeks). A traditional 12 months has been known as the mean time infants take to walk. The statistical analysis confirmed that the walking data are normally distributed with somewhat equal variance, adopting a 12.5 % improvement as significant and anything other than 8 % as not significant. We took δ 1 = 1.5 months, δ 2 = 1.0 months, k = 3 , and the starting sample size m = 5 . The data were analyzed via the following three procedures: (1) two-stage procedure of [6]; (2) purely sequential procedure of [7]; (3) non-parametric sequential procedure proposed in this manuscript. Additional samples as needed were generated via SRSWR and saved to have the same data for all the procedures. Note that all the three sampling methodologies yielded the same result: that is, the active exercise group was partitioned as better than the control, while the passive and no exercise groups were partitioned as bad compared to the control, since the improvement was lower than 8 % . The sample size for these five methodologies is reported in Table 7. One will note that the sample size was somewhat larger for the non-parametric sequential procedure, and it increased further when the parameter α was increased. However, this was quite expected, since the data are normally distributed in this case, and the procedures based on normal distribution assumption are bound to perform better. Note that from the simulations, the true advantage of the non-parametric procedure is when the data are not normal and have long tails.

Author Contributions

Conceptualization, T.K.S.S. and J.Z.; formal analysis, T.K.S.S. and J.Z.; methodology, T.K.S.S. and J.Z.; writing—review and editing, T.K.S.S. and J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to thank the editor and two referees for their invaluable feedback.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Roessler, E.B. Testing the significance of observations compared with a control. Proc. Am. Soc. Hortic. Sci. 1946, 47, 249–251. [Google Scholar]
  2. Paulson, E. On the comparison of several experimental categories with a control. Ann. Math. Stat. 1952, 23, 239–246. [Google Scholar] [CrossRef]
  3. Dunnett, C.W. A multiple comparison procedure for comparing several treatments with a control. J. Am. Stat. Assoc. 1955, 50, 1096–1121. [Google Scholar] [CrossRef]
  4. Bechhofer, R.E. A single-sample multiple decision procedure for ranking means of normal populations with known variances. Ann. Math. Stat. 1954, 25, 16–39. [Google Scholar] [CrossRef]
  5. Gupta, S.S. On a Decision Rule for a Problem in Ranking Means. Ph.D. Thesis, University of North Carolina, Chapel Hill, NC, USA, 1956. [Google Scholar]
  6. Tong, Y.L. On partitioning a set of normal populations by their locations with respect to a control. Ann. Math. Stat. 1969, 40, 1300–1324. [Google Scholar] [CrossRef]
  7. Datta, S.; Mukhopadhyay, N. Second-order asymptotics for multistage methodologies in partitioning a set of normal populations having a common unknown variance. Stat. Decis. 1998, 16, 191–205. [Google Scholar] [CrossRef]
  8. Buzaianu, E.M. Selection among Bernoulli populations in comparison with a standard. Seq. Anal. 2019, 38, 184–198. [Google Scholar] [CrossRef]
  9. Solanky, T.K.S.; Zhou, J. A generalization of the partition problem. Seq. Anal. 2015, 34, 483–503. [Google Scholar] [CrossRef]
  10. Solanky, T.K.S. Second Order Asymptotics of a Fine-Tuned Purely Sequential Procedure for the Generalized Partition Procedure. Stat. Appl. 2021, 19, 401–415. [Google Scholar]
  11. Geertsema, J.C. Nonparametric Sequential Procedures for Selecting the Best of K Populations. J. Am. Stat. Assoc. 1972, 67, 614–616. [Google Scholar] [CrossRef]
  12. Mukhopadhyay, N.; Solanky, T.K.S. A nonparametric accelerated sequential procedure for selecting the largest center of symmetry. Nonparametric Stat. 1993, 3, 155–166. [Google Scholar] [CrossRef]
  13. Hodges, J.L.; Lehmann, E.L. Estimation of location based on ranks. Ann. Math. Stat. 1963, 34, 598–611. [Google Scholar] [CrossRef]
  14. Zelazo, P.R.; Zelazo, N.A.; Kolb, S. “Walking” in the Newborn. Science 1972, 176, 314–315. [Google Scholar] [CrossRef]
Table 1. Simulation results for normal distribution with σ = 1 .
Table 1. Simulation results for normal distribution with σ = 1 .
α δ n * n ¯ std n ¯ P ¯ std P ¯
0.750.4995052.0500.1430.8670.011
0.750.353100102.2980.1890.8700.011
0.750.250200202.5970.2630.8700.011
0.750.177400402.5070.3760.8770.010
0.750.125800803.6360.4920.8470.011
0.850.4995052.9580.1220.8650.011
0.850.353100103.0460.1800.8650.011
0.850.250200203.6380.2550.8550.011
0.850.177400403.3820.3650.8570.011
Table 2. Simulation results for Laplace distribution with λ = 2 2 .
Table 2. Simulation results for Laplace distribution with λ = 2 2 .
α δ n * n ¯ std n ¯ P ¯ std P ¯
0.750.3995055.5700.1830.9700.005
0.750.282100106.4860.2640.9780.005
0.750.199200206.2310.3510.9690.005
0.750.141400408.0600.5140.9750.005
0.750.099800808.3740.6870.9750.005
0.850.3995056.8720.1750.9760.005
0.850.282100107.6850.2440.9750.005
0.850.199200207.4810.3470.9780.005
0.850.141400409.5980.5050.9690.006
Table 3. Simulation results for T-distribution with d f = 5 .
Table 3. Simulation results for T-distribution with d f = 5 .
α δ n * n ¯ std n ¯ P ¯ std P ¯
0.750.5665052.9810.1590.8960.010
0.750.400100103.3580.2240.8980.010
0.750.283200202.9230.2690.8930.010
0.750.200400403.1290.4230.9010.009
0.850.5665054.4940.1470.9010.009
0.850.400100104.4880.2090.9090.009
0.850.283200204.6760.2930.9130.009
0.850.200400404.6600.4130.9180.009
0.900.5665054.6050.1440.9280.008
0.900.400100105.2420.2130.8930.010
0.900.283200204.8160.2800.9130.009
0.950.5665055.7690.1350.9290.008
0.950.400100105.9880.2080.9120.009
0.950.283200205.7990.2790.9260.008
Table 4. Simulation results for uniform distribution.
Table 4. Simulation results for uniform distribution.
α δ n * n ¯ std n ¯ P ¯ std P ¯
0.600.2825042.7920.5640.4870.016
0.600.199100104.7320.4090.5990.016
0.600.141200210.7470.2360.6210.015
0.750.2825056.7690.1170.6410.015
0.750.199100110.1060.1290.640.015
0.750.141200214.0450.1750.620.015
0.850.2825058.1220.0940.6530.015
0.850.199100111.6980.1140.6100.015
0.850.141200216.0710.1460.6040.015
0.990.2825063.7370.0700.7190.014
0.990.199100118.3740.0890.6480.015
0.990.141200224.7960.1190.6540.015
Table 5. Simulation results for mixture of two normal distributions: X = 0.35 N x 1 ; 0 , 1 + 0.65 N x 2 ; 0 , 2 .
Table 5. Simulation results for mixture of two normal distributions: X = 0.35 N x 1 ; 0 , 1 + 0.65 N x 2 ; 0 , 2 .
α δ n * n ¯ std n ¯ P ¯ std P ¯
0.750.8045052.8590.1620.9030.009
0.750.569100103.2430.2130.9050.009
0.750.402200203.9620.3030.9110.009
0.850.8045053.6850.1400.9160.007
0.850.569100104.2160.2160.9260.008
0.850.402200204.2050.2850.9120.009
0.900.8045054.8170.1430.9090.009
0.900.569100104.8230.2030.9020.009
0.900.402200204.9280.2900.9000.009
0.950.8045055.6760.1420.9280.008
0.950.569100105.8010.2020.9180.009
0.950.402200206.6010.2710.9130.009
Table 6. Simulation results for mixture of two normal distributions: X = 0.8 N x 1 ; 0 , 1 + 0.2 N x 2 ; 0 , 5 .
Table 6. Simulation results for mixture of two normal distributions: X = 0.8 N x 1 ; 0 , 1 + 0.2 N x 2 ; 0 , 5 .
α δ n * n ¯ std n ¯ P ¯ std P ¯
0.750.6785054.4240.1870.9520.007
0.750.480100104.5930.2590.9350.008
0.750.339200205.0310.3510.9320.008
0.850.6785055.8260.1690.9340.008
0.850.450100106.3340.2540.9260.008
0.850.339200206.5340.3320.9330.008
0.850.240400406.4970.4860.9420.007
0.900.6785056.7620.1770.9550.007
0.900.480100106.7460.2440.9240.008
0.900.339200207.8880.3510.9350.008
0.950.6785058.6710.1730.9590.006
0.950.480100108.7420.2350.9470.007
0.950.339200208.0190.3320.9310.008
Table 7. Comparison of various statistical methodologies.
Table 7. Comparison of various statistical methodologies.
ProcedureSample Size
Two-stage71
Purely Sequential66
Non-Parametric Sequential42 α = 0.75
52 α = 0.80
53 α = 0.85
60 α = 0.90
67 α = 0.95
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Solanky, T.K.S.; Zhou, J. A Non-Parametric Sequential Procedure for the Generalized Partition Problem. Mathematics 2024, 12, 591. https://doi.org/10.3390/math12040591

AMA Style

Solanky TKS, Zhou J. A Non-Parametric Sequential Procedure for the Generalized Partition Problem. Mathematics. 2024; 12(4):591. https://doi.org/10.3390/math12040591

Chicago/Turabian Style

Solanky, Tumulesh K. S., and Jie Zhou. 2024. "A Non-Parametric Sequential Procedure for the Generalized Partition Problem" Mathematics 12, no. 4: 591. https://doi.org/10.3390/math12040591

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop