Composite Estimators for the Population Mean Under Ranked Set Sampling

Mostafa, Sayed A.; Johnson, Thomas; Jennings, Sanford

doi:10.3390/math13193071

Open AccessFeature PaperArticle

Composite Estimators for the Population Mean Under Ranked Set Sampling

by

Sayed A. Mostafa

^*

,

Thomas Johnson III

and

Sanford Jennings

Department of Mathematics & Statistics, North Carolina A&T State University, Greensboro, NC 27411, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(19), 3071; https://doi.org/10.3390/math13193071

Submission received: 22 July 2025 / Revised: 17 September 2025 / Accepted: 19 September 2025 / Published: 24 September 2025

(This article belongs to the Special Issue Innovations in Survey Statistics and Survey Sampling)

Download

Browse Figures

Versions Notes

Abstract

Ranked Set Sampling (RSS) is an effective sampling technique, particularly when precise measurement of the study variable is costly or time-consuming, but ranking the units is relatively easy. Estimation of the finite population mean from RSS, with and without auxiliary information, has been widely studied, with estimators such as the RSS sample mean, ratio estimator, and regression estimators receiving considerable attention. While the RSS sample mean does not utilize auxiliary information, the ratio and regression estimators rely heavily on its quality. To address these limitations, this study proposes a shrinkage-type (composite) estimator for the finite population mean. The proposed estimator adaptively combines the RSS sample mean and the ratio estimator, leveraging auxiliary information when it is useful while maintaining robustness when it is not. We derive its statistical properties, including bias, variance, and mean squared error. Simulation studies demonstrate that the proposed estimator can outperform conventional estimators across a range of scenarios. We illustrate the method through a real data application.

Keywords:

composite estimators; ranked set sampling; ratio estimator; regression estimator; simple random sampling

MSC:

62D05

1. Introduction

Efficient sampling methods are essential in statistical analysis, especially when acquiring data is costly or time-consuming. In such contexts, alternative sampling designs that leverage auxiliary information or partial measurements can offer significant advantages. One such approach is Ranked Set Sampling (RSS), first introduced by McIntyre [1] in the early 1950s to estimate pasture yields in Australia. Since its inception, RSS has gained considerable attention due to its ability to increase estimation efficiency with minimal additional cost or effort, particularly in environments where precise measurement is costly, but ranking units is relatively easy [2]. RSS has been successfully applied in a variety of fields—including agriculture, environmental science, and economics—where it has outperformed Simple Random Sampling (SRS) in estimation efficiency.

To implement RSS, a total of

p^{2}

units are randomly selected from the population and divided into p sets, each containing p units. Within each set, the units are ranked based on visual inspection, expert judgment, concomitant variables, or other means that do not involve precise quantification. The process continues as follows: the smallest unit from the first set is measured, the second smallest unit from the second set is measured, and so forth, until the largest unit from the p-th set is measured. This procedure constitutes a single cycle of RSS, which can be repeated for m independent cycles to obtain a balanced RSS of size

n = p m

:

{Y_{[k] l} : k = 1, 2, \dots, p; l = 1, 2, \dots, m}

, where

Y_{[k] l}

denotes the k-th judgment order statistic of the units in the k-th set of the l-th cycle. The use of square brackets instead of parentheses signifies that ranking may be subject to error or imperfection. If ranking is perfect, the notation

Y_{(k) l}

may be used instead.

To illustrate the technique, in a balanced RSS with

p = 4

and

m = 2

, with ranking based on an auxiliary (or concomitant) variable X, the selection proceeds as follows:

Cycle 1:

$\begin{matrix} X_{(1) 1} \leq X_{(2) 1} \leq X_{(3) 1} \leq X_{(4) 1} & \Rightarrow (X_{(1) 1}, Y_{[1] 1}) \\ X_{(1) 1} \leq X_{(2) 1} \leq X_{(3) 1} \leq X_{(4) 1} & \Rightarrow (X_{(2) 1}, Y_{[2] 1}) \\ X_{(1) 1} \leq X_{(2) 1} \leq X_{(3) 1} \leq X_{(4) 1} & \Rightarrow (X_{(3) 1}, Y_{[3] 1}) \\ X_{(1) 1} \leq X_{(2) 1} \leq X_{(3) 1} \leq X_{(4) 1} & \Rightarrow (X_{(4) 1}, Y_{[4] 1}) \end{matrix}$
Cycle 2:

$\begin{matrix} X_{(1) 2} \leq X_{(2) 2} \leq X_{(3) 2} \leq X_{(4) 2} & \Rightarrow (X_{(1) 2}, Y_{[1] 2}) \\ X_{(1) 2} \leq X_{(2) 2} \leq X_{(3) 2} \leq X_{(4) 2} & \Rightarrow (X_{(2) 2}, Y_{[2] 2}) \\ X_{(1) 2} \leq X_{(2) 2} \leq X_{(3) 2} \leq X_{(4) 2} & \Rightarrow (X_{(3) 2}, Y_{[3] 2}) \\ X_{(1) 2} \leq X_{(2) 2} \leq X_{(3) 2} \leq X_{(4) 2} & \Rightarrow (X_{(4) 2}, Y_{[4] 2}) \end{matrix}$

producing a balanced RSS:

$\begin{matrix} s & = \{(X_{(1) 1}, Y_{[1] 1}), (X_{(2) 1}, Y_{[2] 1}), (X_{(3) 1}, Y_{[3] 1}), (X_{(4) 1}, Y_{[4] 1}), \\ (X_{(1) 2}, Y_{[1] 2}), (X_{(2) 2}, Y_{[2] 2}), (X_{(3) 2}, Y_{[3] 2}), (X_{(4) 2}, Y_{[4] 2})\} . \end{matrix}$

There is a vast body of literature on statistical inference using RSS, including parametric estimation of population parameters such as the mean (e.g., [3]), variance (e.g., [4]), and quantiles (e.g., [5]), and nonparametric estimation and hypothesis testing (e.g., [6,7,8,9,10]). For a comprehensive review, see [11].

In this paper, we consider the problem of estimating a finite population mean using RSS. Let

U = {1, 2, \dots, N}

denote a finite population of size N, where each unit

i \in U

is associated with a value

y_{i}

of a study variable Y, and a value

x_{i}

of an auxiliary variable X. The parameter of interest is the finite population mean:

\bar{Y} = \frac{1}{N} \sum_{i = 1}^{N} y_{i} .

The goal is to estimate

\bar{Y}

based on an RSS of size n. Numerous estimators have been proposed for this purpose, including the simple mean of the RSS ([12]), the ratio estimator ([13,14]), and the regression estimator ([15]).

In this study, we propose a shrinkage-type (composite) estimator for estimating the finite population mean in the context of RSS. This estimator builds upon the idea of combining two or more estimators using shrinkage weights, offering a compromise that can outperform the individual components under certain conditions. Specifically, we extend the work of Lui [16], who proposed a composite estimator that blends the simple mean and the ratio estimator under SRS. Lui [16] showed that the composite approach can offer improved performance when certain conditions are met. Their simulation study suggested that the SRS composite estimator could outperform the simple and ratio estimators individually, especially when the auxiliary variable is moderately to strongly correlated with the study variable. We extend and study this idea under RSS. We provide theoretical results that characterize the performance of the proposed estimator and explore its properties through simulation and real data analysis.

The remainder of this paper is organized as follows. Section 2 reviews the idea of composite estimators under SRS. Section 3 introduces the proposed composite estimator under RSS and derives its theoretical properties. Section 4 presents a simulation study investigating the finite sample performance of the proposed estimator relative to several other estimators. Section 5 illustrates the performance of the proposed estimator on a real dataset. Section 6 concludes the paper with a discussion of key findings and future research directions.

2. Composite Estimators Under SRS

In this section, we review the idea of composite estimators under SRS. The simplest estimator for the finite population mean under SRS is the simple mean estimator defined by [12]:

{\bar{y}}_{SRS} = \frac{1}{n} \sum_{i = 1}^{n} y_{i} .

The simple mean estimator is known to be unbiased and its variance, or, equivalently, its mean squared error (MSE), is given by [12], ch. 2:

MSE ({\bar{y}}_{SRS}) = Var ({\bar{y}}_{SRS}) = (1 - \frac{n}{N}) \frac{S_{y}^{2}}{n},

where

S_{y}^{2} = \frac{1}{N - 1} \sum_{i = 1}^{N} {(y_{i} - \bar{Y})}^{2}

is the variance of the study variable Y.

An alternative estimator that is commonly used to take advantage of available auxiliary information is the ratio estimator, which is defined as follows under SRS [12]:

{\bar{y}}_{rSRS} = \frac{{\bar{y}}_{SRS}}{{\bar{x}}_{SRS}} \bar{X} .

The approximate variance of the ratio estimator under SRS is given by [12], ch. 6:

\begin{matrix} Var ({\bar{y}}_{rSRS}) & ≃ & (1 - \frac{n}{N}) \frac{1}{n} [S_{y}^{2} + R^{2} S_{x}^{2} - 2 R S_{x y}] \\ = & (1 - \frac{n}{N}) \frac{1}{n} {\bar{Y}}^{2} [C_{y}^{2} + C_{x}^{2} - 2 ρ C_{x} C_{y}] . \end{matrix}

(1)

where

R = \frac{\bar{Y}}{X}

,

S_{x y} = \frac{1}{N - 1} \sum_{i = 1}^{N} (x_{i} - \bar{X}) (y_{i} - \bar{Y})

,

C_{z}^{2} = \frac{S_{z}^{2}}{Z^{2}}

is the squared population coefficient of variation of a generic variable Z, and

ρ = \frac{S_{x y}}{S_{x} S_{y}}

is the population Pearson’s correlation coefficient between X and Y. Under SRS, Lui [16] suggested combining the simple mean estimator and the ratio estimator into a composite estimator defined by

{\bar{y}}_{cSRS} = w {\bar{y}}_{SRS} + (1 - w) {\bar{y}}_{rSRS},

where

0 \leq w \leq 1

is an unknown weight parameter. Observing the fact that, since

{\bar{y}}_{SRS}

is unbiased,

| Bias ({\bar{y}}_{cSRS}) | = | E ({\bar{y}}_{cSRS} - \bar{Y}) | = | (1 - w) E ({\bar{y}}_{rSRS} - \bar{Y}) | \leq | E ({\bar{y}}_{rSRS} - \bar{Y}) | = | Bias ({\bar{y}}_{rSRS}) |

and the fact that the ratio estimator under SRS is asymptotically unbiased, ref. [16] derived the optimal weight by minimizing the variance of the composite estimator:

\begin{matrix} w_{SRS}^{*} & = & \frac{Var ({\bar{y}}_{rSRS}) - Cov ({\bar{y}}_{SRS}, {\bar{y}}_{rSRS})}{Var ({\bar{y}}_{rSRS}) + Var ({\bar{y}}_{SRS}) - 2 Cov ({\bar{y}}_{SRS}, {\bar{y}}_{rSRS})} \\ = & 1 - ρ C_{y} / C_{x}, \end{matrix}

(2)

where

Cov (U, V) = E [(U - E (U)) (V - E (V))]

. According to Lui [16], there are specific circumstances in which the composite estimator surpasses the ratio estimator under SRS. These conditions are related to the correlation between X and Y, as well as their coefficients of variation. In addition, a simulation study was carried out by Lui [16] to showcase the efficiency of the composite estimator compared to the simple mean estimator and the ratio estimator under SRS. In this study, we are building upon the research conducted by [16] and expanding it to the context of RSS. In the following section, we will review the existing estimators for the population mean under RSS and introduce our proposed composite estimator.

3. Composite Estimators Under RSS

3.1. SimpleMean Estimator Under RSS

The unbiased mean estimator under RSS is given by

\begin{matrix} {\bar{y}}_{RSS} = \frac{1}{p m} \sum_{k = 1}^{p} \sum_{l = 1}^{m} y_{[k] l} . \end{matrix}

(3)

Under RSS, the MSE of

{\bar{y}}_{RSS}

is given by Samawi and Muttlak [13] as follows:

MSE ({\bar{y}}_{RSS}) = \frac{1}{p m} (S_{y}^{2} - \frac{1}{p} \sum_{k = 1}^{p} τ_{y [k]}^{2}),

(4)

where

τ_{y [k]} = (μ_{y [k]} - \bar{Y})

, with

μ_{y [k]}

being the mean of the k-th order statistic of the study variable.

3.2. Ratio Estimator Under RSS

The conventional ratio estimator for the population mean

\bar{Y}

of the study variable, y, is established as a quotient of two auxiliary variables, namely, the total of the study variable in the sample and the corresponding total of a related variable in the population. This estimator is widely used in various research fields, including business and academia, as it provides a reliable and efficient approach for estimating population means.

The ratio estimator under RSS is given by

\begin{matrix} {\bar{y}}_{rRSS} & = \frac{{\bar{y}}_{RSS}}{{\bar{x}}_{RSS}} \bar{X}, \end{matrix}

(5)

where

{\bar{x}}_{RSS} = \frac{1}{p m} \sum_{k = 1}^{p} \sum_{l = 1}^{m} x_{(k) l} .

First-order approximations––assuming a large

n = p m

––of the bias and MSE of the ratio estimator in Equation (5) are given by Kadilar et al. [14]:

Bias ({\bar{y}}_{rRSS}) ≃ \bar{Y} [\frac{1}{p m} (C_{x}^{2} - C_{y x}) - (W_{x}^{2} - W_{y x})],

(6)

MSE ({\bar{y}}_{rRSS}) ≃ \frac{1}{p m} (S_{y}^{2} - 2 R S_{x y} + R^{2} S_{x}^{2}) - \frac{1}{p^{2} m} (\sum_{k = 1}^{p} τ_{y [k]}^{2} - 2 R \sum_{k = 1}^{p} τ_{y x (k)} + R^{2} \sum_{k = 1}^{p} τ_{x (k)}^{2}),

(7)

where

C_{y x} = ρ C_{y} C_{x} = \frac{S_{x y}}{\bar{Y} \bar{X}}, W_{x}^{2} = \frac{1}{p^{2} m {\bar{X}}^{2}} \sum_{k = 1}^{p} τ_{x (k)}^{2}, and W_{y x} = \frac{1}{p^{2} m \bar{X} \bar{Y}} \sum_{k = 1}^{p} τ_{y x (k)},

with

τ_{x (k)} = (μ_{x (k)} - \bar{X}), τ_{y x (k)} = (μ_{x (k)} - \bar{X}) (μ_{y [k]} - \bar{Y}), μ_{x (k)} = E (X_{(k)}), and μ_{y [k]} = E (Y_{[k]}) .

3.3. Proposed Composite Estimator

In this study, we propose the use of a composite estimator, which is defined as a mixture of the simple mean estimator and the ratio estimator, for estimating the finite population mean

\bar{Y}

under RSS. The estimator is defined as follows:

{\overset{\ddot{¯}}{y}}_{cRSS} = w {\bar{y}}_{RSS} + (1 - w) {\bar{y}}_{rRSS},

(8)

where

0 \leq w \leq 1

represents an unknown weight parameter. To derive the statistical properties of this estimator, we assume w to be fixed, and hence the use of the dot notation in

{\overset{\ddot{¯}}{y}}_{cRSS}

. Later in this section, we will derive the optimal weight and propose a corresponding estimator.

The following theorem and corollary summarize the statistical properties of the composite estimator in (8).

Theorem 1.

Under RSS, first-order approximations of the design-bias and variance of the composite estimator

{\overset{\ddot{¯}}{y}}_{cRSS}

are given by

\begin{matrix} Bias ({\overset{\ddot{¯}}{y}}_{cRSS}) & = & (1 - w) Bias ({\bar{y}}_{rRSS}) \\ ≃ & (1 - w) \bar{Y} [\frac{1}{p m} (C_{x}^{2} - C_{y x}) - (W_{x}^{2} - W_{y x})] \end{matrix}

(9)

and

\begin{matrix} Var ({\overset{\ddot{¯}}{y}}_{cRSS}) ≃ w^{2} A + {(1 - w)}^{2} B + 2 w (1 - w) C, \end{matrix}

(10)

where

\begin{matrix} A & : = & Var ({\bar{y}}_{R S S}) = \frac{1}{p m} (S_{y}^{2} - \frac{1}{p} \sum_{k = 1}^{p} τ_{y [k]}^{2}), \end{matrix}

(11)

\begin{matrix} B & : = & Var ({\bar{y}}_{r R S S}) \\ ≃ & \frac{1}{p m} (S_{y}^{2} - 2 R S_{y x} + R^{2} S_{x}^{2}) - \frac{1}{p^{2} m} (\sum_{k = 1}^{p} τ_{y [k]}^{2} - 2 R \sum_{k = 1}^{p} τ_{y x (k)} + R^{2} \sum_{k = 1}^{p} τ_{x (k)}^{2}) \\ - {\bar{Y}}^{2} {[\frac{1}{p m} (C_{x}^{2} - C_{y x}) - (W_{x}^{2} - W_{y x})]}^{2} \end{matrix}

(12)

and

\begin{matrix} C & : = & Cov ({\bar{y}}_{R S S}, {\bar{y}}_{r R S S}) = \frac{1}{p m} (S_{y}^{2} - \frac{1}{p} \sum_{k = 1}^{p} τ_{y [k]}^{2}) - R \frac{1}{p m} (S_{y x}^{2} - \frac{1}{p} \sum_{k = 1}^{p} τ_{y x (k)}^{2}) . \end{matrix}

(13)

Proof.

We start with the bias statement. First, notice that

\begin{matrix} E ({\overset{\ddot{¯}}{y}}_{cRSS}) & = & w E ({\bar{y}}_{RSS}) + (1 - w) E ({\bar{y}}_{rRSS}) \\ = & w \bar{Y} + (1 - w) E ({\bar{y}}_{rRSS}) . \end{matrix}

Therefore,

\begin{matrix} Bias ({\overset{\ddot{¯}}{y}}_{cRSS}) & = & E ({\overset{\ddot{¯}}{y}}_{cRSS}) - \bar{Y} \\ = & w \bar{Y} + (1 - w) E ({\bar{y}}_{rRSS}) - (w \bar{Y} + (1 - w) \bar{Y}) \\ = & (1 - w) [E ({\bar{y}}_{rRSS}) - \bar{Y}] \\ = & (1 - w) Bias ({\bar{y}}_{rRSS}) \\ ≃ & (1 - w) \bar{Y} [\frac{1}{p m} (C_{x}^{2} - C_{y x}) - (W_{x}^{2} - W_{y x})], \end{matrix}

where the last equality follows from Equation (6).

Next, we derive the variance statement. Observe that

\begin{matrix} Var ({\overset{\ddot{¯}}{y}}_{cRSS}) & = & w^{2} Var ({\bar{y}}_{RSS}) + {(1 - w)}^{2} Var ({\bar{y}}_{rRSS}) + 2 w (1 - w) Cov ({\bar{y}}_{RSS}, {\bar{y}}_{rRSS}) \\ = & w^{2} A + {(1 - w)}^{2} B + 2 w (1 - w) C . \end{matrix}

Now, A in Equation (11) follows directly from Equation (4), whereas B in Equation (12) follows from combining Equations (6) and (7) into

B = Var ({\bar{y}}_{rRSS}) = MSE ({\bar{y}}_{rRSS}) - {Bias}^{2} ({\bar{y}}_{rRSS})

. Finally,

\begin{matrix} C & = & Cov ({\bar{y}}_{RSS}, {\bar{y}}_{rRSS}) \\ = & E [({\bar{y}}_{RSS} - \bar{Y}) ({\bar{y}}_{rRSS} - \bar{Y})] \\ = & E [({\bar{y}}_{RSS} - \bar{Y}) (\frac{{\bar{y}}_{RSS}}{{\bar{x}}_{RSS}} \bar{X} - \bar{Y})] \\ = & E [({\bar{y}}_{RSS} - \bar{Y}) \frac{\bar{X}}{{\bar{x}}_{RSS}} ({\bar{y}}_{RSS} - \frac{\bar{Y}}{\bar{X}} {\bar{x}}_{RSS})] \\ ≃ & E [({\bar{y}}_{RSS} - \bar{Y}) \cdot 1 \cdot ({\bar{y}}_{RSS} - R {\bar{x}}_{RSS})] \\ = & E [({\bar{y}}_{RSS} - \bar{Y}) ({{\bar{y}}_{RSS} - \bar{Y}} - R {{\bar{x}}_{RSS} - \bar{X}})] \\ = & E [{({\bar{y}}_{RSS} - \bar{Y})}^{2}] - E [R ({\bar{x}}_{RSS} - \bar{X}) ({\bar{y}}_{RSS} - \bar{Y})] \\ = & Var ({\bar{y}}_{RSS}) - R Cov ({\bar{x}}_{RSS}, {\bar{y}}_{RSS}) \\ = & \frac{1}{p m} (S_{y}^{2} - \frac{1}{p} \sum_{k = 1}^{p} τ_{y [k]}^{2}) - R \frac{1}{p m} (S_{y x}^{2} - \frac{1}{p} \sum_{k = 1}^{p} τ_{y x (k)}^{2}) . \end{matrix}

The proof is complete. □

Corollary 1.

Under RSS, the first-order approximation of the MSE of the composite estimator

{\overset{\ddot{¯}}{y}}_{cRSS}

is given by

\begin{matrix} MSE ({\overset{\ddot{¯}}{y}}_{cRSS}) & ≃ & w^{2} \frac{1}{p m} (S_{y}^{2} - \frac{1}{p} \sum_{k = 1}^{p} τ_{y [k]}^{2}) \\ + {(1 - w)}^{2} [\frac{1}{p m} (S_{y}^{2} - 2 R S_{y x} + R^{2} S_{x}^{2}) \\ - \frac{1}{p^{2} m} (\sum_{k = 1}^{p} τ_{y [k]}^{2} - 2 R \sum_{k = 1}^{p} τ_{y x (k)} + R^{2} \sum_{k = 1}^{p} τ_{x (k)}^{2})] \\ + 2 w (1 - w) [\frac{1}{p m} (S_{y}^{2} - \frac{1}{p} \sum_{k = 1}^{p} τ_{y [k]}^{2}) - R \frac{1}{p m} (S_{y x}^{2} - \frac{1}{p} \sum_{k = 1}^{p} τ_{y x (k)}^{2})] \\ = & w^{2} A + {(1 - w)}^{2} D + 2 w (1 - w) C, \end{matrix}

(14)

where

\begin{matrix} D : = [\frac{1}{p m} (S_{y}^{2} - 2 R S_{y x} + R^{2} S_{x}^{2}) - \frac{1}{p^{2} m} (\sum_{k = 1}^{p} τ_{y [k]}^{2} - 2 R \sum_{k = 1}^{p} τ_{y x (k)} + R^{2} \sum_{k = 1}^{p} τ_{x (k)}^{2})] . \end{matrix}

(15)

3.4. Composite Estimator with Optimal Weight

Let

w^{*}

represent the optimal weight that minimizes the MSE of the composite estimator in Equation (14). The proposed composite estimator with optimal weight under RSS is given by

\begin{matrix} {\bar{y}}_{cRSS}^{*} = w^{*} {\bar{y}}_{RSS} + (1 - w^{*}) {\bar{y}}_{rRSS} . \end{matrix}

(16)

Theorem 2.

The optimal weight

w^{*}

that minimizes the (approximate) design-based MSE of the composite estimator under RSS is given by

\begin{matrix} w^{*} = \frac{D - C}{A + D - 2 C}, \end{matrix}

(17)

and the corresponding minimum MSE is given by

\begin{matrix} MSE ({\bar{y}}_{cRSS}^{*}) = \frac{A {(D - C)}^{2} + D {(A - C)}^{2} + 2 C (D - C) (A - C)}{{(A + D - 2 C)}^{2}}, \end{matrix}

(18)

where A, C, and D are given in Equation (11), Equation (13) and Equation (15), respectively.

Proof.

To find the weight w that minimizes

MSE = : f (w)

, we take the derivative of the MSE expression in Equation (14) with respect to w:

\begin{matrix} \frac{d}{d w} MSE ({\overset{\ddot{¯}}{y}}_{cRSS}) & = & \frac{d}{d w} (w^{2} A + {(1 - w)}^{2} D + 2 w (1 - w) C) \\ = & 2 w A - 2 D + 2 D w + 2 C - 4 w C \\ = & 2 w (A + D - 2 C) + 2 C - 2 D . \end{matrix}

Setting this derivative to zero gives the weight corresponding to the minimum MSE:

w^{*} = \frac{D - C}{A + D - 2 C} .

Substituting by

w^{*}

into Equation (14) gives the minimum MSE in Equation (18) after some basic algebra. □

3.5. Composite Estimator with Estimated Weight

Since the optimal weight

w^{*}

is unknown, we define the composite estimator with estimated weight as follows:

\begin{matrix} {\bar{y}}_{cRSS} = \hat{w} {\bar{y}}_{RSS} + (1 - \hat{w}) {\bar{y}}_{rRSS}, \end{matrix}

(19)

where the estimated optimal weight is given by

\begin{matrix} \hat{w} & = \frac{\hat{D} - \hat{C}}{\hat{A} + \hat{D} - 2 \hat{C}}, \end{matrix}

(20)

with

\hat{A} = \frac{1}{p m} ({\hat{S}}_{y}^{2} - \frac{1}{p} \sum_{k = 1}^{p} {\hat{τ}}_{y [k]}^{2}),

\hat{D} = \frac{1}{p m} ({\hat{S}}_{y}^{2} - 2 \hat{R} {\hat{S}}_{y x} + {\hat{R}}^{2} {\hat{S}}_{x}^{2}) - \frac{1}{p^{2} m} (\sum_{k = 1}^{p} {\hat{τ}}_{y [k]}^{2} - 2 \hat{R} \sum_{k = 1}^{p} {\hat{τ}}_{y x (k)} + {\hat{R}}^{2} \sum_{k = 1}^{p} {\hat{τ}}_{x (k)}^{2}),

\hat{C} = \hat{A} - \hat{R} \frac{1}{p m} ({\hat{S}}_{y x} - \frac{1}{p} \sum_{k = 1}^{p} {\hat{τ}}_{y x (k)}),

with

{\hat{S}}_{z}^{2} = \frac{1}{n - 1} \cdot \sum_{i = 1}^{n} {(z_{i} - {\bar{z}}_{R S S})}^{2},

{\hat{S}}_{y x} = \frac{1}{n - 1} \cdot \sum_{i = 1}^{n} (y_{i} - {\bar{y}}_{R S S}) (x_{i} - {\bar{x}}_{R S S}),

\hat{R} = \frac{{\bar{y}}_{R S S}}{{\bar{x}}_{R S S}},

{\hat{τ}}_{x (k)} = ({\hat{μ}}_{x (k)} - {\bar{x}}_{R S S}),

{\hat{τ}}_{y [k]} = ({\hat{μ}}_{y [k]} - {\bar{y}}_{R S S}),

and

{\hat{τ}}_{y x (k)} = ({\hat{μ}}_{x (k)} - {\bar{x}}_{R S S}) ({\hat{μ}}_{y [k]} - {\bar{y}}_{R S S}),

for

k = 1, 2, \dots, p

.

4. Simulation Study

In this section, we present the results of our simulation study. We aimed to evaluate the performance of various mean estimators under three different sampling techniques: SRS, RSS with perfect ranking, and RSS with imperfect ranking. Both versions of RSS were implemented using the RSSampling package ver. 1.0 in R [17].

4.1. Simulation Settings

4.1.1. Normal Case

To evaluate the effectiveness and robustness of the proposed estimators under normality, we considered a finite population of size

N = 10, 000

. The population means of the auxiliary variable X and the study variable Y were both set to 10. The coefficients of variation

C V_{x}

(

= : C_{x}

) and

C V_{y}

(

= : C_{y}

) were varied to reflect moderate-to-high variability. The correlation coefficients between X and Y were set to

ρ = 0.60

and

0.80

, representing moderate-to-strong positive associations where the ratio and regression estimators are known to perform well. Sample sizes of

n = 15

, 30, 45, and 60 were used to assess estimators’ performance under varying sampling rates. Under each of the resulting finite population and sample size combinations, we drew 10,000 samples using (1) SRS, (2) imperfect-ranking RSS where the ranking is based on the auxiliary variable X, and (3) perfect-ranking RSS where the ranking is based on the study variable Y.

4.1.2. Lognormal Case

To evaluate the estimators’ performance under non-normal, skewed data, we generated a finite population of size

N = 10, 000

from the lognormal distribution. The sample size settings were kept the same as those used for the normal case. The population means of the log-transformed variables U and V were set to 0, and their standard deviations were set to

0.5

and

1.5

, respectively. The correlation coefficient between U and V was set to

ρ = 0.60

and

0.80

, capturing moderate-to-strong linear associations on the logarithmic scale. Under each simulation scenario, we again drew 10,000 samples using (1) SRS, (2) imperfect-ranking RSS, and (3) perfect-ranking RSS.

4.2. Estimators

The following five estimators were compared in our simulations:

1: The composite estimator with optimal weight in Equation (16) (denoted by ${\bar{y}}_{c}^{*}$ );
2: The composite estimator with estimated weight in Equation (19) (denoted by ${\bar{y}}_{c}$ );
3: The simple mean estimator in Equation (3) (denoted by $\bar{y}$ );
4: The ratio estimator in Equation (5) (denoted by ${\bar{y}}_{r}$ );
5: The regression estimator (denoted by ${\bar{y}}_{reg}$ ), which is defined under RSS as follows:

${\bar{y}}_{regRSS} = {\bar{y}}_{RSS} + {\hat{β}}_{RSS} (\bar{X} - {\bar{x}}_{RSS}),$

(21)

where

${\hat{β}}_{RSS} = \frac{\sum_{k = 1}^{p} \sum_{l = 1}^{m} (x_{(k) l} - {\bar{x}}_{RSS}) (y_{[k] l} - {\bar{y}}_{RSS})}{\sum_{k = 1}^{p} \sum_{l = 1}^{m} {(x_{(k) l} - {\bar{x}}_{RSS})}^{2}},$

and its bias and variance are given in Theorem 1 of [15]. Similarly, under SRS, the regression estimator is defined as follows:

${\bar{y}}_{regSRS} = {\bar{y}}_{SRS} + {\hat{β}}_{SRS} (\bar{X} - {\bar{x}}_{SRS}),$

(22)

where

${\hat{β}}_{SRS} = \frac{\sum_{k = 1}^{n} (x_{k} - {\bar{x}}_{SRS}) (y_{k} - {\bar{y}}_{SRS})}{\sum_{k = 1}^{n} {(x_{k} - {\bar{x}}_{SRS})}^{2}},$

and its bias and variance are given by ([12], ch. 7).

4.3. Approximation of the Order Statistics Moments

Note that the calculation of the optimal (or estimated) weight for the composite RSS estimator requires the values (or estimates) of

μ_{x (k)}

, the first-order moment of the k-th order statistic. For the normal case, an approximation for these moments is given by [18]:

\begin{matrix} μ_{x (k)} & = & E [X_{(k)}] = μ_{x} + Φ^{- 1} (\frac{k - α}{p - 2 α + 1}) σ_{x}, \end{matrix}

where

E [X_{(k)}]

is the expected value of the k-th order statistic from a normal distribution in a sample of size p;

μ_{x}

and

σ_{x}

are the mean and the standard deviation of X;

Φ^{- 1} (\cdot)

is the inverse of the cumulative distribution function of the standard normal distribution; and

α

is a constant for correction, typically 0.375. This adjustment improves the accuracy of the approximation, especially for extreme ranks in small samples. A similar approximation was used for

μ_{y [k]}

. For the estimated weight

\hat{w}

, we use

{\hat{μ}}_{x (k)}

, which has the same form as

μ_{x (k)}

, but replaces

μ_{x}

and

σ_{x}

with the sample mean and sample standard deviation, respectively. This is also the case for

{\hat{μ}}_{y [k]}

.

The approximation for the lognormal case according to [19] is given by

\begin{matrix} μ_{x (k)} = \frac{p!}{(k - 1)! (p - k)!} \times \int_{0}^{\infty} {(Φ (\frac{log (z) - μ_{x}}{σ_{x}}))}^{k - 1} {(1 - Φ (\frac{log (z) - μ_{x}}{σ_{x}}))}^{p - k} ϕ (\frac{log (z) - μ_{x}}{σ_{x}}) d z, \end{matrix}

(23)

where

μ_{x}

and

σ_{x}

are the mean and the standard deviation of X, and the integral is evaluated using a standard numerical integration method implemented in the ‘integrate()’ function in R. A similar approximation was used for

μ_{y [k]}

.

4.4. Performance Metrics

Under each scenario, we calculated the Monte Carlo bias, variance, and MSE for each of the five estimators as follows:

Bias ({\bar{y}}_{est}) = \frac{1}{M} \sum_{j = 1}^{M} ({\bar{y}}_{est}^{j} - \bar{Y}),

Var ({\bar{y}}_{est}) = \frac{1}{M} \sum_{j = 1}^{M} {({\bar{y}}_{est}^{j} - \frac{1}{M} \sum_{k = 1}^{M} {\bar{y}}_{est}^{k})}^{2},

MSE ({\bar{y}}_{est}) = \frac{1}{M} \sum_{j = 1}^{M} {({\bar{y}}_{est}^{j} - \bar{Y})}^{2},

where the number of simulations M = 10,000.

4.5. Simulation Results

4.5.1. The Normal Case

In this section, we summarize the simulation results under the normal population scenario. Figure 1, Figure 2 and Figure 3 display the Monte Carlo distribution of five estimators under SRS, imperfect-ranking RSS, and perfect-ranking RSS, respectively. The MSE results for the five estimators are presented in Figure 4, Figure 5 and Figure 6. Additional simulation results, specifically bias and variance plots, for the normal case are deferred to Appendix A.1.

First, examining the sampling distributions of the estimators, we observe that in all three sampling designs, the ratio estimator (

{\bar{y}}_{r}

) consistently shows the largest interquartile range, longer whiskers, and more outliers, indicating lower efficiency and robustness, and this is especially evident under SRS; see Figure 1. On the other hand, the composite (

{\bar{y}}_{c}

and

{\bar{y}}_{c}^{*}

) and regression (

{\bar{y}}_{reg}

) estimators show the shortest interquartile ranges and the least outliers across most scenarios, indicating relatively higher efficacy than other estimators. Although the simple mean estimator (

\bar{y}

) has relatively wide interquartile ranges under SRS and imperfect-ranking RSS, it shows much-improved behavior under perfect-ranking RSS, as it benefits from improved ranking accuracy. As the sample size increases, the precision of all estimators improves, as shown by the tighter interquartile ranges. In general, all estimators demonstrate a good central tendency, as the centers of the boxplots are mostly aligned with the true mean line.

Figure 4, Figure 5 and Figure 6 illustrate the MSE performance of the five estimators under different sampling designs and parameter settings. Under all three sampling designs, the MSE decreases with increasing sample size for all estimators, supporting the consistency of these estimators. In the SRS setting (Figure 4), the simple mean (

\bar{y}

) and ratio (

{\bar{y}}_{r}

) estimators consistently exhibit the highest MSE values across all sample sizes, indicating lower efficiency. In contrast, the composite estimators (

{\bar{y}}_{c}

and

{\bar{y}}_{c}^{*}

) and the regression estimator (

{\bar{y}}_{reg}

) demonstrate notably lower MSE values, with the two composite estimators performing nearly identically and slightly outperforming the regression estimator. Under the imperfect-ranking RSS design (Figure 5), the simple mean and ratio estimators remain the least efficient across all conditions. The composite and regression estimators show substantial improvements in efficiency, with the two composite estimators again nearly indistinguishable and generally exhibiting the lowest MSEs. Under perfect-ranking RSS (Figure 6), the simple mean estimator shows superior relative performance for the lower-correlation setting (

ρ = 0.6

), especially at low-to-moderate sample sizes. Under this design, the ratio estimator has the highest MSE across all parameter values. The composite estimators consistently outperform other estimators under the high-correlation setting (

ρ = 0.8

), closely followed by the regression estimator.

4.5.2. The Lognormal Case

Figure 7, Figure 8 and Figure 9 present the sampling distributions of five mean estimators under lognormal populations across different sampling designs, correlation levels (

ρ = 0.6

and

ρ = 0.8

), and sample sizes (

n = 15, 30, 45, 60

). In the SRS setting (Figure 7), the composite estimators (

{\bar{y}}_{c}

and

{\bar{y}}_{c}^{*}

) exhibit strong performance, with tight interquartile ranges, relatively symmetric distributions, and limited outliers. The ratio estimator (

{\bar{y}}_{r}

) maintains a good central tendency, but with slightly broader variability. In contrast, the regression (

{\bar{y}}_{reg}

) and simple mean (

\bar{y}

) estimators show wider spreads and more frequent extreme values, especially for smaller sample sizes. Under the imperfect-ranking RSS design (Figure 8), all five estimators show similar behavior, with clear underestimation, as seen from the centers of all the boxplots falling below the true mean line. A similar pattern is observed under perfect-ranking RSS (Figure 9), with the regression estimator showing occasional negative outlying values and the simple mean estimator showing somewhat larger variability under the higher-correlation scenario.

Figure 10, Figure 11 and Figure 12 illustrate the MSE behavior of the five target estimators across varying sample sizes and correlation levels for lognormal populations. In the SRS setting (Figure 10), the regression (

{\bar{y}}_{reg}

) estimator consistently achieves the lowest MSE across all sample sizes and both correlation levels, especially under strong correlation (

ρ = 0.8

). The composite (

{\bar{y}}_{c}

and

{\bar{y}}_{c}^{*}

) and ratio (

{\bar{y}}_{r}

) estimators follow closely behind, showing comparable MSE performance at moderate correlation (

ρ = 0.6

), but lagging under the stronger-correlation (

ρ = 0.8

) scenario. The simple mean (

\bar{y}

) estimator consistently has the highest MSE values. Figure 11, which corresponds to the imperfect-ranking RSS design, shows a similar overall trend, but with subtle differences. The regression estimator remains dominant, especially under

ρ = 0.8

, showing the lowest MSE across all sample sizes. The ratio estimator follows in superiority, showing the second-lowest MSE values across scenarios. The composite estimators still perform well, but their advantage is less pronounced under the stronger-correlation and smaller-sample-size scenarios. The simple mean estimator again show relatively higher MSE values, remaining the least efficient. Under perfect ranking (Figure 12), while all estimators have similar MSE curves, the ratio estimator, followed by the composite estimator (

{\bar{y}}_{c}^{*}

), achieves the lowest MSE values at

ρ = 0.6

. When

ρ = 0.8

, the pattern closely replicates that observed under imperfect-ranking RSS.

Across all figures, the MSE decreases with increasing sample size for all estimators, reflecting improved estimation precision. These results highlight that while the regression estimator offers the best performance overall—particularly in the high-correlation setting—composite and ratio estimators provide competitive alternatives, especially as the ranking accuracy improves.

Additional simulation results, including bias and variance graphs under lognormal data, are located in Appendix A.2.

5. Real Data Application

For the real data application, we use a modified version of the longleaf pine dataset comprising

N = 396

trees, as analyzed by Jafari Jozani and Johnson [20]. The original dataset can be found in Chen et al. [21] and Platt et al. [22]. In this dataset, the variable X denotes tree diameter at breast height, while Y represents tree height [20]. To remove ties when ranking based on the X variable, a small amount of random noise was added to the X variable:

ε_{x} \sim N (0, 0.000001)

, recentered at zero. The Pearson correlation between the transformed X variable and the Y variable was

ρ = 0.91

. Considering the

N = 396

observations as the finite population, we drew 10,000 samples using each of three sampling designs—SRS, RSS with imperfect ranking (ranking based on X), and RSS with perfect ranking (ranking based on Y)—and computed each of the five estimators considered in the simulation study in the previous section. Goodness-of-fit tests—using the Shapiro—Wilk (‘sw’) option in the ‘gofTest()’ function of the EnvStats package ver. 3.1.0 in R [23]—confirmed that both X and Y can be modeled by the lognormal distribution, justifying the application of the lognormal approximation of the moments of sample order statistics (see Equation (23)) under this dataset.

Figure 13, Figure 14 and Figure 15 present the distribution of five mean estimators under different sampling designs applied to the longleaf pine dataset, with the blue horizontal line indicating the true population mean. In Figure 13 (SRS), the simple mean estimator (

\bar{y}

) exhibits the greatest variance, as its boxplots are widely spread. The ratio (

{\bar{y}}_{r}

) and composite (

{\bar{y}}_{c}

and

{\bar{y}}_{c}^{*}

) estimators perform best under SRS, showing less outliers and relatively low variance. The regression (

{\bar{y}}_{reg}

) estimator follows with slightly higher outlier presence, especially at small sample sizes. All estimators show low-to-no bias, as the boxplots are mostly centered at the true population mean. Similar patterns are observed under imperfect-ranking RSS (Figure 14), but with the composite estimators switching places with the regression estimators. Under perfect-ranking RSS (Figure 15), the simple mean estimator still shows the largest variability across different sample sizes, but it possesses the least bias overall. The ratio and regression estimators again show slightly lower variability than the composite estimators. However, the composite estimators are more centered around the true mean, indicating lower bias than the ratio and regression estimators.

Table 1 displays the MSE behavior of five estimators under the SRS and RSS designs. The bias and variance results are included in Appendix A.3. Across all three sampling designs, the simple mean estimator (

\bar{y}

) exhibits notably higher MSE values than the other estimators across all sample sizes, confirming its inefficiency. However, the impact of the sampling design on the simple mean estimator’s MSE is most evident, as we can clearly see lower MSE values under RSS than under SRS, and under perfect-ranking RSS than under imperfect-ranking RSS. The remaining four estimators have similar MSE values under SRS. The ratio (

{\bar{y}}_{r}

) and regression (

{\bar{y}}_{reg}

) estimators show superior performance under imperfect-ranking RSS across all sample sizes. The superiority of the ratio and regression estimators remains under perfect-ranking RSS, especially for the smaller sample sizes, with the composite estimators (

{\bar{y}}_{c}

and

{\bar{y}}_{c}^{*}

) approaching them for the larger sample sizes (

n = 45, 60

).

6. Discussion

In this study, we introduced a composite estimator for estimating the finite population mean under RSS. The estimator is defined as a weighted average of the simple mean and the ratio estimator. We derived its MSE and obtained the optimal weight that minimizes the MSE, along with a plug-in estimator for this optimal weight. We evaluated the performance of the composite estimator—both with the known optimal weight and the estimated (plug-in) weight—via Monte Carlo simulations using bivariate normal and lognormal populations, as well as a real finite population of longleaf pine trees. Comparisons were made against the simple mean, ratio, and regression estimators under SRS, imperfect-ranking RSS, and perfect-ranking RSS. The results indicate that the proposed composite estimator is highly efficient, particularly for moderate-to-large sample sizes, even in cases where the simple mean and ratio estimators perform poorly. The regression estimator consistently emerged as a strong competitor across all settings. For all three designs, the performance of the composite, ratio, and regression estimators improved as the correlation between the study and auxiliary variables increased. This trend also benefited the simple mean estimator under imperfect-ranking RSS, as higher correlation improved ranking accuracy. These findings were consistent across normal, lognormal, and real data scenarios. Although the plug-in version of the composite estimator performed well, its efficiency may be further enhanced through refinements in estimating the optimal weight. A key step in this process involves estimating the moments of the order statistics,

μ_{x (k)}

and

μ_{y [k]}

, which we approximated using known formulas for normal and lognormal distributions. Similar approximations exist for other distributions and could be explored in future work. Another promising direction is extending composite estimators to settings with multivariate auxiliary variables, where one or more variables could be used for ranking, estimation, or both. Such generalizations could further enhance the utility and flexibility of composite estimators for survey data.

Author Contributions

Conceptualization, S.A.M. and S.J.; methodology, S.A.M. and S.J.; software, S.A.M., T.J.III, and S.J.; formal analysis, S.A.M. and S.J.; investigation, S.A.M., T.J.III, and S.J.; resources, S.A.M.; data curation, T.J.III; writing—original draft preparation, T.J.III and S.J.; writing—review and editing, S.A.M., T.J.III, and S.J.; supervision, S.A.M.; project administration, S.A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work of Thomas Johnson III was funded by the North Carolina A&T State University Chancellor’s Distinguished Fellowship, a Title III HBGI grant from the U.S. Department of Education.

Data Availability Statement

The original data presented in the study are openly available in [5].

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Additional Simulation Results

Appendix A.1. The Normal Case

Figure A1. Bias of five mean estimators under SRS from normal population.

Figure A2. Bias of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from normal population.

Figure A2. Bias of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from normal population.

Figure A3. Bias of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from normal population.

Figure A3. Bias of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from normal population.

Figure A4. Variance of five mean estimators under SRS from normal population.

Figure A5. Variance of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from normal population.

Figure A5. Variance of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from normal population.

Figure A6. Variance of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from normal population.

Figure A6. Variance of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from normal population.

Appendix A.2. The Lognormal Case

Figure A7. Bias of five mean estimators under SRS from lognormal population.

Figure A8. Bias of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from lognormal population.

Figure A8. Bias of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from lognormal population.

Figure A9. Bias of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from lognormal population.

Figure A9. Bias of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from lognormal population.

Figure A10. Variance of five mean estimators under SRS from lognormal population.

Figure A11. Variance of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from lognormal population.

Figure A11. Variance of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from lognormal population.

Figure A12. Variance of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from lognormal population.

Figure A12. Variance of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from lognormal population.

Appendix A.3. Real Data Application

Figure A13. Bias of five mean estimators under SRS from longleaf pine dataset.

Figure A14. Bias of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from longleaf pine dataset.

Figure A14. Bias of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from longleaf pine dataset.

Figure A15. Bias of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from longleaf pine dataset.

Figure A15. Bias of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from longleaf pine dataset.

Figure A16. Variance of five mean estimators under SRS from longleaf pine dataset.

Figure A17. Variance of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from longleaf pine dataset.

Figure A17. Variance of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from longleaf pine dataset.

Figure A18. Variance of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from longleaf pine dataset.

Figure A18. Variance of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from longleaf pine dataset.

References

McIntyre, G.A. A method for unbiased selective sampling using ranked sets. Aust. J. Agric. Res. 1952, 3, 385–390. [Google Scholar] [CrossRef]
Wolfe, D.A. Ranked set sampling: An approach to more efficient data collection. Stat. Sci. 2004, 19, 636–643. [Google Scholar] [CrossRef]
Takahasi, K.; Wakimoto, K. On unbiased estimates of the population mean based on the sample stratified by means of ordering. Ann. Inst. Stat. Math. 1968, 20, 1–31. [Google Scholar] [CrossRef]
Stokes, S.L. Estimation of Variance Using Judgment Ordered Ranked Set Samples. Biometrics 1980, 36, 35–42. [Google Scholar] [CrossRef]
Chen, Z. On ranked-set sample quantiles and their applications. J. Stat. Plan. Inference 2000, 83, 125–135. [Google Scholar] [CrossRef]
Stokes, S.L.; Sager, T.W. Characterization of a Ranked-Set Sample with Application to Estimating Distribution Functions. J. Am. Stat. Assoc. 1988, 83, 374–381. [Google Scholar] [CrossRef]
Bohn, L.L.; Wolfe, D.A. Nonparametric Two-Sample Procedures for Ranked-Set Samples Data. J. Am. Stat. Assoc. 1992, 87, 552–561. [Google Scholar] [CrossRef]
Hettmansperger, T.P. The ranked-set sample sign test. J. Nonparametric Stat. 1995, 4, 263–270. [Google Scholar] [CrossRef]
Koti, K.M.; Jogesh Babu, G. Sign test for ranked-set sampling. Commun. Stat.-Theory Methods 1996, 25, 1617–1630. [Google Scholar] [CrossRef]
Chen, Z. Density estimation using ranked-set sampling data. Environ. Ecol. Stat. 1999, 6, 135–146. [Google Scholar] [CrossRef]
Wolfe, D.A. Ranked set sampling. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 460–466. [Google Scholar] [CrossRef]
Cochran, W.G. Sampling Techniques, 3rd ed.; Wiley: New York, NY, USA, 1977. [Google Scholar] [CrossRef]
Samawi, H.M.; Muttlak, H.A. Estimation of ratio using rank set sampling. Biom. J. 1996, 38, 753–764. [Google Scholar] [CrossRef]
Kadilar, C.; Unyazici, Y.; Cingi, H. Ratio estimator for the population mean using ranked set sampling. Stat. Pap. 2009, 50, 301–309. [Google Scholar] [CrossRef]
Philip, L.; Lam, K. Regression estimator in ranked set sampling. Biometrics 1997, 53, 1070–1080. [Google Scholar] [CrossRef]
Lui, K.J. Notes on Use of the Composite Estimator: An Improvement of the Ratio Estimator. J. Off. Stat. 2020, 36, 137–149. [Google Scholar] [CrossRef]
Sevinc, B.; Cetintav, B.; Esemen, M.; Gurler, S. RSSampling: Ranked Set Sampling, R package version 1.0; 2018. Available online: https://CRAN.R-project.org/package=RSSampling (accessed on 21 July 2025).
Royston, J.P. Algorithm AS 177: Expected Normal Order Statistics (Exact and Approximate). J. R. Stat. Society. Ser. C (Appl. Stat.) 1982, 31, 161–165. [Google Scholar] [CrossRef]
Nadarajah, S. Explicit expressions for moments of log normal order statistics. Econ. Qual. Control 2008, 23, 267–279. [Google Scholar] [CrossRef]
Jafari Jozani, M.; Johnson, B.C. Design based estimation for ranked set sampling in finite populations. Environ. Ecol. Stat. 2011, 18, 663–685. [Google Scholar] [CrossRef]
Chen, Z.; Bai, Z.; Sinha, B.K. Ranked Set Sampling: Theory and Applications; Springer: Berlin/Heidelberg, Germany, 2004; Volume 176. [Google Scholar] [CrossRef]
Platt, W.J.; Evans, G.W.; Rathbun, S.L. The population dynamics of a long-lived conifer (Pinus palustris). Am. Nat. 1988, 131, 491–525. [Google Scholar] [CrossRef]
Millard, S.P. EnvStats: An R Package for Environmental Statistics; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]

Figure 1. Distribution of five mean estimators under SRS from normal population.

Figure 2. Distribution of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from normal population.

Figure 2. Distribution of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from normal population.

Figure 3. Distribution of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from normal population.

Figure 3. Distribution of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from normal population.

Figure 4. MSE of five mean estimators under SRS from normal population.

Figure 5. MSE of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from normal population.

Figure 5. MSE of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from normal population.

Figure 6. MSE of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from normal population.

Figure 6. MSE of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from normal population.

Figure 7. Distribution of five mean estimators under SRS from lognormal population. All estimates with values ≤0 or ≥30 are excluded.

Figure 8. Distribution of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from lognormal population. All estimates with values ≥30 are excluded.

Figure 8. Distribution of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from lognormal population. All estimates with values ≥30 are excluded.

Figure 9. Distribution of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from lognormal population. All estimates with values ≥30 are excluded.

Figure 9. Distribution of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from lognormal population. All estimates with values ≥30 are excluded.

Figure 10. MSE of five mean estimators under SRS from lognormal population.

Figure 11. MSE of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from lognormal population.

Figure 11. MSE of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from lognormal population.

Figure 12. MSE of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from lognormal population.

Figure 12. MSE of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from lognormal population.

Figure 13. Distribution of five mean estimators under SRS from longleaf pine dataset.

Figure 14. Distribution of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from longleaf pine dataset.

Figure 14. Distribution of five mean estimators under imperfect-ranking RSS (fixing

p = 5

) from longleaf pine dataset.

Figure 15. Distribution of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from longleaf pine dataset.

Figure 15. Distribution of five mean estimators under perfect-ranking RSS (fixing

p = 5

) from longleaf pine dataset.

Table 1. MSE of five mean estimators under different sampling designs from longleaf pine dataset. For RSS, set size is fixed at

p = 5

.

Table 1. MSE of five mean estimators under different sampling designs from longleaf pine dataset. For RSS, set size is fixed at

p = 5

.

Sampling Method	Estimator	$n = 15$	$n = 30$	$n = 45$	$n = 60$
SRS	${\bar{y}}_{c}$	$42.42$	$19.68$	$12.57$	$8.948$
	${\bar{y}}_{c}^{*}$	$40.77$	$19.53$	$12.54$	$8.944$
	${\bar{y}}_{r}$	$40.77$	$19.53$	$12.54$	$8.944$
	${\bar{y}}_{reg}$	$44.33$	$19.33$	$12.17$	$8.530$
	$\bar{y}$	$212.71$	$99.80$	$63.40$	$45.346$
Imperfect-ranking RSS	${\bar{y}}_{c}$	$60.31$	$30.49$	$20.12$	$15.389$
	${\bar{y}}_{c}^{*}$	$62.30$	$31.06$	$20.37$	$15.534$
	${\bar{y}}_{r}$	$40.37$	$19.90$	$13.08$	$9.821$
	${\bar{y}}_{reg}$	$41.39$	$19.44$	$12.68$	$9.409$
	$\bar{y}$	$107.72$	$53.87$	$35.48$	$27.143$
Perfect-ranking RSS	${\bar{y}}_{c}$	$45.08$	$24.63$	$16.93$	$13.231$
	${\bar{y}}_{c}^{*}$	$51.91$	$26.21$	$17.55$	$13.576$
	${\bar{y}}_{r}$	$36.42$	$19.36$	$13.64$	$11.108$
	${\bar{y}}_{reg}$	$39.10$	$20.60$	$14.66$	$12.164$
	$\bar{y}$	$93.05$	$47.09$	$30.80$	$23.752$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mostafa, S.A.; Johnson, T., III; Jennings, S. Composite Estimators for the Population Mean Under Ranked Set Sampling. Mathematics 2025, 13, 3071. https://doi.org/10.3390/math13193071

AMA Style

Mostafa SA, Johnson T III, Jennings S. Composite Estimators for the Population Mean Under Ranked Set Sampling. Mathematics. 2025; 13(19):3071. https://doi.org/10.3390/math13193071

Chicago/Turabian Style

Mostafa, Sayed A., Thomas Johnson, III, and Sanford Jennings. 2025. "Composite Estimators for the Population Mean Under Ranked Set Sampling" Mathematics 13, no. 19: 3071. https://doi.org/10.3390/math13193071

APA Style

Mostafa, S. A., Johnson, T., III, & Jennings, S. (2025). Composite Estimators for the Population Mean Under Ranked Set Sampling. Mathematics, 13(19), 3071. https://doi.org/10.3390/math13193071

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Composite Estimators for the Population Mean Under Ranked Set Sampling

Abstract

1. Introduction

2. Composite Estimators Under SRS

3. Composite Estimators Under RSS

3.1. SimpleMean Estimator Under RSS

3.2. Ratio Estimator Under RSS

3.3. Proposed Composite Estimator

3.4. Composite Estimator with Optimal Weight

3.5. Composite Estimator with Estimated Weight

4. Simulation Study

4.1. Simulation Settings

4.1.1. Normal Case

4.1.2. Lognormal Case

4.2. Estimators

4.3. Approximation of the Order Statistics Moments

4.4. Performance Metrics

4.5. Simulation Results

4.5.1. The Normal Case

4.5.2. The Lognormal Case

5. Real Data Application

6. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Additional Simulation Results

Appendix A.1. The Normal Case

Appendix A.2. The Lognormal Case

Appendix A.3. Real Data Application

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI