A Novel Family of CDF Estimators Under PPS Sampling: Computational, Theoretical, and Applied Perspectives

Shah, Salman; Mahmoudi, Eisa; Iftikhar, Hasnain; Rodrigues, Paulo Canas; Gonzales Medina, Ronny Ivan; López-Gonzales, Javier Linkolk

doi:10.3390/axioms14110796

Open AccessArticle

A Novel Family of CDF Estimators Under PPS Sampling: Computational, Theoretical, and Applied Perspectives

by

Salman Shah

¹,

Eisa Mahmoudi

¹,

Hasnain Iftikhar

^2,3,*

,

Paulo Canas Rodrigues

⁴

,

Ronny Ivan Gonzales Medina

⁵ and

Javier Linkolk López-Gonzales

⁶

¹

Department of Statistics, Yazd University, Yazd 89158-18411, Iran

²

Department of Statistics, University of Peshawar, Peshawar 25120, Pakistan

³

Department of Statistics, Quaid-i-Azam University, Islamabad 45320, Pakistan

⁴

Department of Statistics, Federal University of Bahia, Salvador 40170-110, Brazil

⁵

Facultad de Ciencias e Ingenierías Físicas y Formales, Universidad Católica de Santa María, Arequipa 04013, Peru

⁶

Escuela de Posgrado, Universidad Peruana Unión, Lima 15024, Peru

^*

Author to whom correspondence should be addressed.

Axioms 2025, 14(11), 796; https://doi.org/10.3390/axioms14110796

Submission received: 14 September 2025 / Revised: 18 October 2025 / Accepted: 23 October 2025 / Published: 29 October 2025

(This article belongs to the Special Issue Computational Statistics and Its Applications, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Accurate estimation of population distribution characteristics is a fundamental task in survey sampling and statistical inference. This paper introduces a new family of estimators for the cumulative distribution function (CDF) under probability proportional to size (PPS) sampling, incorporating auxiliary information to enhance efficiency. The proposed approach employs dual auxiliary variables in the estimation phase, while the sampling design relies on a single auxiliary variable. Theoretical properties, including bias and mean squared error (MSE), are rigorously derived to establish the efficiency of the new class. An extensive empirical evaluation using three distinct populations—fisheries data, wine chemistry data, and demographic records—demonstrates the superiority of the proposed estimators. In terms of accuracy, the best-performing proposed estimator achieves an MSE of 0.0012, compared to 0.0127 for the widely used GK estimator. Percentage relative efficiency (PRE) values further underscore these improvements, with gains ranging from 123% to over 328% across the three populations. Graphical comparisons confirm these trends, illustrating that the proposed estimators consistently dominate conventional approaches. Overall, the findings highlight both the theoretical soundness and practical utility of the proposed family, offering robust and computationally efficient improvements for CDF estimation in complex survey designs.

Keywords:

probability proportional to size (PPS) sampling; cumulative distribution function (CDF) estimation; auxiliary information; dual auxiliary variables; efficiency analysis; simulation study; computational statistics; survey sampling applications

MSC:

62D05; 62G30; 62G32; 62P20

1. Introduction

Accurate inference on population distribution features is central to survey statistics and numerous applied domains. When unit sizes exhibit pronounced heterogeneity, probability proportional to size (PPS) sampling [1,2] delivers substantial efficiency gains over equal-probability designs by prioritizing information-rich units. Beyond totals or means, reliable inference on the cumulative distribution function (CDF) is crucial for quantile estimation, inequality assessment, risk profiling, and tail probability reporting in economics, health, and environmental monitoring [3,4]. However, standard distribution function estimators often suffer from inflated mean squared error (MSE) and diminished percentage relative efficiency (PRE) when size variability is pronounced [5,6,7,8,9].

The existing research has enriched distribution function estimation by leveraging auxiliary information, primarily under simple random sampling (SRS). Hartley–Ross-type approaches have been extended to handle nonresponse and robustness [10,11,12], while auxiliary-variable calibration has yielded efficient families of CDF estimators [13,14]. Within PPS designs, most attention has focused on mean estimation, where logarithmic, predictive, and regression-type estimators have been proposed to reduce bias, minimize MSE, and improve robustness [15,16,17,18,19]. Recent empirical work has further demonstrated the advantages of PPS-based mean estimators using real-world data, such as the radiation dataset [20]. Extensions to rank set sampling (RSS) have also shown promise in survey applications, offering lower variances and higher efficiencies than conventional designs [21]. Yet, despite these advances, a clear methodological gap persists in CDF estimation under PPS: an approach that integrates multiple auxiliary variables with tractable theoretical guarantees and extensive empirical validation. To address this gap, the present study develops a new class of CDF estimators explicitly designed for PPS sampling. At the design stage, one auxiliary variable governs the inclusion probabilities, while at the estimation stage, multiple auxiliary variables (up to three) are incorporated to enhance precision. The proposed estimators are computationally straightforward, admit closed-form first-order approximations for bias and MSE, and can be easily implemented within standard survey workflows.

Parallel to these traditional developments, advances in computational and statistical learning have introduced flexible, data-driven methodologies that align conceptually with PPS-based estimation principles [22,23,24]. For instance, Ref. [25] proposed a regression-based conditional independence test employing adaptive kernels, providing a mechanism for data-driven bandwidth selection that parallels efficiency optimization in PPS estimation. Similarly, Ref. [26] introduced an adaptive tempered reversible-jump algorithm for Bayesian curve fitting, demonstrating that flexible computational structures can enhance estimator stability and convergence. Furthermore, Ref. [27] developed a Gaussian kernel similarity approach for multisource information fusion, emphasizing the role of weighted auxiliary information in improving estimation accuracy. This idea resonates with our proposed dual-auxiliary-variable framework, wherein the inclusion of multiple auxiliary sources systematically reduces estimator bias and MSE. Likewise, Ref. [28] examined belief-based fuzzy and imprecise clustering for arbitrary data distributions, offering valuable insights into managing uncertainty and variability—issues central to complex survey designs.

Complementary to these methodological contributions, Ref. [29] investigated the dimensional efficiency of noncontrastive learning, offering computational perspectives on balancing model complexity and scalability—an aspect particularly relevant for large-scale survey datasets. In the same vein, Ref. [30] extended the Merton option-pricing framework to deposit insurance modeling, demonstrating that statistically rigorous models can address practical problems in economics and finance. Collectively, these studies highlight the convergence between traditional survey estimation and emerging computational paradigms. They underscore the growing emphasis on adaptability, efficiency, and scalability—principles directly motivating our proposed PPS-based distribution estimation framework. By aligning theoretical efficiency with adaptive computational techniques, the proposed work advances modern survey methodology and integrates it into the broader landscape of computational statistics.

This paper contributes to four major dimensions: First, it introduces a flexible family of PPS-based CDF estimators that combines multi-auxiliary calibration with interpretable association measures. Second, it derives analytical expressions for first-order bias and MSE, identifies conditions for estimator dominance, and provides optimal tuning strategies. Third, it demonstrates computational tractability, requiring only standard survey quantities easily implemented in existing pipelines. Fourth, it provides extensive numerical evidence from benchmark populations—fisheries, wine chemistry, and demographic data—where the proposed estimators achieve empirical MSEs as low as

0.0012

(versus

0.0127

for comparators) and PRE gains ranging from

123 %

to

328 %

.

The proposed methodology is broadly applicable to domains where PPS sampling is natural and CDF-based inference is decision-critical, such as income distribution analysis, poverty and inequality assessment, disease prevalence, hospital resource allocation, crop yield risk, deforestation monitoring, election polling, and transportation planning. In such contexts, unit sizes (e.g., household size, facility capacity, stand area, or farm acreage) align with PPS designs, while auxiliary data are increasingly accessible from administrative or satellite sources. By jointly leveraging PPS and auxiliary calibration, the proposed estimators offer enhanced accuracy, reduced sample size requirements, and broad practical relevance.

The remainder of the paper is organized as follows: Section 2 presents the proposed class of estimators within the PPS framework. Section 3 reports empirical and simulation results, including efficiency comparisons. Section 4 provides practical guidance and broader implications, while Section 5 concludes the study.

2. Methodology

Let the finite population be denoted by

Φ = {1, 2, \dots, i, \dots, N}

, where N is the population size. For the

i^{th}

unit, let

y_{i}

denote the value of the study variable of interest, while

(x_{i}, z_{i}, q_{i})

represent the auxiliary variables associated with the same unit. We consider probability proportional to size (PPS) sampling, where the selection probability of the

i^{th}

unit is defined as

P_{i} = \frac{q_{i}}{\sum_{j = 1}^{N} q_{j}} .

To standardize the PPS design, we introduce transformed variables.

y_{i}^{*} = \frac{y_{i}}{N P_{i}}, x_{i}^{*} = \frac{x_{i}}{N P_{i}}, w_{i}^{*} = \frac{z_{i}}{N P_{i}} .

To characterize the cumulative distribution function (CDF) in the PPS framework, we define indicator functions for the medians of the transformed variables:

I (y_{i}^{*} \leq {\tilde{y}}^{*}), I (x_{i}^{*} \leq {\tilde{x}}^{*}), I (w_{i}^{*} \leq {\tilde{w}}^{*}),

where

{\tilde{y}}^{*}

,

{\tilde{x}}^{*}

, and

{\tilde{w}}^{*}

denote the population medians of

y_{i}^{*}

,

x_{i}^{*}

, and

w_{i}^{*}

, respectively.

The corresponding finite population distribution functions and their sample-based estimators are then given as

G_{y}^{*} = \frac{1}{N} \sum_{i = 1}^{N} I (y_{i}^{*} \leq {\tilde{y}}^{*}), {\hat{G}}_{y}^{*} = \frac{1}{n} \sum_{i = 1}^{n} I (y_{i}^{*} \leq {\tilde{y}}^{*}),

G_{x}^{*} = \frac{1}{N} \sum_{i = 1}^{N} I (x_{i}^{*} \leq {\tilde{x}}^{*}), {\hat{G}}_{x}^{*} = \frac{1}{n} \sum_{i = 1}^{n} I (x_{i}^{*} \leq {\tilde{x}}^{*}),

G_{w}^{*} = \frac{1}{N} \sum_{i = 1}^{N} I (w_{i}^{*} \leq {\tilde{w}}^{*}), {\hat{G}}_{w}^{*} = \frac{1}{n} \sum_{i = 1}^{n} I (w_{i}^{*} \leq {\tilde{w}}^{*}) .

The variability of these indicator functions across the population is captured through their variances:

S_{G_{y}^{*}}^{2} = \frac{1}{N} \sum_{i = 1}^{N} {(I (y_{i}^{*} \leq {\tilde{y}}^{*}) - G_{y}^{*})}^{2},

S_{G_{x}^{*}}^{2} = \frac{1}{N} \sum_{i = 1}^{N} {(I (x_{i}^{*} \leq {\tilde{x}}^{*}) - G_{x}^{*})}^{2},

S_{G_{w}^{*}}^{2} = \frac{1}{N} \sum_{i = 1}^{N} {(I (w_{i}^{*} \leq {\tilde{w}}^{*}) - G_{w}^{*})}^{2} .

The coefficients of variation are therefore defined as

C_{G_{y}^{*}} = \frac{S_{G_{y}^{*}}}{G_{y}^{*}}, C_{G_{x}^{*}} = \frac{S_{G_{x}^{*}}}{G_{x}^{*}}, C_{G_{w}^{*}} = \frac{S_{G_{w}^{*}}}{G_{w}^{*}} .

To account for dependence among the transformed indicators, we define population covariances:

S_{G_{y}^{*} G_{x}^{*}} = \frac{1}{N - 1} \sum_{i = 1}^{N} (I (y_{i}^{*} \leq {\tilde{y}}^{*}) - G_{y}^{*}) (I (x_{i}^{*} \leq {\tilde{x}}^{*}) - G_{x}^{*}),

S_{G_{y}^{*} G_{w}^{*}} = \frac{1}{N - 1} \sum_{i = 1}^{N} (I (y_{i}^{*} \leq {\tilde{y}}^{*}) - G_{y}^{*}) (I (w_{i}^{*} \leq {\tilde{w}}^{*}) - G_{w}^{*}),

S_{G_{x}^{*} G_{w}^{*}} = \frac{1}{N - 1} \sum_{i = 1}^{N} (I (x_{i}^{*} \leq {\tilde{x}}^{*}) - G_{x}^{*}) (I (w_{i}^{*} \leq {\tilde{w}}^{*}) - G_{w}^{*}) .

The corresponding correlation coefficients are

R_{G_{y}^{*} G_{x}^{*}} = \frac{S_{G_{y}^{*} G_{x}^{*}}}{S_{G_{y}^{*}} S_{G_{x}^{*}}}, R_{G_{y}^{*} G_{w}^{*}} = \frac{S_{G_{y}^{*} G_{w}^{*}}}{S_{G_{y}^{*}} S_{G_{w}^{*}}}, R_{G_{x}^{*} G_{w}^{*}} = \frac{S_{G_{x}^{*} G_{w}^{*}}}{S_{G_{x}^{*}} S_{G_{w}^{*}}} .

To study the bias and mean squared error (MSE) properties of the estimators of

G_{y}^{*}

, we introduce relative error components:

ξ_{0} = \frac{{\hat{G}}_{y}^{*} - G_{y}^{*}}{G_{y}^{*}}, ξ_{1} = \frac{{\hat{G}}_{x}^{*} - G_{x}^{*}}{G_{x}^{*}}, ξ_{2} = \frac{{\hat{G}}_{w}^{*} - G_{w}^{*}}{G_{w}^{*}},

with expectations

E (ξ_{i}) = 0

, for

i = 0, 1, 2

.

The error terms are given by

\begin{matrix} E (ξ_{0}^{2}) & = λ {(C_{G_{y}^{*}})}^{2} = ψ_{0}, \\ E (ξ_{1}^{2}) & = λ {(C_{G_{x}^{*}})}^{2} = ψ_{1}, \\ E (ξ_{2}^{2}) & = λ {(C_{G_{w}^{*}})}^{2} = ψ_{2}, \\ E (ξ_{0} ξ_{1}) & = λ R_{G_{y}^{*} G_{x}^{*}} C_{G_{y}^{*}} C_{G_{x}^{*}} = ψ_{01}, \\ E (ξ_{0} ξ_{2}) & = λ R_{G_{y}^{*} G_{w}^{*}} C_{G_{y}^{*}} C_{G_{w}^{*}} = ψ_{02}, \\ E (ξ_{1} ξ_{2}) & = λ R_{G_{x}^{*} G_{w}^{*}} C_{G_{x}^{*}} C_{G_{w}^{*}} = ψ_{12}, \end{matrix}

where

λ = \frac{1}{n}

and n is the sample size.

The above framework provides the foundational notations for developing new classes of CDF estimators under PPS sampling. The inclusion of auxiliary information through dual variables (

x_{i}, z_{i}

) facilitates efficiency gains in finite-sample inference. From an application perspective, these formulations are directly applicable in survey settings where unit sizes are heterogeneous (e.g., household income surveys, enterprise-level trade statistics, or agricultural crop yield studies). The subsequent sections will utilize these expressions to derive bias, MSE, and efficiency results, and to demonstrate their empirical advantages through both simulations and real-world datasets.

On the other hand, we construct several standard estimators for the finite population cumulative distribution function (CDF) under probability proportional to size (PPS) sampling. The biases and mean squared errors (MSEs) of these estimators are derived up to the first-order approximation. The existing estimators considered are as follows:

(1): Consider the traditional mean estimator for the finite population CDF:

$\hat{G} (y^{*}) = \frac{1}{n} \sum_{i = 1}^{n} I (y_{i}^{*} \leq {\tilde{y}}^{*}) .$

(1)

The variance of

\hat{G_{y^{*}}}

is given by

V (\hat{G} (y^{*})) = {G_{y}^{*}}^{2} ψ_{0} .

(2)

(2): Ref. [31] The ratio estimator under PPS sampling is defined as

${\hat{G}}_{R} (y^{*}) = {\hat{G}}_{y}^{*} (\frac{G_{x}^{*}}{{\hat{G}}_{x}^{*}}) .$

(3)

The bias and mean squared error (MSE) of

{\hat{G}}_{R} (y^{*})

are given by

Bias ({\hat{G}}_{R} (y^{*})) = G_{y}^{*} (ψ_{0} - ψ_{01}),

and

MSE ({\hat{G}}_{R} (y^{*})) = {G_{y}^{*}}^{2} (ψ_{0} + ψ_{1} - 2 ψ_{01}) .

(4)

(3): Ref. [32] The Bahl and Tuteja ratio-type exponential estimator under PPS sampling is defined as

${\hat{G}}_{B T R} (y^{*}) = \hat{G} y^{*} exp (\frac{G_{x^{*}} - {\hat{G}}_{x^{*}}}{G_{x^{*}} + {\hat{G}}_{x^{*}}}) .$

(5)

The bias and mean squared error of this estimator are given by

Bias ({\hat{G}}_{B T R} (y^{*})) = G_{y}^{*} (\frac{3}{8} ψ_{1} - \frac{1}{2} ψ_{01}),

and

MSE ({\hat{G}}_{B T R} (y^{*})) = {G y^{*}}^{2} (ψ_{0} + \frac{1}{4} ψ_{1} - ψ_{01}) .

(6)

(4): The regression estimator for $G_{y^{*}}$ under PPS sampling is given by

{\hat{G}}_{R e g} (y^{*}) = {\hat{G}}_{y^{*}} + m (G_{x^{*}} - {\hat{G}}_{x^{*}}),

(7)

where m is an unknown parameter to be determined.

The minimum variance of

{\hat{G}}_{R e g} (y^{*})

is achieved at the optimal value

m_{opt} = \frac{G (y^{*}) ψ_{01}}{G (x^{*}) ψ_{0}},

and is given by

{Var}_{min} ({\hat{G}}_{R e g} (y^{*})) = \frac{G_{y^{*}}^{2} (ψ_{0} ψ_{1} - ψ_{01}^{2})}{ψ_{1}} .

This can also be expressed as

{Var}_{min} ({\hat{G}}_{R e g} (y^{*})) = G_{y^{*}}^{2} ψ_{0} (1 - \frac{ψ_{01}^{2}}{ψ_{0} ψ_{1}}) .

(8)

(5): Ref. [33] The Rao difference-type estimator for $G (y^{*})$ is defined as

{\hat{G}}_{R D} (y^{*}) = m_{1} {\hat{G}}_{y^{*}} + m_{2} (G_{x^{*}} - {\hat{G}}_{x^{*}}),

(9)

where

m_{1}

and

m_{2}

are unknown constants to be determined.

The bias and mean squared error (MSE) of

{\hat{G}}_{R D} (y^{*})

, up to the first order of approximation, are given by

Bias ({\hat{G}}_{R D} (y^{*})) = G_{y^{*}} (m_{1} - 1),

and

MSE ({\hat{G}}_{R D} (y^{*})) = G_{y^{*}}^{2} (1 - 2 m_{1} + m_{1}^{2}) + m_{1}^{2} G_{y^{*}}^{2} ψ_{1} .

The optimal values of

m_{1}

and

m_{2}

that minimize the MSE are

m_{1 (o p t)} = \frac{ψ_{1}}{ψ_{1} ψ_{0} - ψ_{01}^{2} + ψ_{1}},

m_{2 (o p t)} = \frac{G_{y^{*}} ψ_{0}}{G_{x^{*}} (ψ_{0} ψ_{1} - ψ_{01}^{2} + ψ_{1})} .

At these optimum values, the minimum mean squared error is

{MSE}_{m i n} ({\hat{G}}_{R D} (y^{*})) = \frac{G_{y^{*}}^{2} (ψ_{0} ψ_{1} - ψ_{01}^{2})}{ψ_{0} ψ_{1} - ψ_{01}^{2} + ψ_{1}} .

(10)

(6): Ref. [34] Grover and Kaur introduced a generalized class of exponential estimators for $G (y^{*})$ , defined as

${\hat{G}}_{G K} (y^{*}) = \{m_{3} {\hat{G}}_{y^{*}} + m_{4} (G_{x^{*}} - {\hat{G}}_{x^{*}})\} exp (\frac{a (G_{x^{*}} - {\hat{G}}_{x^{*}})}{a (G_{x^{*}} + {\hat{G}}_{x^{*}}) + 2 b}),$

(11)

where $m_{3}$ and $m_{4}$ are unknown constants.

The bias of

{\hat{G}}_{G K} (y^{*})

, up to the first order of approximation, is

Bias ({\hat{G}}_{G K} (y^{*})) = G_{y^{*}} (m_{3} - 1) + \frac{3}{8} θ^{2} m_{3} G_{y^{*}} + \frac{1}{2} θ m_{4} G_{x^{*}} V_{1} - \frac{1}{2} θ G_{y^{*}} ψ_{01} .

The mean squared error (MSE) is given by

\begin{matrix} MSE ({\hat{G}}_{G K} (y^{*})) & = m_{4}^{2} G_{x^{*}}^{2} ψ_{1} + m_{3}^{2} G_{y^{*}}^{2} ψ_{0} + 2 θ m_{3} m_{4} G_{y^{*}} G_{x^{*}} ψ_{1} \\ - 2 m_{3} m_{4} G_{y^{*}} G_{x^{*}} ψ_{01} + G_{y^{*}}^{2} - 2 m_{3} G_{y^{*}}^{2} + θ m_{3}^{2} G_{y^{*}}^{2} \\ + m_{3} G_{y^{*}}^{2} ψ_{01} - θ m_{4} G_{y^{*}} G_{x^{*}} ψ_{1} - 2 θ m_{3}^{2} G_{y^{*}}^{2} ψ_{01} \\ + \frac{3}{4} θ^{2} m_{3} G_{y^{*}}^{2} ψ_{1} + θ^{2} m_{3}^{2} G_{y^{*}}^{2} ψ_{1} . \end{matrix}

The optimal values of

m_{3}

and

m_{4}

are

m_{3 (o p t)} = \frac{ψ_{1} (θ^{2} ψ_{1} - 8)}{8 (- ψ_{0} ψ_{1} + ψ_{01}^{2} - ψ_{1})},

m_{4 (o p t)} = \frac{G_{y^{*}} (θ^{2} ψ_{1}^{2} - θ^{2} ψ_{1} ψ_{01} + 4 θ ψ_{0} ψ_{1} - 4 θ ψ_{01}^{2} - 4 θ ψ_{1} + 8 ψ_{01})}{8 G_{x^{*}} (ψ_{0} ψ_{1} - ψ_{01}^{2} + ψ_{1})} .

The minimum MSE of

{\hat{G}}_{G K} (y^{*})

at these optimal values is

{MSE}_{m i n} ({\hat{G}}_{G K} (y^{*})) = \frac{G_{y^{*}}^{2}}{64} (64 - 16 θ^{2} ψ_{1} - \frac{ψ_{1} {(- 8 + θ^{2} ψ_{1})}^{2}}{ψ_{1} (1 + ψ_{0}) - ψ_{01}^{2}}) .

(12)

2.1. Proposed Family of Estimators Under PPS Sampling

In this subsection, we propose a novel family of estimators for the finite population distribution function (DF) within the framework of probability proportional to size (PPS) sampling. This development is motivated by the need to construct efficient estimators that simultaneously exploit auxiliary information and maintain computational tractability, thereby aligning with both theoretical contributions and practical applications in large-scale survey sampling and computational statistics.

We define the proposed class of estimators as

\begin{matrix} {\hat{G}}_{SM} (y^{*}) = [ & {\hat{G}}_{y^{*}} + β_{1} (G_{x^{*}} - {\hat{G}}_{x^{*}}) + β_{2} (G_{w^{*}} - {\hat{G}}_{w^{*}})] \\ \times {[exp (\frac{G_{x^{*}} - {\hat{G}}_{x^{*}}}{G_{x^{*}} + {\hat{G}}_{x^{*}}})]}^{γ_{1}} \times {[exp (\frac{G_{w^{*}} - {\hat{G}}_{w^{*}}}{G_{w^{*}} + {\hat{G}}_{w^{*}}})]}^{γ_{2}}, \end{matrix}

(13)

where

{\hat{G}}_{SM} (y^{*})

denotes the proposed estimator of the population DF at

y^{*}

, while

x^{*}

and

w^{*}

represent transformed auxiliary points with known distribution functions

G_{x^{*}}

and

G_{w^{*}}

, respectively. The tuning parameters

β_{1}, β_{2}, γ_{1},

and

γ_{2}

are user-specified constants chosen to optimize estimator performance in terms of efficiency and robustness.

2.1.1. First-Order Approximation

Expanding the estimator using a first-order Taylor series expansion and introducing error terms

ξ_{0}, ξ_{1},

and

ξ_{2}

, we obtain

\begin{matrix} {\hat{G}}_{SM} (y^{*}) = & {G_{y^{*}} + G_{y^{*}} ξ_{0} - β_{1} G_{x^{*}} ξ_{1} - β_{2} G_{w^{*}} ξ_{2}} \\ \times [(1 - \frac{1}{2} γ_{1} ξ_{1} + \frac{1}{4} γ_{1} ξ_{1}^{2} + \frac{1}{8} γ_{1}^{2} ξ_{1}^{2}) (1 - \frac{1}{2} γ_{2} ξ_{2} + \frac{1}{4} γ_{2} ξ_{2}^{2} + \frac{1}{8} γ_{2}^{2} ξ_{2}^{2})] . \end{matrix}

(14)

By retaining terms up to the first order, the deviation of

{\hat{G}}_{SM} (y^{*})

from

G_{y^{*}}

is expressed as

\begin{matrix} {\hat{G}}_{SM} (y^{*}) - G_{y^{*}} & = G_{y^{*}} [ξ_{0} - \frac{1}{2} γ_{1} ξ_{1} - \frac{1}{2} γ_{2} ξ_{2}] - β_{1} G_{x^{*}} ξ_{1} - β_{2} G_{w^{*}} ξ_{2} + O (ξ^{2}) . \end{matrix}

(15)

2.1.2. Bias and Mean Squared Error

The bias of

{\hat{G}}_{SM} (y^{*})

, up to the first order, is given by

\begin{matrix} Bias ({\hat{G}}_{SM} (y^{*})) & = G_{y^{*}} (\frac{1}{4} γ_{1}^{2} ψ_{1} - \frac{1}{2} γ_{1} ψ_{01} - \frac{1}{2} γ_{2} ψ_{02} + \frac{1}{4} γ_{1} γ_{2} ψ_{12} + \frac{1}{8} γ_{2}^{2} ψ_{2}) \\ + G_{x^{*}} (\frac{1}{2} γ_{1} β_{1} ψ_{1} + \frac{1}{2} γ_{2} β_{1} ψ_{12}) + G_{w^{*}} (\frac{1}{2} γ_{1} ψ_{12} + \frac{1}{2} γ_{2} ψ_{2}) . \end{matrix}

(16)

Similarly, the mean squared error (MSE) of

{\hat{G}}_{SM} (y^{*})

is approximated by

\begin{matrix} MSE ({\hat{G}}_{SM} (y^{*})) & = \frac{1}{4} (γ_{1}^{2} ψ_{1} + (2 γ_{2} ψ_{12} - 4 ψ_{01}) γ_{1} + γ_{2}^{2} ψ_{2} - 4 γ_{2} ψ_{02} + 4 ψ_{0}) G_{y^{*}}^{2} \\ + β_{1}^{2} G_{x^{*}}^{2} ψ_{1} + β_{2}^{2} G_{w^{*}}^{2} ψ_{2} + 2 β_{1} β_{2} G_{x^{*}} G_{w^{*}} ψ_{12} . \end{matrix}

(17)

The values of

β_{1}

and

β_{2}

that minimize the MSE are obtained as

β_{1} = \frac{1}{2 G_{x^{*}} ψ_{1}} (- G_{y^{*}} γ_{1} ψ_{1} + G_{y^{*}} γ_{2} ψ_{12} - 2 G_{y^{*}} ψ_{01} + 2 G_{w^{*}} ψ_{12}),

(18)

β_{2} = - \frac{G_{y^{*}}}{2 G_{w^{*}} ψ_{2}} (γ_{1} ψ_{12} + γ_{2} ψ_{2} - 2 ψ_{02}) .

(19)

Substituting these into the MSE expression yields the minimized MSE:

\begin{matrix} {MSE}_{m i n} ({\hat{G}}_{SM} (y^{*})) = & G_{y^{*}}^{2} (ψ_{0} ψ_{1} ψ_{2} - ψ_{1} ψ_{02}^{2} - ψ_{2} ψ_{01}^{2} + γ_{1} ψ_{1} ψ_{02} + γ_{2} ψ_{2} ψ_{01} \\ - 4 γ_{1}^{2} ψ_{1} ψ_{12}^{2} - 2 γ_{1} γ_{2} ψ_{1} ψ_{2} ψ_{12} - γ_{2}^{2} ψ_{2} ψ_{12}^{2}) \\ + G_{y^{*}} G_{w^{*}}^{2} (- 4 γ_{1} ψ_{1} ψ_{2} ψ_{12} - 4 γ_{2} ψ_{2} ψ_{12} + 8 ψ_{2} ψ_{01} ψ_{12}) \\ - 4 G_{w^{*}}^{2} ψ_{2} ψ_{12}^{2} \end{matrix}

(20)

2.1.3. Theoretical and Application Relevance

Theoretically, this proposed family of estimators provides a flexible framework that generalizes ratio-type and exponential-type estimators under PPS sampling. The introduction of auxiliary parameters

(β_{1}, β_{2}, γ_{1}, γ_{2})

enables adaptability to different sampling designs and error structures.

From an application standpoint, the estimator is particularly suitable for survey settings where PPS designs are frequently employed, such as household income surveys, business establishment surveys, and agricultural statistics. By incorporating auxiliary information effectively, the estimator reduces bias and MSE, thereby ensuring more reliable estimation of population distribution functions in practice.

3. Results

In this section, we present the findings of our empirical and simulation-based investigations, focusing on efficiency comparisons across multiple estimator families. The results are structured into three complementary parts: efficiency comparison, empirical study, and simulation study.

3.1. Efficiency Comparison

To evaluate the theoretical performance of the proposed estimator

{\hat{G}}_{S M} (y^{*})

under the PPS framework, we compared its minimum mean square error (MSE) with that of the existing competing estimators. The efficiency conditions were established by showing that the proposed estimator consistently achieved a smaller MSE, thereby ensuring superior precision. Specifically, the following results hold:

1.: From (2) and (20),

$M S E_{min} ({\hat{G}}_{S M} (y^{*})) < V (\hat{G} (y^{*})),$

provided that

$V (\hat{G} (y^{*})) - M S E_{min} ({\hat{G}}_{S M} (y^{*})) > 0 .$

Proof.

$\begin{matrix} G_{y^{*}}^{2} (ψ_{0} ψ_{1} ψ_{2} - ψ_{1} ψ_{02}^{2} - ψ_{2} ψ_{01}^{2} + γ_{1} ψ_{1} ψ_{02} + γ_{2} ψ_{2} ψ_{01} \\ - 4 γ_{1}^{2} ψ_{1} ψ_{12}^{2} - 2 γ_{1} γ_{2} ψ_{1} ψ_{2} ψ_{12} - γ_{2}^{2} ψ_{2} ψ_{12}^{2}) \\ + G_{y^{*}} G_{w^{*}}^{2} (- 4 γ_{1} ψ_{1} ψ_{2} ψ_{12} - 4 γ_{2} ψ_{2} ψ_{12} + 8 ψ_{2} ψ_{01} ψ_{12}) \\ - 4 G_{w^{*}}^{2} ψ_{2} ψ_{12}^{2} < {G_{y}^{*}}^{2} ψ_{0} . \end{matrix}$

(21)

Simplifying the above expression we get,

$G_{y^{*}}^{2} A + G_{y^{*}} G_{w^{*}}^{2} B + G_{w^{*}}^{2} C < G_{y^{*}}^{2} ψ,$

(22)

where $A = ψ_{0} ψ_{1} ψ_{2} - ψ_{1} ψ_{02}^{2} - ψ_{2} ψ_{01}^{2} + γ_{1} ψ_{02}^{2} ψ_{1}^{2} + γ_{2 ψ_{01}} ψ_{1}^{2}$ …

$B = - 4 γ_{1} ψ_{12}^{2} ψ_{1}^{2} ψ_{2}^{2} - 4 γ_{2} ψ_{2}^{2} ψ_{12} + 8 ψ_{2} ψ_{01} ψ_{12},$

(23)

and

$C = - 4 ψ ψ_{12}^{2} .$

(24)

So, the dominancy of the proposed estimator holds iff $A < 0$ , $B < 0$ , and $C < 0$ . Then

$M S E_{min} ({\hat{G}}_{S M} (y^{*})) < V (\hat{G} (y^{*})) .$

(25)

□
2.: From (4) and (20),

$M S E_{min} ({\hat{G}}_{S M} (y^{*})) < M S E ({\hat{G}}_{R} (y^{*})),$

whenever

$M S E ({\hat{G}}_{R} (y^{*})) - M S E_{min} ({\hat{G}}_{S M} (y^{*})) > 0 .$

Proof.
For the above statement, the simplified expression is given below:

$G_{y^{*}}^{2} (ψ_{0} + ψ_{1} - 2 ψ_{01} - D) - G_{y^{*}} G_{w^{*}}^{2} E + 4 G_{w^{*}}^{2} ψ_{2} ψ_{12}^{2} > 0 .$

(26)

where $D = A + B$ , while A and B are defined in the proof of the first statement, and

$E = - 4 γ_{1} ψ_{1} ψ_{2} ψ_{12} - 4 γ_{2} ψ_{2} ψ_{12} + 8 ψ_{2} ψ_{01} ψ_{12} .$

(27)

Now we know that,

$ψ_{0} + ψ_{1} - 2 ψ_{01} - D > 0,$

(28)

because D is basically the collection of covariance and correction terms. So,

$G_{y^{*}}^{2} (ψ_{0} + ψ_{1} - 2 ψ_{01} - D) > G_{y^{*}} (Q E + 4 Q ψ_{2} ψ_{12}^{2}) .$

(29)

So, the above result shows that

$M S E ({\hat{G}}_{R} (y^{*})) > M S E_{min} ({\hat{G}}_{S M} (y^{*})) .$

(30)

□
3.: From (6) and (20),

$M S E_{min} ({\hat{G}}_{S M} (y^{*})) < M S E ({\hat{G}}_{B T, R} (y^{*})),$

if

$M S E ({\hat{G}}_{B T, R} (y^{*})) - M S E_{min} ({\hat{G}}_{S M} (y^{*})) > 0 .$
4.: From (8) and (20),

$M S E_{min} ({\hat{G}}_{S M} (y^{*})) < V a r_{min} ({\hat{G}}_{R e g} (y^{*})),$

provided that

$V a r_{min} ({\hat{G}}_{R e g} (y^{*})) - M S E_{min} ({\hat{G}}_{S M} (y^{*})) > 0 .$
5.: From (10) and (20),

$M S E_{min} ({\hat{G}}_{S M} (y^{*})) < M S E_{min} ({\hat{G}}_{R D} (y^{*})),$

whenever

$M S E_{min} ({\hat{G}}_{R D} (y^{*})) - M S E_{min} ({\hat{G}}_{S M} (y^{*})) > 0 .$
6.: From (12) and (20),

$M S E_{min} ({\hat{G}}_{S M} (y^{*})) < M S E_{min} ({\hat{G}}_{G K} (y^{*})),$

provided that

$M S E_{min} ({\hat{G}}_{G K} (y^{*})) - M S E_{min} ({\hat{G}}_{S M} (y^{*})) > 0 .$

Proof.
The simplification after putting the mean squared error (MSE) of both estimators, we get,

$G_{y^{*}}^{2} K - (G_{y^{*}}^{2} A + G_{y^{*}} G_{w^{*}}^{2} B + G_{w^{*}}^{2} C) > 0,$

(31)

where,

$K = \frac{1}{64} (64 - 16 θ^{2} ψ_{1} - \frac{ψ_{1} {(- 8 + θ^{2} ψ_{1})}^{2}}{ψ_{1} (1 + ψ_{0}) - ψ_{01}^{2}}),$

(32)

where A, B and C are defined in the proof of the first statement. So, the above statement will hold iff $A < K$ , $B \leq 0$ , and $C \leq 0$ .

□

These theoretical inequalities confirm that the proposed estimator achieves a lower MSE than its counterparts across multiple established classes, including ratio-type, regression-type, and generalized kernel-based estimators. In other words, the proposed estimator exhibits uniformly better efficiency under the PPS framework.

From a practical perspective, such efficiency gains translate directly into more reliable estimation of cumulative distribution functions (CDFs) in real-world survey applications. For instance,

In income distribution studies, a more efficient estimator reduces the required sample size, thereby lowering data collection costs.
In epidemiological surveys, improved precision ensures more accurate prevalence estimates, which are critical for healthcare policy and resource allocation.
In agricultural yield forecasting, efficiency improvements minimize the risk of biased or imprecise yield estimates when unit sizes (e.g., farm sizes) vary widely.

Thus, the theoretical superiority of the proposed estimator directly strengthens its applicability in domains where PPS sampling is indispensable and estimator efficiency is crucial.

3.2. Simulation Study

To evaluate the finite-sample performance of the proposed family of estimators, a Monte Carlo simulation study was conducted using three synthetic populations with distinct mean and covariance structures. These populations were generated from multivariate normal distributions to mimic diverse correlation patterns and size heterogeneity frequently encountered in survey data.

Population I was designed with a simple increasing mean vector

μ_{1} = {(1, 2, 3, 4)}^{⊤}

and a moderately correlated covariance matrix, thereby representing a balanced structure with mild heterogeneity. Population II was constructed with zero means and higher off-diagonal correlations, reflecting stronger dependency among variables and potential multicollinearity effects. Population III had decreasing negative means

μ_{3} = {(- 1, - 2, - 3, - 4)}^{⊤}

and a denser covariance matrix with stronger correlations, thereby representing more heterogeneous and complex conditions (see Table 1). Table 2 reports the mean squared errors (MSEs) for the proposed estimators

{\hat{G}}_{j, S M} (y^{*})

(

j = 1, \dots, 8

) and for benchmark competitors including the random group (

{\hat{G}}_{R (y^{*})}

), bias-transformed ratio (

{\hat{G}}_{B T R (y^{*})}

), regression (

{\hat{G}}_{R e g (y^{*})}

), regression difference (

{\hat{G}}_{R D (y^{*})}

), and generalized kernel (

{\hat{G}}_{G K (y^{*})}

) estimators. The results demonstrate that across all three populations, the proposed estimators consistently yield smaller MSEs than the classical alternatives. Particularly,

{\hat{G}}_{6, S M} (y^{*})

and

{\hat{G}}_{7, S M} (y^{*})

achieve the lowest error magnitudes, highlighting the robustness of the SM-based framework.

To complement the MSE results, Table 3 presents the percentage relative efficiencies (PREs) of the estimators with respect to the baseline

{\hat{G}}_{y^{*}}

. Values greater than 100 indicate efficiency gains. The superiority of the proposed estimators is evident, with PREs ranging from 114% to 171% in Population I, 146% to 160% in Population II, and 121% to 158% in Population III. In particular,

{\hat{G}}_{6, S M} (y^{*})

and

{\hat{G}}_{7, S M} (y^{*})

consistently outperform the benchmarks across different populations, demonstrating notable gains in precision under heterogeneous sampling conditions.

Overall, the simulation study highlights that the proposed SM-based estimators substantially reduce estimation error and improve efficiency, especially in scenarios with high heterogeneity and strong inter-variable correlations. These gains reaffirm the theoretical advantages of incorporating auxiliary size information into distribution function estimation under PPS sampling.

In addition to the tabulated results, Figure 1 and Figure 2 provides a visual comparison of the performance of the proposed family of estimators relative to the existing benchmarks. Panel (a) presents the mean squared error (MSE) values, while Panel (b) depicts the percentage relative efficiencies (PREs) for all three simulated populations.

In Figure 1a and Figure 2a, the proposed estimators

{\hat{G}}_{j, S M} (y^{*})

,

j = 1, \dots, 8

, consistently achieve lower MSE values across all the populations when compared to traditional estimators such as

{\hat{G}}_{y^{*}}

,

{\hat{G}}_{R e g (y^{*})}

, and

{\hat{G}}_{B T R (y^{*})}

. This clearly demonstrates the robustness of the new class of estimators in reducing estimation error even under diverse covariance structures. The improvement is most pronounced for Populations I and III, where heterogeneity in the covariance matrix is higher, highlighting the adaptability of the proposed methods to more complex correlation patterns.

Figure 1b and Figure 2b further corroborate these findings by showing that the proposed estimators substantially outperform classical benchmarks in terms of PRE. For instance, the estimator

{\hat{G}}_{6, S M} (y^{*})

achieves up to 170.92% efficiency relative to the simple expansion estimator

{\hat{G}}_{y^{*}}

, while

{\hat{G}}_{1, S M} (y^{*})

and

{\hat{G}}_{7, S M} (y^{*})

also exhibit consistently high efficiency gains across populations. These improvements indicate that the proposed estimators not only reduce bias and variance but also enhance efficiency, making them preferable choices for practical applications in survey inference.

Overall, the visual evidence aligns well with the numerical results reported in Table 4, reinforcing the conclusion that the proposed family of estimators provides superior performance in terms of both error minimization and efficiency enhancement.

3.3. Empirical Study

To examine the empirical performance of the proposed family of estimators, we consider three distinct populations commonly used in survey sampling research. These populations are drawn from fisheries, enology, and demographic contexts, ensuring diversity in both application domains and correlation structures. The corresponding descriptive statistics are summarized in Table 5.

Population I (Adapted from [35]). This population is based on fisheries data, where auxiliary information is naturally available across different years. The variables are defined as follows: Y = Quantity of fish caught in 1995; X = Quantity of fish caught in 1992; Z = Quantity of fish caught by fishermen in 1993; aQ = Quantity of fish caught in 1994. This dataset is characterized by strong temporal correlations across years, making it well-suited for evaluating auxiliary-variable-based estimators.

Population II ([36]). This population is constructed from the UCI Wine dataset, with chemical composition attributes serving as the study and auxiliary variables: Y = Aspartame; X = Leucine; Z = Isoleucine; Q = Valine. This setting captures biochemical correlation structures, with strong relationships among amino acids, providing a different test case from fisheries and demographic data.

Population III (Adapted from [37]). This population reflects demographic data with institutional auxiliary variables: Y = Population (in thousands) in 1985; X = Population in 1975; Z = Population in 1977; Q = Total number of seats in the municipal council. This population represents the type of socio-economic datasets where auxiliary information from administrative records enhances survey inference.

As shown in Table 5, Population I and II exhibit high correlations between study and auxiliary variables (e.g.,

R_{G_{x}^{*} G_{w}^{*}} = 0.420

and

0.582

, respectively), whereas Population III shows weaker correlations (e.g.,

0.0380

). These differences create a proper test bed for assessing how the proposed estimators respond to varying strengths of auxiliary information. The coefficients of variation (

C_{G}

) also highlight structural heterogeneity, especially in Population III, where demographic and institutional measures are less aligned. These variations across populations provide an informative benchmark for evaluating the robustness and efficiency of the proposed estimators (see Table 6).

The empirical findings provide a comprehensive comparison of the proposed family of distribution function estimators with the existing methods in terms of efficiency, precision, and robustness (see Table 7). Table 4 reports the mean squared errors (MSEs) across Populations I–III, while Table 8 presents the percentage relative efficiencies (PREs). The corresponding graphical representations are presented in Figure 3, which illustrates both the MSE patterns (panel a) and the efficiency gains (panel b).

From Table 4, it is evident that the proposed estimators consistently achieve lower MSE values compared to conventional estimators such as

{\hat{G}}_{y^{*}}

,

{\hat{G}}_{R e g (y^{*})}

, and

{\hat{G}}_{R D (y^{*})}

. For example,

{\hat{G}}_{7, S M} (y^{*})

records the lowest MSEs across all three populations, reaching values of

0.0093

,

0.0057

, and

0.0015

for Populations I, II, and III, respectively. Similarly,

{\hat{G}}_{6, S M} (y^{*})

and

{\hat{G}}_{1, S M} (y^{*})

perform remarkably well, with substantial reductions in error relative to their classical counterparts. These improvements underscore the adaptability of the proposed estimators in diverse population settings.

The efficiency comparison, summarized in Table 8, further confirms these findings. The proposed family outperforms baseline estimators, often by a wide margin. For instance,

{\hat{G}}_{1, S M} (y^{*})

achieves PREs of

188.47

,

255.60

, and

328.16

across the three populations, highlighting its robustness and superior efficiency relative to the standard

{\hat{G}}_{y^{*}}

benchmark, which is normalized to 100. Likewise,

{\hat{G}}_{6, S M} (y^{*})

and

{\hat{G}}_{7, S M} (y^{*})

consistently show high PRE values, demonstrating their efficiency gains across both small and large populations.

These numerical insights are supported by the graphical results in Figure 3. Panel (a) illustrates the MSE comparisons, where the proposed estimators demonstrate clear superiority over the existing methods, resulting in significant reductions in error levels. Panel (b) illustrates the PRE values, which further highlight the relative efficiency improvements. Notably, the proposed estimators not only outperform traditional regression and ratio-based estimators but also maintain robustness across repeated simulation runs, reinforcing their practical utility in survey sampling applications. Panel (c) and (d) indicates the pattern and trend with respect to mean square errors and percentage relative efficiency.

Taken together, the results indicate that the proposed family of estimators achieves substantial improvements in both precision and efficiency, offering a versatile and effective alternative to conventional approaches. Their consistently high performance across multiple populations and simulation settings suggests strong potential for application in real-world survey data, especially in contexts where auxiliary information is available and reliable.

4. Discussion

The performance evaluation of the proposed estimator requires not only theoretical justification but also empirical evidence that reflects its robustness in practical applications. To this end, we carried out an extensive simulation study, complemented by an empirical investigation across three distinct populations, to validate the theoretical properties established in the earlier sections. The discussion presented here emphasizes both the statistical implications and the applied relevance of the findings.

From the theoretical perspective, the derivations in Section 3 demonstrated that the proposed family of estimators consistently achieves a lower minimum mean squared error (MSE) compared to conventional alternatives such as

\hat{G} (y^{*})

,

{\hat{G}}_{R} (y^{*})

,

{\hat{G}}_{B T, R} (y^{*})

,

{\hat{G}}_{R e g} (y^{*})

,

{\hat{G}}_{R D} (y^{*})

, and

{\hat{G}}_{G K} (y^{*})

. This advantage was shown to hold under general conditions, thereby establishing a strong foundation for the efficiency gains observed in practice.

On the empirical side, the results provide compelling numerical evidence. As depicted in Figure 3, the proposed estimators

{\hat{F}}_{1, S M}

through

{\hat{F}}_{8, S M}

consistently yield substantially lower MSE values across all three populations, thereby confirming the robustness of the theoretical results. The magnitude of improvement is not marginal; rather, it is systematic and pronounced. In particular, the efficiency gains are clearly illustrated in Figure 3, where the percentage relative efficiency (PRE) values of the proposed estimators exceed those of the existing estimators, ranging from 136% to 328% across populations. Such high PRE values underscore the substantial improvements in precision and efficiency achieved by incorporating auxiliary information into the PPS sampling framework.

From an applied standpoint, these findings are particularly significant. The improved estimation of the population cumulative distribution function (CDF) has direct implications for fields where accurate distributional inference is critical, such as economics, social sciences, official statistics, and biomedical studies. For instance, survey practitioners working with unequal probability designs often face challenges in balancing design efficiency and estimator bias; the proposed family of estimators provides a viable solution that reduces estimation error while maintaining design consistency. Moreover, the graphical comparisons not only enhance the interpretability of the results but also underscore the stability of the proposed estimators across heterogeneous population structures, making them more attractive for practical implementation.

Beyond these theoretical and applied contributions, the study also underscores the role of computational advances in modern survey sampling and inference. The use of simulation-based validation enables the exploration of estimator performance across a wide range of population structures and sampling designs, providing insights that would be difficult to obtain through purely analytical derivations. Furthermore, the integration of auxiliary information into both the design and estimation stages exemplifies how computational statistics can bridge theory and application: leveraging additional covariates enhances estimator efficiency. At the same time, simulation frameworks ensure reproducibility and scalability in practice. This synergy between computational methods, auxiliary information, and theoretical efficiency directly aligns with the goals of contemporary computational statistics, as emphasized in this special issue.

Therefore, the discussion consolidates both theoretical and empirical perspectives: the proposed estimators achieve provable efficiency gains under PPS sampling, supported by extensive numerical validation. The joint evidence from analytical derivations, numerical tables, and graphical results confirms that the proposed family of estimators offers a substantial improvement over the existing approaches, thereby advancing the toolkit available for distribution function estimation in modern computational statistics and its applications.

Looking ahead, the methodological framework introduced here could be further extended to meet the challenges posed by contemporary data environments. In particular, integrating the proposed estimators with big data survey designs, Bayesian computational frameworks, or machine learning-based auxiliary information extraction presents exciting avenues for future research. Such extensions would not only enhance scalability and adaptability in high-dimensional or complex population settings but also strengthen the role of computational statistics in bridging traditional theory with modern data-driven applications. Likewise, can also be extended to other scenarios with different datasets [38,39,40,41,42].

5. Conclusions

This study emphasizes the fundamental importance of probability proportional to size (PPS) sampling for improving the estimation of distribution functions by strategically incorporating auxiliary information. By extending the theoretical framework to cumulative distribution functions (CDFs) for both study and auxiliary variables, we developed a novel, computationally efficient family of estimators specifically tailored to estimate the population CDF of the study variable. Theoretical analyses reveal that the proposed estimators achieve notable reductions in mean squared error (MSE). At the same time, empirical evidence across multiple populations shows percentage relative efficiency (PRE) gains ranging from 136% to 328% relative to the existing PPS-based estimators. Beyond methodological refinement, this work directly advances modern survey methodology by integrating dual auxiliary information into both design and estimation stages, thereby improving estimator precision without increasing sample size requirements. Although the current empirical evaluations were based on moderately sized datasets (e.g., fisheries, wine chemistry, and demographics), future research will extend this framework to large-scale survey environments to assess computational scalability and performance under high-dimensional data structures. Thus, the proposed PPS-based estimation framework bridges theoretical innovation with computational practicality, providing a robust and efficient tool for contemporary survey statisticians and data scientists.

Author Contributions

Conceptualization, S.S., E.M. and H.I.; Methodology, S.S., E.M., H.I. and P.C.R.; Software, S.S.; Validation, S.S., E.M., H.I., R.I.G.M. and J.L.L.-G.; Formal analysis, S.S., E.M., H.I., P.C.R., R.I.G.M. and J.L.L.-G.; Investigation, S.S., E.M., H.I., P.C.R. and R.I.G.M.; Resources, H.I., P.C.R. and J.L.L.-G.; Data curation, S.S.; Writing—original draft, S.S., E.M. and H.I.; Writing—review & editing, S.S., E.M., H.I., P.C.R., R.I.G.M. and J.L.L.-G.; Visualization, S.S., H.I., R.I.G.M. and J.L.L.-G.; Supervision, E.M., H.I. and P.C.R.; Project administration, E.M., P.C.R., R.I.G.M. and J.L.L.-G.; Funding acquisition, P.C.R., R.I.G.M. and J.L.L.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are openly available on Yahoo Finance at https://finance.yahoo.com (accessed on 15 December 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Horvitz, D.G.; Thompson, D.J. A Generalization of Sampling Without Replacement from a Finite Universe. J. Am. Stat. Assoc. 1952, 47, 663–685. [Google Scholar] [CrossRef]
Hájek, J. Asymptotic Theory of Rejective Sampling with Varying Probabilities from a Finite Population. Ann. Math. Stat. 1964, 35, 1491–1523. [Google Scholar] [CrossRef]
Rao, J.N.K.; Wu, C.F.J. Resampling Inference with Complex Survey Data. J. Am. Stat. Assoc. 1988, 83, 231–241. [Google Scholar] [CrossRef]
Food and Agriculture Organization of the United Nations. Sampling Manual for Forest and Range Inventory; FAO Forestry Paper 82; FAO: Rome, Italy, 1998. [Google Scholar]
Abbas, N.; Shabbir, J.; Hanif, M. Estimation of cumulative distribution function under complex survey designs using auxiliary information. PLoS ONE 2025, 20, e0322660. [Google Scholar] [CrossRef]
Flood, J.; Mostafa, S. Survey Data Integration for Distribution Function Estimation. arXiv 2024, arXiv:2409.14284. [Google Scholar] [CrossRef]
Dagdoug, A.; Goga, C.; Haziza, D. Model-assisted estimation in survey sampling using random forests. arXiv 2020, arXiv:2002.09736. [Google Scholar]
Xia, J.; Li, S.; Huang, J.; Yang, Z.; Jaimoukha, I.M.; Gündüz, D. Metalearning-Based Alternating Minimization Algorithm for Nonconvex Optimization. IEEE Trans. Neural Networks Learn. Syst. 2023, 34, 5366–5380. [Google Scholar] [CrossRef]
Abbas, M.; Shehzad, M.A.; Iftikhar, H.; Rodrigues, P.C.; Alharbi, A.A.; Allohibi, J. Efficient Estimators of Finite Population Variance Using Raw Moments under Two- and Three-Stage Cluster Sampling Schemes. AIMS Math. 2025, 10, 23429–23466. [Google Scholar] [CrossRef]
Ahmad, S.; Qureshi, M.; Iftikhar, H.; Rodrigues, P.C.; Rehman, M.Z. An improved family of unbiased ratio estimators for a population distribution function. AIMS Math. 2025, 10, 1061–1084. [Google Scholar] [CrossRef]
Hussain, S.; Ahmad, S.; Akhtar, S.; Javed, A.; Yasmeen, U. Estimation of finite population distribution function with dual use of auxiliary information under non-response. PLoS ONE 2020, 15, e0243584. [Google Scholar] [CrossRef]
Xia, W.; Pu, L.; Zou, X.; Shilane, P.; Li, S.; Zhang, H.; Wang, X. The Design of Fast and Lightweight Resemblance Detection for Efficient Post-Deduplication Delta Compression. ACM Trans. Storage 2023, 19, 1–30. [Google Scholar] [CrossRef]
Singh, H.P.; Singh, S.; Kozak, M. A family of estimators of finite-population distribution function using auxiliary information. Acta Appl. Math. 2008, 104, 115–130. [Google Scholar] [CrossRef]
Yaqub, M.; Sohil, F.; Shabbir, J.; Sohail, M.U. Estimation of population distribution function in the presence of non-response using stratified random sampling. Commun. Stat.-Simul. Comput. 2024, 53, 2498–2526. [Google Scholar] [CrossRef]
Mustafa, M.; Ahmad, S.; Zahid, E.; Shabbir, J.; Masood, S. Novel Methods For Estimation Of Population Mean Using Auxiliary Information Under PPS Sampling: Application With Real And Simulated Data Sets. Kurd. Stud. 2024, 12, 1553–1562. [Google Scholar] [CrossRef]
Khan, S.; Farooq, M.; Ahmad, S.; Khan, S. Improved estimator for estimation of population mean using predictive approach under PPS sampling. VFAST Trans. Math. 2024, 12, 1–16. [Google Scholar] [CrossRef]
Wang, J.; Ahmad, S.; Arslan, M.; Lone, S.A.; Abd Ellah, A.; Aldahlan, M.A.; Elgarhy, M. Estimation of finite population mean using double sampling under probability proportional to size sampling in the presence of extreme values. Heliyon 2023, 9, e21418. [Google Scholar] [CrossRef]
Zeng, Z.; Goetz, S.M. A General Modeling and Analysis of Impacts of Unbalanced Inductance on PWM Schemes for Two-Parallel Interleaved Power Converters. IEEE Trans. Power Electron. 2024, 39, 12235–12248. [Google Scholar] [CrossRef]
Ahmad, S.; Shabbir, J. Use of extreme values to estimate finite population mean under pps sampling scheme. J. Reliab. Stat. Stud. 2018, 11, 99–112. [Google Scholar]
Azeem, M.; Iftikhar, S.; Ijaz, M.; Salahuddin, N.; Ilyas, M. An improved estimator of population mean under PPS sampling with application to radiation data sets. J. Radiat. Res. Appl. Sci. 2025, 18, 101543. [Google Scholar] [CrossRef]
Amiri, S.; Hassani, H.; Heravi, S. An efficient variant of ranked set sampling, probability proportional to size with application to economic data. Austrian J. Stat. 2024, 53, 17–31. [Google Scholar] [CrossRef]
Li, L.; Xia, Y.; Ren, S.; Yang, X. Homogeneity Pursuit in the Functional-Coefficient Quantile Regression Model for Panel Data with Censored Data. Stud. Nonlinear Dyn. Econom. 2025, 29, 323–348. [Google Scholar] [CrossRef]
Gan, X.; Li, T.; Gong, C.; Li, D.; Dong, D.; Liu, J.; Lu, K. GraphCSR: A Degree-Equalized CSR Format for Large-scale Graph Processing. Proc. VLDB Endow. 2025, 18, 4255–4268. [Google Scholar] [CrossRef]
Wu, X.; Li, L.; Tao, X.; Yuan, J.; Xie, H. Towards the Explanation Consistency of Citizen Groups in Happiness Prediction via Factor Decorrelation. IEEE Trans. Emerg. Top. Comput. Intell. 2025, 9, 1392–1405. [Google Scholar] [CrossRef]
Ren, Y.; Zhang, J.; Xia, Y.; Wang, R.; Xie, F.; Guan, J.; Zhou, S. Regression-based conditional independence test with adaptive kernels. Artif. Intell. 2025, 347, 104391. [Google Scholar] [CrossRef]
Tian, Z.; Lee, A.; Zhou, S. Adaptive tempered reversible jump algorithm for Bayesian curve fitting. Inverse Probl. 2024, 40, 045024. [Google Scholar] [CrossRef]
Yang, R.; Li, H.; Huang, H. Multisource information fusion considering the weight of focal element’s beliefs: A Gaussian kernel similarity approach. Meas. Sci. Technol. 2024, 35, 025136. [Google Scholar] [CrossRef]
Zhang, Z.; Liu, Z.; Ning, L.; Tian, H.; Wang, B. Belief-based fuzzy and imprecise clustering for arbitrary data distributions. IEEE Trans. Fuzzy Syst. 2025, 33, 2755–2767. [Google Scholar] [CrossRef]
Cao, Z.; Huang, L.; Wang, T.; Wang, Y.; Shi, J.; Zhu, A.; Snoussi, H. Understanding the Dimensional Need of Noncontrastive Learning. IEEE Trans. Cybern. 2025, 55, 4089–4102. [Google Scholar] [CrossRef]
Yang, X.; Han, Q.; Ni, J.; Li, L. Research on the Expansion of Deposit Insurance Pricing Model Based on the Merton Option Pricing Framework. Comput. Econ. 2025, 8, 1–23. [Google Scholar] [CrossRef]
Cochran, W. The estimation of the yields of cereal experiments by sampling for the ratio of grain to total produce. J. Agric. Sci. 1940, 30, 262–275. [Google Scholar] [CrossRef]
Bahl, S.; Tuteja, R. Ratio and product type exponential estimators. J. Inf. Optim. Sci. 1991, 12, 159–164. [Google Scholar] [CrossRef]
Rao, T. On certail methods of improving ration and regression estimators. Commun. Stat.-Theory Methods 1991, 20, 3325–3340. [Google Scholar] [CrossRef]
Grover, L.K.; Kaur, P. A generalized class of ratio type exponential estimators of population mean under linear transformation of auxiliary variable. Commun. Stat.-Simul. Comput. 2014, 43, 1552–1574. [Google Scholar] [CrossRef]
Singh, S. Advanced Sampling Theory with Applications: How Michael “Selected” Amy; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003; Volume 2. [Google Scholar]
Herbert, A. Wine Dataset. UCI Machine Learning Repository. 2009. Available online: http://archive.ics.uci.edu/ml/datasets/wine (accessed on 20 January 2025).
Särndal, C.E. Methods for estimating the precision of survey estimates when imputation has been used. Surv. Methodol. 1992, 18, 241–252. [Google Scholar]
Zhang, H.; Bao, X.; Zhao, H.; Hao, Y.; Huang, H.; Dai, M.; Liu, W. High-Precision Deblending of 3-D Simultaneous Source Data Based on Prior Information Constraint. IEEE Geosci. Remote Sens. Lett. 2025, 22, 1–5. [Google Scholar] [CrossRef]
Cai, X.; Zhang, C. An Innovative Differentiated Creative Search Based on Collaborative Development and Population Evaluation. Biomimetics 2025, 10, 260. [Google Scholar] [CrossRef]
Zhang, Y.; Feng, S.; Wang, P.; Tan, Z.; Luo, X.; Ji, Y.; Cheung, Y. Learning Self-Growth Maps for Fast and Accurate Imbalanced Streaming Data Clustering. IEEE Trans. Neural Networks Learn. Syst. 2025, 36, 16049–16061. [Google Scholar] [CrossRef]
Li, S.; Yang, J.; Bao, H.; Xia, D.; Zhang, Q.; Wang, G. Cost-Sensitive Neighborhood Granularity Selection for Hierarchical Classification. IEEE Trans. Knowl. Data Eng. 2025, 37, 4471–4484. [Google Scholar] [CrossRef]
Zhang, H.; Ren, Y.; Xia, Y.; Zhou, S.; Guan, J. Towards Effective Causal Partitioning by Edge Cutting of Adjoint Graph. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10259–10271. [Google Scholar] [CrossRef]

Figure 1. (a) MSE comparison and (b) PRE comparison for the proposed family of estimators and the existing methods based on simulation results.

Figure 2. (a) Scatter plot for MSE comparison and (b) PRE comparison for the proposed family of estimators and the existing methods using scatter plot based on simulation results.

Figure 3. Graphical illustration of simulation outcomes: (a) mean squared error (MSE) comparison, highlighting the lower error rates achieved by the proposed family of estimators; (b) percentage relative efficiency (PRE) comparison, where values above 100 confirm substantial improvements over the traditional estimator; (c) MSE comparison based on scatter plot; and (d) PRE comparison based on scatter plot with regression curve. The consistency of these patterns across different populations underscores the robustness and superiority of the proposed estimators.

Table 1. Summary measure.

Measure	Pop. I	Pop. II	Pop. III
Population size (N)	5000	5000	5000
Sample size (n)	120	120	120
$G_{y^{*}}$	0.5	0.5	0.5
$G_{x^{*}}$	0.5	0.5	0.5
$G_{w^{*}}$	0.5	0.5	0.5
$C_{G_{y}^{*}}$	1.0001	1.0001	1.0001
$M_{y^{*}}$	1.0101	$- 0.0142$	$- 0.9810$
$M_{x^{*}}$	1.9767	0.000228	$- 1.998$
$M_{w^{*}}$	2.99041	$- 0.0003$	$- 2.9747$
$C_{G_{x}^{*}}$	1.0001	1.0001	1.0001
$C_{G_{w}^{*}}$	1.0001	1.0001	1.0001
$λ$	0.0081	0.0081	0.0081
$R_{G_{y}^{} G_{x}^{}}$	0.1104	0.5072	0.3352
$R_{G_{y}^{} G_{w}^{}}$	0.1504	0.2624	0.114
$R_{G_{x}^{} G_{w}^{}}$	0.2752	$- 0.0112$	$- 0.212$

Table 2. Mean squared errors (MSEs) from simulation study. Smaller values indicate higher efficiency.

Estimators	Population I	Population II	Population III
${\hat{G}}_{8, S M} (y^{*})$	0.00178	0.00138	0.00139
${\hat{G}}_{7, S M} (y^{*})$	0.00147	0.00138	0.00129
${\hat{G}}_{6, S M} (y^{*})$	0.00119	0.00137	0.00165
${\hat{G}}_{5, S M} (y^{*})$	0.00175	0.00138	0.00127
${\hat{G}}_{4, S M} (y^{*})$	0.00163	0.00138	0.00153
${\hat{G}}_{3, S M} (y^{*})$	0.00176	0.00138	0.00147
${\hat{G}}_{2, S M} (y^{*})$	0.00173	0.00138	0.00131
${\hat{G}}_{1, S M} (y^{*})$	0.00137	0.00127	0.00167
${\hat{G}}_{R (y^{*})}$	0.00277	0.00201	0.00270
${\hat{G}}_{B T R (y^{*})}$	0.00194	0.00155	0.00190
${\hat{G}}_{y^{*}}$	0.00203	0.00204	0.00206
${\hat{G}}_{R e g (y^{*})}$	0.00183	0.00151	0.00180
${\hat{G}}_{R D (y^{*})}$	0.00181	0.00150	0.00179
${\hat{G}}_{G K (y^{*})}$	0.00181	0.00150	0.00179

Table 3. Percentage relative efficiencies (PREs) with respect to

{\hat{G}}_{y^{*}}

(baseline = 100).

Table 3. Percentage relative efficiencies (PREs) with respect to

{\hat{G}}_{y^{*}}

(baseline = 100).

Estimators	Population I	Population II	Population III
${\hat{G}}_{8, S M} (y^{*})$	114.06	146.87	146.49
${\hat{G}}_{7, S M} (y^{*})$	138.59	147.55	157.80
${\hat{G}}_{6, S M} (y^{*})$	170.92	148.20	123.57
${\hat{G}}_{5, S M} (y^{*})$	115.90	147.43	160.68
${\hat{G}}_{4, S M} (y^{*})$	124.73	147.82	133.15
${\hat{G}}_{3, S M} (y^{*})$	115.57	147.82	138.05
${\hat{G}}_{2, S M} (y^{*})$	117.37	147.69	155.10
${\hat{G}}_{1, S M} (y^{*})$	148.31	159.85	121.44
${\hat{G}}_{R (y^{*})}$	73.36	93.03	75.39
${\hat{G}}_{B T R (y^{*})}$	104.06	104.25	106.88
${\hat{G}}_{y^{*}}$	100.00	100.00	100.00
${\hat{G}}_{R e g (y^{*})}$	111.28	127.22	112.79
${\hat{G}}_{R D (y^{*})}$	112.10	133.96	113.61
${\hat{G}}_{G K (y^{*})}$	112.34	136.44	113.61

Table 4. Mean squared errors (MSEs) of the proposed and existing estimators under Populations I–III. Lower MSE indicates higher efficiency.

Estimator	Population I	Population II	Population III
${\hat{G}}_{8, S M} (y^{*})$	0.0141	0.0096	0.0030
${\hat{G}}_{7, S M} (y^{*})$	0.0093	0.0057	0.0015
${\hat{G}}_{6, S M} (y^{*})$	0.0066	0.0032	0.0020
${\hat{G}}_{5, S M} (y^{*})$	0.0125	0.0074	0.0018
${\hat{G}}_{4, S M} (y^{*})$	0.0118	0.0082	0.0016
${\hat{G}}_{3, S M} (y^{*})$	0.0141	0.0087	0.0027
${\hat{G}}_{2, S M} (y^{*})$	0.0138	0.0090	0.0026
${\hat{G}}_{1, S M} (y^{*})$	0.0093	0.0068	0.0012
${\hat{G}}_{R (y^{*})}$	0.0263	0.0187	0.0106
${\hat{G}}_{B T R (y^{*})}$	0.0212	0.0167	0.0072
${\hat{G}}_{y^{*}}$	0.0175	0.0174	0.0040
${\hat{G}}_{R e g (y^{*})}$	0.0164	0.0136	0.0040
${\hat{G}}_{R D (y^{*})}$	0.0154	0.0130	0.0039
${\hat{G}}_{G K (y^{*})}$	0.0151	0.0127	0.0039

Table 5. Key parameters and summary measures for the three selected populations.

Measure	Pop. I	Pop. II	Pop. III
Population size (N)	69	67	284
Sample size (n)	12	12	50
$S_{y^{*}}$	0.5034	0.504	0.494
$S_{x^{*}}$	0.5021	0.504	0.5019
$S_{w^{*}}$	0.5034	0.500	0.5012
$G_{y^{*}}$	0.500	0.507	0.500
$G_{x^{*}}$	0.500	0.507	0.500
$G_{w^{*}}$	0.500	0.507	0.500
$C_{G_{y}^{*}}$	0.993	0.993	0.845
$C_{G_{x}^{*}}$	0.993	0.993	1.002
$C_{G_{w}^{*}}$	0.993	0.993	1.002
$λ$	0.0688	0.0688	0.0165
$M_{y^{*}}$	4782.01	25.68	28.81
$M_{x^{*}}$	4788.68	19.59	36.85
$M_{w^{*}}$	4810.81	61.04	12.77
$R_{G_{y}^{} G_{x}^{}}$	0.246	0.463	0.186
$R_{G_{y}^{} G_{w}^{}}$	0.072	0.403	0.186
$R_{G_{x}^{} G_{w}^{}}$	0.420	0.582	0.0380

Table 6. Proposed family of distribution function estimators for selected values of

α_{1}

and

α_{2}

.

Table 6. Proposed family of distribution function estimators for selected values of

α_{1}

and

α_{2}

.

$γ_{1}$	$γ_{2}$	Proposed Estimator ${\hat{G}}_{prop} (y^{*})$
$R_{G_{y}^{} G_{x}^{}}$	$C_{G_{y}^{} G_{w}^{}}$	${\hat{G}}_{8, prop} (y^{*})$
$S_{y^{*}}$	1	${\hat{G}}_{7, prop} (y^{*})$
1	$G_{x^{*}}$	${\hat{G}}_{6, prop} (y^{*})$
$R_{G_{x^{}} G_{w^{}}}$	$C_{G_{x^{}} G_{w^{}}}$	${\hat{G}}_{5, prop} (y^{*})$
$G_{w^{*}}$	$C_{G_{v^{}} G_{w^{}}}$	${\hat{G}}_{4, prop} (y^{*})$
$C_{G_{y^{}} G_{x^{}}}$	$R_{G_{x^{}} G_{w^{}}}$	${\hat{G}}_{3, prop} (y^{*})$
$R_{G_{y^{}} G_{x^{}}}$	$G_{x^{*}}$	${\hat{G}}_{2, prop} (y^{*})$
1	0	${\hat{G}}_{1, prop} (y^{*})$

Table 7. Proposed family of distribution function estimators for selected values of

α_{1}

and

α_{2}

.

Table 7. Proposed family of distribution function estimators for selected values of

α_{1}

and

α_{2}

.

$γ_{1}$	$γ_{2}$	Proposed Estimator ${\hat{G}}_{prop} (y^{*})$
$R_{G_{y}^{} G_{x}^{}}$	$C_{G_{y}^{} G_{w}^{}}$	${\hat{G}}_{8, prop} (y^{*})$
$S_{y^{*}}$	1	${\hat{G}}_{7, prop} (y^{*})$
1	$G_{x^{*}}$	${\hat{G}}_{6, prop} (y^{*})$
$R_{G_{x^{}} G_{w^{}}}$	$C_{G_{x^{}} G_{w^{}}}$	${\hat{G}}_{5, prop} (y^{*})$
$G_{w^{*}}$	$C_{G_{v^{}} G_{w^{}}}$	${\hat{G}}_{4, prop} (y^{*})$
$C_{G_{y^{}} G_{x^{}}}$	$R_{G_{x^{}} G_{w^{}}}$	${\hat{G}}_{3, prop} (y^{*})$
$R_{G_{y^{}} G_{x^{}}}$	$G_{x^{*}}$	${\hat{G}}_{2, prop} (y^{*})$
1	0	${\hat{G}}_{1, prop} (y^{*})$

Table 8. Percentage relative efficiencies (PREs) of the proposed estimators compared with the usual distribution estimator

{\hat{G}}_{y^{*}}

. Values greater than 100 indicate improvement over the baseline.

Table 8. Percentage relative efficiencies (PREs) of the proposed estimators compared with the usual distribution estimator

{\hat{G}}_{y^{*}}

. Values greater than 100 indicate improvement over the baseline.

Estimator	Population I	Population II	Population III
${\hat{G}}_{8, S M} (y^{*})$	123.98	181.75	136.09
${\hat{G}}_{7, S M} (y^{*})$	187.27	306.62	263.52
${\hat{G}}_{6, S M} (y^{*})$	266.25	236.13	205.52
${\hat{G}}_{5, S M} (y^{*})$	139.17	233.82	222.20
${\hat{G}}_{4, S M} (y^{*})$	148.49	211.46	257.92
${\hat{G}}_{3, S M} (y^{*})$	123.86	199.55	147.36
${\hat{G}}_{2, S M} (y^{*})$	126.39	192.61	152.11
${\hat{G}}_{1, S M} (y^{*})$	188.47	255.60	328.16
${\hat{G}}_{R (y^{*})}$	66.33	93.03	37.82
${\hat{G}}_{B T R (y^{*})}$	82.30	104.25	56.04
${\hat{G}}_{y^{*}}$	100.00	100.00	100.00
${\hat{G}}_{R e g (y^{*})}$	106.45	127.22	101.01
${\hat{G}}_{R D (y^{*})}$	113.24	133.96	102.19
${\hat{G}}_{G K (y^{*})}$	115.33	136.44	102.65

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shah, S.; Mahmoudi, E.; Iftikhar, H.; Rodrigues, P.C.; Gonzales Medina, R.I.; López-Gonzales, J.L. A Novel Family of CDF Estimators Under PPS Sampling: Computational, Theoretical, and Applied Perspectives. Axioms 2025, 14, 796. https://doi.org/10.3390/axioms14110796

AMA Style

Shah S, Mahmoudi E, Iftikhar H, Rodrigues PC, Gonzales Medina RI, López-Gonzales JL. A Novel Family of CDF Estimators Under PPS Sampling: Computational, Theoretical, and Applied Perspectives. Axioms. 2025; 14(11):796. https://doi.org/10.3390/axioms14110796

Chicago/Turabian Style

Shah, Salman, Eisa Mahmoudi, Hasnain Iftikhar, Paulo Canas Rodrigues, Ronny Ivan Gonzales Medina, and Javier Linkolk López-Gonzales. 2025. "A Novel Family of CDF Estimators Under PPS Sampling: Computational, Theoretical, and Applied Perspectives" Axioms 14, no. 11: 796. https://doi.org/10.3390/axioms14110796

APA Style

Shah, S., Mahmoudi, E., Iftikhar, H., Rodrigues, P. C., Gonzales Medina, R. I., & López-Gonzales, J. L. (2025). A Novel Family of CDF Estimators Under PPS Sampling: Computational, Theoretical, and Applied Perspectives. Axioms, 14(11), 796. https://doi.org/10.3390/axioms14110796

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Family of CDF Estimators Under PPS Sampling: Computational, Theoretical, and Applied Perspectives

Abstract

1. Introduction

2. Methodology

2.1. Proposed Family of Estimators Under PPS Sampling

2.1.1. First-Order Approximation

2.1.2. Bias and Mean Squared Error

2.1.3. Theoretical and Application Relevance

3. Results

3.1. Efficiency Comparison

3.2. Simulation Study

3.3. Empirical Study

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI