Next Article in Journal
A Biased Random-Key Genetic Algorithm for Maximum Flow with Minimum Labels
Previous Article in Journal
On the Extended Adjacency Eigenvalues of Graphs and Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prospective Inference of Central Tendency Through Data-Adaptive Mechanisms

by
Huda M. Alshanbari
1 and
Malik Muhammad Anas
2,*
1
Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
2
Department of Economics and Statistics, University of Salerno, 84084 Fisciano, Salerno, Italy
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(22), 3622; https://doi.org/10.3390/math13223622
Submission received: 18 October 2025 / Revised: 8 November 2025 / Accepted: 9 November 2025 / Published: 12 November 2025

Abstract

In the modern age of data enrichment, it has become necessary to incorporate adaptive inference processes into survey-based estimation systems in order to achieve efficient and consistent population summaries. In this work, a new type of data-adaptive approach to the prospective estimation of central tendency under stratified random sampling (StRS) frameworks is presented. The suggested structure takes advantage of the auxiliary information based on locally tuned, non-parametric smoothing plans that dynamically adapt to a heterogeneity of sampled and unsampled domains. The estimator wisely reacts to an intricate pattern of the data, ensured by the application of variable bandwidth functions, stratified weighting plans, which ensure resilience to model misspecification and outlier effects. Substantial Monte Carlo simulations and two empirical studies, i.e., solar radiation data and fish market data, are performed to confirm its performance in a variety of bandwidth and sample size settings. The findings have consistently shown that the suggested adaptive inference mechanism is significantly more precise and stable than traditional estimators, not only when auxiliary expectations are known, but also when they have to be estimated. This study brings into play a flexible, design-conscious framework that connects model-driven estimation with design-driven survey inference, which is of importance in contemporary information-gathering settings of informational diversity and enrichment.

1. Introduction

In most complicated surveys, the study population may provide us with data that can be used both during the design phase and during the estimation phase to develop effective approaches to estimating parameters of a finite population, including the population total or the population mean. Such information may be provided by sources such as national census, remote sensing, or information pertaining to the natural resource inventory. Considering the fact that estimation is the key of survey sampling, much attention is paid to the use of supplementary information. The most common method is to have a hypothetical model, which is typically a linear functional form, to represent the relationship between the auxiliary and survey variables. Based on this relationship, the estimator is constructed. It is worth mentioning that employing these estimators relies on prior information about the population parameters’ specific form, which can pose challenges when dealing with multiple variables that require model development, Wu and Sitter [1]. Due to these issues, more emphasis has been placed on non-parametric models that allow for more complex associations between auxiliary and survey variables, Dorfman [2]. It is worth mentioning that the idea of non-parametric modeling was introduced in the works of (Dorfman and Hall [3]).
Non-parametric inference procedures typically differ from model or probability design more sensitively than parametric ones (Nadaraya [4]). The theoretical literature predominantly offers two methods for constructing more efficient estimators: two broad categories of implementation research: model-based and design-based. The model-based approach also assumes that the target population is a sample of the super population model. It is then used to estimate observations for portions of the population not included in the sample to compute finite population parameters (Dorfman and Hall [3]). The first introductory use of non-parametric models in this context was conducted by Nadaraya [4], who argued for the local polynomial regression (LPR) estimator as a broadened kind of regression estimator. A simulation analysis confirmed that the current new estimator was exhibits improved efficiency relative to the other parametric estimators commonly used in the past. Following this, Rueda and Sanchez-Borrego [5] enhanced this line of work with a model-based LPR estimator appropriate for direct probability sampling designs.
Although local polynomial kernel regression can be applied to both continuous and discrete data, its use depends on the type of data being analyzed. Local polynomial model fitting at each point of interest is well suited for continuous data, as it allows a local polynomial function to allocate more weight to nearby points through the use of a kernel function. Furthermore, this approach offers flexible smoothing with relatively minimal global requirements. For discrete data, some adaptations are necessary. As a running example, binary or categorical data can be processed by local polynomial logistic regression, and local generalized linear models (GLMs) with appropriate link functions like Poisson regression are used for count data. It is important to choose the right kernel bandwidth when working with sparse discrete data to ensure optimal performance. For more information on this topic, see [5].
The arithmetic mean is an established measure of statistics and is considered one of the most popular and prominent types of averages, with applications across all fields of science and the arts (Zaman and Iftikhar [6]). For this reason, mean estimation plays a significant role not only in survey sampling but also in many other sectors (Alqudah et al. [7]; Bhushan et al. [8]). For additional information related to mean estimation, readers are referred to the references Bhushan et al. [8] and Koc and Koc [9]. Given these considerations, there is a need to improve mean estimation methods by employing model-based estimation techniques.
This paper focuses on the empirical literature on non-parametric model-based mean estimators. More particularly, in model-based estimation, the model that relates dependent variables to independent variables is formed. Another class of estimation methods builds the fundamental form of the statistical model on which the parameter estimation can be based (Srivastava [10]). Under some assumptions, such models are useful in imputing the non-sampled observations of the dependent variable at the micro and macro levels. When data is collected via a sampling design, then this sampling design too can be brought into the estimation process in the same way as design based estimation. In the same vein, Rueda and Sanchez-Borrego [5] enveloped simple random sampling (SRS), together with an LPR-supported model-based estimator. This estimator reveals several desirable features within the current model-based framework.
It is worthy to note that multiple predictor variables can be handled by multiple LPR (MLPR) by fitting a local polynomial surface rather than simply a local polynomial curve. Due to the curse of dimensionality, MLPR can provide unreliable results. Furthermore, it is crucial to choose appropriate bandwidths for each variable; otherwise, noise variables can be over-smoothed or not smoothed at all. However, with only one predictor, the curse of dimensionality problem is eliminated because the data points are dense in the lower-dimensional space. This means that the kernel function can implement reliable smoothing without inflating the variance much. Furthermore, bandwidth selection is convenient and controllable over a single variable, with no more risk of over-smoothing or under-smoothing.
Despite extensive research, little work has focused on model-based calibration-type mean estimation where calibration constraints leverage auxiliary information. This paper aims to address this gap by using auxiliary information to enhance mean precision through non-parametric model regression. To accomplish this, we implement a model-oriented methodology using a LPR estimator to estimate the non-sampled values of y. The estimator demonstrates favorable properties, as shown by both theoretical and practical analysis. We explore several bandwidth selection methods to identify the optimal one for the new estimator, given its sensitivity to this choice. The proposed procedure serves as a practical alternative to existing estimators. Additionally, the Gaussian function used here displays optimal characteristics for general non-parametric regression.
In recent years, particular concern over the role of quantification and estimation of natural resources, specifically fish stock, has changed the dynamics of aquaculture, fisheries, and fish science fields where advanced estimation techniques have been reported. Fish physical attributes such as size, weight, and potentially shape need to be measured and predicted for the fish stock assessment, markets, and decision making in the fish sector. Therefore, the performance of estimators will be assessed using a real-life dataset of a fish market, which contains information about various fish attributes. This study also includes a dataset that simulates the factors affecting solar ultraviolet (UV) radiation and target labels describing UV risk levels. The purpose of this dataset is to serve as a predictive modeling of UV radiation and risk assessment using tree-based or other machine learning approaches. This research applies both datasets in order to quantify data driven approaches to natural resource estimation, fisheries management, and environmental risk assessment. Lastly, this article also provides an overview of the survey sampling methods in the field of aquaculture and fisheries with special consideration given to the environmental factors, especially UV radiation, that may influence aquatic ecosystems.
The remainder of this article provides comprehensive details on sophisticated mean estimation approaches, particularly within the non-parametric model-based framework, focusing on stratified random sampling (StRS). Section 2 offers preliminary background, introducing specific existing model-based estimators and emphasizing the advantages of non-parametric methods. Section 3 presents an adapted calibrated estimator. Section 4 extends this by employing calibration to further improve the accuracy of estimation in a model-based stratification model. Section 5 provides an insight into the double StRS approach with data characteristics estimated by employing a double-sampling approach to obtain more precise estimates in cases where the average of auxiliary variables is unknown. Section 6 includes simulation studies to evaluate and compare the efficiency of the proposed estimators with others, using varying sample sizes and bandwidth selectors. This extensive analysis demonstrates the practicality and robustness of the derived methods. Finally, conclusions are provided in Section 7.

2. Model-Based Estimator Under StRS

Alomair et al. [11] and Rueda and Sanchez-Borrego [5] suggest that the model-based approach assumes the population can be adequately represented through the predictive model ξ :
y i = m ( x i ) + ϖ i
For representation under StRS, the prediction model ξ λ ζ for λ ζ t h stratum can be written as
y i λ ζ = m ( x i λ ζ ) + ϖ i λ ζ
where ϖ i λ ζ are i i d with E ξ λ ζ ( ϖ i λ ζ ) = 0 and variance σ λ ζ 2 = 1 . The term m ( · ) is the smooth function of x. E ξ λ ζ denotes expectation.
Once the sample has been collected, the mean prediction of unobserved values Y λ ζ can be written as
Y ¯ λ ζ = f λ ζ y ¯ s λ ζ + ( 1 f λ ζ ) y ¯ s ¯ λ ζ
where y ¯ s λ ζ = ( n λ ζ ) 1 i λ ζ s λ ζ y i λ ζ and y ¯ s ¯ λ ζ = 1 N λ ζ n λ ζ j s ¯ λ ζ y j λ ζ and with i λ ζ denoting sampled units s λ ζ and j λ ζ for the non-sampled s ¯ λ ζ values in the λ ζ t h stratum. Further, N λ ζ is the size of stratum, n λ ζ is the size of sample in the stratum, and the correction factor is expressed as f λ ζ = n λ ζ N λ ζ . However, the overall population size is N. Note that the initial part of Equation (1) is known. So, the estimation of later part of Y ¯ λ ζ can be viewed as predicting the mean y ¯ s ¯ λ ζ for data that has not been sampled. If all x values are known, predictions can be straightforwardly made using a regression model, with the surrogate values y j λ ζ * = m ( x j λ ζ ) serving as substitutes for the unobserved values y j λ ζ for j λ ζ s ¯ λ ζ . However, values are unknown in practice. So, non-parametric kernel regression is utilized to receive estimates m ^ j λ ζ for j λ ζ s ¯ λ ζ Chambers et al. [12]. Further, this idea is adapted by many researchers including Rueda and Sanchez-Borrego [5]. So, in light of these studies, the traditional model-based estimator under StRS for λ ζ t h stratum is
y ¯ B R λ ζ = f λ ζ y ¯ s λ ζ + ( 1 f λ ζ ) 1 N λ ζ n λ ζ j λ ζ = s ¯ λ ζ m ^ j λ ζ
For all the strata, y ¯ B R λ ζ can be written as
y ¯ B R = λ ζ = 1 λ β P λ ζ y ¯ B R λ ζ .
where P λ ζ = N λ ζ n λ ζ is a conventional stratification weight and λ β represents the total count of strata.
It should be emphasized that m ^ j λ ζ in LPR is a generalization of kernel regression and is suitable for use in a wide array of problems. Inspired by the work of Ref. [13], as well as Rueda and Sanchez-Borrego [5], we utilize a kernel-based LPR p t h order estimator to generate predictions for the variable of interest. By defining K h ( u ) = h 1 K ( u / h ) , where K refers a Gaussian kernel function and h is the bandwidth. For some latest developments about kernel related work, please see Shahzad et al. [14], Ali [15], and Ali et al. [16]. Consequently, the prediction for unknown m ^ j λ ζ is
m ^ j λ ζ = e 1 ( X s j λ ζ W s j λ ζ X s j λ ζ ) 1 X s j λ ζ W s j λ ζ Y s λ ζ = w s j λ ζ Y s λ ζ ,
where e 1 is the vector with length p + 1 , Y s λ ζ = [ y i λ ζ ] i λ ζ s λ ζ , W s j λ ζ = diag { K h ( x i λ ζ x j λ ζ ) } i λ ζ s λ ζ , and X s j λ ζ = [ 1 , ( x i λ ζ x j λ ζ ) , , ( x i λ ζ x j λ ζ ) p ] i λ ζ s λ ζ .
The estimator y ¯ B R λ ζ can be improved under StRS using a calibration approach. So, in the coming sections, we will extend this work in light of the calibration approach.

3. Adapted Estimator

When supplementary information is used, the accuracy of the estimators of the average value can really improve. As presented by Refs. [17,18], in most actual circumstances, a direct connection is believed to exist between the variable of focal interest Y and the teamed variable X. For example, if education level and income are known to have a direct or ‘causal relation’ (i.e., education level causes income), and it has been proven and expected that people with higher education levels earn more (Smits [19]). A similar example occurs with health-related factors; for instance, there is a direct positive link between activity level and cardiovascular health. Boehm and Kubzansky [20] have noted that an increased level of activity promotes a healthy heart. These common examples show how auxiliary variables are useful in improving the precision of mean estimates.
Calibration estimation is recognized as one of the most efficient techniques for adjusting the initial weights to minimize a specific distance measure, incorporating additional or auxiliary data. Many scholars have discussed calibration weighting within strata to increase efficiency in population parameter estimates. Creating calibration weights requires two main factors: a distance function and associated constraints. Efficient weights for the auxiliary variable can also enhance the study variable. Ref. [21] have advanced calibration-based estimation employing multiple calibration constraints within the survey sampling framework, as discussed by Koyuncu [22], Singh et al. [23], Koyuncu [24], and Sinha et al. [25]. However, the focused mean estimation within the calibrated model based on StRS has not been extensively studied. Building on seminal work such as Koyuncu [24], this article proposes a calibrated model-based mean estimator using StRS from a non-parametric kernel regression perspective.
In light of StRS design, let n and N be the overall sizes of sample and population, and ( x ¯ λ ζ , X ¯ λ ζ ) be the sample and population averages of X. ( P λ ζ , Π λ ζ ) be the traditional and calibrated stratification weights. Using these described characteristics, a random sample of size n λ ζ is sampled from a population containing N λ ζ units in λ ζ t h stratum, where λ ζ = 1 , 2 , , λ β , and an adapted calibrated estimator is
y ¯ A R = λ ζ = 1 λ β Π λ ζ y ¯ B R λ ζ
subject to the constraints
λ ζ = 1 λ β Π λ ζ = λ ζ = 1 λ β P λ ζ ,
λ ζ = 1 λ β Π λ ζ x ¯ λ ζ = λ ζ = 1 λ β P λ ζ X ¯ λ ζ
The justification for the use of loss functions in the calibration approach, particularly discussed by Ref. [21], revolves around improving estimation accuracy by modifying the weights assigned to sampled data points. This process minimizes a distance measure between base sampling weights and the post-calibration weights while adhering to calibration constraints. So, a Lagrange function (LF) formulated by including the multipliers ( η 1 ( m ) , η 2 ( m ) ) and a Chi-square loss function:
A ( m 1 ) = λ ζ = 1 λ β ( Π λ ζ P λ ζ ) 2 Q ^ λ ζ P λ ζ 2 η 1 ( m ) λ ζ = 1 λ β Π λ ζ λ ζ = 1 λ β P λ ζ 2 η 2 ( m ) λ ζ = 1 λ β Π λ ζ x ¯ λ ζ λ ζ = 1 λ β P λ ζ X ¯ λ ζ .
The derivative of A ( m 1 ) w.r.t Π λ ζ , by equating zero, provides
Π λ ζ = P λ ζ + Q ^ λ ζ P λ ζ η 2 ( m ) x ¯ λ ζ + η 1 ( m ) ,
Calibrated weights ( Π λ ζ ) have several significant characteristics including minimum variance reducing bias, and being consistent with the auxiliary information available. The objectives of designing them are to facilitate matching the weighted totals of auxiliary variables in the sample to known population totals so that survey estimates are more accurate. However, calibrated weights are not always guaranteed to be positive. Negative weights may arise, particularly when there are significant differences between the sample and population characteristics or when a specific loss functions are used in the calibration process. However, the problem of negative weights is minimal when using the Chi-square distance in contrast to other distance functions. Because it penalizes extreme deviations more severely, the Chi square distance attempts to keep the weights close to their initial value. Its performance is enhanced by smoother adjustments and decreased likelihood of extreme or negative weights, increasing the calibration stability and relevance.
By putting (6) in (3) and (4), we achieve the following:
η 1 ( m ) = λ ζ = 1 λ β P λ ζ X ¯ λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 ,
η 2 ( m ) = λ ζ = 1 λ β P λ ζ X ¯ λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 .
By substituting η 1 ( m ) and η 2 ( m ) in (6), we receive the following:
Π λ ζ = P λ ζ + Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β P λ ζ X ¯ λ ζ x ¯ λ ζ ,
By putting Π λ ζ in y ¯ A R , we receive its final form:
y ¯ A R = λ ζ = 1 λ β P λ ζ y ¯ B R λ ζ + λ ζ = 1 λ β Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ y ¯ B R λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ y ¯ B R λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β P λ ζ X ¯ λ ζ x ¯ λ ζ .
This estimator can be rewritten as
y ¯ A R = y ¯ s t ( m ) + R 1 λ ζ = 1 λ β P λ ζ ( X ¯ λ ζ x ¯ λ ζ ) ,
where
R 1 = λ ζ = 1 λ β Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ y ¯ B R λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ y ¯ B R λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 .
Note that the adapted estimator y ¯ A R can be converted into the generalized version by choosing different values of Q ^ λ ζ . However, for reader simplicity, we currently only use Q ^ λ ζ = 1 . However, different values of known population characteristics of Q ^ λ ζ can be used and the shape of the estimator can be changed accordingly, see Garg and Pachori [26], Pal et al. [27], and Pandy et al. [28].

4. Proposed Estimator

In a model-based approach, the base is built on the so-called superpopulation models ξ ; subsequently, it is assumed that the studied population is an example of the random variables generated under ξ . This model, ξ , takes advantage of information in the population and allows calculation of values that were not directly sampled, especially when determining finite population values such as the mean of Y. Key advantages of this approach include the following:
1.
Model-based theory, also known as prediction theory for survey sampling, is a well rounded theoretical framework for making statistical inferences about finite populations. In this general framework, widely recognized estimators of population parameters can be emerge as predictors under specific models.
2.
This framework is still in line with dominant statistical approaches found in fields like econometrics.
3.
Other fundamental advantages include that for large samples, and under certain distributional assumptions, the model-based results are very close to the design-based inferences.
4.
The variance of the model-based estimators is usually lower than that of the design-based estimators.
The design of the proposed dual calibration structure is aimed at simultaneously depicting the central tendency and relative dispersion of the auxiliary information. The coefficient of variation (CV) constraint is also useful, specifically when the auxiliary variable is heteroscedastic or has considerable inter-strata variance, because this enables the adjusted weights to capture level and scale variance. When the relationship between the study and the auxiliary variables is not necessarily linear, as highlighted by Garg and Pachori [26] and Sinha et al. [25], it is useful to calibrate using both the mean and the CV to greatly reduce MSE. Hence, the CV-based adjustment serves as a scaling procedure that determines the magnification of the efficiency of the estimator by stabilizing the variance across strata.
  • So, taking motivation from Refs. [22,24,25,26], the proposed calibrated model-based mean estimator using StRS under non-parametric kernel regression based framework is
y ¯ P M = λ ζ = 1 λ β Π λ ζ y ¯ B R λ ζ ,
subject to the constraints
λ ζ = 1 λ β Π λ ζ x ¯ λ ζ = λ ζ = 1 λ β P λ ζ X ¯ λ ζ
λ ζ = 1 λ β Π λ ζ C ^ x λ ζ = λ ζ = 1 λ β P λ ζ C x λ ζ
λ ζ = 1 λ β Π λ ζ = λ ζ = 1 λ β P λ ζ ,
where ( C ^ x λ ζ , C x λ ζ ) is the sample and population CV of auxiliary variable X. An LF formulated by including the multipliers η 1 ( m ) , η 2 ( m ) and η 3 ( m ) :
A ( m 2 ) = λ ζ = 1 λ β ( Π λ ζ P λ ζ ) 2 Q ^ λ ζ P λ ζ 2 η 1 ( m ) λ ζ = 1 λ β Π λ ζ x ¯ λ ζ λ ζ = 1 λ β P λ ζ X ¯ λ ζ 2 η 2 ( m ) λ ζ = 1 λ β Π λ ζ C ^ x λ ζ λ ζ = 1 λ β P λ ζ C x λ ζ 2 η 3 ( m ) λ ζ = 1 λ β Π λ ζ λ ζ = 1 λ β P λ ζ .
A derivative of A ( m 2 ) w.r.t Π λ ζ , by equating zero, provides
Π λ ζ = P λ ζ + Q ^ λ ζ P λ ζ η 1 ( m ) x ¯ λ ζ + η 2 ( m ) C ^ x λ ζ + η 3 ( m ) ,
By substituting (18) in (14), (15), and (16), respectively, we receive the following:
G 1 ( 3 × 3 ) η 1 ( 3 × 1 ) = F 1 ( 3 × 1 ) ,
where
η 1 ( 3 × 1 ) = η 1 ( m ) η 2 ( m ) η 3 ( m ) ,
F 1 ( 3 × 1 ) = λ ζ = 1 λ β P λ ζ X ¯ λ ζ x ¯ λ ζ λ ζ = 1 λ β P λ ζ C x λ ζ C ^ x λ ζ 0 ,
G 1 ( 3 × 3 ) = λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ 2 x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ .
By solving Equation (19), we receive
η 1 ( m ) = D 71 ( m ) H 1 , η 2 ( m ) = D 72 ( m ) H 1 , η 3 ( m ) = D 73 ( m ) H 1 ,
where D 71 ( m ) , D 72 ( m ) , D 73 ( m ) , and H 1 can be found in the Appendix.
BY substituting these values in (18) and (13), we achieve
y ¯ P M = y ¯ s t ( m ) + η 1 ( m ) λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ y ¯ B R λ ζ + η 2 ( m ) λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ y ¯ B R λ ζ + η 3 ( m ) λ ζ = 1 λ β Q ^ λ ζ P λ ζ y ¯ B R λ ζ ,
= λ ζ = 1 λ β P λ ζ y ¯ B R λ ζ + R 2 λ ζ = 1 λ β P λ ζ X ¯ λ ζ x ¯ λ ζ + R 3 λ ζ = 1 λ β P λ ζ C x λ ζ C ^ x λ ζ ,
where
R 2 = D 74 ( m ) H 1 , R 3 = D 75 ( m ) H 1 ,
where
D 74 ( m ) = λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ y ¯ B R λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ 2 x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ y ¯ B R λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ y ¯ B R λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ + λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ y ¯ B R λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ + λ ζ = 1 λ β Q ^ λ ζ P λ ζ y ¯ B R λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ y ¯ B R λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ 2 x λ ζ ,
D 75 ( m ) = λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ y ¯ B R λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ y ¯ B R λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ x ¯ λ ζ + λ ζ = 1 λ β Q ^ λ ζ P λ ζ y ¯ B R λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ y ¯ B R λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ + λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ y ¯ B R λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ y ¯ B R λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 .

Properties of y ¯ P M

This proposed estimator has several practical properties:
  • First, the estimator is linear, as shown by its functional form, where the estimator incorporates both observed and predicted components of the population, making it a composite measure. The estimator uses a weight factor, say m, which is adjusted based on the influence of non-sampled units.
  • Second, the estimator is data-concentrated, requiring knowledge of x n values for all population elements and necessitating intensive computations.
  • Lastly, this estimator does not rely on the design-based probabilities, unlike conventional design-based estimators that typically incorporate these probabilities for inference. Instead, it substitutes them with new weights, say w, determined by the proximity of sample points, aiming to improve model predictiveness and precision. This adjustment aligns the inference process with the conditional principle, focusing on observed sample characteristics rather than average properties across possible samples.

5. Double StRS

Double sampling is especially useful when the population average of X is not known and must be estimated. This is a continuation of Section 5 focusing on a number of cases where this population mean is not available. The present research builds on references (Al-Omari [29]; Alomair and Daraz [30]; Daraz et al. [31]) where the type of sampling changes from a StRS to a two-stage (or double) StRS. In the first stage, the sampling technique used is SRS and in the second stage, the sampling technique used is the StRS. In this context, N a λ ζ is the sample size in the first phase for λ ζ t h stratum, and n a λ ζ is the sample size in the subsequent phase. Attributes of the auxiliary variable in the 1st phase are x ¯ a λ ζ and C ^ x a λ ζ and in the 2nd phase within the λ ζ t h stratum, both variables characteristics are x ¯ λ ζ , y ¯ B R a λ ζ and C ^ x λ ζ . The traditional model-based estimator for double StRS in the λ ζ t h stratum is
y ¯ B R a λ ζ = f a λ ζ y ¯ s a λ ζ + ( 1 f a λ ζ ) 1 N a λ ζ n a λ ζ j λ ζ = s ¯ a λ ζ m ^ j a λ ζ
For all the strata, y ¯ B R a λ ζ can be written as
y ¯ B R t = λ ζ = 1 λ β P a λ ζ y ¯ B R a λ ζ .

5.1. Double Adapted Estimator

Th generalized class of estimators under double StRS is given below
y ¯ A R t = λ ζ = 1 λ β Π a λ ζ y ¯ B R a λ ζ ,
subject to the constraints
λ ζ = 1 λ β Π a λ ζ = λ ζ = 1 λ β P a λ ζ ,
λ ζ = 1 λ β Π a λ ζ x ¯ λ ζ = λ ζ = 1 λ β P a λ ζ X ¯ a λ ζ
An LF is formulated by including the multipliers η 1 ( m ) and η 2 ( m ) in the following:
A ( m 3 ) = λ ζ = 1 λ β ( Π a λ ζ P a λ ζ ) 2 Q ^ λ ζ P a λ ζ 2 η 1 ( m ) λ ζ = 1 λ β Π a λ ζ λ ζ = 1 λ β P a λ ζ 2 η 2 ( m ) λ ζ = 1 λ β Π a λ ζ x ¯ λ ζ λ ζ = 1 λ β P a λ ζ X ¯ a λ ζ .
A derivative of A ( m 3 ) w.r.t Π λ ζ , by equating zero, provides
Π a λ ζ = P a λ ζ + Q ^ λ ζ P a λ ζ η 2 ( m ) x ¯ λ ζ + η 1 ( m ) ,
and by putting (28) in (25) and (26), we achieve
η 1 ( m ) = λ ζ = 1 λ β P a λ ζ X ¯ a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 ,
η 2 ( m ) = λ ζ = 1 λ β P a λ ζ X ¯ a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 .
By substituting η 1 ( m ) and η 2 ( m ) in (28), we receive
Π a λ ζ = P a λ ζ + Q ^ λ ζ P a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β P a λ ζ X ¯ a λ ζ x ¯ λ ζ ,
By putting Π a λ ζ in y ¯ A R t , we achieve
y ¯ A R t = y ¯ B R t + λ ζ = 1 λ β Q ^ λ ζ P a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ y ¯ B R a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ y ¯ B R a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β P a λ ζ X ¯ a λ ζ x ¯ λ ζ .
This estimator can be rewritten as
y ¯ A R t = y ¯ B R t + R 4 λ ζ = 1 λ β P a λ ζ ( X ¯ a λ ζ x ¯ λ ζ ) ,
where
R 4 = λ ζ = 1 λ β Q ^ λ ζ P a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ y ¯ B R a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ y ¯ B R a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 .

5.2. Double Proposed Estimator

The proposed estimator under double StRS as given below:
y ¯ P M t = λ ζ = 1 λ β Π a λ ζ y ¯ B R a λ ζ ,
subject to the constraints
λ ζ = 1 λ β Π a λ ζ x ¯ λ ζ = λ ζ = 1 λ β P a λ ζ X ¯ a λ ζ
λ ζ = 1 λ β Π a λ ζ C ^ x λ ζ = λ ζ = 1 λ β P a λ ζ C ^ x a λ ζ
λ ζ = 1 λ β Π a λ ζ = λ ζ = 1 λ β P a λ ζ ,
An LF formulated by including the multipliers η 1 ( m ) , η 2 ( m ) and η 3 ( m ) :
A ( m 4 ) = λ ζ = 1 λ β ( Π a λ ζ P a λ ζ ) 2 Q ^ λ ζ P a λ ζ 2 η 1 ( m ) λ ζ = 1 λ β Π a λ ζ x ¯ λ ζ λ ζ = 1 λ β P a λ ζ X ¯ a λ ζ 2 η 2 ( m ) λ ζ = 1 λ β Π a λ ζ C ^ x λ ζ λ ζ = 1 λ β P a λ ζ C ^ x a λ ζ 2 η 3 ( m ) λ ζ = 1 λ β Π a λ ζ λ ζ = 1 λ β P a λ ζ .
A derivative of A ( m 4 ) w.r.t Π λ ζ , by equating zero, provides
Π a λ ζ = P a λ ζ + Q ^ λ ζ P a λ ζ η 1 ( m ) x ¯ λ ζ + η 2 ( m ) C ^ x λ ζ + η 3 ( m ) ,
By substituting (40) in (36), (37), and (38), respectively, we receive
G 2 ( 3 × 3 ) η 2 ( 3 × 1 ) = F 2 ( 3 × 1 ) ,
where
η 2 ( 3 × 1 ) = η 1 ( m ) η 2 ( m ) η 3 ( m ) ,
F 2 ( 3 × 1 ) = λ ζ = 1 λ β P a λ ζ X ¯ a λ ζ x ¯ λ ζ λ ζ = 1 λ β P a λ ζ C ^ x a λ ζ C ^ x λ ζ 0 ,
G 2 ( 3 × 3 ) = λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ 2 x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ .
By solving Equation (41), we receive
η 1 ( m ) = D 81 ( m ) H 2 , η 2 ( m ) = D 82 ( m ) H 2 , η 3 ( m ) = D 83 ( m ) H 2 ,
where D 81 ( m ) , D 82 ( m ) , D 33 ( m ) , and H 2 can be found in Appendix A.
Substituting these values in (40) and (35), we have
y ¯ P M t = y ¯ B R t + η 1 ( m ) λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ y ¯ B R a λ ζ + η 2 ( m ) λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ y ¯ B R a λ ζ + η 3 ( m ) λ ζ = 1 λ β Q ^ λ ζ P a λ ζ y ¯ B R a λ ζ ,
= y ¯ B R t + R 5 λ ζ = 1 λ β P a λ ζ X ¯ a λ ζ x ¯ λ ζ + R 6 λ ζ = 1 λ β P a λ ζ C ^ x a λ ζ C ^ x λ ζ ,
where
R 5 = D 84 ( m ) H 2 , R 6 = D 85 ( m ) H 2 ,
where
D 84 ( m ) = λ ζ = 1 λ β Q ^ λ ζ P a λ ζ y ¯ B R a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ y ¯ B R a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ 2 x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ y ¯ B R a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ + λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ y ¯ B R a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ + λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ y ¯ B R a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ 2 x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ y ¯ B R a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ 2 ,
D 85 ( m ) = λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ y ¯ B R a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ y ¯ B R a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ + λ ζ = 1 λ β Q ^ λ ζ P a λ ζ y ¯ B R a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ y ¯ B R a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ + λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ y ¯ B R a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ y ¯ B R a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 .
Note that just like the adapted estimator, someone can also generate many other versions of the proposed estimator in light of the references (Garg and Pachori [26]; Pal et al. [27]; Pandey et al. [28]), and by choosing different values of Q ^ λ ζ .
Mean estimation techniques are statistically important in fisheries and aquaculture because trends of several important indices of fish stocks including size and growth are important for determination of right harvesting periods and sustainable fishing quotas. The fish market dataset provides an environment to compare the performance of various estimation methods in a real context where modeling of the fish characteristics has direct bearing to the economic returns of fish stocks and therefore viability of the fisheries to support the food needs of the populace. While the point estimates for such stocks based on customary assessments can be useful, applying the calibration approach suggested in this paper, which involves auxiliary information for such datasets, can improve the accuracy of these estimates for the purpose of managing fisheries and aquatic resources by both commercial fishers and regulatory bodies.

6. Numerical Illustration

6.1. Bandwidth Selectors

It is worth noting that ( y ¯ B R and y ¯ B R t ) and all the other considered estimators are based on the bandwidth parameter h, which plays a critical role in balancing the bias–variance tradeoff of local polynomial regression. To ensure reliability and consistency of results, we examined the estimator performance under several bandwidth selection strategies. Specifically, bandwidths were chosen using (i) a fixed bandwidth approach, (ii) the direct plug-in ( d p i k ) method described by Wand and Jones [32], and (iii) two data-driven cross-validation approaches, the Biased ( b c v ) and Unbiased ( u c v ) methods, as proposed by Scott and Terrell [33]. These methods are known to provide asymptotically consistent bandwidth choices in large samples. Evaluating results across these approaches helped us to assess the effectiveness of estimators ( y ¯ P M , y ¯ P M t , y ¯ B R , y ¯ A R , y ¯ B R t , and y ¯ A R t ) under varying bandwidth selection strategies.

6.2. Simulation Experiments

This section’s simulation experiments aim to evaluate the effectiveness and efficiency of the estimators y ¯ P M and y ¯ P M t in comparison to y ¯ B R , y ¯ A R , y ¯ B R t , and y ¯ A R t . For this purpose, three simulated datasets—Sine, Bump, and Jump—were created based on the following regression functions:
P a x = s i n ( 2 π x ) + κ
P b x = 1 + 2 ( x 0.5 ) + e x p ( 200 ( x 0.5 ) 2 ) + κ
P c x = 1 + 2 ( x 0.5 ) I x 0.65 + 0.65 I x > 0.65 + κ
where x is uniformly distributed with [0, 1] and error term κ is independently and identically distributed with zero mean unit standard deviation. For graphical representation, see Figure 1, Figure 2 and Figure 3.
We have considered two simulated populations based on generated datasets, see Equations (43)–(45), under StRS. In the first population, Bump data ( P b x , x ) is considered as stratum-I and Sine data ( P a x , x ) is considered as stratum-II. In the second population, Bump data ( P b x , x ) is considered as stratum-I and Jump data ( P c x , x ) is considered as stratum-II. In each stratum with a size of Ω = 1000 , inspired by the methodology of Koyuncu [22,24], we draw samples from the specified stratified populations. To enable a fair comparison between the estimators, we collect a range of samples under the StRS scheme. The percentage sizes of these various samples are detailed in Table 1, Table 2, Table 3 and Table 4. The R b = 5000 times repeated simulation-experiment-based MSE and PRE results under single and double StRS are provided in Table 1, Table 2, Table 3 and Table 4. For each sample, the numerical values of y ¯ P M , y ¯ P M t , y ¯ B R , y ¯ A R , y ¯ B R t , and y ¯ A R t were obtained. The expressions for MSEs and PREs are
MSE ( c ^ ( g 1 ) ) = k b = 1 R b ( c ^ ( g 1 ) μ b ) 2 / R b .
MSE ( c ^ ( g 2 ) ) = k b = 1 R b ( c ^ ( g 2 ) μ b ) 2 / R b .
PRE ( c ^ ( g 1 ) , y ¯ B R ) = MSE ( c ^ ( g 1 ) ) MSE ( y ¯ B R ) × 100 ,
PRE ( c ^ ( g 2 ) , y ¯ B R t ) = MSE ( c ^ ( g 2 ) ) MSE ( y ¯ B R t ) × 100 ,
where c ^ ( g 1 ) = y ¯ P M , y ¯ B R , y ¯ A R , and c ^ ( g 2 ) = y ¯ P M t , y ¯ B R t , y ¯ A R t .

6.3. Real-Life Applications Related to Fisheries and Radiations

The current type of datasets, such as the fish market dataset, are suitable for kernel-based linear polynomial regression (LPR), especially because of its flexibility in modeling complex associations between variables, which is a result of its non-parametric nature. This approach can also be used to analyze the datasets in which factors that influence the solar ultraviolet (UV) radiation are simulated, in order to perform more accurate risk assessment and environmental impact modeling. The utility of kernel regression to estimate the mean and other population parameters is based on the understanding of both fish characteristics and UV exposure risks that can be operationalized within the context of aquaculture, fisheries, and environmental science. These techniques are applied to show how some branches of advanced statistics can be used in many areas to create a theoretical framework that improves the accuracy of fisheries assessments and help aquaculture industry to achieve its sustainability, as well as for ecological monitoring. The combination of fisheries and UV radiation studies using predictive modeling enhances the need for data-driven decision making for resource management and the sustainability of the environment.
Currently, mean estimation based on kernel regression is a part of predictive calibration in fisheries science, as nonlinear dependencies of numerous variables are better modeled with non-parametric than with linear models. Specifically, these models are susceptible to unpredictability and changes in fish characteristics, e.g., weight or length, in a market where biological variability plays a significant role. It is also applicable to predictive modeling methods used on environmental factors which can change not only aquatic ecosystems but the entire ecological balance such as UV (ultraviolet) radiation of the sun. This paper uses the datasets that model the variables of UV radiations and fish market characteristics to analyse how non-parametric regression can be used to enhance the prediction in fisheries science and environmental risk management. In addition to fish stock assessment, these advanced estimation methods assist in the making of decisions in wider scope of aquaculture, fisheries management, and ecological monitoring.
The features of the fish market dataset are as follows: it is a detailed collection of various fish species and their characteristics, organized in a format suitable for analysis. Each row typically captures information about an individual fish and includes multiple physical attributes such as length, height, and width, corresponding to the species listed in that row. The data were obtained from the publicly available website (https://www.kaggle.com/datasets/vipullrathod/fish-market), accessed on 21 October 2024, and approval was not needed. As per the description provided about the data, this data set was collected to develop a predictive model for calculating a fish’s weight from its species and certain standardized measurements. This significant description point motivate us to use this data set as we are also interested in predictive model-based mean estimation.
To stratify the data, the natural grouping variable was the species type, stratum-I represents the height of the Bream species (study variable), and stratum-II represents the width of the Bream species (auxiliary variable). This type of stratification allows for the investigation of heterogeneity across meaningful biological categories.
Solar UV radiation dataset features: a collection of various environmental factors affecting the levels of ultraviolet (UV) radiation and structured for predictive analysis. The rows represent a unique set of atmospheric conditions, each containing multiple other influencing factors (temperature, humidity, ozone levels, and solar angles) and corresponds to specific UV risk levels. The data were obtained from the publicly available website Kaggle, accessed on 22 February 2025, and approval was not needed. As per the description provided about the dataset, it was collected to support the development of predictive models for assessing UV radiation risk based on meteorological and environmental attributes. This dataset is particularly useful for our research, as we are also interested in predictive model-based mean estimation.
Stratum-I was constructed based on low UV risk levels, and stratum-II on moderate UV risk levels, which are realistic environmental groupings that can be compared. The study variable in this context is solar radiation intensity, which plays a critical role in evaluating UV exposure and its potential impact.
Accordingly, both examples are illustrative and comparative, and these approaches align with the standard pseudo-population principle used in simulation-based survey methodology. The results of the fisheries and radiations data analysis are presented in Table 5, Table 6, Table 7 and Table 8. The auxiliary variable x is generated using uniform distribution [0, 1] as per following the guide lines of Rueda and Sanchez-Borrego [5], Younis and Shabbir [34], and Qureshi et al. [35]. The results of the fish market data analysis are presented in Table 5 and Table 6.

6.4. Interpretation

Table 1 and Table 2 illustrate the results where the basic regression estimator ( y ¯ B R ) and the adapted estimator ( y ¯ A R ) both underperform the predictive mean estimator ( y ¯ P M ) in terms of lowering their MSE values with respect to different sample sizes and bandwidth selectors. Secondly, the double StRS is slightly more accurate than the single-stage version StRS, as confirmed by smaller values of MSE for the double ( y ¯ B R t ), ( y ¯ A R t ), and ( y ¯ P M t ) estimators. Moreover, the bandwidth selection plays a critical role in the accuracy of the estimators, and especially data-driven bandwidth selectors (dpik, bcv, and ucv) always result in lower MSE than fixed ones ( h = ( 0.2 , 0.5 ) ) , and dpik performs the best. The same pattern of results can be seen in Table 3 and Table 4.
Table 5 and Table 6 present the Mean Squared Error (MSE) values obtained from the fisheries and radiations datasets under both stratified random sampling (StRS) and double stratified random sampling (double StRS). The estimators assessed include y ¯ B R , y ¯ A R , and y ¯ P M , along with their double stratified versions ( y ¯ B R t , y ¯ A R t , and y ¯ P M t ). The data was analyzed across different sample sizes ( n = 10 %   , 15 % , and 20 % ) and bandwidth selectors ( h = 0.2 , 0.5 , d p i k , b c v , u c v ). The results indicate that the predictive mean estimator ( y ¯ P M ) and its double StRS version ( y ¯ P M t ) outperform the basic and adapted regression estimators in terms of MSE reduction across all scenarios. The same pattern of results can be seen in Table 7 and Table 8.
Moreover, the sensitivity analysis was conducted in a detailed manner to determine the soundness of the proposed estimators in different experimental conditions. In particular, the analysis has taken into consideration the impact of bandwidth variation, i.e., changing the fixed bandwidth values ( 0.2 , and 0.5 ) , as well as the impact of sample size variation ( 10 % , 15 % , and 20 % ) . Computed values of PRE at varying sample sizes serve as an indicator of sensitivity, demonstrating the effect of variation of a data scale on the estimators. The comparative results showed that the MSE-based PRE variations were very small with either under- or over-smoothing to extreme, but the general efficiency patterns and the ranking of the proposed estimators were steady. All these findings are reassuring regarding the strength and stability of the proposed calibration-based framework across various data-driven bandwidth selectors and sample sizes.
As shown, the PRE values for all proposed estimators are above 100, demonstrating their enhanced performance compared to other estimators. Although this conclusion is drawn from our simulation study, we believe that similar outcomes would be likely in other scenarios as well.

7. Conclusions

The purpose of this work is to build a framework using a non-parametric model-based estimator, kernel regression, and calibration for average estimation. To improve the precision of the average with a given set of calibrated weights, the proposed method is based on StRS. This estimator was confirmed to be superior to traditional and model-based adapted estimators through simulations and real datasets from the fish market and solar UV radiation datasets. The fish market dataset in the field of aquaculture and fisheries was used, where fish characteristics estimation is critical for stock control and market evaluation, and the results of this method were shown to be effective. Similarly, when applied to environmental science on the solar UV radiation dataset, the method showed applicability in the modeling and the assessment of solar radiation intensity and UV risk levels. It has been demonstrated that local polynomial kernel regression and calibration, utilizing the known or estimated calibration function, enhance the estimation efficiency. This improvement is further enhanced when auxiliary variables are included. The relative efficiency of predictions showed that the proposed methodology could be used not only in fisheries science, but also in the case of prediction of the UV radiation risk, making it a valuable technique. The kernel regression-based predictive calibration method is flexible, robust, and may be used for non-parametric mean estimation in survey sampling, resource management, and environmental risk assessment.

Author Contributions

Conceptualization, H.M.A. and M.M.A.; Methodology, H.M.A. and M.M.A.; Software, H.M.A. and M.M.A.; Validation, H.M.A.; Formal analysis, H.M.A.; Investigation, H.M.A.; Resources, H.M.A.; Data curation, H.M.A.; Writing—original draft, H.M.A. and M.M.A.; Writing—review & editing, H.M.A. and M.M.A.; Visualization, H.M.A. and M.M.A.; Supervision, H.M.A.; Project administration, H.M.A.; Funding acquisition, H.M.A. All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to the Deanship of Scientific Research and Libraries in Princess Nourah bint Abdulrahman University for funding this research work through the Program for Supporting Publication in Top-Impact Journals (Grant No. SPTIF-2025-6).

Data Availability Statement

All the relevant data information is available within the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

D 71 ( m ) = λ ζ = 1 λ β P λ ζ X ¯ λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ 2 x λ ζ λ ζ = 1 λ β P λ ζ X ¯ λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ 2 + λ ζ = 1 λ β P λ ζ C x λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ λ ζ = 1 λ β P λ ζ C x λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ x ¯ λ ζ ,
D 72 ( m ) = λ ζ = 1 λ β P λ ζ C x λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β P λ ζ C x λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β P λ ζ X ¯ λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ + λ ζ = 1 λ β P λ ζ X ¯ λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ ,
D 73 ( m ) = λ ζ = 1 λ β P λ ζ X ¯ λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ x ¯ λ ζ λ ζ = 1 λ β P λ ζ X ¯ λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ 2 x λ ζ + λ ζ = 1 λ β P λ ζ C x λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ C ^ x λ ζ λ ζ = 1 λ β P λ ζ C x λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ .
H 1 = λ ζ = 1 λ β Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ 2 x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ 2 x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ 2 + 2 λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P λ ζ x ¯ λ ζ C ^ x λ ζ .
D 81 ( m ) = λ ζ = 1 λ β P a λ ζ X ¯ a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ 2 x λ ζ λ ζ = 1 λ β P a λ ζ X ¯ a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ 2 + λ ζ = 1 λ β P a λ ζ C ^ x a λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ λ ζ = 1 λ β P a λ ζ C ^ x a λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ x ¯ λ ζ ,
D 82 ( m ) = λ ζ = 1 λ β P a λ ζ C ^ x a λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β P a λ ζ C ^ x a λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β P a λ ζ X ¯ a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ + λ ζ = 1 λ β P a λ ζ X ¯ a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ ,
D 83 ( m ) = λ ζ = 1 λ β P a λ ζ X ¯ a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ x ¯ λ ζ λ ζ = 1 λ β P a λ ζ X ¯ a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ 2 x λ ζ + λ ζ = 1 λ β P a λ ζ C ^ x a λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ C ^ x λ ζ λ ζ = 1 λ β P a λ ζ C ^ x a λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ .
H 2 = λ ζ = 1 λ β Q ^ λ ζ P a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ 2 x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ 2 x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ 2 λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ 2 + 2 λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ C ^ x λ ζ λ ζ = 1 λ β Q ^ λ ζ P a λ ζ x ¯ λ ζ C ^ x λ ζ .

References

  1. Wu, C.; Sitter, R.R. A model-calibration approach to using complete auxiliary information from survey data. J. Am. Stat. Assoc. 2001, 96, 185–193. [Google Scholar] [CrossRef]
  2. Dorfman, A.H. Nonparametric regression for estimating totals in finite populations. In Proceedings of the Section on Survey Research Methods; American Statistical Association: Alexandria, VA, USA, 1992; pp. 622–625. [Google Scholar]
  3. Dorfman, A.H.; Hall, P. Estimators of the finite population distribution function using nonparametric regression. Ann. Stat. 1993, 21, 1452–1475. [Google Scholar] [CrossRef]
  4. Nadaraya, E.A. On estimating regression. Theory Probab. Its Appl. 1964, 9, 141–142. [Google Scholar] [CrossRef]
  5. Rueda, M.; Sanchez-Borrego, I.R. A predictive estimator of finite population mean using nonparametric regression. Comput. Stat. 2009, 24, 1–14. [Google Scholar] [CrossRef]
  6. Zaman, T.; Iftikhar, S. A New Logarithmic ratio type estimator of population mean for simple random sampling: A simulation study. J. Sci. Arts 2023, 23, 839–848. [Google Scholar] [CrossRef]
  7. Alqudah, M.A.; Zayed, M.; Subzar, M.; Wani, S.A. Neutrosophic robust ratio type estimator for estimating finite population mean. Heliyon 2024, 10, e28934. [Google Scholar] [CrossRef]
  8. Bhushan, S.; Kumar, A.; Alrumayh, A.; Khogeer, H.A.; Onyango, R. Evaluating the performance of memory type logarithmic estimators using simple random sampling. PLoS ONE 2022, 17, e0278264. [Google Scholar] [CrossRef]
  9. Koc, T.; Koc, H. A new class of quantile regression ratio-type estimators for finite population mean in stratified random sampling. Axioms 2023, 12, 713. [Google Scholar] [CrossRef]
  10. Srivastava, S.K. Predictive estimation of finite population mean using product estimator. Metrika 1983, 30, 93–99. [Google Scholar] [CrossRef]
  11. Alomair, A.M.; Shahzad, U.; Al-Noor, N.H.; Zhu, H. Probability weighted moments and family of non-parametric regression estimators. Maejo Int. J. Sci. Technol. 2025, 19, 160–170. [Google Scholar]
  12. Chambers, R.L.; Dorfman, A.H.; Wehrly, T.E. Bias robust estimation in finite populations using nonparametric calibration. J. Am. Stat. Assoc. 1993, 88, 268–277. [Google Scholar] [CrossRef]
  13. Breidt, F.J.; Opsomer, J.D. Local polynomial regression estimators in survey sampling. Ann. Stat. 2000, 28, 1026–1053. [Google Scholar] [CrossRef]
  14. Shahzad, U.; Ahmad, I.; Almanjahie, I.M.; Al–Noor, N.H.; Hanif, M. Adaptive Nadaraya-Watson kernel regression estimators utilizing some non-traditional and robust measures: A numerical application of British food data. Hacet. J. Math. Stat. 2023, 52, 1425–1437. [Google Scholar] [CrossRef]
  15. Ali, T.H. Modification of the adaptive Nadaraya-Watson kernel method for nonparametric regression (simulation study). Commun. Stat. Simul. Comput. 2022, 51, 391–403. [Google Scholar] [CrossRef]
  16. Ali, T.H.; Hayawi, H.A.A.M.; Botani, D.S.I. Estimation of the bandwidth parameter in Nadaraya-Watson kernel non-parametric regression based on universal threshold level. Commun. Stat. Simul. Comput. 2023, 52, 1476–1489. [Google Scholar] [CrossRef]
  17. Shahzad, U.; Ahmad, I.; Alshahrani, F.; Almanjahie, I.M.; Iftikhar, S. Calibration-based mean estimators under stratified median ranked set sampling. Mathematics 2023, 11, 1825. [Google Scholar] [CrossRef]
  18. Koc, H.; Tanis, C.; Zaman, T. Poisson regression-ratio estimators of the population mean under double sampling, with application to COVID-19. Math. Popul. Stud. 2022, 29, 226–240. [Google Scholar] [CrossRef]
  19. Smits, J. Social closure among the higher educated: Trends in educational homogamy in 55 countries. Soc. Sci. Res. 2003, 32, 251–277. [Google Scholar] [CrossRef]
  20. Boehm, J.K.; Kubzansky, L.D. The heart’s content: The association between positive psychological well-being and cardiovascular health. Psychol. Bull. 2012, 138, 655. [Google Scholar] [CrossRef]
  21. Deville, J.C.; Sarndal, C.E. Calibration estimators in survey sampling. J. Am. Stat. Assoc. 1992, 87, 376–382. [Google Scholar] [CrossRef]
  22. Koyuncu, N. New difference-cum-ratio and exponential type estimators in median ranked set sampling. Hacet. J. Math. Stat. 2016, 45, 207–225. [Google Scholar] [CrossRef]
  23. Singh, S.; Horn, S.; Yu, F. Estimation variance of general regression estimator: Higher level calibration approach. Surv. Methodol. 1998, 48, 41–50. [Google Scholar]
  24. Koyuncu, N. Calibration estimator of population mean under stratified ranked set sampling design. Commun. Stat. Theory Methods 2018, 47, 5845–5853. [Google Scholar] [CrossRef]
  25. Sinha, N.; Sisodia, B.V.S.; Singh, S.; Singh, S.K. Calibration approach estimation of the mean in stratified sampling and stratified double sampling. Commun. Stat. Theory Methods 2017, 46, 4932–4942. [Google Scholar]
  26. Garg, N.; Pachori, M. Use of coefficient of variation in calibration estimation of population mean in stratified sampling. Commun. Stat. Theory Methods 2019, 49, 5842–5852. [Google Scholar] [CrossRef]
  27. Pal, A.; Varshney, R.; Yadav, S.K.; Zaman, T. Improved memory-type ratio estimator for population mean in stratified random sampling under linear and non-linear cost functions. Soft Comput. 2024, 28, 7739–7754. [Google Scholar] [CrossRef]
  28. Pandey, M.K.; Singh, G.N.; Zaman, T.; Al Mutairi, A.; Mustafa, M.S. Improved estimation of population variance in stratified successive sampling using calibrated weights under non-response. Heliyon 2024, 10, e27738. [Google Scholar] [CrossRef]
  29. Al-Omari, A.I. Ratio estimation of the population mean using auxiliary information in simple random sampling and median ranked set sampling. Stat. Probab. Lett. 2012, 82, 1883–1890. [Google Scholar] [CrossRef]
  30. Alomair, M.A.; Daraz, U. Dual Transformation of Auxiliary Variables by Using Outliers in Stratified Random Sampling. Mathematics 2024, 12, 2839. [Google Scholar] [CrossRef]
  31. Daraz, U.; Alomair, M.A.; Albalawi, O.; Al Naim, A.S. New techniques for estimating finite population variance using ranks of Auxiliary Variable in Two-Stage Sampling. Mathematics 2024, 12, 2741. [Google Scholar] [CrossRef]
  32. Wand, M.P.; Jones, M.C. Kernel Smoothing; Chapman and Hall: London, UK, 1995. [Google Scholar]
  33. Scott, D.W.; Terrell, G.R. Biased and unbiased cross-validation in density estimation. J. Am. Stat. Assoc. 1987, 82, 1131–1146. [Google Scholar] [CrossRef]
  34. Younis, F.; Shabbir, J. Estimation of general parameters under stratified adaptive cluster sampling based on dual use of auxiliary information. Sci. Iran. 2021, 28, 1780–1801. [Google Scholar] [CrossRef]
  35. Qureshi, M.N.; Khalil, S.; Hanif, M. Joint influence of exponential ratio and exponential product estimator for the estimation clustered population mean in adaptive cluster sampling. Adv. Appl. Stat. 2018, 53, 13–28. [Google Scholar] [CrossRef]
Figure 1. Sine dataset.
Figure 1. Sine dataset.
Mathematics 13 03622 g001
Figure 2. Bump dataset.
Figure 2. Bump dataset.
Mathematics 13 03622 g002
Figure 3. Jump dataset.
Figure 3. Jump dataset.
Mathematics 13 03622 g003
Table 1. MSE using ( P a x , x ) data.
Table 1. MSE using ( P a x , x ) data.
( n , h ) MSE StRSMSE Double StRS
y ¯ BR y ¯ AR y ¯ PM y ¯ BRt y ¯ ARt y ¯ PMt
n = 10 %
h = 0.2 0.00026770.00026170.00025990.00026750.00026150.0002596
h = 0.5 0.00026800.00026200.00026020.00026770.00026170.0002598
h = d p i k 0.00026800.00026290.00026020.00026770.00026250.0002598
h = b c v 0.00026800.00026290.00026030.00026770.00026260.0002599
h = u c v 0.00026780.00026270.00026000.00026760.00026250.0002598
n = 15 %
h = 0.2 0.00038140.00037540.00036990.00037650.00037050.0003653
h = 0.5 0.00038160.00037560.00037000.00037670.00037070.0003655
h = d p i k 0.00038170.00037660.00037020.00037660.00037150.0003655
h = b c v 0.00038160.00037650.00037010.00037670.00037160.0003656
h = u c v 0.00038140.00037630.00036990.00037650.00037140.0003654
n = 20 %
h = 0.2 0.00049260.00048660.00047790.00048460.00047860.0004705
h = 0.5 0.00049280.00048680.00047810.00048490.00047890.0004708
h = d p i k 0.00049280.00048770.00047810.00048480.00047970.0004707
h = b c v 0.00049290.00048770.00047820.00048500.00047980.0004708
h = u c v 0.00049270.00048750.00047800.00048470.00047960.0004706
Table 2. MSE using ( P b x , x ) data.
Table 2. MSE using ( P b x , x ) data.
( n , h ) MSE StRSMSE Double StRS
y ¯ BR y ¯ AR y ¯ PM y ¯ BRt y ¯ ARt y ¯ PMt
n = 10 %
h = 0.2 0.00026770.00026170.00025990.00026750.00026150.0002596
h = 0.5 0.00026800.00026200.00026020.00026770.00026170.0002598
h = d p i k 0.00026800.00026290.00026020.00026770.00026250.0002598
h = b c v 0.00026800.00026290.00026030.00026770.00026260.0002599
h = u c v 0.00026780.00026270.00026000.00026760.00026250.0002598
n = 15 %
h = 0.2 0.00038140.00037540.00036990.00037650.00037050.0003653
h = 0.5 0.00038160.00037560.00037000.00037670.00037070.0003655
h = d p i k 0.00038170.00037660.00037020.00037660.00037150.0003655
h = b c v 0.00038160.00037650.00037010.00037670.00037160.0003656
h = u c v 0.00038140.00037630.00036990.00037650.00037140.0003654
n = 20 %
h = 0.2 0.00049260.00048660.00047790.00048460.00047860.0004705
h = 0.5 0.00049280.00048680.00047810.00048490.00047890.0004708
h = d p i k 0.00049280.00048770.00047810.00048480.00047970.0004707
h = b c v 0.00049290.00048770.00047820.00048500.00047980.0004708
h = u c v 0.00049270.00048750.00047800.00048470.00047960.0004706
Table 3. PRE using ( P a x , x ) data.
Table 3. PRE using ( P a x , x ) data.
( n , h ) PRE StRSPRE Double StRS
y ¯ BR y ¯ AR y ¯ PM y ¯ BRt y ¯ ARt y ¯ PMt
n = 10 %
h = 0.2 100102.2885102.9805100102.2904103.0119
h = 0.5 100102.2858102.9910100102.2885103.0211
h = d p i k 100101.9531102.9804100101.9556103.0121
h = b c v 100101.9528102.9904100101.9552103.0205
h = u c v 100101.9546102.9867100101.9561103.0171
n = 15 %
h = 0.2 100101.5953103.1133100101.6166103.0446
h = 0.5 100101.5946103.1185100101.6156103.0496
h = d p i k 100101.3634103.1154100101.3820103.0465
h = b c v 100101.3637103.1173100101.3816103.0485
h = u c v 100101.3644103.1128100101.3823103.0435
n = 20 %
h = 0.2 100101.2309103.0747100101.2513102.9994
h = 0.5 100101.2302103.0767100101.2505103.0019
h = d p i k 100101.0527103.0752100101.0702102.9991
h = b c v 100101.0526103.0759100101.0699103.0010
h = u c v 100101.0531103.0732100101.0706102.9975
Table 4. PRE using ( P b x , x ) data.
Table 4. PRE using ( P b x , x ) data.
( n , h ) PRE StRSPRE Double StRS
y ¯ BR y ¯ AR y ¯ PM y ¯ BRt y ¯ ARt y ¯ PMt
n = 10 %
h = 0.2 100102.2885102.9805100102.2904103.0119
h = 0.5 100102.2858102.9910100102.2885103.0211
h = d p i k 100101.9531102.9804100101.9556103.0121
h = b c v 100101.9528102.9904100101.9552103.0205
h = u c v 100101.9546102.9867100101.9561103.0171
n = 15 %
h = 0.2 100101.5953103.1133100101.6166103.0446
h = 0.5 100101.5946103.1185100101.6156103.0496
h = d p i k 100101.3634103.1154100101.3820103.0465
h = b c v 100101.3637103.1173100101.3816103.0485
h = u c v 100101.3644103.1128100101.3823103.0435
n = 20 %
h = 0.2 100101.2309103.0747100101.2513102.9994
h = 0.5 100101.2302103.0767100101.2505103.0019
h = d p i k 100101.0527103.0752100101.0702102.9991
h = b c v 100101.0526103.0759100101.0699103.0010
h = u c v 100101.0531103.0732100101.0706102.9975
Table 5. MSE using fish data.
Table 5. MSE using fish data.
( n , h ) MSE StRSMSE Double StRS
y ¯ BR y ¯ AR y ¯ PM y ¯ BRt y ¯ ARt y ¯ PMt
n = 10 %
h = 0.2 0.0062260.0062200.0062160.0062010.0061950.006186
h = 0.5 0.0062440.0062380.0062360.0062260.0062200.006203
h = d p i k 0.0058060.0058040.0058030.0057790.0057770.005761
h = b c v 0.0058180.0058130.0058100.0057980.0057930.005775
h = u c v 0.0058170.0058120.0058090.0057980.0057930.005775
n = 15 %
h = 0.2 0.0081330.0081270.0080560.0081580.0081520.008086
h = 0.5 0.0081630.0081630.0080780.0081960.0081960.008116
h = d p i k 0.0077660.0077610.0076870.0077900.0077850.007716
h = b c v 0.0077820.0077770.0077010.0078130.0078080.007735
h = u c v 0.0077790.0077730.0076970.0078090.0078040.007731
n = 20 %
h = 0.2 0.0098050.0097990.0096510.0098900.0098840.009725
h = 0.5 0.0098490.0098430.0096860.0099360.0099300.009764
h = d p i k 0.0094740.0094690.0093190.0095590.0095540.009395
h = b c v 0.0094950.0094900.0093360.0095850.0095800.009417
h = u c v 0.0094930.0094880.0093320.0095810.0095750.009411
Table 6. MSE using radiations data.
Table 6. MSE using radiations data.
( n , h ) MSE StRSMSE Double StRS
y ¯ BR y ¯ AR y ¯ PM y ¯ BRt y ¯ ARt y ¯ PMt
n = 10 %
h = 0.2 23.2822.7322.7323.2822.7322.73
h = 0.5 23.4722.9122.9223.4722.9122.91
h = d p i k 21.8621.6421.6521.8621.6421.65
h = b c v 22.0321.5121.5222.0321.5121.52
h = u c v 22.0121.4921.4922.0121.4921.49
n = 15 %
h = 0.2 30.7930.2330.2330.7930.2330.23
h = 0.5 30.9430.3830.3830.9430.3830.38
h = d p i k 29.5529.3329.3329.5529.3329.33
h = b c v 29.6529.1329.1429.6529.1329.13
h = u c v 29.6629.1429.1429.6629.1429.14
n = 20 %
h = 0.2 37.0936.5336.5337.0936.5336.53
h = 0.5 37.4336.8736.8837.4336.8736.88
h = d p i k 36.0435.8235.8336.0435.8235.82
h = b c v 36.3135.7935.8036.3135.7935.75
h = u c v 36.3135.7935.7936.3135.7935.78
Table 7. PRE using fish data.
Table 7. PRE using fish data.
( n , h ) PRE StRSPRE Double StRS
y ¯ BR y ¯ AR y ¯ PM y ¯ BRt y ¯ ARt y ¯ PMt
n = 10 %
h = 0.2 100100.0961100.1609100100.0965100.2465
h = 0.5 100100.0959100.1153100100.0961100.3703
h = d p i k 100100.0362100.0615100100.0364100.3166
h = b c v 100100.0882100.1399100100.0885100.3920
h = u c v 100100.0882100.1402100100.0885100.3893
n = 15 %
h = 0.2 100100.0737100.9513100100.0735100.8834
h = 0.5 100100.0011101.0455100100.0011100.9898
h = d p i k 100100.0662101.0197100100.0659100.9696
h = b c v 100100.0660101.0516100100.0658101.0061
h = u c v 100100.0660101.0606100100.0658101.0204
n = 20 %
h = 0.2 100100.0611101.5926100100.0606101.6978
h = 0.5 100100.0608101.6797100100.0603101.7562
h = d p i k 100100.0542101.6595100100.0537101.7498
h = b c v 100100.0541101.7065100100.0536101.7814
h = u c v 100100.0541101.7256100100.0536101.8017
Table 8. PRE using radiations data.
Table 8. PRE using radiations data.
( n , h ) PRE StRSPRE Double StRS
y ¯ BR y ¯ AR y ¯ PM y ¯ BRt y ¯ ARt y ¯ PMt
n = 10 %
h = 0.2 100102.4638102.4593100102.4638102.4629
h = 0.5 100102.4434102.4390100102.4426102.4426
h = d p i k 100100.9952100.9854100101.0003100.9877
h = b c v 100102.4146102.3694100102.4170102.3751
h = u c v 100102.4143102.4352100102.4166102.4381
n = 15 %
h = 0.2 100101.8524101.8490100101.8538101.8490
h = 0.5 100101.8430101.8396100101.8447101.8430
h = d p i k 100100.7344100.7272100100.7368100.7292
h = b c v 100101.7830101.7498100101.7844101.7694
h = u c v 100101.7806101.7960100101.7834101.7981
n = 20 %
h = 0.2 100101.5326101.5299100101.5346101.5315
h = 0.5 100101.5184101.5156100101.5198101.5176
h = d p i k 100100.6013100.5954100100.6027100.5965
h = b c v 100101.4512101.4243100101.4526101.5672
h = u c v 100101.4499101.4623100101.4521101.4638
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alshanbari, H.M.; Anas, M.M. Prospective Inference of Central Tendency Through Data-Adaptive Mechanisms. Mathematics 2025, 13, 3622. https://doi.org/10.3390/math13223622

AMA Style

Alshanbari HM, Anas MM. Prospective Inference of Central Tendency Through Data-Adaptive Mechanisms. Mathematics. 2025; 13(22):3622. https://doi.org/10.3390/math13223622

Chicago/Turabian Style

Alshanbari, Huda M., and Malik Muhammad Anas. 2025. "Prospective Inference of Central Tendency Through Data-Adaptive Mechanisms" Mathematics 13, no. 22: 3622. https://doi.org/10.3390/math13223622

APA Style

Alshanbari, H. M., & Anas, M. M. (2025). Prospective Inference of Central Tendency Through Data-Adaptive Mechanisms. Mathematics, 13(22), 3622. https://doi.org/10.3390/math13223622

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop