Abstract
Adaptive cluster sampling is a methodology designed for data collection in contexts where the population is rare and spatially clustered. This approach has been effectively applied in various disciplines, including epidemiology and resource management. The present study introduces novel estimators that incorporate auxiliary variable information to improve estimation efficiency. These estimators were developed using the jackknife resampling technique to improve the performance of ratio-type estimators. Theoretical properties, including bias and mean square error (MSE), were derived, and a simulation study was conducted to validate the theoretical findings. The results demonstrated that the proposed estimators consistently outperformed conventional estimators that do not utilize auxiliary variables across all network sample sizes. Furthermore, in several scenarios, the proposed estimators also exhibited superior efficiency to existing ratio estimators that do incorporate auxiliary information.
MSC:
62D05; 62F40
1. Introduction
Sampling methods are employed when it is impractical to collect data from an entire population. To ensure the validity of statistical inferences, it is essential that the sample be obtained through probability-based sampling techniques. One of the most widely used probability sampling methods is simple random sampling. The population mean for a variable of interest is typically estimated using the sample mean, which is known to be an unbiased estimator. A central objective in sampling theory is to enhance the efficiency of such estimators.
In many cases, the estimation efficiency can be enhanced by utilizing an auxiliary variable that is correlated with the variable of interest. When both variables exhibit a high positive correlation, the ratio estimator, which incorporates the auxiliary variable’s mean, is widely adopted. Numerous studies have proposed improvements to the ratio estimator in the context of simple random sampling. For example, Sisodia and Dwivedi [1] introduced a modified ratio estimator based on the coefficient of variation of the auxiliary variable. Singh and Tailor [2] proposed an estimator that incorporates the population correlation coefficient between the target and auxiliary variables. Yadav et al. [3] developed ratio-cum-product estimators, while Jerajuddin and Kishun [4] enhanced ratio estimators by considering sample size. Soponviwatkul and Lawson [5] proposed further refinements by incorporating the coefficient of variation, correlation coefficient, and regression coefficient.
However, in situations where the population is both rare and clustered, simple random sampling may not be optimal. To address this, Thompson [6] introduced adaptive cluster sampling (ACS) in 1990. In ACS, an initial sample is selected using simple random sampling without replacement. If a unit in this initial sample satisfies a prespecified condition for the variable of interest, its neighboring units are added to the sample. This expansion continues iteratively until no additional units meet the condition. The collection of initial and subsequently added units forms a network. Units that do not meet the condition are referred to as edge units. The union of a network and its edge units constitutes a cluster. If the initial unit fails to satisfy the condition, it remains a singleton network. For this study, neighborhoods were defined as the four orthogonally adjacent units (up, down, left, and right), with mutual neighborhood relationships assumed (Figure 1). ACS has proven effective across a range of survey applications, especially in situations where the target attribute is either rare or exhibits spatial clustering. Numerous studies have employed ACS in diverse fields, including forest ecosystem monitoring [7], herpetofaunal surveys in tropical rainforests [8], evaluations of sea lamprey larval distributions [9], and investigations of freshwater mussel populations [10]. The method has also been utilized in hydroacoustic sampling [11] and in research related to the COVID-19 pandemic [12,13,14]. Beyond ecological and epidemiological contexts, ACS has also been adapted for autonomous systems and Internet of Things (IoT) applications [15,16].
Figure 1.
A neighborhood is defined based on the four units directly adjacent to a selected unit in the cardinal directions. An initially selected unit is marked with a asterisk; the condition for adding neighboring units could be a value greater than or equal to one. The blue units illustrate a single network formed under ACS.
Thompson also proposed an estimator and demonstrated that ACS yields improved efficiency in clustered populations. Analogously to simple random sampling, incorporating auxiliary variable information in ACS can further improve estimator performance. Chao [17] introduced a ratio estimator for ACS, while Dryver and Chao [18] introduced modified ratio estimators. Chutiman and Kumphon [19] presented a ratio estimator using two auxiliary variables. Chutiman [20] and Yadav et al. [21] proposed ratio estimators based on population parameters, including the coefficient of variation, kurtosis, skewness, and correlations with auxiliary variables. Chaudhry and Hanif [22,23] proposed generalized exponential-type estimators, while Bhat et al. [24] developed a generalized class of ratio-type estimators.
Although several studies have proposed enhancements to ratio-type estimators within the framework of ACS, a comprehensive investigation into the application of jackknife resampling techniques for refining ratio-type estimators under ACS remains limited in the existing literature. This study addresses this gap by developing an enhanced version of Chao’s ratio-type estimator for adaptive cluster sampling, which uses the jackknife method to leverage data from a single auxiliary variable, specifically the auxiliary variable’s mean. Section 2 outlines relevant estimators in ACS, Section 3 introduces the proposed jackknife-based estimators, Section 4 presents simulation results, and Section 5 concludes the study.
2. Adaptive Cluster Sampling
In adaptive cluster sampling, an initial sample of units is selected using simple random sampling without replacement.
Let and denote the variable of interest and the auxiliary variable, respectively. Let represent the initial sample size and be the final sample size. Let denote the network that includes unit , and let represent the number of units in that network.
Let and denote the averages of the y- and x-values, respectively, within the network that includes the initial sample unit , where for the variable of interest and for the auxiliary variable.
The Hansen–Hurwitz estimator of the population mean for the variable of interest is [25]:
The mean square error (MSE) of is:
where and .
The modified Hansen–Hurwitz estimator of the population mean of the auxiliary variable is:
Let be the population ratio between and , . Chao [17] introduced the following ratio estimator for the population mean:
where is a biased estimator of . The bias of is:
where , , , and , being the correlation coefficient between and . The ratio estimator () is biased; however, its bias diminishes as the correlation between and increases.
The mean square error (MSE) of is:
3. Proposed Estimators in Adaptive Cluster Sampling Using the Jackknife Method
Motivated by the work of Banerjee and Tiwari [26], this study applies Quenouille’s jackknife method [27] to propose a set of ratio-type estimators aimed at improving estimation accuracy.
Three estimators are proposed based on variations of the jackknife method.
- (1)
- First proposed estimator
This estimator employs a two-group jackknife partition of the sample network. The sample network of size n is randomly divided into two groups, each of size m = n/2.
Let , and , where and are the modified Hansen–Hurwitz estimators of the population mean based on a group of size m, for the y- and x-variables, respectively. Here, serves as a ratio-type estimator of the population mean based on group of size .
The first proposed estimator is
The bias of is:
The MSE of is:
- (2)
- Second proposed estimator
Building upon the first estimator, this version incorporates a correction term inspired by Banerjee and Tiwari [26].
The second proposed estimator is
is an unbiased estimator of to the first order of approximation (based on Banerjee and Tiwari [26]).
The MSE of is:
- (3)
- Third proposed estimator
This estimator utilizes the jackknife method through a delete-one-network resampling approach. In this case, jackknife samples are obtained by systematically omitting one network at a time. Therefore, and . The corresponding and represent modified Hansen–Hurwitz estimators of the population mean for the y- and x-variables in the delete network , respectively.
The third proposed estimator is
where is the ratio estimator of the population mean for the y-variable in the delete network .
The bias of is:
The MSE of is:
The detailed derivations of the bias and mean square error (MSE) for each of the proposed estimators are provided in Appendix B.
4. Simulation Study and Discussion
Simulation studies were conducted under two scenarios, categorized by the degree of correlation between the auxiliary variable (X) and the variable of interest (Y):
Case 1: Medium correlation
Case 2: High correlation
Population I: The first population was based on the blue-winged teal dataset [28]. The study area—5000 square kilometers in central Florida—was divided into 50 units of 100 square kilometers each. These data were assigned as y-values. The corresponding x-values [29] were simulated using the model , where , is the average of the y-values within the network that includes unit i of the initial sample. For consistency, x = 0 was assumed whenever y = 0. The population mean of y was 282.580, and the Pearson correlation coefficient between x and y was 0.47320. Initial sample sizes n considered were: 4, 6, 10, 16, and 20. Further insights into the distributional characteristics of the study variables are provided in Appendix A, where Figure A1 and Figure A2 depict Population I based on the variable of interest and the auxiliary variable, respectively. Population II: The second population, adopted from Chao [17], was generated using a linked-pairs process in combination with a bivariate Poisson cluster process, resulting in a 20 × 20 grid (400 units). The population mean of y was 0.6475, and the Pearson correlation coefficient between x and y was 0.7070. The initial sample sizes examined were: 4, 8, 10, 16, 20, 26, 30, 40, 50, 100, and 200. The distributions of the variable of interest and the auxiliary variable are illustrated in Figure A3 and Figure A4 (see Appendix A).
For each simulation iteration, initial sample units were selected via simple random sampling without replacement. The expansion criterion in the adaptive cluster sampling was defined by the condition . A total of 10,000 iterations were conducted for each estimator. The initial sample size n was varied to evaluate the performance of each estimator under different sampling effort levels. Small initial sample sizes reflected scenarios constrained by limited budget, time, or personnel. Moderate initial sample sizes represented realistic survey conditions under practical resource availability. Large initial sample sizes applied to intensive studies such as epidemiological surveillance, where detailed data collection is necessary.
The expected final sample size was computed as follows:
The estimated absolute relative bias was defined as:
The estimated mean square error of the estimator was defined as:
The relative efficiency of the proposed estimators was evaluated with respect to and , denoted as and , respectively.
The flowchart in Figure 2 outlines the simulation procedure, while the estimated absolute relative bias, estimated mean square error (MSE), and relative efficiency of the proposed estimators are presented in Table 1, Table 2 and Table 3. Additionally, bar charts in Figure 3 illustrate the behavior of the estimated MSE across various initial sample sizes.
Figure 2.
The steps of the simulation study for each initial sample size.
Table 1.
Absolute relative bias of population mean estimators.
Table 2.
Mean square error (MSE) of population mean estimators.
Table 3.
Relative efficiency of population mean estimators.
Figure 3.
Estimated mean squared error (MSE) of the estimators across varying initial sample sizes.
Discussion
Based on the data studied, the variable of interest was positively correlated with the auxiliary variable. The results from the simulation data are presented as follows.
Table 1 presents the estimated absolute relative bias of the biased estimators, namely , , and . For both populations, it can be observed that as the sample size increased, the absolute relative bias for all estimators declined and approached zero.
Table 2 shows that the estimators incorporating auxiliary variable information—given the positive correlation with the variable of interest—consistently yielded lower MSEs than estimators that do not use such information. Notably, achieved a lower MSE than the traditional ratio estimator when the initial sample size was small. Among all estimators, provided the lowest MSE across all initial sample sizes. Although the estimator is unbiased, its MSE was higher than that of despite being lower than that of estimators that do not use auxiliary information. As illustrated in Figure 3, the MSE for all estimators decreased as the sample size increased. Table 3 reports the relative efficiency of each estimator compared with that of and . Notably, all estimators that incorporate auxiliary variable information demonstrated higher relative efficiency than across both Population I and Population II. Among them, the estimator consistently exhibited the highest relative efficiency.
A comparative analysis indicated that both and outperformed in terms of RE, particularly at small to moderate sample sizes. Moreover, as the initial sample size increased, the efficiency of converged with that of the traditional ratio estimator .
In the case of Population I, while the overall correlation between the auxiliary variable (X) and the variable of interest (Y) was moderate, the correlation between their network-level averages and was substantially higher at 0.99. This strong association accounts for the trends in MSE and relative efficiency (RE) that closely mirror those observed in Population II.
5. Conclusions
Adaptive cluster sampling (ACS) is particularly effective for studying rare and spatially clustered populations. This research proposes three enhanced ratio-type estimators for ACS, building on Chao’s [17] original ratio estimator and employing the jackknife method to reduce bias and improve efficiency. Analytical derivations of bias and MSE were provided for each estimator, and their performance was evaluated through extensive simulation. The simulation results demonstrated that all three proposed estimators outperformed conventional estimators that do not utilize auxiliary variable information. Specifically, proved to be more efficient than Chao’s estimator for small initial sample sizes, while exhibited superior efficiency for both small and moderate initial sample sizes. In large-sample settings, the efficiency of became comparable to that of the traditional ratio estimator. Although is an unbiased estimator, its efficiency was the lowest among the estimators that incorporate auxiliary variable information. The practical implications of these findings are particularly relevant for field studies with limited budgets. In such scenarios, the estimators and may be the most effective choices because of their superior performance with smaller initial sample sizes. For studies with moderate resource availability requiring a medium-sized initial sample, continued to demonstrate high efficiency. In cases necessitating a large initial sample size, the estimator maintained its high efficiency, although its performance became comparable to that of the traditional ratio estimator .
A limitation of the current study lies in the use of a two-group partitioning strategy within the jackknife procedure. The effects of alternative partitioning schemes—such as dividing the sample into three or more groups—have not been explored. Future research could investigate whether multigroup jackknife strategies offer additional improvements in estimator efficiency, particularly in scenarios involving larger sample sizes or more complex population structures. Moreover, this study focused on a single auxiliary variable. Future research could investigate the development of jackknife-based estimators that incorporate multiple auxiliary variables. Such extensions may further improve estimation accuracy, particularly in high-dimensional sampling contexts. Additionally, alternative resampling methods—such as the bootstrap or balanced repeated replication (BRR)—could be explored and compared with the jackknife approach within the framework of adaptive cluster sampling (ACS).
Author Contributions
Conceptualization, S.W. and N.C.; methodology, S.W.; software, N.C. and P.G.; investigation, A.N.; writing—original draft preparation, N.C. and A.N.; writing—review and editing, P.G. and S.W.; funding acquisition, N.C. All authors have read and agreed to the published version of the manuscript.
Funding
This research project was financially supported by Mahasarakham University.
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.
Conflicts of Interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Appendix A
Figure A1 and Figure A2 illustrate Population I, depicting the distributions of the variable of interest and the auxiliary variable, respectively. Figure A1 presents the distribution of the variable of interest based on the blue-winged teal dataset [28], while Figure A2 shows the distribution of the auxiliary variable generated following the method proposed by Pochai [29].
Figure A1.
Distribution of the variable of interest (y) for Population I (blue-winged teal data).
Figure A2.
Distribution of the auxiliary variable (x) for Population I (simulated using Pochai [29]).
Figure A3 and Figure A4 present the distributions of the variable of interest and the auxiliary variable for Population II, respectively, as generated based on the procedure outlined by Chao [17].
Figure A3.
Distribution of the variable of interest (y) for Population II.
Figure A4.
Distribution of the auxiliary variable (x) for Population II.
Appendix B
Appendix B presents the detailed derivations of the bias and mean square error (MSE) for each of the proposed estimators introduced in Section 3. These derivations provide theoretical justification for the properties and performance of the estimators.
The bias and MSE of each proposed estimator are as follows:
The first proposed estimator:
Let and .
Where i = 1:
Assuming , the term can be expanded as an infinite series.
Therefore, .
The bias of is given by:
where ,
Therefore, .
The bias of is derived in the same way as that of , and the is equal to the .
Therefore, the bias of is:
For the MSE of ,
where
The MSE of is:
The second proposed estimator: , where
From and , it follows that , and is an unbiased estimator of to the first order of approximation (based on Banerjee and Tiwari [26]).
For the MSE of ,
where ,
and .
Therefore, the MSE of is
The third proposed estimator:
Let and .
Where i = 1:
Assuming , the term can be expanded as an infinite series.
The bias of is given by:
where , , , and .
The bias of is derived in the same way as that of , and the is equal to the . Therefore, the bias of is:
For the MSE of ,
where and .
Therefore, the MSE of is:
References
- Sisodia, B.V.S.; Dwivedi, V.K.A. Modified ratio Estimator Using Coefficient of Variation of Auxiliary Variable. J. Indian Soc. Agric. Stat. 1981, 33, 13–18. [Google Scholar]
- Singh, H.P.; Tailor, R. Use of Known Correlation Coefficient in Estimating the Finite Population Mean. Stat. Transit. 2003, 6, 553–560. [Google Scholar]
- Yadav, S.K.; Subramani, J.; Mishra, S.S.; Shukla, A.K. Improved Ratio-Cum-Product Estimators of Population Mean Using Known Population Parameters of Auxiliary Variables. Am. J. Oper. Res. 2016, 6, 48–54. [Google Scholar] [CrossRef]
- Jerajuddin, M.; Kishun, J. Modified Ratio Estimators for Population Mean Using Size of the Sample, Selected from Population. Int. J. Sci. Res. Sci. Eng. Technol. 2016, 2, 10–16. [Google Scholar] [CrossRef]
- Soponviwatkul, K.; Lawson, N. New Ratio Estimators for Estimating Population Mean in Simple Random Sampling using a Coefficient of Variation, Correlation Coefficient and a Regression Coefficient. Gazi Univ. J. Sci. 2017, 30, 610–621. [Google Scholar]
- Thompson, S.K. Adaptive cluster sampling. J. Am. Statist. Assoc. 1990, 85, 1050–1059. [Google Scholar] [CrossRef]
- Magnussen, S.; Kurz, W.; Leckie, D.G.; Paradine, D. Adaptive cluster sampling for estimation of deforestation rates. Eur. J. For. Res. 2005, 124, 207–220. [Google Scholar] [CrossRef]
- Noon, B.R.; Ishwar, N.M.; Vasudevan, K. Efficiency of adaptive cluster and random sampling in detecting terrestrial herpetofauna in a tropical rainforest. Wildl. Soc. Bull. 2006, 34, 59–68. [Google Scholar] [CrossRef]
- Sullivan, W.P.; Morrison, B.J.; Beamish, F.W.H. Adaptive cluster sampling: Estimating density of spatially autocorrelated larvae of the sea lamprey with improved precision. J. Great Lakes Res. 2008, 34, 86–97. [Google Scholar] [CrossRef]
- Smith, D.R.; Villella, R.F.; Lemarié, D.P. Application of adaptive cluster sampling to low-density populations of freshwater mussels. Environ. Ecol. Stat. 2003, 10, 7–15. [Google Scholar] [CrossRef]
- Conners, M.E.; Schwager, S.J. The use of adaptive cluster sampling for hydroacoustic surveys. ICES J. Mar. Sci. 2002, 59, 1314–1325. [Google Scholar] [CrossRef]
- Olayiwola, O.M.; Ajayi, A.O.; Onifade, O.C.; Wale-Orojo, O.; Ajibade, B. Adaptive cluster sampling with model based approach for estimating total number of Hidden COVID-19 carriers in Nigeria. Stat. J. IAOS 2020, 36, 103–109. [Google Scholar] [CrossRef]
- Chandra, G.; Tiwari, N.; Nautiyal, R. Adaptive cluster sampling-based design for estimating COVID-19 cases with random samples. Curr. Sci. 2021, 120, 1204–1210. [Google Scholar] [CrossRef]
- Stehlík, M.; Kiseľák, J.; Dinamarca, A.; Alvarado, E.; Plaza, F.; Medina, F.A.; Stehlíková, S.; Marek, J.; Venegas, B.; Gajdoš, A.; et al. REDACS: Regional emergency-driven adaptive cluster sampling for effective COVID-19 management. Stoch. Anal. Appl. 2022, 41, 474–508. [Google Scholar] [CrossRef]
- Hwang, J.; Bose, N.; Fan, S. AUV adaptive sampling methods: A Review. Appl. Sci. 2019, 9, 3145. [Google Scholar] [CrossRef]
- Giouroukis, D.; Dadiani, A.; Traub, J.; Zeuch, S.; Markl, V. A survey of adaptive sampling and filtering algorithms for the internet of things. In Proceedings of the 14th ACM International Conference on Distributed and Event- Based Systems, Montreal, QC, Canada, 13–17 July 2020; pp. 27–38. [Google Scholar] [CrossRef]
- Chao, C.T. Ratio estimation on adaptive cluster sampling. J. Chin. Stat. Assoc. 2004, 42, 307–327. [Google Scholar] [CrossRef]
- Dryver, A.L.; Chao, C.T. Ratio estimators in adaptive cluster sampling. Environmetric 2007, 18, 607–620. [Google Scholar] [CrossRef]
- Chutiman, N.; Kumphon, B. Ratio estimator using two auxiliary variables for adaptive cluster sampling. Thail. Stat. 2008, 6, 241–256. [Google Scholar]
- Chutiman, N. Adaptive cluster sampling using auxiliary variable. J. Math. Stat. 2013, 9, 249–255. [Google Scholar] [CrossRef]
- Yadav, S.K.; Misra, S.; Mishra, S. Efficient estimator for population variance using auxiliary variable. Am. J. Oper. Res. 2016, 6, 9–15. [Google Scholar] [CrossRef]
- Chaudhry, M.S.; Hanif, M. Generalized exponential-cum-exponential estimator in adaptive cluster sampling. Pak. J. Stat. Oper. Res. 2015, 11, 553–574. [Google Scholar] [CrossRef]
- Chaudhry, M.S.; Hanif, M. Generalized difference-cum-exponential estimator in adaptive cluster sampling. Pak. J. Stat. 2017, 33, 335–367. [Google Scholar]
- Bhat, A.A.; Sharma, M.; Shah, M.; Bhat, M. Generalized ratio type estimator under adaptive cluster sampling. J. Sci. Res. 2023, 67, 46–51. [Google Scholar] [CrossRef]
- Thompson, S.K. Sampling, 3rd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2012; pp. 319–337. [Google Scholar]
- Banerjie, J.; Tiwari, N. Improved ratio type estimator using jack-knife method of estimation. J. Reliab. Stat. Stud. 2011, 4, 53–63. [Google Scholar]
- Quenouille, M.H. Notes on Bias in Estimation. Biometrika 1956, 43, 353–360. [Google Scholar] [CrossRef]
- Smith, D.R.; Conroy, M.J.; Brakhage, D.H. Efficiency of adaptive cluster sampling for estimating density wintering waterfowl. Biometrics 1995, 51, 777–788. [Google Scholar] [CrossRef]
- Pochai, N. Double and Resampling in Adaptive Cluster Sampling. Doctoral Dissertation, National Institute of Development Administration, Bangkok, Thailand, 2006. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).