1. Introduction
In sampling survey literature, it is well established that the efficiency of estimators for population parameters of the characters of interest can be improved by incorporating auxiliary information related to a correlated auxiliary attribute x [
1,
2,
3]. This auxiliary information can be leveraged at the planning or design stage to secure better estimators compared to those that do not use auxiliary data. A common technique for utilizing known population parameters of the auxiliary attribute is through ratio, product and regression estimation methods, applied across diverse probability sampling designs like simple random sampling, cluster sampling, systematic sampling, stratified sampling and two-phase sampling. In the current research, the focus will be on using the knowledge of auxiliary attributes within the framework of systematic sampling.
Systematic sampling is a sampling design where the first unit is randomly selected, and the remaining units are automatically chosen according to a predetermined pattern. This is one of the most used sampling techniques because of its practical implementation. Compared to simple random sampling, systematic sampling is easier to execute, especially when conducted in the field [
4]. Additionally, systematic sampling can give estimators with higher precision than simple random sampling when the sampling frame exhibits explicit or implicit stratification [
4]. This is because systematic sampling effectively segregates the population units into
strata of
equal size, selecting a unit from each stratum, which is like stratified random sampling with a unit per stratum. Systematic sampling is also efficient for sampling natural populations, such as estimating timber volume in forest areas [
5]. Many research organizations, including the Food and Agriculture Organization (FAO) of the United Nations, utilize systematic sampling in their surveys, such as the Survey of Global Forest Resources Assessment in 2010 [
1].
Systematic sampling is a sampling procedure that provides an equal chance of selection for each unit in the population. In systematic sampling approach which has its origins in the work of [
6] a sample of size
n is selected from a finite population of size
N. The process involves first randomly selecting the first unit from the first
k units, where
. After this initial random selection, every
unit from the population is then included in the sample. But, if the population size cannot be expressed as product of
n and
k, then the sample size cannot be fixed. This means that the sample mean, which is an estimate of the population mean, becomes biased [
7] and this led to development of modified systematic sampling procedure like diagonal systematic sampling (DSS). Additionally, the properties of estimators (such as the sample mean) derived from systematic samples depend on the order of the units in the sampling frame. Under certain arrangements, such as the presence of linear, parabolic, or periodic trends in the population, the systematic sample may be less efficient than other sampling methods. If the sampling interval
k is equivalent to the period of a periodic trend, the efficiency of the systematic sampling units will be approximately to that of a single random observation from the population set. However, if the sampling interval
k is an odd multiple of the half-period, the systematic sample mean will equal the true population mean [
4].
The DSSS was first unveiled by [
8] which was generalized by [
9] followed by [
10] who proposed a modified version of systematic sampling called diagonal circular systematic sampling. Ref. [
11] examined the conditions under which the [
10] sampling scheme is applicable. Additionally, [
11] presented a generalized type of systematic sampling and demonstrated that DSS is a special case of the generalized scheme. Beyond these specific developments, systematic sampling has been extensively studied by many researchers over the years, with various aspects of the technique being investigated. Some of the notable works include those by [
4,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23].
The literature on survey sampling has explored various techniques to improve the efficiency of estimators for population proportions and other parameters. These include using unknown weights, power/exponential/logarithmic transformations, linear combinations of estimators and robust measures. One particularly promising technique is calibration estimation. Calibration aims to enhance the accuracy of estimators by modifying the original design weights using auxiliary information. This is done by minimizing a distance measure subject to a set of calibration constraints. Significant research on estimation using calibration has been conducted by several researchers, including [
24,
25,
26,
27,
28,
29,
30] and others.
Recently, ref. [
31] presented a new conventional estimator based on DSS and its efficiency was compared to that of conventional estimators based on simple random sampling without replacement (SRSWOR) and linear systematic sampling. However, their proposed estimators do not consider the presence of outliers. Ref. [
4] defines outliers as observations that are substantially different from the rest of the data. These observations can have a disproportionate influence on sample statistics, such as the mean, variance, proportion and other statistics, and can lead to biased or inefficient estimates. Ref. [
4] provides strategies for dealing with outliers in sampling and these include trimming, robust estimation (median-based estimators, M-estimators, calibration approach). Therefore, this current study focused on the modification of [
31] estimators of population proportion in the presence of outliers using calibration approach.
This study presented the concepts of calibration approach to incorporate auxiliary attributes into estimators of population proportion under diagonal systematic sampling to obtain new estimators that are robust against outliers/extreme values, stable and efficient and can produce the estimate of population proportion with higher precision. The calibration approach is not only limited to sampling theory, but it also has applications in different fields, e.g., system reliability. For example, ref. [
32] discussed the concepts of calibration and validation in the context of reliability engineering, providing insights into how these processes can enhance the reliability of systems through proper statistical methods. Ref. [
33] discussed methodologies for assessing the reliability of measurement systems, including calibration techniques that ensure measurement accuracy over time. Ref. [
34] outlined best practices for developing calibration procedures that enhance the reliability of measurement systems, emphasizing the role of regular calibration in maintaining system performance.
The paper is organized as follows:
Section 1, titled Introduction, discusses the background, the problem to be solved and the significance of the study, as well as the novelty of the proposed methods.
Section 2 discusses the procedure for diagonal systematics sampling (DSS) and approach for defining conventional estimator for population proportion.
Section 3 presents the proposed calibrated estimators for estimating population proportion when sampling is conducted using DSS, along with the members of estimators in different situations.
Section 4 compares the performance of the proposed estimators against the conventional estimator numerically through simulation studies. Finally,
Section 5 provides the conclusion and offers some recommendations based on the findings.
2. Diagonal Systematic Sampling Procedure and Estimator of Population Proportion
Consider a finite population with N elements and a sample of size n is drawn at random without a replacement from the first k units and every subsequent unit.
Let
be pairs of study and auxiliary characters of population units belonging to one of two disjoint classes
H and
where
H is the class of units possessing the attribute of interest. That is, let
where
and
are the study and auxiliary attributes, respectively.
Then, the systematic sample proportion are defined as and are unbiased estimators of the population proportion and for and , respectively.
The usual sample proportion
estimator of population proportion
is given in (3) as
The variance of
denoted by
is given as in (4).
Ref. [
8] presented the procedure for DSS as shown below.
Classify units into rows and columns.
Draw a random number .
Then, the sample units are selected in the pattern defined in (5).
Under DSS, the first-order inclusion probability denoted by
and the second-order inclusion probability denoted by
are given by Equations (6) and (7) respectively.
Ref. [
31] suggested sample proportion based on the DSSS denoted by
as given in (8).
where
.
The variance of
denoted by
is given by (9).
Following the Sen–Yates–Grundy approach suggested by Sen (1953), Yates and Grundy (1953) [
23], the
and estimate of
denoted by
can are given as in (10) and (11), respectively.
4. Empirical Study
In this section, simulation studies were conducted to evaluate the performance of the proposed estimators,
, and
, in comparison to the [
31] estimator. The simulation data consisted of 500 units generated using the binomial distribution, with success probabilities of 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 and 0.9. The distributions of the simulated data are presented in
Figure 1. The data in the populations 1-IV are skewed to the left, while that of populations VI–IX are skewed to right, and population V is uniformly distributed.
A sample of size 100 was selected using the diagonal systematic sampling method, and this process was repeated 500 times. The biases, mean squared errors (MSEs) and percentage relative efficiencies (PREs) of the considered estimators were computed using the formulas provided in Equations (37)–(39),
where
is any estimator,
is the sample proportion.
The tables (
Table 1,
Table 2,
Table 3,
Table 4,
Table 5,
Table 6,
Table 7,
Table 8 and
Table 9) present numerical results comparing the biases, mean squared errors (MSEs) and percentage relative efficiencies (PREs) for existing estimators and the new estimators proposed in the study. The findings indicate that, when the probability of success is set to 0.5, all the proposed new estimators exhibit lower MSEs and higher PREs compared to the existing estimators considered in the investigation.
Furthermore, this pattern holds true not just for p = 0.5, but also for other probability of success values examined, such as 0.1, 0.2, 0.3, 0.4, 0.6, 0.7, 0.8 and 0.9 which include extreme probabilities. The results indicated that, the proposed estimators
,
and
exhibited lower MSEs with a significant percentage gain in efficiency compared to that of the estimator by [
31] in all the cases considered for empirical studies.
- i.
The
Table 1,
Table 2,
Table 3,
Table 4,
Table 5,
Table 6,
Table 7,
Table 8 and
Table 9 present numerical results that compare the biases, mean squared errors (MSEs) and percentage relative efficiencies (PREs) of the existing estimator and the new estimators proposed in the study for the other probability of success of 0.1, 0.2, 0.3, 0.5, 0.6, 0.7, 0.8 and 0.9, respectively.
- ii.
The results of the biases showed that the proposed estimators have lower values compared to the estimator proposed by [
31], with the exception of a few cases indicating robustness of the proposed estimators over the conventional one.
- iii.
The results of the MSEs showed that the proposed estimators have lower values compared to the estimator proposed by [
31] in all cases, indicating the higher efficiency and precision of the proposed estimators than the conventional one.
- iv.
The results of the PREs showed that the proposed estimators have higher values compared to the estimator proposed by [
31] in all cases, indicating higher efficiency gains by the proposed estimators over the conventional one.
5. Conclusions
This study introduced calibrated estimators for estimating the population proportion of a characteristic of interest under the diagonal systematic sampling scheme. Two novel calibration schemes were proposed, and the corresponding estimators were derived. Empirical studies were conducted through simulation to evaluate the biases, mean squared errors (MSEs) and percentage relative efficiencies (PREs) of the existing and the suggested estimators. The simulations considered success probabilities (p) ranging from 0.1 to 0.9 in increments of 0.1. The findings indicate that all the proposed estimators,
and
, exhibit lower MSEs and higher PREs compared to the Azeem (2021) [
31] estimator considered in this investigation. In conclusion, the proposed estimators under the calibration technique prove to be more efficient and precise than the existing estimator.
The results of the proposed calibrated estimators which utilized information on auxiliary character dominated the existing conventional estimator in terms of biasness, efficiency, robustness, stability as well as efficiency gain.
The proposed calibrated estimators of population proportion based on DSS using auxiliary attribute in this study can be practical in different areas of endeavors. For example, in market research, they can be applied to determining consumer preferences using demographic data (e.g., age, income) as auxiliary attributes. In public health, they can be used for assessing vaccination rates or health behaviors by incorporating socioeconomic factors or geographic data. In social sciences, they can be utilized in analyzing public opinion on policies while using demographic characteristics to refine estimates. The proposed estimators can be utilized in environmental studies to estimate the proportion of polluted sites using historical industrial activity data. The suggested estimators can be used in census data collection, for evaluating household access to services like the internet using urban–rural classifications. In education, the estimators can be used in estimating student performance or enrollment rates based on socioeconomic status and prior educational outcomes. In transportation, they can be used in assessing traffic patterns or public transport usage using data on population density or urban planning.