1. Introduction
Systematic sampling is a widely used probability sampling design that offers a good balance between operational simplicity and statistical efficiency while ensuring that the sample adequately represents the natural populations, as discussed by [
1,
2,
3] noted that systematic sampling is frequently used either independently or in combination with other sampling methods, and is considered one of the most extensively applied sampling techniques in practice. According to [
4], systematic sampling may provide implicit stratification, which often leads to improved estimates when the sampling frame is ordered. Moreover, Buckland [
5] agreed with Finney [
1,
2] that systematic sampling is efficient and practically convenient, especially in the sampling of natural populations like forests. Although, systematic sampling has been applied in various applications (agriculture, forestry, and meteorology), the theory of systematic sampling was first studied by [
6,
7,
8] continued the development of the theory of systematic sampling, which was later reviewed by [
5,
9] presented a critical review of the recent developments in system systematic.
The use of auxiliary information in systematic sampling is beneficial to enhancing the accuracy and efficiency of mean estimation. It enables researchers to make better use of available knowledge about the population and reduces the potential for bias and uncertainty in the estimation process. By leveraging auxiliary data effectively, systematic sampling can produce reliable and informative estimates of the population mean. Many authors have used systematic sampling with or without auxiliary data. In the presence of auxiliary information, refs. [
10,
11] proposed ratio and product estimators for the estimation of population mean using systematic sampling and discussed the properties as well. Ref. [
12] proposed modified ratio and product-type estimators for population mean, whereas ref. [
13] proposed exponential-type estimators for the estimation of study variables in systematic sampling design. Refs. [
14,
15], respectively, proposed chain ratio- and exponential ratio-type estimators, whereas ref. [
16] proposed a semi-exponential ratio estimator using two auxiliary variables under a systematic sampling design. Recently, refs. [
17,
18] suggested various estimators using two auxiliary variables for mean estimation in systematic sampling.
The EWMA statistic is a technique that integrates both the current and previous sample information to generate smoothed estimates, giving more weight to recent data while not entirely discarding the older observations under the setting of time-scaled surveys. The estimators based on EWMA statistics, also known as memory-type estimators, not only improve the efficiency but also produce smoother estimates by weighting recent observations more heavily, which is useful in situations where data may be subject to short-term fluctuations. These estimators are less sensitive to the sudden changes in data and outliers, making them robust in time-scaled surveys where data consistency is essential.
In recent years, refs. [
19] proposed memory-type ratio and product estimators, whereas ref. [
20] proposed a log-type estimator for time-scaled surveys. Ref. [
21] proposed memory-type variance estimators using auxiliary information in the presence and absence of measurement errors under simple random sampling (
SRS). Ref. [
22] introduced memory-type estimators with dual auxiliary variables for mean estimation. Ref. [
22] extended memory-type estimators to a stratified design for heterogeneous populations. Ref. [
23] proposed memory-type estimators that incorporate
EWMA and regression imputation techniques to handle non-response. Ref. [
24] developed memory-type estimators for survey accuracy using
EWMA and related extended statistics.
In time-scaled surveys, where the population values may change over time, classical estimators may not capture temporal dynamics effectively, which can lead to suboptimal performance in terms of bias and
MSE. To address this limitation, we propose memory-type estimators based on the
EWMA statistic, which incorporate past information and provide improved estimation accuracy for estimation of the mean in time-scaled surveys. After this brief introduction, the rest of the paper is structured as follows.
Section 2 is based on the methodology, formulas, and basic notations of systematic sampling. Some classical estimators are mentioned in
Section 3. The proposed memory-type estimators are presented in
Section 4, along with the derivations of bias and mean
MSE expressions. Mathematical comparisons of memory-type estimators with conventional estimators are presented in
Section 5. To assess the performance of the memory-type estimators, a simulation study is conducted in
Section 6, while
Section 7 is based on application to real data. Concluding remarks and future directions are presented in
Section 8.
2. Sampling Methodology, Formulas, and Notations
The process of systematic sampling is straightforward to implement. Once the sampling interval is determined, selecting the sample elements becomes a systematic process, making it convenient for researchers to employ in practice. Consider a finite population P = (P1, P2, …, Pj, …, PN) made up of N different elements labelled from 1 to N (1, 2, …, j, …, N), and organized in a particular order. Let us simply refer to the jth element of P as Pj. Additionally, suppose that N may be written as the product of two non-negative numbers, n and k, with the result being N = nk. Let Y stand for the primary variable of concern and X represent a supporting variable. The labels for the study variable and the auxiliary variable, respectively denoted as y = (y1, y2, …, yj, …, yN) and x = (x1, x2, …, xj, …, xN), can be used to define the values of both variables. The values of the primary study variable and the auxiliary variable for the jth unit (j = 1, 2, 3, …, k) in the ith (i = 1, 2, 3, …, n) systematic sample are represented here by yij and xij, respectively. We start by creating a random number between 1 and k (let us pretend it is j) in order to choose a sample of size n. Then, starting from j, we choose every kth unit such that it contains the j, j + k, j + 2k, …, j + (n − 1)k successive elements. As a result, there are a total of k potential samples with a total of n elements.
The means and variances for the study and the auxiliary variables for systematic (
sys) samples may be obtained as
The means and variances for the study and the auxiliary variables for population mean may be obtained as
Ref. [
25] first gave rise to the notion of the
EWMA statistic to examine the change in the process mean. The
EWMA statistic for the non-sensitive study variable and auxiliary variable are, respectively, defined as
and
where
and
are means of the study variable and the auxiliary variable of the current sample using systematic sampling. Here,
λ is the smoothing constant (0
λ 1), which is known as the weight given to the observations. The larger the value of
λ, the larger the weight given to the current values and the smaller the weight given to the past values; whereas, the smaller the value of
λ, the smaller the values given to the current values and larger the weight given to the past values. The mean of the current sample receives all the weight for
λ = 1, and the
EWMA statistic would be equal to the conventional sample mean estimator. Further,
t determines the number of the sample and the term “
t − 1” is used to indicate the prior observation of the given statistic. The expected mean or the average of the prior sample is taken as its initial value.
It is quite obvious that no survey is conducted unless the results obtained from the pilot survey are trustworthy. Usually, the initial values of wt,sys and ut,sys are assumed to be the expected means, which can be estimated from the pilot survey. In this study, we consider them to be zero.
Let
and
represent the sampling errors associated with the study variable and the auxiliary variable, respectively, defined as
Some important notations, formulas, and statistical properties are presented in Equation (1), and are required for the derivation of the bias and
MSE of the proposed estimator.
where
ρy and
ρx be the corresponding intra-class correlations for the study and the auxiliary variables, which actually measure the rate of homogeneity within clusters in the context of systematic sampling. The intra-class correlation typically depends on the
yij (the
ith observation of the
jth cluster) and
yij* (the
ith observation of the
j* cluster, which is different from
jth cluster), and
ρyx is the correlation coefficient between the study variable and the auxiliary variable. The correction factor is used to adjust the variance in sampling when dealing with a finite population. Moreover, the quantities
Cy and
Cx are population coefficients of variations for the study and auxiliary variables, respectively.
3. Classical Estimators Under Systematic Sampling
In this section, we present the classical ratio, product, exponential ratio, and exponential product estimator, along with the expression of bias and MSE expressions in systematic sampling.
The sample mean estimator along with the expression of its variance is given by
and
Refs. [
10,
11], respectively, proposed classical ratio and product estimators under systematic sampling, defined as follows:
and
The expressions of bias and
MSE of classical ratio and product estimators are given by
and
Ref. [
12] proposed classical exponential ratio and exponential product estimators under systematic sampling, defined as follows:
and
The expressions of bias and
MSE of classical exponential ratio and exponential product estimators are given by
and
4. Memory-Type Estimators Based on EWMA Statistic Using Systematic Sampling
In this section, we define the memory-type classical estimators and modified forms of classical estimators based on the EWMA statistic under systematic sampling. The expressions of approximate bias and MSE of the aforementioned estimators are also derived up to the second order using Taylor and exponential expansions.
The memory-type sample mean estimator based on the
EWMA statistic [
25] under systematic sampling is defined as
The variance of the memory-type mean estimator is given by
The memory-type ratio estimator, based on the classical ratio estimator of [
10] under systematic sampling, replaces the sample means of
y and
x with their
EWMA statistics, and is defined as
To derive the expression of bias, we re-write Equation (17) in terms of sampling error using the notations given in Equation (1) as
Applying the Taylor series up to the second order in Equation (18), we have
Simplifying Equation (19) and retaining the terms up to the second order, we get
Applying expectation to both sides of Equation (20), we find that the expression for the bias of the memory-type ratio estimator based on the
EWMA statistic is given by
To derive the expression of
MSE, ignoring the term beyond the first order, squaring and taking expectations on both sides of Equation (19), we get
The final expression for the
MSE of the memory-type ratio estimator based on
EWMA statistics is given by
The memory-type product estimator, based on the classical product estimator of [
11] under systematic sampling, replaces the sample means of
y and
x with their
EWMA statistics, and is defined as
To derive the expression of bias of the memory-type product estimator, we re-write Equation (24) in terms of sampling error using the notations given in Equation (1) as
On simplification of Equation (25), we have
Applying expectation to both sides of Equation (26), we find that the final expression for the bias of the memory-type product estimator based on
EWMA statistics is given by
To derive the expression of
MSE, ignoring the term beyond the first order, squaring and taking expectations on both sides of Equation (26), we get
The final expression of
MSE of the memory-type product estimator based on
EWMA statistics is given by
The memory-type exponential ratio estimator is an adaptation of the traditional exponential ratio estimator proposed by [
12] under systematic sampling with sample means replaced by the
EWMA statistics of
y and
x, and is defined as
To derive the expression of bias of the memory-type exponential ratio estimator, we re-write Equation (30) in terms of sampling error using the notations given in Equation (1) as
Simplifying, and applying Taylor and exponential series up to the second order in Equation (31), we have
On simplification of Equation (32), we have
Applying expectation to both sides of Equation (33), we find that the final expression for the bias of the memory-type exponential ratio estimator based on the
EWMA statistic is given by
To derive the expression of
MSE of the memory-type exponential ratio estimator, simplifying Equation (32) and ignoring the terms beyond the first order, we get
Simplifying, squaring, and taking expectation on both sides of Equation (35), we obtain
The final expression for the
MSE of the memory-type exponential ratio estimator based on
EWMA statistics is given by
The memory-type exponential product estimator is an adaptation of the traditional exponential product estimator proposed by [
12] under systematic sampling, where
EWMA statistics are used in place of the sample means of
y and
x, and it is defined as
To derive the expression of bias of the memory-type exponential product estimator, we re-write Equation (38) in terms of sampling error using the notations given in Equation (1) as
Simplifying and applying Taylor and exponential series up to second order on Equation (39), we have
On simplification of Equation (40), we have
Applying expectation to both sides of Equation (41), we find that the final expression for the bias of the memory-type exponential product estimator based on
EWMA statistics, is given by
In order to derive the expression of
MSE of the memory-type exponential product estimator, simplifying Equation (40) up to the first order, and expanding the exponential series, we have
Simplifying, squaring, and taking expectation on both sides of Equation (43), we obtain
The final expression for the
MSE of the memory-type exponential product estimator based on
EWMA statistics is given by
6. Simulation Study
We have conducted an extensive simulation study to judge the performance of the proposed estimators with their conventional counterparts. A bivariate normal population is generated using [
26] with the below parameters.
The intra-class correlations for the simulated populations at various levels of correlation are summarized in
Table 1.
A well-known
mvnorm function is used to generate the populations of
x and
y under the library of
mvtnorm. The following formulas are used to compute the relative efficiency of the considered estimators as follows:
and
where
The output of MSEs and relative efficiencies are computed at different levels of correlation along with the different weights of constant using the algorithm of the following simulation:
Bivariate populations of size N (=5000) are generated with different positive/negative levels of correlation (=0.75, 0.80, 0.85, 0.90, and 0.95) and the true mean of the auxiliary variable is computed. Some robust parameters associated with the auxiliary variable are also computed.
Different choices of constant λ (=0.1, 0.2, 0.3, 0.5, and 0.75) are selected to assign the weights to the current and previous sample means.
Different samples of size n (=50, 100, 200, 300, and 500) are selected using systematic sampling from the populations simulated in Step-1.
Step-3 is repeated 50,000 times and the estimators considered in this study are computed.
The
MSE and
RE of each estimator are computed for different sample sizes, different levels of correlation, and different weight constants, and are summarized in
Table 2,
Table 3,
Table 4,
Table 5,
Table 6,
Table 7,
Table 8 and
Table 9, respectively.
Table 2,
Table 4,
Table 6 and
Table 8, together with
Table 3,
Table 5,
Table 7 and
Table 9, represent the
MSE and
RE of the conventional and memory-type estimators based on
EWMA statistics for different combinations of sample size, correlation, and smoothing constant
λ. The results indicate that the
MSE of memory-type estimators decreases as the sample size increases for fixed values of
λ and correlations. It is clear that, regardless of the degree of positive correlation and weight constants, the memory-type ratio and exponential ratio estimators beat the conventional ratio and exponential ratio estimators. For varying degrees of negative correlation and weight constants, the memory-type product and exponential product estimators performed better than the traditional product and exponential product estimators. The values of
RE further confirm that the proposed memory-type estimators are generally more efficient than the conventional estimators across various
n, λ, and correlations. The efficiency gain is more pronounced when the correlation between the variables is strong. In addition, the results suggest that smaller values of λ, which assign higher weight to the past observations in
EWMA statistics, tend to improve the estimator performance in time-scaled survey settings. However, when λ = 1, the
EWMA statistic reduces to the current sample mean, and therefore, the proposed estimators perform similar to the conventional estimators.
Overall, the simulation results demonstrate that the proposed memory-type estimators, which employ the EWMA statistic, are effective and useful for estimating the population means using systematic sampling, particularly for time-scaled surveys where integrating past information can improve estimation accuracy.
7. Real Data Application
We considered a real dataset from [
27] to assess the performance of the proposed memory-type estimators over the conventional estimators based on the
EWMA statistic in systematic sampling. Density is considered to be the main study variable
Y, while stiffness is taken to be the auxiliary variable
X with the parameters given as
We only considered the ratio- and exponential ratio-type estimators as the correlation coefficients between the study variable and the auxiliary data are positive. We considered regular time intervals of 25 samples, each of size 5, selected using systematic sampling. The chosen samples,
MSEs and
REs of conventional ratio, exponential ratio and memory-type ratio, and exponential estimators are presented in
Table 10 and
Table 11.
The results presented in
Table 10 and
Table 11 show that the proposed memory-type estimators achieve lower
MSE and higher
RE than the conventional estimators for systematic sampling. The estimators without the auxiliary information have higher variability as compared to the other estimators utilizing auxiliary information. Moreover, the memory-type ratio and exponential ratio estimators exhibit lower variance than the conventional ratio and exponential ratio estimators, except in a few cases. Based on the 25 samples, the average
MSE values for the conventional ratio, exponential ratio, memory-type ratio, and memory-type exponential ratio estimators are 384.54, 397.98, 27.54, and 32.65, respectively. These results show that the memory-type estimators performed much better than the conventional estimators. In particular, the proposed ratio estimator provides the best performance among all the estimators for estimating the population mean using systematic sampling for time-scaled surveys.
8. Conclusions
In this study, we have emphasized the significance of mean estimation in survey sampling, as it serves as a crucial indicator for understanding population variability. To address this, we have suggested memory-type estimators based on EWMA statistic for time-scaled surveys using systematic sampling. By deriving the expressions of bias and mean square errors for these estimators, we explored their performance under different scenarios. We identified mathematical conditions in which our proposed estimators outperformed conventional estimators. Simulation and real data applications validated the superiority of our proposed memory-type estimators over conventional estimators at different levels of correlation and weight constants. Our findings underscore the importance of efficient estimators in drawing accurate and reliable conclusions from diverse data populations.
In this study, we considered only one auxiliary variable for mean estimation using memory-type estimators. In future, more than one auxiliary variable can be used in multi-phase sampling. Additionally, the proposed methodology can be extended to other sampling designs, such as ranked set sampling, and adapted to handle non-response, which are common challenges in time-scaled surveys.