Next Article in Journal
A Copula-Based Model for Analyzing Bivariate Offense Data
Previous Article in Journal
Maximum Likelihood and Calibrating Prior Prediction Reliability Bias Reference Charts
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction Inferences for Finite Population Totals Using Longitudinal Survey Data

by
Asokan M. Variyath
* and
Brajendra C. Sutradhar
*
Department of Mathematics and Statistics, Memorial University, St. John’s, NL A1C 5S7, Canada
*
Authors to whom correspondence should be addressed.
Stats 2025, 8(4), 110; https://doi.org/10.3390/stats8040110
Submission received: 23 September 2025 / Revised: 8 November 2025 / Accepted: 10 November 2025 / Published: 18 November 2025

Abstract

In an infinite-/super-population (SP) setup, regression analysis of longitudinal data, which involves repeated responses and covariates collected from a sample of independent individuals or correlated individuals belonging to a cluster such as a household/family, has been intensively studied in the statistics literature over the last three decades. In general, a longitudinal, such as an auto-correlation structure for repeated responses for an individual or a two-way cluster–longitudinal correlation structure for repeated responses from the individuals belonging to a cluster/household, are exploited to obtain consistent and efficient regression estimates. However, as opposed to the SP setup, a similar regression analysis for a finite population (FP)-based longitudinal or clustered longitudinal data using a survey sample (SS) taken from the FP-based on a suitable sampling design becomes complex, which requires first defining the FP regression and correlation (both longitudinal and/or clustered) parameters and then estimating them using appropriate sampling weighted-design unbiased (SWDU) estimating equations. The finite sampling inferences, such as predictions of longitudinal changes in FP totals, would become much more complex, meaning that it would be necessary to predict the non-sampled totals after accommodating the longitudinal and/or clustered longitudinal correlation structures. Our objective in this paper is to deal with this complex FP prediction inference by developing a design cum model (DCM)-based estimation approach. Two competitive FP total predictors, namely design-assisted model-based (DAMB) and design cum model-based (DCMB) predictors are compared using an intensive simulation study. The regression and correlation parameters involved in these prediction functions are optimally estimated using the proposed DCM-based approach.

1. Introduction

Clusters/household-based longitudinal survey data analysis for finite population (FP) inferences is an important research topic. For example, to help develop public policy, to understand the determinants of health, and to understand the relationship between health status and health care use, Statistics Canada conducted the National Population Health Survey (NPHS) to gather information on the health of Canadians. The survey began in 1994, collecting biennial information from selected households/clusters under a state/province/strata until 2012. The responses can be linear, binary, or multinomial. In this paper, however, we concentrate on the analysis of linear cluster longitudinal data, such as the repeated body mass index (bmi) measures (ranges in general from 18 to 40 kg / m 2 ), collected under the NPHS study from all members of the selected households over a period of time. Notice that the health status of an individual measured based on b m i at a given time is likely to be dependent on (1) the health status of previous times, (2) household/cluster random effect, and (3) on certain time-dependent covariates such as gender, age group, education level, and lifestyle factors like smoking, diet, and physical activity. One may refer to this type of data as (a) a single-stage cluster-based longitudinal survey (SSCLS) sample. This is because this data set consisting of repeated b m i responses, for example, those collected from all individuals belonging to a sample of households/clusters, where the sample was chosen in a single stage from the specified FP ( F ) containing a large number of households/clusters. In a specialized case, when the household is considered to be the sampling unit, i.e., repeated responses are collected from the household leader only, for example, one does not need to consider any household/cluster correlation. In such cases, one may refer to the data set as (b) a single-stage individual-based longitudinal survey (SSILS) sample. In this paper, we study finite sampling inferences using both SSCLS and SSILS samples.
We remark that, except for some general discussion and exploratory analysis [1,2,3,4], there do not appear to be adequate discussions on F inferences using the aforementioned SSILS and SSCLS samples. More specifically, to use these SSILS and SSCLS samples for F inferences, in a model-based approach, one needs to consider an appropriate S longitudinal correlation model for the F -based hypothetical data. In this token, for linear longitudinal survey data analysis, some studies, such as [5] (Section 2, eqn. 2), assumed that the repeated responses from an individual in the F follow a random effects-based S linear mixed model, where an individual’s common random effect causes an equi-correlation structure among the repeated responses from the individual. This cluster correlation-oriented model, however, fails to accommodate the time-lag-dependent decaying correlations [6] (chps. 2–3) that appear to be more appropriate than an equip-correlation structure for longitudinal responses. Similarly, some studies, such as [7] (Section 7.4 see also the references therein) by following [8], have summarized a ‘working’ correlation model and the so-called GEE (generalized estimating equation) estimation approach to fit the longitudinal survey sampled data, which appear to be appropriate for non-longitudinal clustered data and/or classical multivariate data. More specifically, it was suggested by [7] (Section 7.4) to compute the correlations of the longitudinal survey data by using the standard Pearson’s correlation formula, which appears to be a naive approach, as these correlations fail to exhibit any auto-correlations i.e., decaying correlations as time lag increases, appropriate for the longitudinal data. A similar working correlations approach using ‘working’ odds ratio parameters for longitudinal binary survey data was used by [9] (ch. 20), which provides inconsistent regression estimates [10] (ch. 4, Section 4.2).
As a remedy, following [6] (Section 2.2) (see also [11] (Section 3)), we use a general stationary auto-correlation structure-based S correlation model to fit the F -based hypothetical data involving repeated data from independent individuals, and similarly use a familial longitudinal correlation structure-based [6] (Section 3.1) S correlation model to fit the F -based hypothetical data involving repeated data from all members in a cluster/family exhibiting two-way clustered longitudinal correlations—the clusters being independent to each other. These individual-based longitudinal (IL) and cluster-based longitudinal (CL) correlation models for the F , along with the estimation of the model parameters using the SSILS and SSCLS samples, are provided in Section 3 and Section 4 respectively. As far as the SSILS and SSCLS samples are concerned, their construction from respective F are given in the previous Section 2.
Prediction functions for F totals up to a given time are then constructed by replacing the non-sampled response totals with their model as well as design cum model (DCM)-based expectations. These expectations involve S regression parameters which are not easy to estimate consistently using the survey sample. More specifically, as the responses in a survey sample are subject to randomness due to both sample selection (as covariates from sample to sample change) and S model errors, unlike some of the existing studies [12,13,14,15], it is not possible to obtain any valid MUOLS/MUGLS (model unbiased ordinary/generalized least square) estimators for the regression parameters [16]. In Section 3 and Section 4, we thus develop suitable DCMU estimators for the regression parameters involved in the S -based expectations using the SSILS and SSCLS samples, respectively. Next, we use these DCMU estimators in the same Section 3 and Section 4 to form the DCMU predictions for the F total at a given time, using both the SSILS and SSCLS samples, respectively. We also include an alternative theoretical result for predictions using the MU function, but replacing the parameters in the expectations with DCMU estimators. We refer to this prediction approach as the design-assisted model-based (DAMB) approach. As expected, a simulation study in Section 5 shows that DCMU predictors perform better than the DAMB predictors, as expected. The paper concludes with a discussion and some concluding remarks in Section 6.

2. Materials: Individual or Cluster-Based Longitudinal Survey Data

In practice, there are many situations where longitudinal data are recorded at a F level, and it may be of interest to know the longitudinal pattern of a response variable over a small period of time by using a survey sample ( s * ) consisting of longitudinal responses chosen from the targeted F . However, the nature of the sample would depend on the form of the underlying F . For example, (a) suppose that an electric power company is interested in analyzing the household power consumption pattern over the last T = 4 years for a city with N (e.g., 10,000) households, where the sampling units are individual households. For this purpose, a single-stage individual household-based longitudinal sample (SSILS), say of size n = 500 , may be taken from the F , and their responses, along with covariates, may be used to understand the longitudinal pattern. However, there are different studies, for example, in health-related cases, (b) a health organization, such as Health Canada, as pointed out in the last section, may be interested to know the longitudinal pattern of the health condition of all members of the households in a state/province. Suppose that there are K households in the state/province and the c-th cluster/household has N c family members. Notice that these members are correlated and that the underlying correlation structure, unlike in case (a), has to be accommodated for any pattern-change analysis. In this example, one would take an SSCLS (single-stage cluster/household-based longitudinal sample) of size k households and use the sample to understand the longitudinal pattern of the F .
For a better understanding of the differences between the two aforementioned samples, SSILS and SSCLS, we present them in notational detail, including their respective F , as follows. We remark that these samples will be exploited in, respectively, for a model-based prediction of their F totals at a given marginal time. As far as the sample selection from the F is concerned, we use the well-known equal-probability-based SRSWOR (simple random sampling without replacement) in both cases.

2.1. SSILS Sample ( s 1 * ) from Individual-Based F 1

Individual-based longitudinal finite population F 1 . Let
F 1 : { ( y i : T × 1 , x i : T × p ) , i = 1 , , N }
be a longitudinal F with y i = ( y i 1 , , y i t , , y i T ) as a vector of T repeated responses for the ith individual, and x i = ( x i 1 , , x i t , , x i T ) denote the p × T covariates matrix with x i t as the p-dimensional covariate vector recorded at time t for the i-th individual in the FP. In reality, this F is unknown, and hence its data are hypothetical, unless a sample is taken to observe a part of the FP.
Survey sample s 1 * from F 1 using SRSWOR design. For the prediction of the F 1 total at a given time t , namely
τ y , t = i = 1 N y i t for t = 1 , , T ,
a sample s 1 * may be chosen from the F 1 in (1), as follows:
s 1 * { ( y i : T × 1 , x i : T × p ) ; i = 1 , , n } F 1
according to a suitable design, say ( D s 1 * ) , as
D s 1 * : P r ( s 1 * F 1 ) = 1 / N n ; π i = P r ( i s 1 * ) = P r [ δ i s 1 * = 1 ] = n N ,
with δ i s 1 * being a sample inclusion indicator variable.

2.2. SSCLS Sample ( s 2 * ) from Cluster-Based F 2

Cluster-based longitudinal finite population F 2 . Let
F 2 { ( y c i : T × 1 , x c i : T × p ) ; i = 1 , , N c ; c = 1 , , K }
be the targeted finite population, where K ( ) denotes the number of independent clusters/households with their sizes N 1 , , N c , , N K , N c being the size of the c-th cluster, which is small and fixed; and y c i = ( y c i 1 , , y c i t , , y c i T ) denotes a T dimensional hypothetical linear response vector containing T potential repeated responses from the i-th ( i = 1 , , N c ) individual of the c-th cluster/household under the finite population. In this setup, at a given time point t , the pair-wise hypothetical responses within the c-th cluster, namely y c i t and y c j t for i j ; i , j = 1 , , N c , are likely to be correlated, as they share an invisible random cluster/household effect leading to the cross-sectional within-cluster correlations; and the repeated responses from the i-th individual under the c-th cluster, namely y c i t and y c i , t + for t = 1 , , T ; = 1 , , T 1 , are also likely to be correlated, maintaining a dynamic dependence relationship, leading to the longitudinal correlations.
Survey sample s 2 * from F 2 using the SRSWOR design. Unlike the construction of s 1 * by (3), clusters are now considered to be the primary sampling units. Thus, s 2 * may be chosen from the F 2 in (5) as follows:
s 2 * { ( y c i : T × 1 , x c i : T × p ) ; c = 1 , , k ; i = 1 , , N c } F 2
according to a suitable design, say ( D s 2 * ) , as
D s 2 * : P r ( s 2 * F 2 ) = 1 / K k ; π c = P r ( c s 2 * ) = P r [ δ c s 2 * = 1 ] = k K ,
with δ c s 2 * being a cluster inclusion indicator variable.
Our purpose is to develop a suitable prediction function and its estimator for the F 2 total, namely for
τ y , t = c = 1 K i = 1 N c y c i t for t = 1 , , T .

3. Proposed DCMU Prediction Method Using SSILS Sample

Following (2), the F 1 total up to time t has the formula
τ y ( t ) = i = 1 N u = 1 t y i u .
Thus, once the SSILS s 1 * is chosen, this longitudinal cumulative total (LCT) may be expressed in terms of survey sampled ( s s ) and non-sampled ( n s ) LCTs as
τ y ( t ) = i = 1 N u = 1 t y i u = i s 1 * n u = 1 t y i u ( s s ) + i s 1 * N n u = 1 t y i u ( n s ) , for t = 1 , , T ,
where the second term in the right hand side of (6) is the n s response totals up to time t. For prediction inferences, this and similar n s response totals are predicted in general with their model-based expectations. Notice that because the repeated responses { y i 1 , , y i u , , y i T } for the i-th individual ( i = 1 , , N ) under the F 1 in (1) are supposed to be longitudinally correlated, we use a super-population correlation model S 1 as in Section 3.1 below. We may then use E S 1 [ · ] E F 1 S 1 to denote the model-based expectations; hence, the model unbiased (MU) prediction functions for the LCT in (9) have the following forms:
MU Prediction Function : τ ^ y ( t ) = i s 1 * n u = 1 t y i u + E F 1 S 1 i s 1 * N n u = 1 t y i u .
However, as the estimation of the expected function in the second term in (10) has to be performed using the sample s 1 * from (3), the decades-long existing studies (e.g., see [12] for independent data with T = 1 ; [14] (Section 2.6.2), [17] (Section 2.2) for clustered correlated data), conditional on s 1 * , have estimated the expected function by using the sampling sequence s 1 * S 1 , obtaining the prediction estimator as
MU Prediction Function Estimator : τ ^ y ( t ) = i s 1 * n u = 1 t y i u + E ^ s 1 * S 1 i s 1 * N n u = 1 t y i u .
Clearly, this estimation, based on the sequence s 1 * S 1 (i.e., treating the sample s 1 * as though it is taken directly from the super-population S 1 ), ignores the F 1 (1) as the source of the sample during the estimation process. Thus, this existing MU prediction approach is flawed, yielding invalid prediction.
As a remedy to the aforementioned anomaly, we propose a design cum model (DCMU) prediction function and estimate the expectation involved in the prediction function based on the true sampling sequence s 1 * F 1 S 1 , as follows:
DCMU Prediction Function : τ ˜ y ( t ) = i s 1 * n u = 1 t y i u + E s 1 * F 1 S 1 i s 1 * N n u = 1 t y i u = i s 1 * n u = 1 t y i u + N n N i = 1 N u = 1 t E F 1 S 1 [ y i u ] ,
yielding the DCMU prediction estimator as
DCMU Prediction Function Estimator : τ ˜ ^ y ( t ) = i s 1 * n u = 1 t y i u + N n N i = 1 N u = 1 t E ^ s 1 * F 1 S 1 [ y i u ] .
We further remark that, as the estimation of the expected function as in (11) is flawed, one may modify the estimation of the expected function by computing the true sampling sequence s 1 * F 1 S 1 -based estimation. We refer to this modified estimator as the design-assisted model-based (DAMB) predictor estimator, with its formula given by
DAMB Prediction Estimator : τ ^ ^ y ( t ) = i s 1 * n u = 1 t y i u + i s 1 * N n u = 1 t E ^ s 1 * F 1 S 1 ( y i u ) .
Note that, because of its validity concern, we no longer follow the MU prediction estimator from (11). As far as the computation of the proposed DCMU prediction estimator in (13) and the DAMB prediction estimator given by (14) are concerned, we demonstrate them by considering a S 1 correlation model for the F 1 (1) data as in the next section.

3.1. Super-Population ( S 1 ) Longitudinal Auto-Correlation Model

As pointed out in Section 1, the so called ‘working’ correlation structures used by some studies (e.g., [7] (Section 7.4), and [9] (ch. 20)) fail to accommodate the decaying correlation properties as time lag increases. As a remedy, in this section, by following [6] (Section 2.2) (see also [11] (Section 3)), we propose a lag-dependent correlation structure, i.e., a super-population ( S 1 ) correlation model for the longitudinal data in the F 1 . More specifically, we suggest using a general auto-correlation model, as follows, that accommodates frequently encountered so-called AR(1) (auto-regressive order 1), MA(1) (moving average order 1), and EQC (equi-correlation) correlation structures, among others. Thus, the hypothetical repeated responses for the ith ( i = 1 , , N ) individual, namely y i = ( y i 1 , , y i t , , y i T ) is assumed to follow the correlation model given by
F 1 S 1 : y i = x i β + ϵ i ,
with ϵ i = ( ϵ i 1 , , ϵ i t , , ϵ i T ) ( 0 , σ ϵ 2 R )
where v a r [ ϵ i t ] = σ ϵ 2 for all i = 1 , , N , and t = 1 , , T , and R is the T × T lag-dependent auto-correlation matrix defined as
R ( ρ ) = 1 ρ 1 ρ 2 ρ T 1 ρ 1 1 ρ 1 ρ T 2 ρ T 1 ρ T 2 ρ T 3 1 ,
where, for = 1 , , T 1 , ρ is known to be the th lag auto-correlation. Notice that x i in (15) is the T × p covariates matrix, as defined in (1), for the F 1 , and β is referred to as the S 1 regression parameters vector. Further notice that, as far as the general nature of the lag correlation matrix R in (17) is concerned, as mentioned above, these lag correlations maintain suitable special patterns under the AR(1), MA(1), and EQC models, respectively, as follows:
For   AR   ( 1 )   model : ϵ i t = ρ ϵ i , t 1 + a i t , with a i t i i d ( 0 , σ ϵ 2 )
ρ = ρ 1 ; for = 1 , , T 1 . For   MA   ( 1 )   model : ϵ i t = a i t ρ a i , t 1 , with a i t i i d ( 0 , σ ϵ 2 )
ρ 1 = ρ 1 + ρ , ρ = 0 ; for = 2 , , T 1 . For   EQC   model : ϵ i t = ρ a i 0 + a i t , with a i t i i d ( 0 , σ ϵ 2 ) , a i 0 i i d ( 0 , σ ϵ 2 ) ; a i t and a i 0 are   independent
ρ = ρ 2 1 + ρ 2 , for = 2 , , T 1 .
Thus, computing the correlation matrix R in (17) is sufficient for all of these three (and other similar) processes. The parameters ρ for = 1 , , T 1 are S 1 lag correlation parameters.
We remark that, even though the S 1 model is fitted to the F 1 data, as in (15)–(17), the super-population regression parameters β , along with lag correlations ρ , cannot, however, be estimated using F 1 as a sample, because it is only a hypothetical sample. In reality, these parameters, therefore, have to be estimated as optimally as possible by using the survey sample (SS) s 1 * constructed in (3). We provide this estimation in Section 3.3. Notice that these parameter estimates will then be used in (13) and (14) to obtain the DCMU and DAMB predictors in order to predict the targeted marginal FP totals. Now, because the super-population model is written as in (15) and (16), we can use the moment properties of the F 1 data and re-write the DCMU and DAMB prediction functions following (13) and (14), respectively, as
DCMU   Prediction   Function : τ ˜ y ( t ) = i s 1 * n u = 1 t y i u + N n N i = 1 N u = 1 t x i u β
DAMB   Prediction   Function : τ ^ y ( t ) = i s 1 * n u = 1 t y i u + i s 1 * N n u = 1 t x i u β .
where β must be estimated by accommodating the lag correlation matrix R ( ρ ) given by (17).
We further remark that, for infinite-population-based inferences for longitudinal data, many studies (e.g., [11], [6] (ch. 2, 7)) have used the auto-correlation model (17). However, for finite sampling inferences for longitudinal data, this model is not adequately discussed in the literature. On the contrary, to model the correlations of the repeated data from the same individual in a finite population setup, some authors such as [5] (ens. 2, 3) have used an individual specific random-effects-based linear mixed model given by
y i t = x i t β + γ i + ϵ i t ,
where γ i denotes the random effect of the ith individual which is shared by all responses over time t = 1 , , T . This model, however, produces equal correlations among the repeated responses, and hence, as pointed out in [18] (see also [19]), they fail to accommodate the time effects on correlations. Some other authors, such as [7] (Section 7.4), following the so-called ‘working’ correlations approach of [8], have suggested using an unstructured correlation matrix, say R * ( ρ ) = ( ρ u t , u t ; u , t = 1 , , T ) : R × R , where there are T ( T 1 ) / 2 correlations to compute by using a method of moments. There are, however, many inference issues with this ‘working’ correlation approach. For example, (a) this unstructured correlation matrix ignores the time effects on the repeated responses and hence fails to accommodate the lag effect on the association between two repeated responses. Consequently, unlike the lagged correlation structure R ( ρ ) shown in (17), this approach amounts to computing more number-paired correlations. (b) Also, as demonstrated in [6] (Section 6.4), this ‘working’ correlations approach may produce inefficient estimates compared to the simpler ‘working’ independence-based approach, which makes it an unacceptable inference approach for longitudinal data.
We now proceed to the next section for the estimation of the regression parameters β , which is involved in the prediction functions in (21) and (22). This estimation also requires the estimation of the lag correlations ρ , as well for = 1 , , T 1 . More specifically, in Section 3.2, we demonstrate how one could optimally estimate β using the F 1 (1)-based data, provided that these data were available. However, as these data are not available in practice, this hypothetical estimates turn out to be the so-called F 1 regression parameters [20,21]. In Section 3.3, we then use the SS s 1 * from (3) to develop a sampling weighted-design unbiased (SWDU) estimate for the finite-population regression parameters. The SWDU estimator, therefore, becomes the DCMU (design cum model unbiased) estimate for the super-population regression parameter β .

3.2. Hypothetical Estimation of the S 1 Model Parameters Using F 1 Data

  • Estimation of β :
Notice that the estimation of the regression parameter β involved in the prediction functions (21) and (22) depends on the available repeated data up to time t , for all t = 1 , , T . Hence, it is convenient to use multiple parameters, namely β ( t ) for β , in these prediction functions, for all t = 1 , , T . Note that, for t = 1 , the auto-correlation matrix R ( ρ ) in (17) does not play any role in estimating β ( 1 ) . For t = 2 , only one lag correlation, namely ρ 1 , would influence β ( 2 ) estimation. Similarly, for t = 3 ,   ρ 1 and ρ 2 would play a role in β ( 3 ) estimation, and so on. Thus, in view of the role of this time length in β estimation, we first rewrite the DAMB and DCMU prediction functions from (22) and (21) using β ( t ) for β , as follows.
DAMB   Prediction   Function : τ ^ y ( t ) = i s 1 * n u = 1 t y i u + i s 1 * N n u = 1 t x i u β ( t ) .
DCMU   Prediction   Function : τ ˜ y ( t ) = i s 1 * n u = 1 t y i u + N n N i = 1 N u = 1 t x i u β ( t ) .
We then obtain an optimal hypothetical estimator of β ( t ) using the F 1 (1)-based hypothetical data, as follows.
More specifically, as the F 1 is assumed to follow the S 1 regression model, as in (15) and (16), by writing
R ( t ) ( ρ ) = 1 ρ 1 ρ 2 ρ t 1 ρ 1 1 ρ 1 ρ t 2 ρ t 1 ρ t 2 ρ t 3 1 ,
for = 1 , , t 1 , one could obtain an optimal HGLS (hypothetical generalized least square) estimate of β ( t ) , say β ^ ( t ) , H G Q L β ( t ) , N provided that F 1 was available, by solving the underlying HGLS estimating equation as
G [ N , ( t ) ] ( β ( t ) | ( ρ 0 , ρ 1 , , ρ t 1 ) ) = i = 1 N x i R ( t ) 1 ( ρ ) ( y i x i β ( t ) ) = 0 β ( t ) = β ^ ( t ) , H G Q L β ( t ) , N , for t = 1 , , T .
Notice that this β ( t ) estimate is not computable, as it is a hypothetical estimate only because F 1 -based data are not available. Thus, it is referred to as the N-dependent F 1 regression parameter [21], which is yet to be unbiasedly estimated using the survey sample s 1 * (3). This we do in Section 3.3.2.
  • Estimation of ρ :
Notice that ρ is the lag auto-correlation for = 1 , , t 1 , as in (26) (see also (17)). More specifically, its S 1 model (15)–(17)-based formula is given by
ρ = E F 1 S 1 [ ( y i u x i u β ( t ) ) ( y i , u + x i , u + β ( t ) ) ] E F 1 S 1 ( y i u x i u β ( t ) ) 2
for all u = 1 , , t , with t = 2 , , T . Thus, if the F 1 -based data were available, one could use the well-known method of moments and consistently estimate ρ using the formula
ρ ^ , H M M = i = 1 N u = 1 t [ ( y i u x i u β ( t ) , N ) ( y i , u + x i , u + β ( t ) , N ) ] / N ( t ) i = 1 N u = 1 t [ y i u x i u β ( t ) , N ] 2 / N t = ρ , N , ( s a y ) , for = 1 , , t 1
where β ( t ) , N is the hypothetical estimate of β ( t ) given by (27). Notice that β ( t ) , N in (27) and ρ ^ , H M M ρ , N in (29) are computed iteratively. Thus, one may use ρ = 0 to obtain the initial value of β ( t ) , N by (27), which is then used in (29) to obtain the first-step estimate ρ , N for ρ . The iteration continues until convergence.

3.3. Real Life Estimation of the S 1 Model Parameters Using the Survey Sample s 1 *

We remark that, for a given t ( t = 1 , , T ) , in the last section, we obtained the formulas for the hypothetical estimates for the super-population ( S 1 ) regression parameter β ( t ) and lagged correlations ρ 1 , , ρ , , ρ t 1 . These hypothetical estimates are β ( t ) , N and ρ 1 , N , , ρ , N , , ρ ( t 1 ) , N , respectively, which are referred to as the finite-population ( F 1 ) parameters. Thus, in reality, these parameters must be estimated based on the sampled data. The purpose of this section is to demonstrate how to exploit the survey sample s 1 * in (3) taken from the F 1 in (1) by using the SRSWOR sampling design D s 1 * given by (4) in order to obtain design-optimal estimates for the F 1 parameters. Note that the sampling design D s 1 * in (4) is widely used in practice. However, other designs also can be applied when appropriate.

3.3.1. Estimating Function Approach for Design Unbiased (DU) Estimation of β ( t ) , N

Because β ( t ) , N is the solution of the F 1 -based estimating equation G [ N , ( t ) ] ( · ) = 0 given by (27), for its design optimal estimation, one needs to develop a s 1 * -based estimating equation, say g [ n , ( t ) ] ( · ) = 0 such that
E D s 1 * [ g [ n , ( t ) ] ( · ) ] E s 1 * F 1 [ g [ n , ( t ) ] ( · ) ] = G [ N , ( t ) ] ( · )
(e.g., [21]). To achieve this goal, based on the sampling design D s 1 * from (4), we consider the sampling weight w i s 1 * = N / n for the selection of ith individual in the sample s 1 * from F 1 . Now, because N individuals belonging to the F 1 are independent, one may follow the structure of G [ N , ( t ) ] ( · ) in (27) and develop the sampling weighted GLS (SWGLS) estimating function g [ n , ( t ) ] ( · ) as
g [ n , ( t ) ] ( β ( t ) , N | ( ρ 0 , ρ 1 , , ρ t 1 ) ) = i = 1 n w i s 1 * x i R ( t ) 1 ( ρ ) ( y i x i β ( t ) , N )
further producing the SWGLS estimating equation that yields the SWGLS estimate of β ( t ) , N as
g [ n , ( t ) ] ( β ( t ) , N | ( ρ 0 , ρ 1 , , ρ t 1 ) ) = 0 β ^ ( t ) , N = i = 1 n w i s 1 * x i R ( t ) 1 ( ρ ) x i 1 i = 1 n w i s 1 * x i R ( t ) 1 ( ρ ) y i ,
which is DU (design unbiased) for the F 1 parameter β ( t ) , N . This is because
E D s 1 * [ g [ n , ( t ) ] ( · ) ] = E D s 1 * i = 1 n w i s 1 * x i R ( t ) 1 ( ρ ) ( y i x i β ( t ) , N ) = i = 1 N w i s 1 * E D s 1 * [ δ i s 1 * ] x i R ( t ) 1 ( ρ ) ( y i x i β ( t ) , N ) with δ i s 1 * as   the   inclusion   indicator   from   ( 4 ) = G [ N , ( t ) ] ( β ( t ) , N | ( ρ 0 , ρ 1 , , ρ t 1 ) ) ,
same as the F 1 -based estimating function in (27).

3.3.2. Estimating Function Approach for Design Unbiased (DU) Estimation of ρ , N

To obtain the sample s 1 * -based estimate for the finite population ( F 1 ) lag correlation parameters ρ , N given in (29), we use the same estimating function approach as used for the estimation of regression parameters ( β ( t ) , N ) in (27) with β ^ ( t ) , N given by (31). More specifically one may use the sampling weighted auto-covariance function as a DU estimator for the auto-covariance in the numerator in (29) because it can be shown that
i = 1 n w i s 1 * u = 1 t [ ( y i u x i u β ( t ) , N ) ( y i , u + x i , u + β ( t ) , N ) ] = i = 1 N u = 1 t [ ( y i u x i u β ( t ) , N ) ( y i , u + x i , u + β ( t ) , N ) ] .
Similarly it can be shown that
i = 1 n w i s 1 * u = 1 t [ y i u x i u β ( t ) , N ] 2 = i = 1 N u = 1 t [ y i u x i u β ( t ) , N ] 2
with respect to DU estimation for the variance term in the denominator in (29).
By combining (33) and (34) one then obtain a first order DU estimator of ρ , N given in (29) as
ρ ^ , N = i = 1 n w i s 1 * u = 1 t [ ( y i u x i u β ( t ) , N ) ( y i , u + x i , u + β ( t ) , N ) ] / N ( t ) i = 1 n w i s 1 * u = 1 t [ y i u x i u β ( t ) , N ] 2 / N t .
Finally, by using ρ ^ , N from (35), we obtain the final DU regression estimator β ^ ( t ) , N for β ( t ) , N by (31). Further, because β ( t ) , N from (27) is MU (model unbiased) for β ( t ) , it then follows that β ^ ( t ) , N from (31) is DCMU (design cum model unbiased) estimator for β ( t ) involved in the prediction functions in (24) and (25). Hence, by using β ^ ( t ) , N from (31) for β ( t ) in (24) and (25), we obtain the desired prediction function estimators as
DAMB   Prediction   Function   Estimator : τ ^ ^ y ( t ) = i s 1 * n u = 1 t y i u + i s 1 * N n u = 1 t x i u β ^ ( t ) , N
Marginal   Prediction   at   Time   t : τ ^ y ( 1 ) * = τ ^ ^ y ( 1 ) ; τ ^ y ( t ) * = [ τ ^ ^ y ( t ) τ ^ ^ y ( t 1 ) ] for t = 2 , , T .
DCMU   Prediction   Function   Estimator : τ ˜ ^ y ( t ) = i s 1 * n u = 1 t y i u + N n N i = 1 N u = 1 t x i u β ^ ( t ) , N
Marginal   Prediction   at   Time   t : τ ˜ y ( 1 ) * = τ ˜ ^ y ( 1 ) ; τ ˜ y ( t ) * = [ τ ˜ ^ y ( t ) τ ˜ ^ y ( t 1 ) ] for t = 2 , , T .

4. Proposed DCMU Prediction Method Using SSCLS Sample: A Generalization

As described in Section 2, more specifically in Section 2.2, dealing with a cluster-based F inferences would require an additional cluster correlation parameter estimation on top of regression and longitudinal correlation estimation, which we have performed in Section 3. In an infinite population setup, this type of familial/cluster longitudinal data have been discussed extensively in the literature (e.g., [6] chps 3, 8, 9).
Turning back to the F setup, in Section 4.1 below, we write a cluster-based finite population (FP: F 2 ) and provide a cluster longitudinal super-population (SP: S 2 ) model to fit the F 2 -based hypothetical data. Similar to the previous section, this SP model fitting to the FP data would be utilized to obtain hypothetical estimates for the SP parameters, where these estimates are referred to as the FP parameters [20,21]. This hypothetical estimation is given in Section 4.2. Next, we exploit the SSCLS sample s 2 * from (6) for DCMU estimation for all parameters, including the cluster regression parameters. We remark here that, in the context of MU prediction, this type of cluster-based F was used by many authors in the past in a two-stage cluster sampling setup [22,23]. However, as demonstrated in [16], the MU estimation approach is flawed and would result in an invalid MU prediction. On the contrary, similarly to Section 3, in this section, we provide valid DCMU estimation-based prediction functions.

4.1. Cluster-Based Longitudinal FP and Its SP Model

As opposed to the individual-based FP (1) studied in the last section, here we consider clusters, such as household-based FP ( F 2 ) , involving a large number of independent households, with each member of a household having repeated potential responses in a longitudinal setup. Typically, cluster/household sizes are small. Next, because the F 2 is unknown or hypothetical, any finite sampling inferences require a sample, say s 2 * , to be chosen from the FP and exploit it to design optimal inferences for the targeted FP parameters. When the whole small-sized cluster is chosen as a sampling unit, the resulting longitudinal survey-based sample is referred to as the SSCLS (single-stage cluster-based longitudinal survey) sample. In Section 2.2, it was shown how one can construct the sample s 2 * (6) from F 2 defined by (5).
Now, by treating the F 2 in (5) as a large sample from a longitudinal super-population ( S 2 ) , its cluster-based hypothetical longitudinal data may be modeled as
F 2 S 2 : y c i t = x c i t β + γ c + ϵ c i t , t = 1 , , T ; i = 1 , , N c ; c = 1 , , K γ c i i d ( 0 , σ γ 2 ) ϵ c i t i i d ( 0 , σ ϵ 2 ) γ c and   ϵ c i t   are   independent ; corr ( ϵ c i t . ϵ c i , t + ) ρ , = 1 , , T 1 ,
equivalently in vector-matrix notations
F 2 S 2 : y c i = x c i β + 1 T γ c + ϵ c i ,
where, for y c i = ( y c i 1 , , y c i t , , y c i T ) , ϵ c i = ( ϵ c i 1 , , ϵ c i t , , ϵ c i T ) we write
ϵ c i ( 01 T , σ ϵ 2 R ( ρ 1 , , ρ , , ρ T 1 ) ) E S 2 ( Y c i ) = x c i β , c o v S 2 ( Y c i ) = σ γ 2 1 T 1 T + σ ϵ 2 R ( ρ ) = σ γ 2 U T T + σ ϵ 2 R ( ρ ) ,
where 1 T is the T-dimensional unit vector and R ( ρ ) is a T × T general lag correlation matrix, as in (17) under Section 3.1, ρ being referred to as the lag longitudinal correlation, and U T T is the T × T unit matrix. Further notice that, at every point of time, two individuals, i and j, belonging to the same cluster c are structurally correlated, and hence
y c i = x c i β + 1 T γ c + ϵ c i , and y c j = x c j β + 1 T γ c + ϵ c j
have their covariance matrix as
c o v S ( Y c i , Y c j ) = σ γ 2 1 T 1 T = σ γ 2 U T T
where U T T : T × T is the unit matrix. Thus, by combining (42) and (43), we obtain the mean and covariance structure for the longitudinal response vector for the individuals in the cth cluster, namely for y c = ( y c 1 , , y c i , , y c N c ) : N c T × 1 , as
E S 2 [ Y c ] = X c β = x c 1 x c i x c N c β : N c T × 1
c o v S 2 [ Y c ] = σ ϵ 2 [ R ( ρ ) , , R ( ρ ) , , R ( ρ ) ] + σ γ 2 U N c T , N c T σ ϵ 2 [ I N c R ( ρ ) ] + σ γ 2 U N c T , N c T = Σ c , N c T ( σ 2 , ϕ , ρ ) : N c T × N c T
where U N c T , N c T : N c T × N c T unit matrix, σ 2 = [ σ ϵ 2 + σ γ 2 ] , ϕ = σ γ 2 / σ 2 , and ρ is the longitudinal correlation index parameter representing all lagged correlations, namely ρ ( ρ 1 , , ρ , , ρ T 1 ) .
We remark that the proposed cluster-based longitudinal model (40)–(43) is a generalization of the individual-based longitudinal model (15)–(17) under Section 3.1 to the cluster setup. This model (40)–(43) also may be treated as a generalization of the cluster regression model (e.g., [22] (eqns. 3.1, 3.2), [24] (eqn 2.1), [25] (eqns. 2.1, 2.2), [23] (eqn. 1), and [14] (eqn. 9.11)) to the longitudinal setup. There is also a difference at the cluster level, as we are considering small-sized clusters, such as households, and whether these studies dealt with larger clusters, prompting the need for two-stage cluster-sampling-based inferences, as opposed to our single-stage cluster-sampling-based inference.

4.2. FP Data-Based Hypothetical Estimation Equations

Notice that if the FP ( F 2 ) data (5) were available, the SP regression parameters β could be optimally estimated by exploiting the SP model-based moment properties (44) and (45), more specifically by solving the hypothetical GLS (HGLS) estimating equation as
Hypothetical   Estimating   Equation   for   β : G N ( β | σ 2 , ϕ , ρ ) = c = 1 K X c Σ c , N c T 1 ( σ 2 , ϕ , ρ ) ( y c X c β ) = 0 β ^ ( T ) , H G L S = c = 1 K X c Σ c , N c T 1 ( σ 2 , ϕ , ρ ) X c 1 c = 1 K X c Σ c , N c T 1 ( σ 2 , ϕ , ρ ) y c = β ( T ) , N , ( say )
where β ^ ( T ) H G L S is used to denote the hypothetical GLS estimator of β involving the F 2 -based hypothetical responses up to time T in a cluster setup, which is also denoted as β ( T ) , N , an N-dependent FP regression parameter [16,20,21]. Note that the computation for Σ c , N c T 1 ( σ 2 , ϕ , ρ ) in (46) may be simplified by recalling the formula for Σ c , N c T ( · ) from (45) as
Σ c , N c T ( σ 2 , ϕ , ρ ) = σ ϵ 2 [ I N c R ( ρ ) ] + σ γ 2 U
and hence writing
Σ c , N c T 1 ( σ 2 , ϕ , ρ ) = 1 σ ϵ 2 [ I N c R 1 ] σ γ 2 σ ϵ 2 [ I N c R 1 ] U [ I N c R 1 ] 1 + σ γ 2 σ ϵ 2 1 c [ I N c R 1 ] 1 c
(e.g., [6] (Section 3.1)) where
σ 2 = [ σ γ 2 + σ ϵ 2 ] , ϕ = σ γ 2 / σ 2 , ρ ( ρ 1 , , ρ , , ρ T 1 ) .
As the optimal estimation of β by (46) depends on the estimates of cluster variance and longitudinal correlation parameters, we estimate the rest of these parameters using an MU estimating equation approach. More specifically, by generalizing the individual-based hypothetical MM (HMM) Formula (29) under Section 3.2 to the cluster/household setup (see also [6] (Section 3.1.2)), we obtain the HMM estimating formula for ρ as
Hypothetical   Method   of   Moments   Estimating   Formula   for   ρ : = 1 , , t 1 ρ ^ , H M M = 1 [ 1 ϕ ^ 0 , H M M ] × c = 1 K i = 1 N c u = 1 t [ ( y c i u x c i u β ( t ) , N ) ( y c i , u + x c i , u + β ( t ) , N ) ] / N ( t ) c = 1 K i = 1 N c u = 1 t [ y c i u x c i u β ( t ) , N ] 2 / N t ϕ ^ 0 , H M M = ρ , N , ( s a y ) , for = 1 , , t 1 ,
with N = c = 1 K N c , and where β ( t ) , N is the hypothetical estimate of β ( t ) obtained by the formula in (46) but using cluster-based individuals’ responses up to time t , for t = 1 , , T . Furthermore, ϕ ^ 0 , H M M in (48) is an initial estimate of ϕ , a specialized value of ϕ ^ H M M using ρ ^ = 0 . The formulas for ϕ ^ H M M and ϕ ^ 0 , H M M are given below in (49).
For the purpose of formulating ϕ estimate, notice from (47) that the ϕ parameter is involved in variances and covariances of the within-cluster responses. Thus, by pooling the F 2 -based sum of squares and sum of products and equating to its S 2 -based expectation, after some algebra, we obtain
Hypothetical   Method   of   Moments   Estimating   Formula   for   ϕ : S ˜ = c = 1 K i = 1 N c u , u = 1 t [ ( y c i u x c i u β ( t ) , N ) ( y c i u x c i u β ( t ) , N ) ] + 2 c = 1 K i < j N c u , u = 1 t [ ( y c i u x c i u β ( t ) , N ) ( y c j u x c j u β ( t ) , N ) ] / c = 1 K i = 1 N c u = 1 t [ y c i u x c i u β ( t ) , N ] 2 ϕ ^ H M M = S ˜ 1 t [ t + 2 { ( t 1 ) ρ 1 + + 2 ρ t 2 + ρ t 1 } ] t ( c = 1 K N c 2 ) / N 1 t [ t + 2 { ( t 1 ) ρ 1 + + 2 ρ t 2 + ρ t 1 } ] = ϕ N , ( say ) ,
with the initial ϕ estimate used in (48), as
ϕ ^ 0 , H M M = S ˜ 1 t ( c = 1 K N c 2 ) / N 1
The remaining parameter σ 2 = [ σ γ 2 + σ ϵ 2 ] , as it is a variance parameter for all individuals under all households, has its HMM formula already used in the denominators of (48) and (49). For the sake of completeness, we give its HMM formula as follows:
σ ^ H M M 2 = c = 1 K i = 1 N c u = 1 t [ y c i u x c i u β ( t ) , N ] 2 / N t = σ N 2 , ( s a y ) .
where N = c = 1 K N c .

4.3. Survey Sample ( s 2 * ) -Based DU Estimating Equations

Notice that all SP parameter estimates obtained in the last section, namely β ( T ) , N , ρ , N , ϕ N , and σ N 2 , computed by (46), (48), (49), and (50), respectively, are all FP parameters. Their real life estimation has to be performed using the sampled data s 2 * from (6), as well as the rest of the covariates information available from the sampling frame. Now, to obtain DU (design unbiased) estimates for these FP parameters, we simply use the sampling weighted (SW) total for each of the FP-based total functions involved in the formulas from (46) to (50). Thus, by using the sampling weight, say w c s 2 * = K / k following (7) (which is the inverse of the inclusion probability for the cth cluster in the sample), we obtain the DU estimates for the aforementioned FP parameters as
β ^ ^ ( T ) , S W G L S = c = 1 k w c s 2 * X c Σ c , N c T 1 ( · ) X c 1 c = 1 k w c s 2 * X c Σ c , N c T 1 ( · ) y c = β ^ ( T ) , N ;
ρ ^ ^ , S W M M ρ ^ , N = 1 1 ϕ ^ 0 , S W M M × c = 1 k i = 1 N c u = 1 t w c s 2 * [ ( y c i u x c i u β ( t ) , N ) ( y c i , u + x c i , u + β ( t ) , N ) ] / N ( t ) × c = 1 k i = 1 N c u = 1 t w c s 2 * [ y c i u x c i u β ( t ) , N ] 2 / N t 1 ϕ ^ 0 , S W M M
ϕ ^ ^ S W M M ϕ ^ N = S ˜ ^ S W M M 1 t [ t + 2 { ( t 1 ) ρ 1 + + 2 ρ t 2 + ρ t 1 } ] t ( c = 1 K N c 2 ) / N 1 t [ t + 2 { ( t 1 ) ρ 1 + + 2 ρ t 2 + ρ t 1 } ]
σ ^ ^ S W M M 2 = c = 1 k i = 1 N c u = 1 t w c s 2 * [ y c i u x c i u β ( t ) , N ] 2 / N t = σ ^ N 2 ,
where in (54)
S ˜ ^ S W M M = c = 1 k i = 1 N c u , u = 1 t w c s 2 * [ ( y c i u x c i u β ( t ) , N ) ( y c i u x c i u β ( t ) , N ) ] + 2 c = 1 k i < j N c u , u = 1 t w c s 2 * [ ( y c i u x c i u β ( t ) , N ) ( y c j u x c j u β ( t ) , N ) ] / c = 1 k i = 1 N c u = 1 t w c s 2 * [ y c i u x c i u β ( t ) , N ] 2 ,
and in (53)
ϕ ^ 0 , S W M M = S ˜ ^ S W M M 1 t ( c = 1 K N c 2 ) / N 1 .

4.4. Formulation of the DCMU Prediction Functions Using SSCLS Sample

As the F 2 is unknown, for the prediction of the F 2 total at a given time t , we first write its LCT (longitudinal cumulative total) and split this total in terms of sampled (ss) and non-sampled (ns) response totals as
τ y ( t ) = c = 1 K i = 1 N c u = 1 t y c i u = c s 2 * k i = 1 N c u = 1 t y c i u ( s s ) + c s 2 * K k i = 1 N c u = 1 t y c i u ( n s ) , for t = 1 , , T ,
which is similar to, but different from, the LCT split in (9) based on the SSILS sample. By following the same technique used in the SSILS setup, more specifically by following (14) and (13) from Section 3, we write the final DAMB and DCMU predictor for the cumulative and marginal totals as follows:
DAMB   Prediction   Function   Estimator : τ ^ ^ y ( t ) = c s 2 * k i = 1 N c u = 1 t y c i u + c s 2 * K k i = 1 N c u = 1 t E ^ s 2 * F 2 S 2 ( y c i u )
= c s 2 * k i = 1 N c u = 1 t y c i u + c s 2 * K k i = 1 N c u = 1 t x c i u β ^ ( t ) , N ( ρ ^ N , ϕ ^ N )
DAMB   Marginal   Prediction   at   Time   t : τ ^ y ( 1 ) * = τ ^ ^ y ( 1 ) ; τ ^ y ( t ) * = [ τ ^ ^ y ( t ) τ ^ ^ y ( t 1 ) ] for t = 2 , , T .
DCMU   Prediction   Function   Estimator : τ ˜ ^ y ( t ) = c s 2 * k i = 1 N c u = 1 t y c i u + K k K c = 1 K i = 1 N c u = 1 t E ^ s 2 * F 2 S 2 ( y c i u )
= c s 2 * k i = 1 N c u = 1 t y c i u + K k K c = 1 K i = 1 N c u = 1 t x c i u β ^ ( t ) , N ( ρ ^ N , ϕ ^ N )
DCMU   Marginal   Prediction   at   Time   t : τ ˜ y ( 1 ) * = τ ˜ ^ y ( 1 ) ; τ ˜ y ( t ) * = [ τ ˜ ^ y ( t ) τ ˜ ^ y ( t 1 ) ] for t = 2 , , T .
Notice that β ^ ( t ) , N ( ρ ^ N , ϕ ^ N ) in (57) and (60) is a DCMU estimate of β , as in (51), which is computed based on the SSCLS sample s 2 * from (6). As this estimate depends on the estimates of longitudinal correlation ρ and cluster correlation ϕ , these later estimates were computed step by step as in (52) and (53), respectively.

5. Prediction Comparison Using Simulation Results

Our objective in this section is to examine the finite sampling performance of the proposed DCMU and DAMB prediction estimators for the FP totals using both SSILS and SSCLS survey data. The precise formulas for these predictors are developed in Section 3.3.2 and Section 4.4 based on the SSILS and SSCLS samples, respectively. As a criterion to understand the performance of the parameter estimators, we checked the amount of bias of an estimator from its true value. However, as the large bias and small standard error of an estimator indicates the worst performance of an estimator, we have used the percentage relative bias (see (63) below under Section 5.1) as a criterion to compare the performance of the DCMU and DAMB total predictors.
We now proceed for simulation studies, first for SSILS sample-based prediction and then for SSCLS sample-based prediction. Details, including how the data are generated and the estimators are obtained to compute the prediction functions, are given in Section 5.1 using the SSILS sample and in Section 5.2 using the SSCLS sample.

5.1. Simulation Study 1: Prediction Performance Using Individual-Based Longitudinal Survey Sample

Recall from Section 3 that, even though the longitudinal responses from the individuals in F 1 (1) are hypothetical, in a model-based approach, it is assumed that the repeated responses of an individual are likely to follow an auto-correlation structure given in (15)–(17). In this section, we conduct a simulation study to examine first the performance of a longitudinal correlation-based SWGLS (sampling weighted GLS) approach (31) in estimating the F 1 regression parameters over time by using the survey sample s 1 * given by (3) and (4). We then examine the performance of two competitive, namely the DAMB (design assisted model-based) and DCMU (design cum model unbiased) predictors given by (36)–(39) for FP ( F 1 ) totals over time. We remark that these predictors are formulated using DCMU estimates for SP ( S 1 ) regression parameters involved in the prediction functions.

5.1.1. Simulation Design in Steps (S1–S7)

S1. 
We specify the SP ( S 1 ) model (15) with a set of regression parameters as β 0 = 1.0 , β 1 = 0.5 , β 2 = 0.2 , and β 3 = 0.5 , and its longitudinal correlation structure (17) based on T = 4 time periods involving lag correlations: ρ 1 , ρ 2 , and ρ 3 .
S2. 
We consider three widely used correlation structures, namely AR1 ( ρ = 0.7 ) , MA1 ( ρ = 0.4 ) , and EQC ( ρ = 0.4 ) . As our estimation method is not model-dependent, we thus estimate the lag correlations ρ 1 , ρ 2 , and ρ 3 , irrespective of the model. For example, when data are generated with true model, say AR(1) ( ρ = 0.7 ) , estimates of lag correlations are obtained for the true SP lag correlations ρ 1 = ρ = 0.7 , ρ 2 = 0.49 , and ρ 3 = 0.34 , and so on.
S3. 
We consider N = 1000 household leaders with responses on household annual electricity consumption for T = 4 years, in the FP ( F 1 ) (1), and their four-time independent household-related covariate values with covariates x 2 as the size of the household, ( x 3 , x 4 ) [ ( 0 , 0 ) or ( 1 , 0 ) or ( 0 , 1 ) ] as two categorical covariates representing three household income levels, more specifically with
x i 1 = 1 an   intercept   covariate   for i = 1 , , 1000 x i 2 = 1 for i = 1 , , 100 2 for i = 101 , , 500 3 for i = 501 , , 700 4 for i = 701 , , 1000 ( x i 3 , x i 4 ) = ( 0 , 0 ) for i = 1 , , 50 ; 101 , , 150 ; 501 , , 550 ; 701 , , 750 ( 1 , 0 ) for i = 51 , , 75 ; 151 , , 450 ; 551 , , 650 ; 751 , , 950 ( 0 , 1 ) for i = 76 , , 100 ; 451 , , 500 ; 651 , , 700 ; 951 , , 1000
S4. 
Using σ ϵ 2 = 1 , and the parameter values, correlation structures, and covariates from steps 1 to 3 above, we generate, for a given t ( t = 1 , , T ) , the longitudinal responses { y i u , u = 1 , , t ; i = 1 , , N } .
S5. 
We then choose a sample s 1 * (3) of size n = 100 households from the F 1 of size N = 1000 , using the SRSWOR sampling design, as in (4), along with their responses and covariates { ( y i u , x i u ) , u = 1 , , t ; i = 1 , , n } . The covariate values for the non-sampled individuals are assumed to be known from an underlying sampling frame.
S6. 
Finally, the sample s 1 * from step 5 is used to compute the SWGLS estimate of β t , N (FP parameter), namely β ^ ( t ) , N by (31), and lag correlation estimate ρ ^ , N for ρ , N (FP parameter) by (35). These estimates, more specifically β ^ ( t ) , N are then used to compute the marginal (at a given time t) total prediction estimates τ ^ ( t ) * by (37) and τ ˜ ( t ) * by (39). The percentage relative biases (PRB), namely
P R B ( τ ^ ( t ) * ) = | τ ^ ( t ) * τ ( t ) s . e . ( τ ^ ( t ) * ) × 100 ; P R B ( τ ˜ ( t ) * ) = | τ ˜ ( t ) * τ ( t ) | s . e . ( τ ˜ ( t ) * ) × 100 ,
are also computed.
S7. 
We then repeat steps 5 and 6 for 25 times and compute the simulation average of the regression and correlation estimates under three correlation structures, AR(1), MA(1), and EQC, which are reported in Table 1, Table 2 and Table 3, respectively. The simulation average of the prediction estimates, along with their percentage relative biases (PRBs), are reported in Table 4.

5.1.2. Simulated Prediction Performance for Marginal FP Totals

Note that as the FP total predictors given by (36) and (38) depend on the sample ( s 1 * ) -based estimates β ^ ( t ) , N (31) for the SP regression parameters β ( t ) β for t = 1 , , T , it is therefore important to examine the performance of this estimator β ^ ( t ) , N in estimating the FP parameter β ( t ) , N given by (27) corresponding to β ( t ) . Furthermore, as β ( t ) , N in (27) depends on the auto-correlation structure R ( ρ ) in (26) up to time t , for t = 1 , , T , we first exploit the FP generated by (15)–(17) to compute the FP regression parameters β ( t ) , N by (27), as well as the FP lag correlation parameters ρ , N by (29). These FP parameters β ( t ) , N corresponding to SP parameter β , and the FP correlation parameters ρ , N corresponding to SP correlation parameters ρ for t = 1 , , 4 , under all three auto-correlations, namely, the AR(1), MA(1), and EQC models, are displayed in the upper half of the Table 1, Table 2, and Table 3, respectively. For example, Table 1, based on the AR(1) model, shows four-dimensional (using four covariates) β ( t ) , N values at t = 4 as
β ( 4 ) , N [ 1.0491 , 0.5007 , 0.1581 , 0.4468 ]
corresponding to the SP regression parameters
β [ 1.0 , 0.5 , 0.2 , 0.5 ] .
Similarly, FP and SP lag correlations may be interpreted. Next, because the sample s 1 * -based (see step 5 in the last sub-section) estimation of β amounts to the estimation of β ( t ) , N , the final estimates β ^ ( t ) , N computed by (31) (which is a simulation average based on 25 repetitions) under the AR(1), MA(1), and EQC processes are displayed in the lower half of the Table 1, Table 2, and Table 3, respectively. For example, the aforementioned β ( 4 ) , N under the AR(1) process are estimated as
β ^ ( 4 ) , N [ 1.026 , 0.514 , 0.134 , 0.394 ]
which and other similar estimates from Table 2 and Table 3 appear to perform well reflecting their design unbiasedness for the FP regression parameters β ( t ) , N .
Next, as indicated in Step 6 in the last sub-section, the aforementioned sample-based regression estimates β ^ ( t ) , N are used to compute the design-assisted model-based (DAMB), as well as design cum model-based (DCMB), marginal (at a given time t) total predictors τ ^ ( t ) * (37) and τ ˜ ( t ) * (39), for predicting/estimating the marginal total τ ( t ) = i = 1 N y i t . The results from Table 4 show that these two predictors appear to perform almost the same, and the estimates are very close to the true FP totals. For example, under the AR(1) process (displayed in the extreme left block), the FP total τ ( t ) at time t = 2 , i.e., τ ( 2 ) = 2574.4 is predicted by MB predictor as τ ^ ( 2 ) * = 2559.1 with PRB as 27.3 , and it is predicted by DCMB predictor as τ ˜ ( 2 ) * = 2559.1 with slightly different PRBs as 27.6 . Thus they perform almost the same for the targeted prediction. Their equivalent performances under the MA(1) and EQC can be interpreted similarly.

5.2. Simulation Study 2: Prediction Performance Using Cluster-Based Longitudinal Survey Sample

In the last section, we examined the prediction performances using the individual-based longitudinal survey sample. As a generalization, we now examine the prediction behaviors for cluster-based FP total predictors over a longitudinal period of the study developed, as in (56)–(61). Notice that, as opposed to the individual-based FP ( F 1 ) (1), a cluster-based FP ( F 2 ) is given by (5) with its SP model S 2 given by (40)–(42). The N c individuals under the cth cluster are now correlated with the cluster correlation coefficient ϕ = σ γ 2 / σ 2 defined in (45). As an illustration in the present simulation, we have generated a cluster-correlated population and, hence, the sample with ϕ = 0.33 , for example. As far as longitudinal correlations are concerned, we use the widely used AR(1) structure with ρ = 0.5 leading to 3 lag-dependent correlation coefficients ρ 1 = 0.50 , ρ 2 = 0.25 , ρ 3 = 0.125 , under total time period T = 4 .
We now construct our hypothetical cluster-based F 2 as follows.
  • Let F 2 consists of K = 275 independent clusters/families. We label these clusters in sequence from 1 to 275 . Following the notations from (40), we consider four different family structure (FS1 to FS4) with their family/cluster sizes ( N c ) as follows:
  • FS1: N c = 4 , c = 1 , , 175 , each with 2 parents (say, father (F) and mother (M)) and 2 children (C1 and C2);
  • FS2: N c = 3 , c = 176 , , 225 , each with 2 parents (F and M) and 1 child (C1);
  • FS3: N c = 3 , c = 226 , , 250 , each with 1 parent (F) and 2 children (C1 and C2);
  • FS4: N c = 3 , c = 251 , , 275 , each with 1 parent (M) and 2 children (C1 and C2).
As far as the covariates are concerned, we consider three covariates, namely age ( x c i t , 1 x 1 ) , smoking status of the individual member at initial time point ( x c i t , 2 x 2 ) , and gender ( x c i t , 3 x 3 ) . We explain below how we generated the covariates under FS1. The covariates under remaining FS2–FS4 are generated similarly.
1.
Generation of x 1 for FS1: (a) For father’s age, we have generated 175 ages from a uniform distribution with range 50–60. (b) For mother’s age, in sequence (following father’s label), we generate one age difference indicator value, say d a , randomly from seven different age difference indicators ( d a [ 2 , 1 , 0 , 1 , 2 , 3 , 4 ) , and computed the selected mother’s age as x 1 ( M ) = x 1 ( F ) d a . (c) To consider the ages of C1, we used x 1 ( C 1 ) = x 1 ( of   younger   between   F   and   M ) d a , where now d a was chosen as a randomly selected value from a set of age difference values [ 20 , 21 , 22 , 23 , 24 , 25 ] . (d) For the age of C 2 (corresponding to C1), we have used the formula x 1 ( C 2 ) = x 1 ( C 1 ) d a with d a as a randomly selected value from a set of age difference values [ 1 , 2 , 3 , 4 ] .
2.
Generation of x 2 for FS1: Smoking habits for the members were determined using the binary distribution with a probability smoking rate of π , say. More specifically, we used
x 2 ( F ) bin ( π = 0.5 ) ; x 2 ( M ) bin ( π = 0.5 ) ; x 2 ( C 1 ) bin ( π = 0.1 ) ; x 2 ( C 2 ) bin ( π = 0.05 ) .
3.
Generation of x 3 for FS1: We considered x 3 ( F ) = 1.0 , x 3 ( M ) = 0 , and to determine gender for both C1 and C2, we used x 3 ( C 1 ) x 3 ( C 2 ) bin ( π = 0.5 ) .
Based on the aforementioned covariate values and using their effects as β 0 ( intercept ) = 1.0 , β 1 ( age   effect ) = 0.5 , β 2 ( smoking   effect ) = 0.2 , β 3 ( gender   effect ) = 0.5 ; and further using cluster correlation ϕ = 0.33 , and AR(1) longitudinal correlation process with ρ = 0.5 , the F 2 responses { y c i t } , namely body mass index ( b m i ) over a longitudinal period T = 4 (equivalent to say 2 years) were generated using the SP ( S 2 ) correlation model (40). Before one can examine the prediction performance of the DAMB (56)–(58) and DCMU (59)–(61) marginal predictors at a given time, it is necessary to first compute the FP data-based regression estimates (FPRE) and then the survey sample ( s 2 * ) ((51)–(53))-based regression estimates (SSRE) after accommodating the cluster correlations (indexed by ϕ ) and AR(1) longitudinal correlations indexed by ρ ( ρ 1 , ρ 2 , ρ 3 ) .
As far as the SS s 2 * (6) is concerned, we use SRSWOR and chose 32 families from 175 families under FS1; 12 families from 50 families under FS2; 6 families from 25 families under FS3; and 6 other families from 25 families under FS4. Thus, altogether, we chose k = 56 clusters/families with sample size n = 200 from K = 275 clusters under the FP F 2 of size N = 1000 . The SP parameter estimates (i.e., the FP correlation and regression parameters) by using the estimating Equations (49) (for cluster correlation ϕ ), (48) (for longitudinal correlations), and (46) (for regression parameter estimates) from Section 4.2; and their corresponding sample ( s 2 * ) -based estimates computed by solving the SS-based estimating Equations (51)–(53) from Section 4.3 are provided in Table 5. Samples were repeated for 25 times to compute the sample-based parameter estimates. All sample-based estimates appear to be close to FP-based estimates. In general, cluster-correlation estimates appear to work well when T is small, as more clusters cause more variation over the longitudinal period. However, the main regression parameter estimates, shown in the bottom rectangular box in Table 5, are not negatively effected by this slight difference in correlation estimates.
Finally, the regression parameters estimates, both FP- and sample-based from Table 5, are used to compute the DAMB and DCMU prediction estimates by using (57) and (60), respectively. These estimates, along with actual FP totals at all time points, are displayed in Table 6. The DCMU predictions shown in column 3 appear to have a smaller PRB (percentage relative bias) as compared to the DAMB predictors exhibited in column 1 for T = 1 , 2 , and 3, showing the relative superiority of the DCMU predictions as compared to the DAMB prediction in the single-stage cluster setup.

6. Discussion and Concluding Remarks

In a finite population (FP) setup, the prediction of FP total is a difficult problem, as one requires us to predict the non-sampled response total well, which is customarily performed by replacing such a non-sampled total with its model-based expectation estimate computed from a survey sample. Following the estimating function approach [21] for independent data, the super-population (SP) model-based (for FP data) regression parameters involved in the model-based expectation (equivalently in the prediction function) may be estimated based on the survey sample using a sampling weighted OLS (ordinary least square) (SWOLS) estimator. This SWOLS is DCMU (design cum model unbiased) for the SP regression parameter, which is DU (design unbiased) for the FP parameter [20] corresponding to the SP parameter. However, in this paper, we have considered an FP with independent individuals or households, for example, but each individual or a cluster/household member providing a set of longitudinally correlated responses, the members in a household being structurally cluster correlated. Clearly, as opposed to an SP regression model for independent data, in the proposed setup, one requires an SP correlation model, more specifically, longitudinal correlation and combined-cluster longitudinal correlation models. We use a so-called ‘working’ correlation model (e.g., [7] (Section 7.4), [8]), for example, have used an unstructured or standard Pearson correlation model, whereas [5] has used a random effects-based mixed model to accommodate the longitudinal correlations. However, as explained in Section 3.2, these models fail to accommodate the time effects on the correlations. More specifically, their models fail to produce a correlation structure with decaying correlations as the time lag for two repeated responses increases. As a remedy, we have considered lag-based correlation models following [11] (Section 3), for example, to incorporate the time effects on the correlations. Also, in a cluster-based longitudinal setup, we have generalized this lag-based correlation model to a dynamic mixed-model setup where, conditionally, on the cluster random effects, the repeated responses follow a lag-based correlation structure.
The aforementioned correlation structures and their sample-based estimates are discussed in detail, and it is demonstrated how to obtain DCMU (design cum model unbiased) estimators, namely, the SWGLS (sampling weighted GLS) estimators for the regression parameters after accommodating the longitudinal or cluster-longitudinal correlations. Subsequently, such DCMU regression estimators are used to develop design-assisted model-based (DAMB) and DCMB (design cum model-based) prediction functions. Also, the relative performance of these DAMB as well as DCMU predictors for FP total estimation is examined both theoretically and numerically.
In conclusion, this paper advances longitudinal survey data analysis for both individuals and cluster-based FP. More specifically, it is demonstrated that the DCMU estimation of the parameters involved in a prediction function provides DCMU-valid prediction for the FP total. The step-by-step development for the estimation and prediction methods should be useful to practitioners from statistical agencies such as Statistics and Health, Canada, and the Bureau of Statistics, USA, or similar organizations in other countries.

Author Contributions

Conceptualization, B.C.S.; Methodology, B.C.S.; Software, A.M.V.; Formal analysis, A.M.V.; Investigation, A.M.V. and B.C.S.; Writing—Original draft, A.M.V. and B.C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

Authors would like to thank three reviewers for their comments and suggestions that led to the improvement of this paper.

Conflicts of Interest

Authors declare no conflicts of interest.

References

  1. Binder, D. Longitudinal surveys: Why are these surveys different from all other surveys? Surv. Methodol. 1998, 24, 101–108. [Google Scholar]
  2. Lynn, P. Methods for longitudinal surveys. In Methodology of Longitudinal Surveys; Lynn, P., Ed.; John Wiley and Sons: Hoboken, NJ, USA, 2009; pp. 1–18. [Google Scholar]
  3. Smith, P.W.F.; Berrington, A.; Sturgis, P. A comparison of graphical models and structural equation models for the analysis of longitudinal survey data. In Methodology of Longitudinal Surveys; Lynn, P., Ed.; John Wiley and Sons: Hoboken, NJ, USA, 2009; pp. 381–391. [Google Scholar]
  4. Thompson, M.E. Using longitudinal complex survey data. Annu. Stat. Appl. 2015, 2, 305–320. [Google Scholar] [CrossRef]
  5. Skinner, C.J.; de Toledo Vieira, M. Variance estimation in the analysis of clustered longitudinal survey data. Surv. Methodol. 2007, 33, 3–12. [Google Scholar]
  6. Sutradhar, B.C. Dynamic Mixed Models for Familial Longitudinal Data; Springer: New York, NY, USA, 2011. [Google Scholar]
  7. Wu, C.; Thompson, M.E. Sampling Theory and Practice; Springer Nature: Cham, Switzerland, 2020. [Google Scholar]
  8. Liang, K.Y.; Zeger, S.L. Longitudinal data analysis using generalized linear models. Biometrika 1986, 78, 13–22. [Google Scholar] [CrossRef]
  9. Roberts, G.; Ren, Q.; Rao, J.N.K. Using marginal mean models for data from longitudinal surveys with a complex design: Some advances in methods. In Methodology of Longitudinal Surveys; Lynn, P., Ed.; John Wiley and Sons: Hoboken, NJ, USA, 2009; pp. 351–366. [Google Scholar]
  10. Sutradhar, B.C. Longitudinal Categorical Data Analysis; Springer: New York, NY, USA, 2014. [Google Scholar]
  11. Sutradhar, B.C.; Das, K. On the efficiency of regression estimators in generalized linear models for longitudinal data. Biometrika 1999, 86, 459–465. [Google Scholar] [CrossRef]
  12. Bellhouse, D.R. Model-based estimation in finite population sampling. Am. Stat. 1987, 41, 260–262. [Google Scholar] [CrossRef]
  13. Prasad, N.G.N.; Rao, J.N.K. The estimation of the mean squared error of small-area estimators. J. Am. Stat. Assoc. 1990, 85, 163–171. [Google Scholar] [CrossRef]
  14. Valliant, R.; Dorfman, A.H.; Royal, R.M. Finite Population Sampling and Inference: A Prediction Approach; John Wiley and Sons, Inc.: New York, NY, USA, 2000. [Google Scholar]
  15. Melville, G.J.; Welsh, A.H. Model-based prediction in ecological surveys including those with incomplete detection. Aust. N. Z. J. Stat. 2014, 56, 257–281. [Google Scholar] [CrossRef]
  16. Sutradhar, B.C. Doubly weighted estimation approach for linear regression analysis with two-stage cluster samples. Sankhya B Indian J. Stat. 2024, 86, 55–90. [Google Scholar] [CrossRef]
  17. Kennel, T.L.; Valliant, R. Robust variance estimators for generalized regression estimators in cluster samples. Surv. Methodol. 2019, 45, 427–450. [Google Scholar]
  18. Jowaheer, V.; Sutradhar, B.C. Analyzing longitudinal count data with overdispersion. Biometrika 2002, 89, 389–399. [Google Scholar] [CrossRef]
  19. Thall, P.F.; Vail, S.C. Some covariance model for longitudinal count data with overdispersion. Biometrics 1990, 46, 657–671. [Google Scholar] [CrossRef] [PubMed]
  20. Binder, D. On the variances of asymptotically normal estimators from complex surveys. Int. Stat. Rev. 1983, 51, 279–292. [Google Scholar] [CrossRef]
  21. Godambe, V.P.; Thompson, M.E. Parameters of super-population and survey population: Their relationships and estimation. Int. Stat. Rev. 1986, 54, 127–138. [Google Scholar] [CrossRef]
  22. Royal, R.M. The linear least-squares prediction approach to two-stage sampling. J. Am. Stat. Assoc. 1976, 71, 657–664. [Google Scholar] [CrossRef]
  23. Valliant, R. Generalized variance functions in stratified two-stage sampling. J. Am. Stat. Assoc. 1987, 82, 499–508. [Google Scholar] [CrossRef]
  24. Isaki, C.T.; Fuller, W.A. Survey design under the regression super-population model. J. Am. Stat. Assoc. 1982, 77, 89–96. [Google Scholar] [CrossRef]
  25. Scott, A.J.; Holt, D. The effect of two-stage sampling on ordinary least squares methods. J. Am. Stat. Assoc. 1982, 77, 848–854. [Google Scholar] [CrossRef]
Table 1. AR(1) ( ρ = 0.7 ) correlation structure-based F 1 regression parameters and their SWGLS (sampling weighted GLS) estimates along with standard errors (given in parenthesis) using 25 samples each ( s 1 * ) of size n = 100 chosen from the F 1 of size N = 1000 .
Table 1. AR(1) ( ρ = 0.7 ) correlation structure-based F 1 regression parameters and their SWGLS (sampling weighted GLS) estimates along with standard errors (given in parenthesis) using 25 samples each ( s 1 * ) of size n = 100 chosen from the F 1 of size N = 1000 .
Super-Population Regression ParametersSuper-Population Lag Correlation
Time β 0 β 1 β 2 β 3 ρ 1 ρ 2 ρ 3
-10.50.20.50.700.490.34
Finite Population Regression ParametersFinite Population Lag Correlation
Time β 0 , N β 1 , N β 2 , N β 3 , N ρ 1 , N ρ 2 , N ρ 3 , N
41.04910.50070.15810.44680.70820.49240.3509
31.04020.50590.15730.42460.69720.4908-
21.02520.50380.19060.44180.6840--
10.99020.50980.22460.4712---
Sample Estimate of Regression ParametersSample Estimate of Lag Correlation
Time β ^ 0 , N β ^ 1 , N β ^ 2 , N β ^ 3 , N ρ ^ 1 , N ρ ^ 2 , N ρ ^ 3 , N
41.02620.51420.13360.39410.68270.47450.3265
(0.1980)(0.0679)(0.1396)(0.2120)(0.0391)(0.0534)(0.0519)
31.02540.51710.13280.35680.67850.4760-
(0.1965)(0.0700)(0.1500)(0.2087)(0.0475)(0.0635)-
21.00730.51390.17370.37850.6697--
(0.1889)(0.0715)(0.1568)(0.2082)(0.0472)--
10.96100.52390.20620.4323---
(0.1789)(0.0673)(0.1708)(0.2244)---
Table 2. MA(1) ( ρ = 0.4 ) correlation structure-based F 1 regression parameters and their SWGLS (sampling weighted GLS) estimates along with standard errors (given in parenthesis) using 25 samples each ( s 1 * ) of size n = 100 chosen from the F 1 of size N = 1000 .
Table 2. MA(1) ( ρ = 0.4 ) correlation structure-based F 1 regression parameters and their SWGLS (sampling weighted GLS) estimates along with standard errors (given in parenthesis) using 25 samples each ( s 1 * ) of size n = 100 chosen from the F 1 of size N = 1000 .
Super-Population Regression ParametersSuper-Population Lag Correlation
Time β 0 β 1 β 2 β 3 ρ 1 ρ 2 ρ 3
-10.50.20.50.400
Finite Population Regression ParametersFinite Population Lag Correlation
Time β 0 , N β 1 , N β 2 , N β 3 , N ρ 1 , N ρ 2 , N ρ 3 , N
40.95950.49870.23870.54950.41030.00610.0293
30.97930.49550.21360.54380.3815−0.0291-
20.99040.49230.20570.56930.3643--
11.00110.47420.25730.6486---
Sample Estimate of Regression ParametersSample Estimate of Lag Correlation
Time β ^ 0 , N β ^ 1 , N β ^ 2 , N β ^ 3 , N ρ ^ 1 , N ρ ^ 2 , N ρ ^ 3 , N
40.97520.48910.25660.59450.3824−0.0147−0.0182
(0.1570)(0.0535)(0.1092)(0.1617)(0.0459)(0.0734)(0.1011)
30.99910.48460.23080.58870.3560−0.0557-
(0.1482)(0.0555)(0.1242)(0.1702)(0.0651)(0.1165)-
20.99270.48860.21620.63560.3229--
(0.1728)(0.0624)(0.1482)(0.1788)(0.0868)--
10.96690.47950.27420.7516---
(0.2668)(0.0762)(0.2120)(0.2339)---
Table 3. EQ ( ρ = 0.4 ) correlation structure-based F 1 regression parameters and their SWGLS (sampling weighted GLS) estimates along with standard errors (given in parenthesis) using 25 samples each ( s 1 * ) of size n = 100 chosen from the F 1 of size N = 1000 .
Table 3. EQ ( ρ = 0.4 ) correlation structure-based F 1 regression parameters and their SWGLS (sampling weighted GLS) estimates along with standard errors (given in parenthesis) using 25 samples each ( s 1 * ) of size n = 100 chosen from the F 1 of size N = 1000 .
Super-Population Regression ParametersSuper-Population Lag Correlation
Time β 0 β 1 β 2 β 3 ρ 1 ρ 2 ρ 3
-10.50.20.50.40.40.4
Finite Population Regression ParametersFinite Population Lag Correlation
Time β 0 , N β 1 , N β 2 , N β 3 , N ρ 1 , N ρ 2 , N ρ 3 , N
41.04430.50070.16160.45170.42530.39600.3668
31.05600.49490.16060.47300.40650.4154-
21.03540.49280.20230.51560.4063--
11.05320.47510.23730.5441---
Sample Estimate of Regression ParametersSample Estimate of Lag Correlation
Time β ^ 0 , N β ^ 1 , N β ^ 2 , N β ^ 3 , N ρ ^ 1 , N ρ ^ 2 , N ρ ^ 3 , N
41.02410.51280.13930.40380.39710.38250.3252
(0.1780)(0.0609)(0.1253)(0.1911)(0.0501)(0.0690)(0.0950)
31.02650.50970.13820.43950.39770.3999-
(0.2047)(0.0648)(0.1316)(0.2128)(0.0615)(0.0955)-
20.98890.51210.18270.50670.3872--
(0.1929)(0.0609)(0.1524)(0.2238)(0.0752)--
11.00370.48990.23630.5340---
(0.2775)(0.0907)(0.1673)(0.2545)---
Table 4. Design-assisted model-based ( τ ^ ( t ) * ) and design cum model unbiased ( τ ˜ ( t ) * ) predictions along with their standard errors (given in parenthesis) and 100% relative absolute biases [given in square bracket] for F 1 totals ( τ ( t ) ) over time t = 1 , , 4 , using 25 samples each ( s 1 * ) of size n = 100 chosen from the F 1 of size N = 1000 .
Table 4. Design-assisted model-based ( τ ^ ( t ) * ) and design cum model unbiased ( τ ˜ ( t ) * ) predictions along with their standard errors (given in parenthesis) and 100% relative absolute biases [given in square bracket] for F 1 totals ( τ ( t ) ) over time t = 1 , , 4 , using 25 samples each ( s 1 * ) of size n = 100 chosen from the F 1 of size N = 1000 .
AR (0.7)MA (0.4)EQ (0.4)
τ ^ ( t ) * τ ˜ ( t ) * τ ( t ) τ ^ ( t ) * τ ˜ ( t ) * τ τ ^ ( t ) * τ ˜ ( t ) * τ
12580.22579.92589.42564.52563.92555.92567.52567.12579.6
(54.4)(56.5)-(72.9)(72.0)-(55.8)(55.1)-
[16.9][16.8]-[11.8][11.1]-[21.7][22.7]-
22559.12559.12574.42551.92551.82539.82581.32581.02585.6
(56.0)(55.4)-(48.6)(47.0)-(54.4)(56.6)-
[27.3]27.6]-[24.9][25.5]-[7.9][8.1]-
32562.12562.12573.02548.32548.32543.42549.32549.42561.1
(59.2)(59.2)-(51.5)(51.0)-(63.1)(61.7)-
[18.4][18.4]-[9.5][9.6]-[18.7][18.9]-
42567.02566.72576.22576.02575.52572.52568.22568.22578.9
(60.2)(59.9)-(67.8)(69.2)-(64.0)(64.8)-
[15.3][15.9]-[5.6][4.3]-[16.7][16.5]-
Table 5. AR(1) ( ρ = 0.5 ) correlation structure-based F 2 regression parameters and their SWGLS (sampling weighted GLS) estimates along with standard errors (given in parenthesis) using 25 samples each ( s 2 * ) of size n = 200 chosen from the cluster-based F 2 of size N = 1000 .
Table 5. AR(1) ( ρ = 0.5 ) correlation structure-based F 2 regression parameters and their SWGLS (sampling weighted GLS) estimates along with standard errors (given in parenthesis) using 25 samples each ( s 2 * ) of size n = 200 chosen from the cluster-based F 2 of size N = 1000 .
Super-Population Reg. ParametersSuper-Population Corr
Time β 0 β 1 β 2 β 3 ρ 1 ρ 2 ρ 3 ϕ
-10.50.20.50.500.250.1250.33
Finite Population Reg. ParametersFinite Population Corr
Time β 0 , N β 1 , N β 2 , N β 3 , N ρ 1 , N ρ 2 , N ρ 3 , N ϕ N
40.96470.50110.24520.49290.51680.27650.17470.3023
30.96590.50120.22150.47720.51340.2657-0.3081
21.03150.49980.25830.44420.4894--0.3350
10.93160.50250.25320.3469---0.3496
41.02550.50060.21550.44710.51760.28810.17890.2560
(0.1878)(0.0041)(0.0806)(0.1414)(0.0421)(0.0584)(0.0798)(0.0484)
31.04090.50070.17650.43600.49660.2635-0.2769
(0.1941)(0.0043)(0.0958)(0.1569)(0.0318)(0.0633). -(0.0407)
21.11030.49930.22650.39470.4745--0.2986
(0.1753)(0.0042)(0.1064)(0.1723)(0.0465)--(0.0492)
10.97970.50280.22510.2934---0.3189
(0.2279)(0.0041)(0.1270)(0.1767)---(0.0583)
Table 6. Design-assisted model-based ( τ ^ ( t ) * ) and design cum model unbiased ( τ ˜ ( t ) * ) predictions, along with their standard errors (given in parenthesis) and 100% relative absolute biases [given in square bracket] for F 2 totals ( τ ( t ) ) over time t = 1 , , 4 , using 25 samples each ( s 2 * ) of size n = 200 chosen from the cluster-based F 2 of size N = 1000 for AR(0.5) model.
Table 6. Design-assisted model-based ( τ ^ ( t ) * ) and design cum model unbiased ( τ ˜ ( t ) * ) predictions, along with their standard errors (given in parenthesis) and 100% relative absolute biases [given in square bracket] for F 2 totals ( τ ( t ) ) over time t = 1 , , 4 , using 25 samples each ( s 2 * ) of size n = 200 chosen from the cluster-based F 2 of size N = 1000 for AR(0.5) model.
Using SSREUsing FPRE
Time τ ^ ( t ) * τ ˜ ( t ) * τ ^ ( t ) * τ ˜ ( t ) * τ
122,40322,40522,37722,38022,371
(109)(126)(22)(54)-
[29.4][27.0][27.3][16.7]-
222,43222,43422,41022,41322,408
(91)(110)(18.6)(52)-
[26.4][13.6][10.8][9.6]-
322,34522,34922,34622,34822,368
(105)(129)(19)(56)-
[21.9][14.7][115][35.7]
422,40322,40622,41322,41622,399
(104)(131)(21)(56)-
[3.8][5.3][66.7][30.4]-
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Variyath, A.M.; Sutradhar, B.C. Prediction Inferences for Finite Population Totals Using Longitudinal Survey Data. Stats 2025, 8, 110. https://doi.org/10.3390/stats8040110

AMA Style

Variyath AM, Sutradhar BC. Prediction Inferences for Finite Population Totals Using Longitudinal Survey Data. Stats. 2025; 8(4):110. https://doi.org/10.3390/stats8040110

Chicago/Turabian Style

Variyath, Asokan M., and Brajendra C. Sutradhar. 2025. "Prediction Inferences for Finite Population Totals Using Longitudinal Survey Data" Stats 8, no. 4: 110. https://doi.org/10.3390/stats8040110

APA Style

Variyath, A. M., & Sutradhar, B. C. (2025). Prediction Inferences for Finite Population Totals Using Longitudinal Survey Data. Stats, 8(4), 110. https://doi.org/10.3390/stats8040110

Article Metrics

Back to TopTop