Adjustment of Anticipatory Covariates in Retrospective Surveys: An Expected Likelihood Approach

: We address an inference issue where the value of a covariate is measured at the date of the survey but is used to explain behavior that has occurred long before the survey. This causes bias because the value of the covariate does not follow the temporal order of events. We propose an expected likelihood approach to adjust for such bias and illustrate it with data on the effects of educational level achieved by the time of marriage on risks of divorce. For individuals with anticipatory educational level (whose reported educational level was completed after marriage), conditional probabilities of having attained the reported level before marriage are computed. These are then used as weights in the expected likelihood to obtain adjusted estimates of relative risks. For our illustrative data set, the adjusted estimates of relative risks of divorce did not differ signiﬁcantly from those obtained from anticipatory analysis that ignores the temporal order of events. Our results are slightly different from those in two other studies that analyzed the same data set in a Bayesian framework, though the studies are not fully comparable to each other.


Introduction
Consider a retrospective survey where the interest is to investigate differentials in the risk of divorce across educational levels attained before marriage, but where available information is only respondents' highest educational level at the time of the survey.Common practice is to use the available information on educational level as a covariate in modeling the risk of divorce, an event that took place before the survey.Educational progress is likely to occur between the time of entry into marriage and the date of the survey.The questions then are, to what extent do changes in patterns of divorce across educational levels reflect real differences in divorce due to differences in educational level and what portion of these changes can be attributed to misclassification of respondents across education levels?These questions can be answered by dealing with the fact that the covariate (education) is anticipatory and adjusting the corresponding parameters to correct the bias inherent in the time inconsistency of the anticipatory covariate.
Hoem [1] warns that using anticipatory covariates is misleading, but concludes that the adverse effects may be smaller in some situations.In their study of mortality clustering in India using past births and deaths, Arulampalam and Bhalotra [2] discard anticipatory regressors: household asset, toilet facility, electricity, or access to piped water at the date of the survey.However, much valuable information may be lost by ignoring such covariates.Hoem and Kreyenfeld [3,4] argue that anticipatory covariates may provide useful information.They propose data imputation, but the procedure requires unrealistic assumptions.Faucett et al. [5] treat anticipatory covariates as missing data and their results, based on Bayesian techniques, gave interval estimates with higher coverage probabilities compared to imputation.Todesco [6] used anticipatory covariates like area of residence, education, and religious commitment to analyze marital dissolution in Italy and concluded that use of anticipatory analysis did not jeopardize the results.
Ghilagaber and Koskinen [7] and Munezero and Ghilagaber [8] used Bayesian modeling approaches and found that anticipatory analysis can lead to overestimation or underestimation of the relative risks associated with the anticipatory covariate.More recently, Hoem and Nedoluzhko [9] highlighted that use of anticipatory analysis (or "negative duration") in estimating rates in childbearing before and after migration leads to incorrect results and should be abandoned.In a related study, Pina-Sánchez et al. [10] suggested an adjustment method based on a mixture Bayesian model in their analysis of restrospectively reported work histories in Swedish register data.
In the present study, we use a maximum likelihood approach to examine any effects of anticipatory covariates.We model divorce risks among 1312 Swedish men born between 1936 and 1964 in a piece-wise constant hazard model framework.For individuals with anticipatory educational levels, we compute conditional probabilities that these levels were attained before marriage.These probabilities are then used as weights in their contributions to the likelihood function.Adjusted relative risks of divorce across educational levels are then estimated by maximizing the weighted likelihood function.For our illustrative data set, such adjusted estimates of relative risks did not differ significantly from those obtained in the anticipatory analysis based on unweighted likelihood.The sign of the estimates is the same as in the Bayesian analysis of the same data set by Ghilagaber and Koskinen [7] and Munezero and Ghilagaber [8], but the differences between the estimates in the anticipatory and adjusted models were significant in the two previous studies [7,8].
In Section 2, we introduce the simple two-factor multiplicative model with piecewise constant hazard and derive its likelihood function from which relevant parameters are estimated.We extend this standard likelihood function to a weighted likelihood function in Section 3 and derive the corresponding parameter estimates.In Section 4, we illustrate our proposed approach using data on the effect of anticipatory educational level on divorce risks among Swedish men.We summarize our findings in Section 5 by way of concluding remarks and suggestions for future investigations.An appendix contains omitted derivations as well as tables and figures of of our empirical findings.

The Multiplicative Two-Factor Hazard Model
For a sample of individuals, consider J educational levels and let D ij be the number of divorces at marriage duration i, i = 1, . . ., I and education level j, j = 1, . . ., J for T ij years of observed exposure to the risk of divorce.The covariate indexed by i is the grouped-time variable (duration of marriage) measured from the date of marriage until the date of divorce or until the interview date, whichever comes first. Define and let T i+ , T +j , and T ++ represent similar quantities for the exposure variable T. Divorce risks are assumed to be piece-wise constant in each of the the I time intervals, but may vary between intervals.The time to divorce then follows a piece-wise exponential distribution for each educational level.The density function of the time to divorce in duration group i for a person k with educational level j is then given by A multiplicative model for the hazard rate λ ij [11,12] corresponds to where β i is the baseline risk of divorce at marriage-duration i and α j is the relative risk of divorce for individuals with education level j (relative to those with the educational level that is used as the baseline/reference).The model in Equation ( 4) has I + J parameters β 1 , β 2 , . . ., β I and α 1 , α 3 , . . ., α J .To construct the likelihood function when Equation ( 4) holds, we define δ ijk as an indicator of whether the kth sample member having the jth level of education is divorced (δ ijk = 1) or is still married (δ ijk = 0) in the ith marriage-duration.From Equations ( 3) and ( 4), the contribution to the likelihood of the n ij sub-sample of individuals in the ith marriageduration and having the jth level of education, is obtained as where The likelihood for the entire sample is then the product of the Λ ij over all levels of i and j: so that Differentiating ln(Λ) in Equation (7) with respect to β i and, separately, with respect to α j , equating the resulting derivatives to 0, and solving for the unknown parameters leads to the following equations and This system of I + J equations has no analytical solution in general, but can be solved iteratively using numerical methods.

Expected Likelihood
From Equations ( 8) and ( 9), we note that the maximum likelihood estimates of the baseline hazards β i and relative hazards α j are functions of the total number of events, D ij , and exposure times, T ij .Therefore, misclassification of events or exposure times into wrong intervals or into wrong levels of the covariate, as with anticipatory covariates, can lead to incorrect estimates of the parameters.
Consider those individuals who have completed their reported highest educational level after marriage.Inference is based on the individuals' education levels at the date of marriage, but only their highest level of education at the date of interview is observed.This is an incomplete data problem, which is handled by maximizing the expected likelihood, conditional on available information [13].
Equation ( 3) denotes the density conditional on the level of education x k (T k ) for individual k at the age of marriage T k .Since we are interested on when the reported educational level j was completed (before or after marriage), j = j(k), the unconditional density is given by We also impose the distribution P{x k (T k ) = j(k)}, which adds a term to the log-likelihood: where ln(Λ) is as in Equation ( 7), and the last term in Equation ( 11) reflects contribution, to the likelihood function, of the distributions of times to educational progress (time to complete primary level, time between primary-and secondary-levels, and time between secondary-and post-secondary levels).

Parameter Estimation in the Expected Likelihood
To calculate the components of the last term, ), in Equation (11), assume that J = 3 and, as in [7,8], let S jk denote the time of transition from educational level j − 1 to educational level j for individual k, let f j denote the density function of S jk , and let F j denote its distribution function.We introduce Bernoulli variables Z j of parameters φ j as indicators of whether or not the level x k (T k ) = j is the highest educational level.Then, ruling out the possibility that x k (T k ) = 0 leads to Proposition 1.
To calculate all P(x k (T k ) = j) explicitly, distributions are imposed on the times, S jk , spent at the different educational levels.As in [7,8], a piece-wise gamma distribution with density function and distribution function is assumed, where is the incomplete gamma function.
Equations ( 7) and ( 11) lead to To make inference on the parameters β i and α j , the conditional expectations of the sufficient statistics D i+ , D +j , and T ij need to be calculated.We re-write where A i is the set of individuals with common first index i, B j is the set of individuals with common second index j, m(i) is the lower duration limit in group i, and I(A) is the indicator function of the event A. For each individual k, only one marriage duration group i = i(k) and one educational level j = j(k) are observed.However, if that educational level is completed after marriage, the educational level at time of marriage is unknown.By introducing distributional assumptions on time to complete a certain educational level, the probabilities are calculated as where t k is the age at completion of the reported level of education for individual k.This feature does not affect D i+ , since summation is over j.However, it affects D +j and T ij , since different individuals may belong to different B j depending on their unknown education level at time of marriage.
Hence, we need to compute Re-writing the right-hand side as gives Similarly, In order to maximize the adjusted log-likelihood in Equation ( 11) over the parameters β i and α j , we first plug in the expression in Equation ( 22) into Equation ( 9) and the expression in Equation ( 23) into Equations ( 8) and ( 9), and proceed in the usual manner.
To make inference on ζ j , η j , and φ j , note first that (see Equation (11)) Each set of ζ j , η j , and φ j values produces a numerical value of Equation (24) as well as expected values of the sufficient statistics in Equations ( 22) and ( 23), which, in turn, are used to maximize ln Λ in Equation (11).This way, the expected log-likelihood for any set of ζ j , η j , and φ j is calculated and maximized with respect to the α j and β i parameters.
The next step is to maximize over ζ j , η j , and φ j using the Newton-Raphson algorithm.According to Orchard and Woodbury [13], such a procedure yields the desired maximum likelihood estimates.The values of the P(x k (T k ) = j, x k (t k ) = y) are obtained using Proposition 2.

Proposition 2. Writing p jyk ≡
Then, the P(x k (t k ) = y) are expressed in terms of the probabilities in Equation (25) to get and

Data set
An extract from the 1985 Mail Survey of Swedish men is used as our data for illustration.The survey contained background variables as well as retrospective information on entry into and exit from marital and non-marital unions.Analysis is based on 1312 evermarried men who were either divorced or still married by the survey time.Their distribution across age at marriage and age at attainment of the reported educational level is shown in Figure A1.Those below the diagonal are the 245 (19%) observations whose reported educational level was completed after they married.Anticipatory analysis, common in the analysis of such types of data, amounts to moving the values below the diagonal in Figure A1 to the left-all the way to the diagonal reference line.A cross tabulation of the sample, as displayed in Table A1, shows differentials in percentage divorced across the anticipatory status of education.The main goal of our work is investigating the role of misclassification on such differentials in educational gradients of divorce.

Models
The time variable, duration of marriage in years, is categorized into five intervals: 0-1 − , 1-2 − , 2-3 − , 3-6 − , and 6 + years.Primary level of education is used as baseline level and, hence, its corresponding relative hazard, α 1 , is set to 1.To make the proposed adjustment comparable to previous works on the same data set [7,8], the parameters from the common anticipatory approach and from a reduced model are also estimated.Using anticipatory analysis amounts to "back-dating" the times of highest educational achievement, τ k , for a number of individuals, while, in the reduced model, observations whose reported educational level was completed after marriage, T k < t k , are discarded.
Neither the anticipatory manipulations nor the proposed adjustment change the observed marginal occurrences D i+ and exposures T i+ .In the reduced model, these marginals are reduced due to the reduction of the total number of respondents.The time at which the highest educational level is achieved is irrelevant as long as it precedes the time of marriage, but it is relevant for calculating exposure times whenever it occurs after the time of marriage.
The conditions required for using the Fisher information to construct confidence intervals may not be fulfilled, due to the nature of the problem at hand.Instead, the bootstrap method is used.The individuals are bootstrapped B times and the expected likelihood for each such sample is maximized.We then calculate empirical 95% confidence intervals for each parameter by computing the 2.5% and 97.5% percentiles of the bootstrap distributions.An alternative approach is to compute standard deviations and form normal approximation confidence intervals.But, this approach was considered problematic because of outliers in the bootstrap distributions.In the reduced and anticipatory cases, the number of bootstrap replications were B = 10,000.In the adjusted case, B = 1000 because estimation of the additional parameters ζ and η required heavy computation.In the bootstrap replications, integrals and derivatives were calculated in the same manner as in the estimation.The estimated parameters were taken as starting values for each bootstrap replication.

Numerical Considerations
Numerical integration was used to calculate the integrals for given parameter values.When maximizing the expected likelihood over the φ parameters, one problem may arise.All the P{x(T) = j} are linear in the φ i and, hence, for individuals with t k ≤ T k , the derivatives of ln P{x k (T k ) = j(k)} may be non-zero for all possible φ i .Hence, if all individuals would have t k ≤ T k , and, thus, conditioning would be unnecessary, the likelihood may be maximized at the boundary of the parameter space, φ i = 0 or φ i = 1, for i = 1 and 2. These solutions might be considered unrealistic.For individuals with T k < t k , the situation is less clear-cut, but if the majority of the individuals have t k ≤ T k , there is a risk that their contribution dominates the likelihood, and this seems to be the case in the data set used for our illustration.
Hence, the φ i parameters were fixed to the empirical proportions of men who did not continue to the next higher level (see Table A1).We assigned φ 1 = 0.34, obtained from Table A1 as 442  1312 , and φ 2 = 0.66, obtained from Table A1 as 488+94 1312−442 .These are also almost identical to the Bayesian estimates obtained in [7,8].
Instead of considering the gamma distribution parameters ζ j , η j explicitly in the maximization, it was more convenient to use the transformed parameters µ j = ζ j η j and . These correspond to the expectations and standard deviations, respectively.We found that the likelihood has an asymptote as σ 3 → 0. To alleviate this problem, σ 3 was fixed and maximization was carried over the other parameters.This turned out to work out well.Unlike the φ parameter, there was no natural choice of ad hoc value of σ 3 .However, the maximum with respect to all other parameters, including α j and β i , was very robust to different choices.We decided to put σ 3 = 0.5, which seemed to be a reasonable value.
To obtain the maximum of the expected likelihood, several methods were combined.Each likelihood calculation was performed for a fixed set of the five gamma distribution parameters, µ 1 , µ 2 , µ 3 , σ 1 , and σ 2 , including a maximization with respect to the α j and β i parameters.Thus, in principle, the required task was to maximize over the gamma parameters.Initially, a random choice of parameter values was performed, and the ones that gave the largest value of the likelihood were selected.Then, using the Newton-Raphson maximization algorithm, the likelihood was maximized over the five gamma distribution parameters, one at a time, keeping the others fixed.Finally, we maximized over µ 1 , µ 2 , µ 3 , σ 1 , and σ 2 simultaneously, using the Newton-Raphson maximization algorithm in five dimensions.
To obtain approximate confidence intervals for the parameters, a bootstrap procedure was used to sample the individuals with replacement.We performed the maximization procedure described above for each bootstrap replication.While bootstrapping, the µ 3 parameter drifted towards zero in about 6% of the bootstrap replications during the Newton-Raphson iterations.As a negative value of µ 3 is not reasonable, an extra condition, µ 3 ≥ 0.001, was imposed to overcome the problem.Further details on numerical issues, including Matlab codes, can be obtained from the authors upon request.

Conditional Probabilities
Figure A2 displays the distribution of conditional probabilities of various educational levels at marriage, given a corresponding reported educational level at time of survey, based on the 245 individuals who have completed their reported educational level after marriage.For comparison purposes, corresponding conditional probabilities computed by using estimates of the covariate-model parameters in a Bayesian analysis of the same data set [7] are also plotted.
The plots show that the probabilities of having had a lower educational level at marriage, given some higher level at interview, decrease with age at marriage.Thus, for someone who reports a post-secondary educational level at interview but has married early, say, below age 20, the probability that he had primary-level education at marriage, P 1|3 , is almost 1 (see Figure A2).The probability that he had secondary-level education at marriage, P 2|3 , is also high, but not as high as P 1|3 .The combined probability that he had a lower-level education, primary or secondary, is almost 1.
A comparison of the ML-and Bayes-estimates of the conditional probabilities indicates that the Bayes-estimates are, in general, higher than their corresponding ML-estimates.This is especially the case at older ages of marriage and for men who reported post-secondarylevel of education at time of survey.

Model Parameters
Table A2 contains estimates of baseline risks, β i expressed per 1000 exposure units, and relative risks of divorce, α j , across the three models, together with their corresponding 95% confidence intervals.Except for the effect of the second interval, β 2 , which is much lower in the reduced model, the estimates of the baseline risks, β i , are close to each other across the three models.
The estimates of the relative risks α j , on the other hand, vary appreciably across models.For instance, men with secondary-level education have about the same risk of divorce as those with primary level education in the reduced model, 10% higher risk in the anticipatory model, and a negligible 5% higher risk in the adjusted model.Those with post-secondary education have much higher risks of divorce relative to the baseline men with primary education.The excess risk is 57% in the reduced model, 35% in the anticipatory model, and 34% in the adjusted model.
More interesting for the present purpose are differences in the estimates of relative risks across the models.That the anticipatory model leads to the same estimates of the relative risks, α j , as in the adjusted model (1.13 and 1.05 for α 2 ; 1.35 and 1.34 for α 3 ) indicates that anticipatory analysis is harmless in our data, in the sense that it does not lead to substantial bias in the estimates of the relative risks.These estimates of relative risks are somewhat higher than those obtained by Bayesian adjustment by of the same data set [7].This is especially true for the estimate of α 3 , but it should be borne in mind that the 95% confidence intervals for α 2 and α 3 in both the current study and those based on Bayesian approach include 1 and, hence, are not significant at 5% significance level.Further, a reestimation of α 3 using the Bayes-estimated parameters of the gamma distribution yielded α 3 = 1.30, which is almost identical to the α 3 = 1.34 obtained through the maximum likelihood approach.
The combined effects of the differences in the estimates of the model parameters, β i and α j , across the three models and for the three educational levels are depicted in Figures A3-A8, which contain estimates λ ij = β i α j under various configurations.Figures A3-A5 show educational profiles of divorce risks over marriage durations for each of the three models (reduced, anticipatory, and adjusted).In Figures A6-A8, differences in the estimates of divorce risks across the three models are depicted for each educational level (primary, secondary, and post-secondary).Divorce risks increase over the first four time intervals, except in the reduced model, which exhibits decrease in the risk between first and second interval, and decrease after about 6 years (Figures A3-A5).Further, the levels and trends are alike across the anticipatory and adjusted models (Figures A6-A8).

Covariate-Model Parameters
In the adjusted model, the parameters of the gamma-distribution for educational career, ζ j and η j , are also estimated, in addition to the model parameters β i and α j .These are shown in the lower half of the last column in Table A2.These parameters are then used to compute estimates of the expected duration to complete the various educational levels, as displayed in Table A3.Thus, it takes, on average, 16.4 years after birth to complete primary-level education, 3.6 years to complete secondary-level education after completing primary level, and 1.9 years to complete post-secondary-level education after completing secondary level.The corresponding figures from the Bayesian analysis, extracted from the estimates in [7], were 14.8, 6.5, and 6.3 years., while those in [8] were 3.56 years to complete secondary-level education (after completing primary level) and 2.43 years to complete post-secondary-level education (after completing secondary-level education).The Bayesian-based estimates differ much from our current expected-likelihood-based estimates.However, as can be seen in Table A5, where estimates of the model parameters α j and β i , given the Bayesian estimates of the gamma-distribution parameters, are presented, it is easy to observe, by comparing to Table A2, that the estimates of the model-parameters from our expected likelihood approach are very robust to this kind of changes in the gamma parameters.
In Tables A3 and A4, in addition to estimates of means µ j and standard deviations σ j of education times, 80% bootstrap confidence intervals are also presented.The reason behind choosing 80% instead of 95% is due to problems in estimating µ 3 as already described in Section 4.4.No confidence interval is provided for σ 3 because it was fixed to 0.5.This might also have contributed towards the relatively narrow confidence intervals for ζ 3 and η 3 in Table A2 .
Finally, in Table A6, the results from our adjusted model in Table A2 are presented together with corresponding results from previous studies-Ghilagaber and Koskinen (G & K) [7] and Munezero and Ghilagaber, (M & G) [8] for comparison purposes.Our results show that anticipatory analysis is harmless because it leads to the same conclusion as the adjusted model.In the two previous studies, however, the conclusions from the respective adjusted models were different, at least partly, from those based on anticipatory analysis.However, the differences between the results in the adjusted and the anticipatory models were in opposite directions in the two previous studies.While results in [7] show that anticipatory analysis leads to overestimation of the relative risk of divorce for those with post-secondary education, results in [8] show that it leads to underestimation of the relative risks for the same group of individuals.But, it is worth noting that the three studies are not comparable to each other (even the two that use Bayesian methods) because they use different approaches and the results are presented using different baseline levels (though we transformed the relatives risks from previous to have the same baseline as ours in Table A6).

Summary and Concluding Remarks
In this study, we addressed a problem in inference, where the value of a covariate is measured at the date of the survey but is used to explain behavior that has occurred long before the survey.
To correct for bias due to the lack of temporal order of events, we proposed an expected likelihood approach, where conditional probabilities of having attained the reported level before marriage were computed and used as weights in the likelihood.Adjusted relative hazards were then estimated by maximizing the expected likelihood which, in turn, was based on some distributions for the time to complete various educational levels.
Our proposed analytic procedure was illustrated with data on the effects of educational level achieved at time of marriage on risks of divorce.Our results showed that anticipatory analysis is harmless because the adjusted estimates of relative risks are close to those from anticipatory analysis.
Whether such results can be replicated on other data sets or on the same data set but with different events of interest, like family formation, through cohabitation or marriage, or childbearing, is an open question for future investigation.
Further, our results are slightly different from two other studies that analyzed the same data set in a Bayesian framework.
Thus, further studies that examine the effects of anticipatory analyses more deeply are needed.Such studies can, for instance, use simulated data in order to draw stronger conclusions about the behaviour of anticipatory analysis under various data configurations.and, similarly, We obtain (by dropping the index k for ease of exposition) Moreover, and the rest of the equalities are trivial.Table A2.Estimated baseline and relative risks of divorce across the three models (95% confidence intervals in parentheses).

Figure A2 .
Figure A2.Distribution of estimated conditional probabilities of various educational levels at marriage-given a corresponding reported level at survey.

Figure A3 .
Figure A3.Educational profiles of divorce risks over marriage duration in the reduced model.

Figure A4 .
Figure A4.Educational profiles of divorce risks over marriage duration in the anticipatory model.

Figure A5 .
Figure A5.Educational profiles of divorce risks over marriage duration in the adjusted model.

Figure A6 .
Figure A6.Estimates of divorce risks for men with primary-level education across the three models.

Figure A7 .Figure A8 .
Figure A7.Estimates of divorce risks for men with secondary-level education across the three models.

Table A1 .
Distribution of the sample of 1312 Swedish men across anticipatory status of education and status of marriage.

Table A3 .
Expected duration to complete various educational levels.

Table A4 .
Standard deviation of the duration to complete various educational levels.

Table A5 .
ML-estimates of model parameters given Bayesian estimated gamma parameters.

Table A6 .
Educational gradients of divorce-risks from three studies.A scatter plot of the data in TableA1: age at completion of reported highest educational levels vs. age at marriage.