This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Epidemiological studies often produce false positive results due to use of statistical approaches that either ignore or distort time. The three time-related issues of focus in this discussion are: (1) cross-sectional

Epidemiology has been defined as the “study of the distribution and determinants of health-related states or events in specified populations, and the application of this study to control of health problems” [

In what follows, we address three interlocking areas of such concern in observational studies in human populations: (1) cross-sectional

To show that some factor is a “risk factor” for a disorder, it must be shown both that: (a) the factor precedes onset of the disorder, and (b) it is correlated with the disorder [

For a disorder that is chronic and persistent after onset, prevalence at a certain age equals the incidence between birth and that age for those that survive to that age. Otherwise, for a disorder from which there may be remissions or recoveries, or one associated with removals from the population, cross sectional correlations may relate as much to the association of factors related to treatment availability and response, or inconsistency of expression over time, as to risk of incurring the disorder. Consequently cross-sectional studies investigating correlates of prevalence of episodic disorders are of limited use in identifying determinants of those disorders or ways of preventing them, although they may be vital in setting the stage for prospective studies to accomplish those purposes.

One exception relates to the detection of “fixed markers” [

The ideal, but admittedly unrealistic, approach to demonstrate risk factor status would be to sample the relevant population disorder free at a designated time zero (t = 0), to evaluate potential risk factors at that time, to follow each sampled individual over his/her subsequent lifetime, in order to evaluate and compare survival curves to the onset of the disorder [_{1}(t) in those with RF = 1) and the “low risk” subpopulation (S_{0}(t) in those with RF = 0) for all values of t.

Then one could not only compare the overall survival curves, but could compare the incidence between time 0 and any fixed time T [1-S_{1}(T) _{0}(T)]. In this hypothetical example, (1) the survival curves cross at t = T*, an unusual, although not an unknown, situation, and (2) onset is not inevitable for every individual, which is a quite common situation. In this example, 50% in one group have onset during their lifetimes compared to 30% in the other group, but the latter are likely to have their onset early if at all, which results in the crossing of survival curves, here at T* ≈ 18.

Here if incidences prior to T* are compared, RF would appropriately be described as a

Because it is more convenient to have short follow-up times, epidemiologists often assume that whatever the relative positions of the two survival curves near zero are the relative positions for all follow-up times. This is one of those extrapolations that often mislead both subsequent research efforts and clinical decision making. In general, the inferences from a cohort study apply to those in the disorder-free subpopulation represented in the sample at time 0, and followed for as long as the study chooses to follow the individuals, whether that be 1, 2, 5 or 10 years, but not necessarily to another population, and for any longer. Explicit presentation of the estimated survival curves up to the end of follow-up, as a general practice, would not only inform medical consumers as to exactly the findings of the study up to the duration of follow-up, but also remind them of the time limitations on inferences.

Standard survival methods [

Moreover, how the zero-point of time (t = 0) is defined makes a major difference in conclusions. For example, in the hypothetical situation in

More problematic is the situation in which disorder-free individuals over a wide span of ages, say 20–80 years, are sampled, and t = 0 refers to the more or less arbitrary time each individual enters the research study. Time of entry to an observational research study (here the focus) has no clinical relevance to the individuals in the population to which inferences are to be drawn. In that case, the observed survival curve is a mixture of the survival curves of those who enter at each age disorder-free with the risk factors as they are at that age, the mixture determined by the age distribution in the samples. Different studies are unlikely to reproduce and confirm the same findings, particularly when both the factor (e.g., use of HRT) and the disorder itself (e.g., heart disease, diabetes, cancer, Alzheimer’s disease) are age-related.

Again, there is one rare situation that is an exception to these concerns: the constant hazards situation with fixed markers. If the risk factor is constant over the lifetime of the individual and the probability of survival for any time span is exactly the same regardless of when a individual enters a study (exponential survival curve, Poisson distribution of events), it doesn’t matter at what times the individuals are sampled, or whether they are sampled at the same time or what the distribution of entry times in the high- and low- risk subpopulations were. It doesn’t even matter how long individuals are followed or why they drop out. This is also the one and only case in which the incidence rate (“events per person-year”) estimates an interpretable population parameter, namely the reciprocal of the mean time to event, regardless of entry and exit times [

However, not only are many risk factors of interest not fixed (e.g., HRT use), there are few, if any, real onset distributions that follow a constant hazards model. Only an inevitable outcome (e.g., death) can possibly follow a constant hazards model. No age-related disorder (e.g., heart disease, cancer, Alzheimer’s disease or even death) can. Nevertheless epidemiological studies still occasionally use the incidence rate to compare the high- and low-risk subgroups [

In a prospective observational study, with a representative sample from the disorder-free population of interest, all entered at a _{1}(T) _{0}(T).

Over the last 20 years, considerable attention has been paid to the overuse, misuse, abuse of “statistical significance” [

However, unless the null hypothesis is absolutely true, the expected value of the p-value approaches zero as the sample size increases, rapidly for a strong effect, slowly for an effect of trivial public health significance. Meehl and others [

For example:

AUC [_{0}(T) − S_{1}(T) + 1).

Success Rate Difference (SRD) [_{0}(T) − S_{1}(T) = 2AUC − 1 (in epidemiology, SRD is usually called the risk difference).

Number Needed to Take (NNT) [

While these are three mathematically equivalent effect sizes, generally NNT is easier to interpret in terms of public health significance, and SRD and AUC are easier used in computations (e.g., for confidence intervals).

There are, of course, other viable effect sizes applicable in special circumstances. For example, Cohen’s d [_{1} − μ_{0})/σ, where μ_{1} and μ_{0} are the two group mean times of onset, and σ^{2} is the average of the two group variances. Then SRD = 2Φ(d/√2) − 1, where Φ() is the cumulative standard normal distribution. In the rare situation in which the constant hazards model holds in both groups, SRD = (μ_{1} − μ_{2})/(μ_{1} + μ_{2}) = (RR − 1)/(RR + 1), where RR, a relative risk, is the ratio of the incidence rates in the two groups. There is limited applicability of such specialized effect sizes, but, when applicable, they are easily converted to SRD (NNT, AUC).

While the expected value of the p-value approaches zero as sample size increases, the sample estimate of an effect size, e.g., SRD, estimates the same population parameter regardless of the sample size. Instead, as sample size increases, the width of its confidence interval decreases to zero,

A question that deserves careful future consideration is which values of NNT indicate public health significance, and which are trivial. For example, one would question any recommendation for costly and risky surgery on 500 patients to prevent one onset of coronary disease,

Epidemiologists often use the Odds Ratio (OR) as such an effect size, but OR is not viable in this role. Historically, OR was introduced as the likelihood-ratio test statistic to test the null hypothesis of randomness. OR remains useful as a detector of non-randomness, for example, in logistic regression analysis models: OR equal to 1 indicates random association; greater than 1, positive association, and less than 1, negative association. However, there is no magnitude of Odds Ratio unequal to 1 that unequivocally indicates public health significance.

Many arguments have been put forward in recent years to support that contentious point [_{1}(t) is graphed against 1-S_{0}(t) for all values of t. These values connected with each other and with the two endpoints at (0,0) and (1,1) form the ROC curve comparing the RF = 1 and RF = 0 groups on time to onset (AUC is the area under this ROC curve). If there were only random association between the risk factor and onset, the ROC curve would coincide with the diagonal line from (0,0) to (1,1): the Random ROC. Here the ROC curve clearly indicates non-random association. The ROC curve crosses the Random ROC when t = T*, and as t increases, all points converge to the single point (0.5,0.3) determined by the proportion of the two groups who will eventually have this non-inevitable onset.

For any fixed follow-up time, T, there is one point on the ROC curve (1-S_{1}(T), 1-S_{0}(T)) that indicates the strength of association between the risk factor and that particular incidence. The SRD for such a binary outcome is proportional to the distance between that point and the Random ROC[

To demonstrate this and to understand why this is often so, it is necessary to put OR and SRD on comparable scales. In _{1}, p_{0}) that would yield OR = 4 (OR = p_{1}(1 − p_{0})/((1 − p_{1})p_{0})), and all pairs of probabilities that would yield SRD = 1/3 or NNT = 1/SRD = 3 (equipotency curves[^{1/2} − 1)/(OR^{1/2} + 1) = Y (Yule’s Index).

What is in _{1} = 1 − p_{0}, and beginning and ending at the points of the Random ROC at (0,0) and (1,1). For fixed OR > 1, the ^{1/2} − 1)/(OR^{1/2} + 1). That distance then decreases to zero at both corners of the ROC plane. Thus it is always true that for positive association, SRD ≤ (OR^{1/2} − 1)/(OR^{1/2} + 1). In the special case when p_{1} = 1 − p_{0}, the SRD = Y = (OR^{1/2} − 1)/(OR^{1/2} + 1). When p_{0} and p_{1} are both of moderate size (say between 0.25 and 0.75) then SRD is approximately equal to Y = (OR^{1/2} − 1)/(OR^{1/2} + 1).

In a ROC comparing survival curves as in _{1}(T), 1 − S_{0}(T)) is bounded away from random association. To be informed that OR = 4 allows the possibility of being as far away from random as is NNT = 3, but also allows the possibility of being arbitrarily close to random association, particularly when T is small.

In the comparison of any two survival curves, points corresponding to very short follow-up times are always near the lower left corner of the ROC plane, and very near the Random ROC. These points will have SRD = 1/NNT near zero indicating weak association, but since all the OR > 1 equipotency curves converge in that corner, these points often have very large OR. Simply stated, the problem is that the denominator of Odds Ratio approaches zero as T approaches zero, and division by zero tends both to “explode” the magnitude of any ratio and to make it very unstable. For this reason and all its many consequences, Odds Ratio should continue to be used as an indicator of non-randomness, to test null hypotheses of randomness, but not to be used as an effect size.

In ^{1/2} − 1)/(OR^{1/2} + 1) for all follow-up times T, for the two survival curves in ^{1/2} − 1)/(OR^{1/2} + 1).

Since the common use of Odds Ratio as if it were an effect size tends to exaggerate the association between risk factors and disorders, moving to use of SRD, NNT or AUC instead will only diminish the apparent clinical importance of many risk factors. This is unwelcome news, but probably reflective of the truth. A few special cases such as infectious diseases or single gene disorders aside, there are probably very few disorders for which a single risk factor can completely explain onset. It is likely that for complex disorders (heart disease, cancer, psychiatric disorders) multiple risk factors “work together” in parallel or in sequence to have influence the onset of a disorder. Thus examining how risk factors “work together” is crucial to prediction and prevention efforts.

To date, multiple possible risk factors are often simply included as independent variables in a linear model, completely ignoring (1) their timing relative to each other, (2) possible correlations between risk factors, (3) their possible interactive effects on incidence. Moreover the linearity assumptions and the link function selected (e.g., log-

The MacArthur Model is an alternative approach that takes these factors into consideration [_{1} and RF_{2} in the population of interest is shown in _{1} = 1 is Q, and that RF_{2} = 1 is P. The parameter ρ (the product moment correlation or phi coefficient between the risk factors) is an indicator of non-random association between the two risk factors in that population, with ρ = 0 indicating stochastic independence between them.

In _{1} for the two values of RF_{2}, and the conditional SRDs for RF_{2} for the two values of RF_{1} are also shown. The “main effect of RF_{1}” (ME_{1}), and the “main effect of RF_{2}” (ME_{2}) are respectively the averages of the corresponding conditional SRDs, and the “interaction effect of RF_{1} and RF_{2}” (INT) is the difference between those conditional SRDs (the same for both sets of conditional SRDs).

If RF_{2} were ignored, the SRD relating RF_{1} to outcome (for each possible value of T) is equal to:

If RF_{1} were ignored, the SRD relating RF_{2} to outcome is equal to:

SRD_{1} and SRD_{2} are called the “raw”, “overall”, “marginal”, or “univariate” effects of RF_{1} and RF_{2} on the outcome, indicating the association of that risk factor on outcome in the population sampled when all other risk factors are ignored. While the formulas above are exact only for two binary risk factors predicting incidence between 0 and fixed T, the principle is true in general. The “raw” effect of a risk factor (SRD_{1} or SRD_{2}) comprises three sources: the unique effect of the risk factor itself, the main effects of other risk factors correlated with the risk factor of interest, and the interactions of the risk factor of interest with other risk factors. How much of each source is represented depends on the joint distributions of the risk factors in the population (here P and Q and their correlation as indicated in

The main effect of a risk factor of interest (ME_{1} or ME_{2}) does not convey the effect size in the total population unless the other risk factor(s) are neither correlated (ρ = 0) nor interactive (INT = 0) for the risk factor of interest. Nor does the main effect of a risk factor convey the strength of association in each subpopulation “matched” on other risk factor, unless there is no interactive effect. In short, the research questions addressed by the raw effect size SRD_{1}, the conditional SRDs for risk factor 1 in the subpopulations with RF_{2} = 1 and 0, and the main effect of RF_{1}, are all usually different, not because one is “right” and the others “wrong”, but because they address the association of RF_{1} with outcome in different populations. Thus “adjusting” for RF_{2} in a linear model does not usually “remove the effect of RF_{2}”. It changes the research question from that of focusing on the effect size of RF_{1} in the total population sampled, to that of the common effect size of RF_{1} in the subpopulations defined by RF_{2} in absence of an interaction, or to some weighting of those effect sizes in the presence of an interaction, the weighting determined by whether or not the interaction was included in the linear model and the joint distribution of the risk factors.

When risk factors are coded +/−1/2 in a linear model (chosen because SRD is here the effect size):

If there is no _{1} and RF_{2}, then there are three possible roles for RF_{1} and RF_{2} for the incidence in question:

_{1} and RF_{2} are “independent risk factors”_{1} and ME_{2} are non-zero. In such cases, the two risk factors play a joint role in determining the outcome and would continue to be of parallel interest. For many disorders, gender and ethnicity are independent risk factors.

_{1} or RF_{2} “is proxy to” the other_{1} and RF_{2} are correlated risk factors), and only one matters. Thus if ME_{1} = INT = 0, then RF_{1} is proxy to RF_{2}. If ME_{2} = INT = 0, then RF_{2} is proxy to RF_{1}. In such cases, the proxy variable should be set aside from further consideration. For example, a measure of family income is often proxy to a well-measured socio-economic index for the family, because family income is usually one component of the index, but less reliably measured than the index as a whole.

_{1} and RF_{2} are overlapping risk factors_{1}, ME_{2} and INT are not equal to zero, RF_{1} and RF_{2} are overlapping. This situation often arises when two risk factors tap the same underlying construct with about equal, but less than perfect, reliability/validity. Then it is preferable to combine the two risk factors to generate a more reliable/valid measure of whatever their common construct. Combining two somewhat unreliable measures of the same construct would disattenuate reliability and thus increase the effect size. Moreover, to do so might focus attention more precisely on the appropriate underlying causal factor. For example, infant birth-weight and gestational age tend to be highly correlated and risk factors for many of the same subsequent outcomes. Some measure of birth maturity that included consideration of both, perhaps even including other indicators of physiological and neurological maturity at birth, might serve prediction and prevention purposes better than either separately.

On the other hand, when there is temporal precedence, with RF_{1} preceding RF_{2} in time, there are four possibilities:

_{1} moderates RF_{2}_{2} differ depending on whether RF_{1} = 1 or = 0. Since a later risk factor, RF_{2}, “works differently” depending on what earlier RF_{1} is, this suggests that the population should be stratified on RF_{1} for further studies. For example, a genotype may be a susceptibility factor for a later environmental risk factor. For those with one genotype, RF_{2} may be a strong risk factor for outcome; otherwise, RF_{2} may be a much weaker risk factor, may not be a risk factor at all, or may even be a protective factor. Indeed, seeking genetic moderators of drug on therapeutic response is the basis for current interest in pharmacogenetics. Moderators are also the basis of personalized medicine[

_{2} mediates RF_{1}_{2} is non-zero. In this case, RF_{2} explains part of the effect of earlier RF_{1} on the outcome. When a mediator is identified, this suggests the possibility of a chain leading from RF_{1} through RF_{2} to the outcome. For example, unsafe sex practices lead to HIV infection that leads to onset of AIDS: HIV infection mediates the effect of unsafe sex practices on AIDS. Mediator relationships are important in that the chain provides multiple opportunities for preventive intervention: one might break the chain by breaking any of the links in the chain.

_{2} is proxy to RF_{1}_{2} and INT are both zero. As is the case for proxy risk factors in absence of temporal precedence, the proxy factor should be set aside. Gender, for example is a risk factor for teen onset of depression. There are many correlates of gender during the pre-teen years, e.g., ball-throwing ability at age 10, that might be found to be risk factors for teen-onset depression when considered individually, but would be found proxy to gender when both were considered. It is probably not worthwhile to teach young girls to throw a ball better to prevent teen depression.

_{1} and RF_{2} are independent risk factors_{1} does not moderate RF_{2}. As is the case for independent risk factors in absence of temporal precedence, both factors would continue to be of parallel interest.

It should be noted that these definitions are more precise than certain current usages in epidemiology. For example, “independent risk factors” in the MacArthur model are required to be stochastically independent of each other. Usual usage of the term does not require such independence. In the way the term is often used, only proxy risk factors would

The Last[

Moreover the term “confounder” is avoided. Last defines the term as “A variable that can cause or prevent the outcome of interest, is not an intermediate variable (mediator), and is associated with the factor under investigation.” (Page 35). That would preclude mediators explicitly, and preclude moderators and independent risk factors because they are not associated with the factor under investigation, leaving proxies or overlapping factors. However, in practice, the term “confounder” is often used more loosely to refer to risk factors in which the researcher is not specifically interested. Thus in a study examining the relationship of diet and exercise to onset of obesity, a dietician might designate exercise as a “confounder”, while an exercise physiologist might designate diet as a “confounder”.

There is as yet little history of seeking moderating/mediating relationships between risk factors for specific outcomes. Consequently, how to conduct such a search to general moderator/mediator hypothesis, and how to conduct studies to formally test such hypotheses, is still work in progress. However, there are a few examples in the literature that might suggest possible options [

Many of the problems here discussed are long and well-known, and yet continue to occur. For example, Caspi and colleagues [

Science progresses by identifying its weaknesses and repairing them. However, it is always very hard to give up methods long used in previous studies and thus very familiar, to be replaced with new and unfamiliar methods. “That’s not the way it is always done, or the way everyone does it!” is a common rejoinder to suggestions for alternative approaches to deal with the problems here discussed.

Such resistance is particularly and predictably strong when the alternative approaches are more difficult and costly to implement, which is here true. To be asked to stratify a mixed age sample, say 20–80 years of age, into relatively short age strata (say entry at 20–24, 25–29,

Resistance is particularly, and again, understandably, strong when alternative methods are

The rejoinder hardest to refute is that the effect of time can be dealt with simply by “adjusting for time” in a linear model. In some cases this may indeed be possible. However, the assumptions and fit of such models should be carefully checked, for if the assumptions that underlie valid results from application of such models do not hold in the population sampled, the results of such adjustments may be more biased than the results in absence of adjustment.

In summary, the issue of time should be central to all thinking in epidemiology research, which would necessitate careful thinking about sampling, measurement, design, analysis, and, perhaps most important, about the interpretation of the results from such studies that might influence clinical decision-making and subsequent clinical research.

Comparison of two hypothetical survival curves for the subpopulations with RF = 1 and RF = 0 (non-proportional hazards).

ROC curve comparing survival in the “high” and “low” risk groups.

Equipotency curves comparing the locus of all points (p_{0}, p_{1}) with p_{1} > p_{0}, the ROC plane with Odds Ratio = p_{1}(1 − p_{0})/[(1 − p_{1})p_{0}] and NNT = 1/(p_{1} − p_{0}) = 3. (The ^{1/2} + 1)/(OR^{1/2} − 1)).

Comparison of Y = (OR^{1/2} − 1)/(OR^{1/2} + 1) with 1/NNT for various follow-up times for the survival curves shown in

The joint distribution of two binary risk factors (RF1 and RF2). with marginal probabilities P = Prob(RF2 = 1) and Q = Prob(RF1 = 1), and the product moment correlation coefficient (ρ) between them.

RF1 = 1 | RF1 = 0 | ||
---|---|---|---|

RF2 = 1 | PQ+ρ(PP’QQ’)^{1/2} |
PQ’−ρ(PP’QQ’)^{1/2} |
P |

RF2 = 0 | P’Q−ρ(PP’QQ’)^{1/2} |
P’Q’+ ρ(PP’QQ’)^{1/2} |
P’ = 1 − P |

Q | Q’ = 1 − Q |

The incidence of disorder by time T for each combination of RF1 and RF2, and the marginal SRDs.

RF1 = 1 | RF1 = 0 | Marginal SRD for RF1 | |
---|---|---|---|

RF2 = 1 | 1 − S_{11}(T) |
1 − S_{10}(T) |
S_{10}(T)−S_{11}(T) |

RF2 = 0 | 1 − S_{01}(T) |
1 − S_{00}(T) |
S_{00}(T)−S_{01}(T) |

Marginal SRD’s for RF2 | S_{01}(T)−S_{11}(T) |
S_{00}(T)−S_{10}(T) |