Observational studies have brought important insight into disease etiology. During the past decade however, the validity of observational studies has been questioned [1
]. This is due to the fact that the role of selected risk, or protective, factors identified via observational studies could not be confirmed by subsequent large randomized controlled trials. For instance, hormonal replacement therapy appeared to protect women against coronary heart disease in observational studies [2
], whereas randomized trials showed no such protection [3
]. Other examples are given by antioxidant vitamin supplementation [4
One cannot, for ethical and technical reasons, randomize risk factors using controlled trials in humans. The identification of risk factors therefore relies on observational studies, which are prone to spurious results due to confounding factors, reverse causation, and/or selection biases [7
]. As a consequence, it is difficult to firmly establish causal relationships between risk factors and disease. Most common diseases (e.g., cancer, cardiovascular disease, etc
.) are complex and are influenced by multiple risk factors that may be correlated with each other. In this context, each factor is expected to have a small influence on disease risk. Epidemiologists have the hard task to determine whether a putative risk factor is causally related to a specific disease, independently of all other risk factors. A promising approach to help epidemiologists in this task is Mendelian randomization. In this review, we first recall the principles of a “Mendelian randomization” approach in observational epidemiology (Section 2), we then provide some technical explanation of the method of instrumental variable (Section 3), followed by simulations and an example with real data (Section 4). We then present the results of a systematic search on original articles having used this approach (Section 5), discuss its limitations (Section 6) and present concluding remarks (Section 7).
2. Mendelian Randomization in Observation Epidemiology
Mendelian randomization refers to the random allocation of alleles at the time of gamete formation. A specific genotype carried by a person therefore results from two such randomized transmissions, one from the paternally inherited allele and the other from the maternally inherited allele. A logical consequence of these randomizations is that genotypes are not expected to be associated with known (measurable or not) or unknown confounders for any outcome of interest, except those lying on the causal pathway between the genotype and the outcome. This should hence allow analyzing the genotype-risk factor association and the genotype-outcome association in an unconfounded manner. By combining appropriately the results of these two analyses, one can get an estimate of the risk factor-outcome association, which is itself not confounded. This is analogous to randomized controlled trials (of sufficient sample size), in which the random allocation of treatment (or preventive measure) is expected to lead to an even distribution of (known or unknown) confounding factors across each groups. The term “Mendelian randomization” is now frequently used in observational epidemiology to refer to the use of genetic variants to estimate a causal effect between a specific modifiable risk factor and a trait/disease of interest. The idea is to overcome some of the problems encountered in observational epidemiology, such as residual confounding and reverse causation, by taking advantage of the natural random allocation of alleles during meiosis [8
We here provide an example to illustrate this approach. The aldehyde dehydrogenase 2 (ALDH2
) gene encodes the enzyme aldehyde dehydrogenase, which catalyzes the chemical transformation from acetaldehyde to acetic acid. Carriers of the ALDH2 *2*2
genotype have reduced alcohol consumption because of adverse reactions (facial flush, headache, nausea and drowsiness) due to acetaldehyde accumulation. This fact has been used to show that alcohol intake increases the risk of esophageal cancer [9
] or head and neck cancer [10
], which is consistent with the findings from observational studies. Whereas reported alcohol consumption may be subject to measurement errors, ALDH2
genotypes can be measured accurately, are present since birth, result from the random allocation of the paternally and maternally inherited alleles, are strongly associated with alcohol consumption, and therefore provide a unique opportunity to assess, in an unconfounded manner, the risk of disease associated with alcohol consumption. As we shall discuss in Section 6, such an approach - although appealing - also raises some methodological issues.
Historically, the first description of the concept of Mendelian randomization in observational epidemiology is attributed to Katan [11
], who suggested to use the APOE
gene to infer causality between cholesterol and cancer. The concept was further developed by Davey Smith and Ebrahim [7
], who have shown that the causal effect of a risk factor (X) on an outcome (Y) can be estimated by combining the effects of a genetic variant (Z) on X and on Y, provided that certain assumptions are met (see Figure 1
). Thomas and Conti [14
] have shown that the Mendelian randomization approach was in fact an application of the instrumental variable approach, which had been used since more than 70 years by econometricians. Wehby et al.
have recently advocated that the term “Mendelian randomization” should be replaced by “instrumental variable analysis with genetic instruments” [15
]. We tend to agree with this latter statement after having reviewed the medical literature and observed that the term “Mendelian randomization” was used with different meanings by different researchers, which might be confusing.
3. The Method of Instrumental Variables
We consider the case where an association between a continuous (or binary) modifiable exposure X
and a continuous response Y
is measured via a beta coefficient in a linear regression, defined as the average increase in Y
is increased by one unit (respectively, when changing the category of X
if the exposure is binary). When observing such an association in epidemiological research, however, it is often difficult to determine which of the two variables (X
) is the cause and which the effect, or whether a third variable (a confounder, U
) related to both variables is responsible for the observed association. Moreover, measurement error could attenuate the beta coefficient. Thus, it is not obvious how a significant non-zero (e.g., positive) beta coefficient obtained from a classical (ordinary) least squares estimate should be interpreted. Here are five possible interpretations (among many others):
The beta coefficient is a consistent estimate of the causal effect of X on Y.
The beta coefficient is actually underestimating the true causal effect of X on Y because of measurement error.
The beta coefficient is overestimating the true causal effect of X on Y because of the presence of a confounder which is positively related to both X and Y.
The non-zero beta coefficient is entirely due to the presence of a confounder which is related to both X and Y: in fact there is no causal effect of X on Y.
The beta coefficient is non-zero because of a causal effect of Y on X, not of X on Y (i.e., reverse causation).
In other words, if the interest lies in assessing “the causal effect of X
the effect that would be observed if one could intervene and change someone’s X
level by one unit, leaving other characteristics unchanged, no definitive conclusion can be drawn from such an analysis. We shall see below, illustrated in the context described by Figure 1
, how the method of instrumental variables can help in this regard.
A linear model (consistent with Figure 1
) is given by:
is the causal effect of X
and where γ1U
plays the role of the error term, U
being some unobserved confounder. Whenever X
is correlated with the error term (see Figure 1
), the expectation of the least squares estimate of the slope in this model, which we denote by
, will be different from β1
The method of instrumental variables has been proposed to correct for the bias of the least squares estimate. For this, we need to have at our disposal an “instrumental variable”, or instrument Z, for the time being continuous or binary, satisfying the following conditions: (1) Z is correlated with X, (2) Z is independent from U, and (3) Z and Y are independent given X and U. Note that the former of these conditions is verifiable from the data, whereas the latter two are largely not.
A second linear model (consistent with Figure 1
) is then as follows:
plays the role of the error term in the model. Since Z
is by assumption uncorrelated with this error term, the coefficients of this second model are estimated without bias by least squares. Note that the first model can be rewritten as:
, β3 = β1β2
, we obtain hence a third linear model:
is the error term. Since Z
is by assumption uncorrelated with this error term, the coefficients of this third model are also estimated without bias by least squares.
At the end, the parameters of the first model can be consistently estimated using relationships α1 = α3 − β1α2 and β1 = β3/β2, the denominator β2 being non zero by assumption. In particular, the instrumental variable (IV) estimate of the causal effect β1 in the first model is the quotient of the two least squares estimates of slope parameters β3 and β2 in the third and second models. Since the expectation of a quotient of two estimates is asymptotically equal to the quotient of the expectations of these estimates, the IV estimates are asymptotically unbiased, but they may be biased in finite samples.
Asymptotically, the IV estimates are normally distributed and explicit formulae for the standard errors are available, enabling to calculate confidence intervals and to test for the nullity of the causal effect β1
in the first model (as calculated e.g., with the ivregress 2sls command implemented in Stata 10.0). The standard error of the estimates will depend, among others, on the percentage of explained variance in the second model (itself related to the percentage of explained variance in the third model). If this percentage is low, the instrument is said to be weak, the standard errors will be large and the test above will have low power. Moreover, the bias of the IV estimates is typically larger, and the asymptotic normal distribution of the IV estimates may be a poor approximation to the true distribution, when the instrument is weak, the inference being then unreliable [17
]. In practice, an instrument is said to be weak if the F-statistics for testing the nullity of parameter β2
in the second model is inferior to 10 [18
Another equivalent way to calculate the IV estimates (but without their standard errors!) is to perform a “two-stage least squares”, regressing X on Z in a first stage (this is the second model above), and regressing Y on the obtained fitted values X̂(Z) in a second stage. The method of instrumental variables can be readily extended to the case of several instrumental variables (and therefore to the case of a qualitative instrument), which may be useful to improve the precision of the instrumental variable estimate. One can also adjust for additional covariates in each of the above models.
In addition to test for the nullity of the causal effect β1, one may also test for the absence of correlation between X and the error term in the first model, implying the equality of the parameters β1 and
, using the Durbin-Wu-Hausman test. This may be of some interest when comparing several candidate models which may have generated the data (see the simulations below).
4. Simulations and Example
To illustrate that the method of instrumental variable is effective, we simulated data from five models consistent with the five above-mentioned interpretations (Table 1
). In each case, we simulated an instrument Z
satisfying the conditions. For simplicity, we took all intercepts in these models to be 0, all slopes to be 1, and the variables which were generated at each step were taken to be N(0,1), i.e.
, normally distributed with mean 0 and variance 1.
The causal effect of X
that we are looking for is β1
= 1 under the first three models, and is β1
= 0 under the last two models. Boxplots of the least squares (LS) estimates and of the instrumental variable (IV) estimates of parameter β1
obtained from 1,000 samples of size n = 100 under each of the five models are shown on the top panel of Figure 2
. The LS estimate is unbiased under the first model, is consistently too small under the second model, and is consistently too large under the last three models. By contrast, the IV estimate is almost unbiased under each of the five models, which is actually remarkable. One can also notice that the IV estimate shows a higher variability than the LS estimate, which is the price to pay for correcting the bias of the latter. The Durbin-Wu-Hausman test was significant in 4.1% (which was close to the nominal 5% level) of the samples generated from the first model, for which
holds, in 66% of the samples generated from the second model, for which
holds, and in 88%, 90% and 100% from the samples generated respectively from the third, fourth and fifth models, for which
To provide an idea of what may happen when using a weak instrument, we considered the same five models, but the slopes involving Z
were set to 0.25 (instead of 1) in each model. In addition, we reduced the sample size to n = 25. Under that setting, the F-statistic in the first stage regression was smaller than 10 in more than 95% of the generated samples. Boxplots of the estimates obtained from 1,000 samples are shown on the bottom panel of Figure 2
. One can see that the variance of the IV estimates dramatically increased (compared to the top panel), while some non-negligible bias appeared.
We next provide an example with real data to illustrate that the method of instrumental variable is able to correct for the bias of least squares in a case of reverse causation. We used the 1,268 participants of the population-based CoLaus study [19
], who reported that they consumed alcohol regularly and who had available data for genetic markers located with the gammaglutamyl transferase 1 (GGT1
) gene as well as circulating GGT levels (X
). CoLaus participants have been genotyped using the Affymetrix 500 K chip, alcohol consumption was assessed using a standardized questionnaire and coded in units of alcohol per week, and GGT levels were measured using standard procedures as previously described [19
]. As we were interested in exploring an example of reverse causation, we chose Y
to be the reported alcohol consumption and tested whether circulating GGT (X
) could cause alcohol consumption (which we know is the opposite of the reality) using the best GGT1
marker as our instrument (Z
). Rs2017869 explained 1.12% of circulating GGT levels. The parameter β1
was estimated using least squares and the method of instrumental variables (the latter with the ivregress 2sls command implemented in Stata 10.0). The LS estimate (95%CI) was 5.53 (4.73;6.33) mmol/L per risk allele. The IV estimate (95%CI) was −4.60 (−13.82; 4.63) mmol/L per risk allele, which was significantly different from the LS estimate in a Durbin-Wu-Hausman test (P = 0.03), and not significantly different from zero. Thus, while the result provided by least squares was highly significant, the instrumental variable approach did not show any evidence for a positive causal association of GGT on alcohol consumption.
5. Review of Observational Studies Using Mendelian Randomization
We searched MEDLINE using the following «Mendelian randomization» OR “Mendelian randomisation”, which retrieved 99 citations (January 13, 2009). We acknowledge that this search strategy might not have retrieved all publications using the concept of Mendelian randomization, but it should provide a good overview of what has been published. The aim was to identify original articles reporting results from an observational study using a Mendelian randomization approach. We also searched references from review papers and original articles, as well as citations of these papers.
We identified 23 studies with a dichotomous trait as the outcome of interest (Table 2
) and 15 studies with a continuous trait as the outcome of interest (Table 3
). Considering that the instrumental variable approach has been introduced, and is well understood, for a continuous outcome, it was a bit of a surprise to find that a majority of studies in fact applied this method to a dichotomous outcome (using non-linear models and odds-ratios to quantify the associations, for which the method has not been quite validated, see also the next section). Thirteen out of 23 studies focusing on binary outcomes (Table 2
) reported results compatible with a causal association. Most studies were in the field of cardiovascular epidemiology and cancer epidemiology. For continuous outcomes (Table 3
), half of studies reported some evidence for causality and most studies were in the field of cardiovascular epidemiology. Most instruments reported in these studies were weak (Figure 3
). We also found many studies that claimed to use a Mendelian randomization approach although they only analyzed the genotype-outcome association, hence focusing on hypothesis testing (i.e
., to confirm or disprove causality). Yet, what is of interest in the Mendelian randomization approach is to estimate the causal effect of X
, the modifiable factor, on Y
and not simply the association between Z
6. Some Limitations of Mendelian Randomization
In order to use Mendelian randomization to infer causality in observational epidemiology, numerous conditions need to be fulfilled [13
]. A major limitation of this approach is that it is difficult, in practice, to meet al.
l these conditions for a given risk factor—outcome association. To fulfill the first condition, Z
should be correlated (genetic instruments for common complex diseases are typically quite weak). This indirectly implies that there is some level of allelic homogeneity (i.e
., common variants rather than rare variants). Note that for many exposures, no suitable genetic instrument is available. The second and third conditions are the problematic ones. They state that Z
is (marginally) independent from all potential confounders U
, and that Z
are independent conditionally on X
]. In an excellent introduction to Mendelian randomization, Didelez and Sheehan [58
] wrote that “if we know a gene closely linked to the phenotype without direct effect on the disease, it can often be reasonably assumed that the gene is not itself associated with any confounding factors”. See however Section 7 of that paper for situations in which these conditions are not satisfied. Mendel’s second law (i.e
., the law of independent assortment of alleles at the time of gamete formation) is not always true in that genetic variants located on the same chromosome, particularly for close loci, do not segregate independently (i.e
., they are linked), as detailed in Lawlor et al.
]. At the population level, such physical linkage patterns result in linkage disequilibrium, i.e
., correlations between alleles at nearly loci. In genetic epidemiology, the second condition implies, among others, that there should be no confounding due to linkage disequilibrium (i.e
., instrument Z
should not be correlated with other genetic variants having an effect on the outcome of interest, Y
]. However, the instrument Z
does not necessarily need to be causally associated with X, in that another genetic variant associated to both Z
might be the true causal variant [13
]. Similarly, population stratification, i.e.,
the existence of population subgroups with different allele frequencies and outcome distributions, may violate this second condition as well. In the Mendelian randomization context, confounding may exist if the subgroups (these often correspond to ethnic groups) are associated to both Z
Also, there should be no pleiotropy, (i.e
having multiple effects, which do not pass through X
). This is however only a problem if the other functions of Z
are associated to Y
]. There should be no canalization (also called developmental compensation), which corresponds to a functional adaptation to a specific genotype influencing the expected genotype-disease association [13
]. For instance, a gene expressed during fetal development may enhance the expression of other genes having compensatory effects on the outcome [13
]. For most genetic variants involved in complex traits, the effect size is small and we do not know if such modifications would lead to developmental compensation. Furthermore, there should be no segregation distortion at the locus of interest. Although unlikely, it has been reported that some loci in the human genome show some evidence of such distortion [59
]. Of course, there should be no selective survival due to the genetic variant of interest. Considering that the randomization occurred many years before the analysis is conducted, if a specific genotype were associated with increased early mortality, the genotypic distribution at the time of the study might not reflect the initial distribution. For instance, the C677T MTHFR
variant has been associated with fetal viability [60
]. And finally, although this has rarely been assessed so far, there should be no parent-of-origin effect (i.e
., the effect of the paternally transmitted allele should be the same as the effect of the maternally transmitted allele).
A practical condition is that there should be enough data to establish reliable genotype-intermediate phenotype, or genotype-outcome, associations. In our literature review, we observed that for many publications, estimates for these two associations came from different studies. Whenever independent studies have analyzed these two relationships, separate meta-analyses can be conducted. For studies having assessed both relationships, a multivariate model is needed in order to take into account the correlation in the genotype–phenotype and genotype–disease associations. Minelli et al.
proposed a method to use meta-analysis results in a multivariate Mendelian randomization approach [62
]. Note that their approach is based on odds ratios (see below). According to some authors, the advantage of using the same study (or studies) to estimate both associations include (1) being in a better position to examine whether or not the assumptions underlying the instrumental variable method have been violated or not and (2) having greater precision [13
Many of the studies we identified applied a Mendelian randomization approach with a binary outcome. While econometricians have proposed instrumental variables methods for binary outcomes (see Lawlor et al.
] for a nice review), the generalization of instrumental variables to non-linear systems is not at all straightforward and may require additional assumptions [13
]. One possibility is to build a linear model using risk differences, instead of risk ratios [64
]. Another is to use a latent model, in which the underlying outcome variable is assumed to be continuous and the observed binary outcome reflects whether or not a specific threshold has been reached (e.g., probit models). Log-linear and logistic structural mean models for binary outcomes were also developed [65
], where it was not possible to avoid some bias. Palmer et al.
] proposed an adjusted IV estimate to reduce the bias of the classical IV estimate applied to a binary outcome, but admitted to ignore whether, and under what conditions, the estimated parameter had a strictly causal interpretation. They also noted that “instrumental variable theory has not been fully generalized to non-linear situations”. Finally, one may obtain bounds on the causal effect using a non-parametric method whenever the instrument, the risk factor and the disease are all categorical [58
]. Note that none of the published studies of binary outcomes we found used these methods.