Previous Article in Journal
Comparison of Pathophysiological Mechanisms Among Crystal-Induced Arthropathies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Mendelian Randomization Studies: A Metric for Quality Evaluation

by
Fiorella Rosas-Chavez
1 and
Tony R. Merriman
1,2,*
1
Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, AL 35294, USA
2
Department of Microbiology and Immunology, University of Otago, Dunedin 9016, New Zealand
*
Author to whom correspondence should be addressed.
Gout Urate Cryst. Depos. Dis. 2025, 3(2), 8; https://doi.org/10.3390/gucdd3020008
Submission received: 26 December 2024 / Revised: 1 May 2025 / Accepted: 12 May 2025 / Published: 20 May 2025

Abstract

:
Background: Mendelian randomization (MR) is a genetic epidemiological method used to infer causal relationships between exposures and outcomes. Its application in hyperuricemia and gout has grown exponentially owing to the ready availability of summary statistics from genome-wide association studies and the ease of applying the two-sample MR technique. However, indications of poor study quality suggest the need for systematic evaluation. Objective: This study evaluated the quality of two-sample MR studies on hyperuricemia and gout and developed a scoring system to help reviewers and readers assess their quality and validity. Methods: A systematic review was conducted on 86 two-sample MR studies published between 2016 and 2024. Studies were assessed using a scoring system encompassing study design, statistical methods, result interpretation, and adherence to STROBE-MR guidelines. Scores could range between −9 and 21. Trends in quality over time were analyzed using regression models. Results: Study quality scores ranged from 0 to 19, with a mean of 9.1 and median of 11, demonstrating wide variability. High-quality studies adhered to MR assumptions, used independent datasets, and conducted replication analyses, while lower-quality studies often failed to correct the p-value when needed, test for confounders, address dataset overlap, or report study power. Despite the increased publication of MR studies, overall quality has not improved over time. Conclusion: There is variability in two-sample MR study quality. Our proposed scoring system offers a practical framework for evaluating MR studies, aiding researchers and clinicians in identifying robust findings while promoting higher methodological standards.

1. Introduction

Gout is caused by a response of the innate immune system to monosodium urate crystals deposited in the joints of people with hyperuricemia [1]. Hyperuricemia and gout are strongly comorbid with renal cardiometabolic conditions [2,3] and also associated with various other diseases including cancer and neurological conditions [4]. Hence, there is much interest in understanding causal relationships between gout, hyperuricemia and other conditions.
Mendelian randomization (MR) is a genetic epidemiological method that aims to assess causal relationships between an exposure and an outcome [5]. Instead of the exposure itself, it uses inherited genetic variants as instrumental variables (IVs). The variants are randomly assigned at conception and remain unaffected by environmental influences. This randomness, as explained by the second law of independent assortment of genes by Mendel, is what gives MR its name and makes it analogous to a randomized controlled trial, where groups are assigned randomly to different exposures. In application to urate control and gout, the “intervention” would be the inheritance of a urate-increasing or gout risk allele, and the “control group” would consist of individuals who inherit the other allele. MR analysis requires the instrumental variable to influence the outcome directly through the exposure, which is a difficult requirement to evaluate [6].
MR has been particularly useful in challenging previous causality associations. For instance, MR has provided evidence against a causal role for high-density lipoprotein in cardiovascular disease [7] and evidence against a causal role for C-reactive protein in coronary heart disease [8], while supporting causality for low-density lipoprotein [9]. In gout, MR has shown that circulating urate is not causally associated with chronic kidney disease (CKD) [10,11], aligning with findings from randomized clinical trials that found no benefit of urate-lowering treatments on CKD progression [12,13]. MR studies have demonstrated a causal role for increased BMI and insulin resistance in hyperuricemia and gout [5,14]. An umbrella review of MR and other studies concluded that, while hyperuricemia is causal for gout and nephrolithiasis, it does not play a causal role in other disease phenotypes [5,15].
The first MR studies used simple linear or logistic regression [5]. If individual-level data were available, the two-stage-least-squares method was often used, which allowed an estimate of the effect size of any causal relationship. More recently, MR studies have virtually exclusively used readily available summary statistics from genome-wide association studies (GWASs) for each of the exposure and outcome in an approach termed two-sample MR. An alternative, also using GWAS summary statistics, is a likelihood-based method where summary statistics are directly modeled with a likelihood function.
The availability of user-friendly statistical packages, for example, MendelianRandomization in R [16], and freely downloadable GWAS summary statistics from, for example, the UK Biobank [17], FinnGen [18], and the Veteran Affairs Million Veterans Program [19], has commoditized two-sample MR studies, substantially increasing the volume of publications in the literature. This creates challenges in the evaluation of study quality for people without a background in genetic epidemiology. Therefore, the aim of this paper was to systematically evaluate two-sample MR studies in hyperuricemia and gout, and to provide guidelines for researchers and clinicians who may not be experts in MR to critically assess these studies. While our focus is on hyperuricemia and gout, we anticipate that our guidelines will have broad applicability in other medical fields.

2. Methods

2.1. Study Selection and Eligibility Criteria

We conducted a systematic review of Mendelian randomization (MR) studies focusing on hyperuricemia and gout as either exposure or outcome variables. We carried out a PubMed search on 27 March 2024, using the terms “Mendelian randomization urate” and “Mendelian randomization gout”. Only peer-reviewed studies were included if they used two-sample MR analysis and assessed urate or gout as an exposure or outcome variable. Studies not focused primarily on urate or gout were excluded. From an initial pool of 234 studies after full-text review, 56 studies were excluded due to being preprints, errata, commentaries, MR methodology articles, or review articles. Another 47 studies were excluded for not using two-sample MR, and 45 were excluded because urate or gout were not the primary focus. Ultimately, 86 studies met the inclusion criteria. (Figure 1).

2.2. Scoring System for Study Quality Assessment

We developed a scoring system with a possible range of −9 to 21 to evaluate the studies. This system assessed factors such as study design, statistical methods, interpretation of results, and adherence to STROBE guidelines [20] (Table 1). Below, we describe the various factors.
The scoring system was designed to prioritize the two most critical aspects that ensure the validity of Mendelian randomization studies: appropriate methodology and data analysis. Approximately 40% of the total score is allocated to the study design category and another 40% to statistical methods, which score the quality of data analysis. The remaining points evaluate whether the study’s conclusions align with its statistical evidence. The weighting of scores for each individual criterion was designed to reward a factor that contributed to a higher quality two-sample MR study.
The STROBE-MR criterion was added at a later stage as a bonus point. Because many studies were published before the guideline’s release in 2021, we chose not to penalize those that lacked compliance. Instead, we awarded +1 point only when explicit adherence was reported, ensuring that it would not impact the total score of older studies significantly.

2.2.1. Study Design

The study design category included an assessment of the study rationale (−1 to 2 points) by evaluating the quality of prior evidence supporting the association. Bidirectional studies, which test both the effect of the exposure on the outcome and the effect of the outcome on the exposure, were assigned 1 point because positive associations in both directions may indicate the presence of confounding factors [21]. We also evaluated datasets based on several criteria: use of the most recent GWAS data (0 or 1 point), matching ancestries (−1 to 1 point), and absence of participant overlap (−1 or 1 point). Matching ancestries was emphasized to minimize bias from ancestral differences in allele frequencies and linkage disequilibrium patterns [21]. For studies comparing a multi-ancestry dataset to a single-ancestry dataset, 0 points were assigned if the specific ancestry of interest was not extracted or analyzed from the multi-ancestry dataset, as detailed in Table 1. In the same way, dataset independence was also assessed, as no participant overlap between datasets reduces overestimation of genetic associations [22]. Lastly, we evaluated whether replication was included in the study design, assigning 3 points for the inclusion of a replication strategy and −1 if replication was not conducted.

2.2.2. Statistical Methods

The statistical methods evaluation included whether or not authors had addressed adherence to the three core MR assumptions (adequate strength of the instrumental variable, excluding genetic variants associated with known confounders of the relationship between exposure and outcome and considering whether the outcome is directly affected by the exposure) [6], appropriate p-value correction for multiple testing and whether or not power of the study was considered. One point was assigned if the selected SNPs demonstrated genome-wide significance (p < 5 × 10−8) or an F-statistic > 10. Additionally, we evaluated whether the SNPs were pruned for linkage disequilibrium (LD) using an R2 threshold of <0.1 (1 point), ensuring that only independent genetic variants were included in the instrumental variable. This step is important as it reduces bias in effect estimates [23]. The second assumption states that no confounders should affect the causal relationship being assessed. This is challenging to test objectively, as all observational associations derived from epidemiological studies have unmeasured confounders [6,24]. Nevertheless, we assigned 1 point to studies that either adjusted for potential or known confounders or excluded SNPs associated with these confounders. The third assumption requires that the genetic variant influences the outcome exclusively through exposure [6]. It is commonly tested using methods such as MR-Egger and MR-PRESSO, which detect outlier SNPs that may influence the outcome through pathways unrelated to the exposure. We did not directly score the use of them because most Mendelian randomization packages available in R incorporate these tests. However, an additional point was given to studies that conducted a mediator analysis alongside these MR methods (Table 1).
The statistical methods evaluation also comprised the presentation of the SNPs associated with the exposure along with effect sizes, effect alleles, and p-values. Additionally, up to 3 points were given for appropriate multiple testing corrections (e.g., Bonferroni), and 2 points for considering study power in the methods or results sections.

2.2.3. Interpretation of Results

We evaluated whether the results were interpreted correctly by considering several factors, such as the significance of the results after applying a Bonferroni correction when appropriate, if the authors accounted for evidence of high pleiotropy (e.g., a significant MR-Egger intercept or high distortion values in MR-PRESSO), and if the findings were consistent across multiple MR methods. The score for this criterion was 2 or −2 points.

2.2.4. STROBE Guidelines

One point was assigned to studies that presented a table describing compliance with the MR-STROBE guideline criteria.

2.3. Data Extraction and Statistical Analysis

Our data extraction included study year, definitions of exposure and outcome, dataset sources, MR methods, results, and score components (Supplementary Table S1). We used Shapiro–Wilk tests for score distribution and examined trends in article quality using linear regression, with significance set at p < 0.05.

3. Results

PMID and summary information of the 86 articles, published between 2016 and March 2024, are presented in the Supplementary Table S1. Among them, 70 focused on urate as either the exposure or outcome variable (Figure 2). In 59 of the 70 studies, urate was the exposure, with 27% (16/59) reporting a causal relationship. Supplementary Table S2 provides a detailed list of the phenotypes analyzed in the reviewed articles, along with the direction of comparison. Common phenotypes with evidence for causal association with increased serum urate included coronary heart disease, hypertension, heart failure, and myocardial infarction (Table 2). Of the 59 studies, 34 conducted bidirectional analyses, while 25 focused solely on urate as the exposure and 11 as the outcome (Figure 3). Conversely, 31 studies used urate as the outcome variable, of which 22 (71%) found causal associations with BMI, fasting insulin, HDL cholesterol, and triglycerides (Table 2).
For gout-related MR analyses, 46 studies were included. Of these, 29 used gout as the exposure variable (Figure 3), with only one reporting a causal relationship, which was with coronary heart disease (Table 2). Additionally, 39 studies investigated potential causes of gout (Figure 3), of which 14 reported causal associations, most commonly with tea intake, coffee intake, BMI, and high blood pressure.
The scores assigned to the 86 studies ranged from 0 to 19 (Supplementary Table S3, with a mean of 9.1 and median of 11, and were normally distributed (Shapiro–Wilk: 0.99, p = 0.92) (Figure 2). The following paragraphs describe the results of our scoring criteria in detail.
The first aspect evaluated in the study design was the rationale. We assessed the plausibility of the studied phenotypes by reviewing prior evidence of the associations. Strong observational evidence, such as findings from large observational studies or small clinical trials supporting the hypothesized association, earned 2 points. Mixed evidence or findings from small observational studies earned 1 point, while evidence based on fewer than five small studies received −1 point (Table 1). Overall, 55 phenotypes (64%) had a strong rationale, 29 (34%) showed mixed or weak evidence, and 2 (2%) were given the lowest score.
Another aspect evaluated in the study design was dataset quality. The most commonly used urate dataset was the GWAS study published of 110,347 individuals by Köttgen et al. [25] in 2013 (n = 44), followed by the GWAS dataset of 457,690 individuals published by Tin et al. [26] in 2019 (n = 20) (Table 3). Overall, 34% of the studies used outdated datasets, meaning that they used the Köttgen et al. dataset instead of the four-times-larger Tin et al. dataset published in 2019, when the Tin et al. dataset was available. The main ancestry in the two datasets was European, which was also the most studied ancestry among the MR studies. Table 3 summarizes the datasets and ancestries used in the articles included in our review.
We also considered whether the exposure and outcome datasets studied the same ancestry. Eight percent of the studies did not use datasets with matching ancestries, and 5% used a dataset with a mixed-ancestry dataset for comparison against another with a single ancestry. Additionally, we reviewed whether the datasets being compared had participants that overlapped between the exposure and outcome datasets, and we found that only 51% of the studies used independent datasets.
The statistical methods category contained the evaluation of MR assumptions. Of the studies reviewed, nine (17%) satisfied all our criteria addressing the three assumptions and received the highest scores in those criteria. Additionally, 66% of the studies satisfied the first assumption, also known as relevance, while 50% tested for confounders.
Regarding the power of the study, we assigned 2 points to studies that addressed it in the methods or the results section, or if they actually calculated it anywhere in the manuscript. A total of 59% fulfilled this criterion.
Another criterion in our scoring system focused on the interpretation of results. Our evaluation considered whether the conclusions presented in the studies were consistent after the p-value was corrected for multiple testing. Of the 86 studies, 56 (65%) reported a causal association. However, 12 studies (14%) drew incorrect conclusions about the causal association due to the lack of a necessary p-value Bonferroni correction. Also, we assessed whether the results were replicated in independent datasets and found that only 12 studies (14%) conducted a replication analysis.

Score Trends per Year and Place of Origin

An analysis of mean article scores by year of publication revealed a significant downward trend, indicating a decrease in average article quality over time (β = −0.29, p = 0.0009). Additionally, we compared the scores before and after the publication of the STROBE guidelines for MR [20] and found no difference in the scores (p = 0.58). When analyzing the score variability per year, it is of note that article quality is becoming more diverse. The highest scores showed a positive trend (β = 0.74, p < 0.05), indicating an increase over time, while the lowest scores declined at a faster rate (β = −1.25, p < 0.05), suggesting a widening gap in the quality of MR studies in urate and gout (Figure 4).
Articles were also categorized by the first author’s country of origin. Most were from China (70%), followed by the USA (6%) and the UK (5%). For comparative purposes, articles were grouped by continent. Asia had the lowest mean score (8.9 ± 0.5), while Oceania had the highest (10 ± 0.6). Notably, Asia exhibited the widest range of scores, reflecting both the lowest and highest scoring articles (IQR = 5), while Oceania had the least variability (IQR = 2) (Table 4). No significant differences in scores were observed between continents (p = 0.8). Over time, the geographic distribution of studies shifted, with considerably more MR studies recently originating from Asia and fewer from other continents (Figure 5); since the beginning of 2022, 96% of MR studies were from Asia.
Finally, we calculated the correlation between the most recent 2 yr journal impact factor and quality scores. There was no correlation (r = −0.066, p = 0.49).

4. Discussion

We developed a scoring system to provide a guide for the quality of two-sample MR studies. Supplementary Figure S1 presents a “how-to” implementation scheme. Possible classification of studies into low vs. medium vs. high quality is subjective, relative to the tranche of papers we evaluated, and would be expected to change over time. Nevertheless, we suggest that studies with a score greater than 12 can be considered high quality, and those with scores less than 7 can be considered low quality. While we developed the scoring system in urate and gout, we expect the scoring system to be readily transferable to other phenotypes. A limitation of our scoring system is the subjective way in which scores were developed. Score weighting (e.g., +3 for replication vs. −1 for no replication) was used to reward a factor that contributed to study robustness, and the size of the score reflected what we considered to be more important (e.g., replication being more important than, for example, whether or not exposure and outcome sample sets were independent).
Over the past 12 years, we found a steady increase in two-sample MR studies, rising from one in 2016 to 22 in 2022. However, the quality of these studies varies widely, with scores ranging from −2 to 19 and an average score of 9.1 (Table 4). This growth is largely driven by the expanding availability of GWAS and the relative ease of conducting MR analyses [24,27]. While this rise has certainly contributed to the understanding of causal relationships, it has also led to studies with poor rationale, skewed estimates, and weak instrumental variables, results lacking robustness and inaccurate conclusions [6,24].
To address inconsistencies in study quality, guidelines such as the 2021 STROBE-MR guideline have been introduced [20,24,28]. These guidelines focus on increasing the quality of design and presentation of MR studies, rather than a framework to increase the quality of an MR study per se. However, the overall quality of published articles has not improved, likely due to a lack of adherence, as only 11% of articles published after 2021 reported following the guideline. Studies that followed it demonstrated higher quality, with compliant articles averaging a score of 10.1, compared to the overall post-2021 average of 8.8.
The lowest-scoring articles in our analysis, each receiving a score of 0, revealed significant gaps in study design. Both studies lacked result replication, confounder testing, adequate p-value corrections, and an assessment of statistical power. None of the studies presented the specific SNPs used or their association values with the exposure. The first study, which focused on sepsis and gout/urate (Article 6 in Supplementary Table S1), used a dataset with European and Japanese ancestry for the exposure and a European ancestry dataset for the outcome. The second study, on sex hormones, breast cancer, and gout (Article 202 in Supplementary Table S1), used datasets with European ancestry for most phenotypes but included an East Asian ancestry dataset for the urate trait, resulting in mismatched datasets. Related to this, we note that the considerable majority of the published MR analyses included in this paper used datasets of European ancestry, which reflects the Eurocentric nature of published GWAS.
In contrast, the highest-scoring articles received scores of 19 and 17. Article 62 (Supplementary Table S1), which reported a causal relationship between urate and heart failure, achieved the highest score overall. This study satisfied all the core assumptions of MR and achieved the highest score in most criteria except for the presentation of the STROBE guidelines. Articles 21 and 91 (Supplementary Table S1) tied for the second-highest score. Article 21 found that metformin had a preventive effect on high urate levels but not on gout, while Article 91 identified hyperinsulinemia as a cause of elevated urate levels. One of the weaknesses we identified in Article 21 was that it was unidirectional. Also, they failed to use the latest dataset for gout and urate and did not present adherence to the STROBE guideline. Similarly, Article 91 fulfilled most criteria but lacked a mediator analysis and also did not include the STROBE guidelines. This last study was published only 1 month after the publication of the guideline, which may explain the omission.
Ensuring that MR assumptions are met is essential for valid results. In our review, the relevance assumption was met by 96% of studies. However, only 50% addressed the independence assumption by explicitly discussing confounders. The third assumption requires that the genetic instrumental variables influence the outcome solely through exposure. The exclusion restriction assumption may not be met in cases where exposures are also affected by environmental factors (e.g., education, physical activity, vitamin D levels). Eight studies (9%) violated this assumption due to implausible exposures (see Supplementary Table S3).
Additionally, a common challenge in two-sample MR studies is overlapping datasets, which cause inflation of effect estimates [29,30]. This issue is particularly relevant for non-European ancestry studies, where limited large datasets often necessitate overlap, whereas European ancestry studies benefit from broader dataset availability. Most studies in our analysis relied on datasets from large GWAS consortia, which enhances study power but can lead to participants overlapping between exposure and outcome datasets. Among the 86 studies reviewed, only 52% of the studies used fully independent datasets.
Our analysis also demonstrates how the 2021 STROBE-MR guideline [20] and our scoring system complement one another while serving distinct purposes. The STROBE-MR guideline is a comprehensive checklist to guide authors during study preparation, which attempts to standardize all aspects of the manuscript, including a statement of objectives, participant eligibility criteria, and the software used for the analysis. In contrast, our scoring system aims to quantitatively evaluate the quality of MR studies after their completion, focusing on specific key aspects like study design, SNP selection, and result interpretation (Table 1). For example, while STROBE included descriptions of MR assumptions and contemplated many types of sensitivity analyses, our scoring system approaches the assumptions through specific criteria, such as p-value thresholds for SNP selection and explicit confounder testing. Similarly, while STROBE includes a category for sensitivity analyses that include comparisons of effect estimates from different methods, independent replication, bias analyses, validation of instruments, or simulations [20], we focused only on replication for simplicity and because replication is the superior metric. Additionally, both systems assess generalizability differently. Our scoring system focuses on matching dataset ancestries and ensuring dataset independence, while STROBE approaches generalizability in a less specific manner and includes discussions on biological mechanisms. These two frameworks serve distinct purposes at different stages of MR studies. The STROBE-MR guideline is intended for researchers preparing manuscripts, whereas our scoring system is a practical tool for reviewers or readers who need to assess the quality and validity of completed MR studies.
Others have also identified the impact of the current load of two-sample MR studies on the medical literature. Stender et al. [31], in a paper entitled “Reclaiming mendelian randomization from the deluge of papers and misleading findings”, pointed out that the public availability of GWAS summary statistics has prompted “an explosion of low-quality two-sample mendelian randomization studies”. They state that “These studies add minimal—if any—value and overwhelm reviewers and journals.”. Stender et al. also advise editors to reject without review two-sample MR papers that only report the MR findings per se with no additional supporting evidence. We fully support these views.
Table 2 summarizes the most common phenotypes linked to urate and gout identified in our review. BMI was consistently identified as a causal factor for elevated serum urate in populations of European ancestry, with supporting studies averaging a score of 9. These findings align with those from older MR studies using different MR techniques that support a causal relationship between BMI and urate, such as the ones performed by Lyngdoh et al. [32], Palmer et al. [33], and Oikonen et al. [34]. Further evidence comes from a randomized controlled trial involving 235 patients, which found that weight loss led to decreased urate levels, regardless of diet type [35]. This example demonstrates that MR results in urate and gout can be applicable and informative in clinical settings. However, it is important to note that the effect of BMI on urate levels is relatively small, and, clinically, the use of urate-lowering therapy to manage gout is considerably more effective.
Other phenotypes with more than one study supporting a causal association included gout as a potential cause of coronary heart disease (CHD). However, the number of two-sample Mendelian randomization (MR) studies was limited (n = 2), and their mean quality score was below the overall average (Table 2). This association may be mediated by hyperuricemia, as urate has been postulated to act as a pro-oxidant under hyperuricemic conditions and in the presence of chronic diseases such as metabolic syndrome, chronic heart failure, and chronic kidney disease. In these settings, urate contributes to endothelial dysfunction by reducing nitric oxide production and activating the renin–angiotensin system, processes that promote atherosclerosis [36]. However, in the clinical setting, the role of urate in cardiovascular disease is not yet fully understood. Results from the CARE trial (2018), which compared febuxostat and allopurinol, showed that, even though febuxostat had a larger percentage of patients with urate levels under 5.0 mg/dL, there was no difference in cardiovascular mortality [37]. Furthermore, a recent one-sample MR study found no significant effect of lowering urate levels through xanthine dehydrogenase-related SNPs on the risk of ischemic heart disease [38]. Similar to the current literature, our review yielded mixed results regarding the role of urate in CHD. Three studies with a mean score of 8 supported this association, while five studies with a mean score of 2.3 did not (Table 2).
Three studies investigated a causal relationship between blood lipids and serum urate levels, consistently identifying high-density lipoprotein cholesterol as a preventive factor against elevated serum urate and triglycerides as a causal factor for increased serum urate (Table 2). However, older studies, including one-sample MR analyses, have shown mixed findings, with some studies favoring the direction of blood lipids influencing serum urate, while others support the reverse relationship [39,40,41]. This variation across studies raises concerns about horizontal pleiotropy and suggests that the observed associations may be influenced by confounding factors or shared pathways.
Similarly, the effect of urate-lowering drugs on blood pressure has also yielded mixed results [42,43,44]. MR studies, along with our analysis of two studies (Table 2), identified blood pressure as a potential cause of gout. It would be valuable to investigate whether managing blood pressure in patients with risk factors for gout or administering antihypertensive medications to those with hyperuricemia could reduce the risk of developing gout.
While MR has advanced our understanding of causal relationships, many studies fall short in meeting core assumptions, using independent datasets, or conducting replication analyses. BMI stood out as a consistent causal factor for hyperuricemia, supported by both MR and clinical evidence. Our scoring system provides a practical tool for evaluating MR study quality, helping researchers and clinicians identify strengths and weaknesses in design, methodology, and interpretation.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/gucdd3020008/s1, Figure S1: Quick Guide. Is This Mendelian Randomization Study Reliable?; Table S1: The 86 studies assessed; Table S2: Key details of the 86 papers assessed; Table S3: Scores of the 86 papers assessed.

Author Contributions

F.R.-C. and T.R.M. conceptualized the study and reviewed and edited. In addition, F.R.-C. carried out the formal analysis and prepared the original draft. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Muntiu, M.; Joosten, L.A.; Crişan, T.O. Gout basic research: 2023 in review. Gout Urate Cryst. Depos. Dis. 2024, 2, 220–235. [Google Scholar] [CrossRef]
  2. Drivelegka, P.; Jacobsson, L.T.; Dehlin, M. Gout and gout-related comorbidities: Insight and limitations from population-based registers in Sweden. Gout Urate Cryst. Depos. Dis. 2024, 2, 144–156. [Google Scholar] [CrossRef]
  3. Andrés, M. Gout and cardiovascular disease: Mechanisms, risk estimations, and the impact of therapies. Gout Urate Cryst. Depos. Dis. 2023, 1, 152–166. [Google Scholar] [CrossRef]
  4. Robinson, P.C.; Horsburgh, S. Gout: Joints and beyond, epidemiology, clinical features, treatment and co-morbidities. Maturitas 2014, 78, 245–251. [Google Scholar] [CrossRef]
  5. Robinson, P.C.; Choi, H.K.; Do, R.; Merriman, T.R. Insight into rheumatological cause and effect through the use of Mendelian randomization. Nat. Rev. Rheumatol. 2016, 12, 486–496. [Google Scholar] [CrossRef]
  6. Richmond, R.C.; Davey Smith, G. Mendelian randomization: Concepts and scope. Cold Spring Harb. Perspect. Med. 2022, 12, a040501. [Google Scholar] [CrossRef]
  7. Voight, B.F.; Peloso, G.M.; Orho-Melander, M.; Frikke-Schmidt, R.; Barbalic, M.; Jensen, M.K.; Hindy, G.; Hólm, H.; Ding, D.L.; Johnson, T.; et al. Plasma HDL cholesterol and risk of myocardial infarction: A mendelian randomisation study. Lancet 2012, 380, 572–580. [Google Scholar] [CrossRef]
  8. Elliott, P.; Chambers, J.C.; Zhang, W.; Clarke, R.; Hopewell, J.C.; Peden, J.F.; Erdmann, J.; Braund, P.; Engert, J.C.; Bennett, D.; et al. Genetic loci associated with C-reactive protein levels and risk of coronary heart disease. JAMA 2009, 302, 37–48. [Google Scholar] [CrossRef]
  9. Holmes, M.V.; Asselbergs, F.W.; Palmer, T.M.; Drenos, F.; Lanktree, M.B.; Nelson, C.P.; Dale, C.E.; Padmanabhan, S.; Finan, C.; Swerdlow, D.I.; et al. Mendelian randomization of blood lipids for coronary heart disease. Eur. Heart J. 2015, 36, 539–550. [Google Scholar] [CrossRef]
  10. Jordan, D.M.; Choi, H.K.; Verbanck, M.; Topless, R.; Won, H.H.; Nadkarni, G.; Merriman, T.R.; Do, R. No causal effects of serum urate levels on the risk of chronic kidney disease: A Mendelian randomization study. PLoS Med. 2019, 16, e1002725. [Google Scholar] [CrossRef]
  11. Hughes, K.; Flynn, T.; de Zoysa, J.; Dalbeth, N.; Merriman, T.R. Mendelian randomization analysis associates increased serum urate, due to genetic variation in uric acid transporters, with improved renal function. Kidney Int. 2014, 85, 344–351. [Google Scholar] [CrossRef] [PubMed]
  12. Badve, S.V.; Pascoe, E.M.; Tiku, A.; Boudville, N.; Brown, F.G.; Cass, A.; Clarke, P.; Dalbeth, N.; Day, R.O.; de Zoysa, J.R.; et al. Effects of allopurinol on the progression of chronic kidney disease. N. Engl. J. Med. 2020, 382, 2504–2513. [Google Scholar] [CrossRef] [PubMed]
  13. Doria, A.; Galecki, A.T.; Spino, C.; Pop-Busui, R.; Cherney, D.Z.; Lingvay, I.; Parsa, A.; Rossing, P.; Sigal, R.J.; Afkarian, M.; et al. Serum urate lowering with allopurinol and kidney function in type 1 diabetes. N. Engl. J. Med. 2020, 382, 2493–2503. [Google Scholar] [CrossRef]
  14. McCormick, N.; O’Connor, M.J.; Yokose, C.; Merriman, T.R.; Mount, D.B.; Leong, A.; Choi, H.K. Assessing the causal relationships between insulin resistance and hyperuricemia and gout using bidirectional Mendelian randomization. Arthritis Rheumatol. 2021, 73, 2096–2104. [Google Scholar] [CrossRef]
  15. Li, X.; Meng, X.; Timofeeva, M.; Tzoulaki, I.; Tsilidis, K.K.; Ioannidis, P.; Campbell, H.; Theodoratou, E. Serum uric acid levels and multiple health outcomes: Umbrella review of evidence from observational studies, randomised controlled trials, and Mendelian randomisation studies. BMJ 2017, 357, j2376. [Google Scholar] [CrossRef]
  16. Yavorska, O.O.; Burgess, S. MendelianRandomization: An R package for performing Mendelian randomization analyses using summarized data. Int. J. Epidemiol. 2017, 46, 1734–1739. [Google Scholar] [CrossRef]
  17. Sudlow, C.; Gallacher, J.; Allen, N.; Beral, V.; Burton, P.; Danesh, J.; Downey, D.; Elliott, P.; Green, J.; Landray, M.; et al. UK biobank: An open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015, 12, e1001779. [Google Scholar] [CrossRef]
  18. Kurki, M.I.; Karjalainen, J.; Palta, P.; Sipila, T.P.; Kristiansson, K.; Donner, K.M.; Reeve, M.P.; Laivuori, H.; Aavikko, M.; Kaunisto, M.A.; et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 2023, 613, 508–518. [Google Scholar] [CrossRef]
  19. Gaziano, J.M.; Concato, J.; Brophy, M.; Fiore, L.; Pyarajan, S.; Breeling, J.; Whitbourne, S.; Deen, J.; Shannon, C.; Humphries, D.; et al. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 2016, 70, 214–223. [Google Scholar] [CrossRef]
  20. Skrivankova, V.W.; Richmond, R.C.; Woolf, B.A.R.; Yarmolinsky, J.; Davies, N.M.; Swanson, S.A.; VanderWeele, T.J.; Higgins, J.P.T.; Timpson, N.J.; Dimou, N.; et al. Strengthening the reporting of observational studies in epidemiology using Mendelian randomization: The STROBE-MR statement. JAMA 2021, 326, 1614–1621. [Google Scholar] [CrossRef]
  21. Hemani, G.; Bowden, J.; Davey Smith, G. Evaluating the potential role of pleiotropy in Mendelian randomization studies. Hum. Mol. Genet. 2018, 27, R195–R208. [Google Scholar] [CrossRef] [PubMed]
  22. Bowden, J.; Davey Smith, G.; Burgess, S. Mendelian randomization with invalid instruments: Effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 2015, 44, 512–525. [Google Scholar] [CrossRef] [PubMed]
  23. Burgess, S.; Thompson, S.G. Use of allele scores as instrumental variables for Mendelian randomization. Int. J. Epidemiol. 2013, 42, 1134–1144. [Google Scholar] [CrossRef]
  24. Hartwig, F.P.; Davies, N.M.; Hemani, G.; Davey Smith, G. Two-sample Mendelian randomization: Avoiding the downsides of a powerful, widely applicable but potentially fallible technique. Int. J. Epidemiol. 2016, 45, 1717–1726. [Google Scholar] [CrossRef]
  25. Köttgen, A.; Albrecht, E.; Teumer, A.; Vitart, V.; Krumsiek, J.; Hundertmark, C.; Pistis, G.; Ruggerio, D.; O’Seaghdha, C.M.; Haller, T.; et al. Genome-wide association analyses identify 18 new loci associated with serum urate concentrations. Nat. Genet. 2013, 45, 145–154. [Google Scholar] [CrossRef]
  26. Tin, A.; Marten, J.; Halperin Kuhns, V.L.; Li, Y.; Wüttke, M.; Kirsten, H.; Sieber, K.B.; Qiu, C.; Gorski, M.; Yu, Z.; et al. Target genes, variants, tissues and transcriptional pathways influencing human serum urate levels. Nat. Genet. 2019, 51, 1459–1474. [Google Scholar] [CrossRef]
  27. Burgess, S.; Woolf, B.; Mason, A.M.; Ala-Korpela, M.; Gill, D. Addressing the credibility crisis in Mendelian randomization. BMC Med. 2024, 22, 374. [Google Scholar] [CrossRef]
  28. Burgess, S.; Davey Smith, G.; Davies, N.M.; Dudbridge, F.; Gill, D.; Glymour, M.M.; Hartwig, F.; Kutalik, Z.; Holmes, M.V.; Minelli, C.; et al. Guidelines for performing Mendelian randomization investigations: Update for summer 2023. Wellcome Open Res. 2019, 4, 186. [Google Scholar] [CrossRef]
  29. Burgess, S.; Davies, N.M.; Thompson, S.G. Bias due to participant overlap in two-sample Mendelian randomization. Genet. Epidemiol. 2016, 40, 597–608. [Google Scholar] [CrossRef]
  30. Jiang, T.; Gill, D.; Butterworth, A.S.; Burgess, S. An empirical investigation into the impact of winner’s curse on estimates from Mendelian randomization. Int. J. Epidemiol. 2023, 52, 1209–1219. [Google Scholar] [CrossRef]
  31. Stender, S.; Gellert-Kristensen, H.; Davey Smith, G. Reclaiming mendelian randomization from the deluge of papers and misleading findings. Lipids Health Dis. 2024, 23, 286. [Google Scholar] [CrossRef] [PubMed]
  32. Lyngdoh, T.; Vuistiner, P.; Marques-Vidal, P.; Rousson, V.; Waeber, G.; Vollenweider, P.; Bochud, M. Serum uric acid and adiposity: Deciphering causality using a bidirectional Mendelian randomization approach. PLoS ONE 2012, 7, e39321. [Google Scholar] [CrossRef] [PubMed]
  33. Palmer, T.M.; Nordestgaard, B.G.; Benn, M.; Tybjaerg-Hansen, A.; Davey Smith, G.; Lawlor, D.A.; Timpson, N.J. Association of plasma uric acid with ischaemic heart disease and blood pressure: Mendelian randomisation analysis of two large cohorts. BMJ 2013, 347, f4262. [Google Scholar] [CrossRef] [PubMed]
  34. Oikonen, M.; Wendelin-Saarenhovi, M.; Lyytikainen, L.P.; Siitonen, N.; Loo, B.M.; Jula, A.; Seppälä, I.; Saarikoski, L.; Lehtimäki, T.; Hutri-Kähönen, N.; et al. Associations between serum uric acid and markers of subclinical atherosclerosis in young adults. The cardiovascular risk in Young Finns study. Atherosclerosis 2012, 223, 497–503. [Google Scholar] [CrossRef]
  35. Yokose, C.; McCormick, N.; Rai, S.K.; Lu, N.; Curhan, G.; Schwarzfuchs, D.; Shai, I.; Choi, H.K. Effects of low-fat, Mediterranean, or low-carbohydrate weight loss diets on serum urate and cardiometabolic risk factors: A secondary analysis of the Dietary Intervention Randomized Controlled Trial (DIRECT). Diabetes Care 2020, 43, 2812–2820. [Google Scholar] [CrossRef]
  36. Neogi, T.; George, J.; Rekhraj, S.; Struthers, A.D.; Choi, H.; Terkeltaub, R.A. Are either or both hyperuricemia and xanthine oxidase directly toxic to the vasculature? A critical appraisal. Arthritis Rheumatol. 2012, 64, 327–338. [Google Scholar] [CrossRef]
  37. White, W.B.; Saag, K.G.; Becker, M.A.; Borer, J.S.; Gorelick, P.B.; Whelton, A.; Hunt, B.; Castillo, M.; Gunawardhana, L. Cardiovascular safety of febuxostat or allopurinol in patients with gout. N. Engl. J. Med. 2018, 378, 1200–1210. [Google Scholar] [CrossRef]
  38. Kim, J.; Lee, S.Y.; Lee, J.; Yoon, S.; Kim, E.G.; Lee, E.; Kim, N.; Lee, S.; Park, S.-I. Effects of uric acid on ischemic diseases, stratified by lipid levels: A drug-target, nonlinear Mendelian randomization study. Sci. Rep. 2024, 14, 1338. [Google Scholar] [CrossRef]
  39. Li, X.; Meng, X.; He, Y.; Spiliopoulou, A.; Timofeeva, M.; Wei, W.Q.; Gifford, A.; Yang, T.; Varley, T.; Tzoulaki, I.; et al. Genetically determined serum urate levels and cardiovascular and other diseases in UK Biobank cohort: A phenome-wide mendelian randomization study. PLoS Med. 2019, 16, e1002937. [Google Scholar] [CrossRef]
  40. Rasheed, H.; Hughes, K.; Flynn, T.J.; Merriman, T.R. Mendelian randomization provides no evidence for a causal role of serum urate in increasing serum triglyceride levels. Circ. Cardiovasc. Genet. 2014, 7, 830–837. [Google Scholar] [CrossRef]
  41. Li, X.; Meng, X.; Spiliopoulou, A.; Timofeeva, M.; Wei, W.Q.; Gifford, A.; Shen, X.; He, Y.; Varley, T.; McKeigue, P.; et al. MR-PheWAS: Exploring the causal effect of SUA level on multiple disease outcomes by using genetic instruments in UK Biobank. Ann. Rheum. Dis. 2018, 77, 1039–1047. [Google Scholar] [CrossRef] [PubMed]
  42. Gaffo, A.L.; Calhoun, D.A.; Rahn, E.J.; Oparil, S.; Li, P.; Dudenbostel, T.; Feig, D.I.; Redden, D.T.; Muntner, P.; Foster, P.J.; et al. Effect of serum urate lowering with allopurinol on blood pressure in young adults: A randomized, controlled, crossover trial. Arthritis Rheumatol. 2021, 73, 1514–1522. [Google Scholar] [CrossRef] [PubMed]
  43. Feig, D.I.; Soletsky, B.; Johnson, R.J. Effect of allopurinol on blood pressure of adolescents with newly diagnosed essential hypertension: A randomized trial. JAMA 2008, 300, 924–932. [Google Scholar] [CrossRef]
  44. Beattie, C.J.; Fulton, R.L.; Higgins, P.; Padmanabhan, S.; McCallum, L.; Walters, M.R.; Dominiczak, A.F.; Touyz, R.M.; Dawson, J. Allopurinol initiation and change in blood pressure in older adults with hypertension. Hypertension 2014, 64, 1102–1107. [Google Scholar] [CrossRef]
Figure 1. Flow diagram of study selection.
Figure 1. Flow diagram of study selection.
Gucdd 03 00008 g001
Figure 2. Histogram of scores.
Figure 2. Histogram of scores.
Gucdd 03 00008 g002
Figure 3. Number of studies using urate and gout as variables.
Figure 3. Number of studies using urate and gout as variables.
Gucdd 03 00008 g003
Figure 4. Trends in scores by year.
Figure 4. Trends in scores by year.
Gucdd 03 00008 g004
Figure 5. Continent of origin over time.
Figure 5. Continent of origin over time.
Gucdd 03 00008 g005
Table 1. Scoring system for two-sample MR studies.
Table 1. Scoring system for two-sample MR studies.
Study designRationale2: Strong observational evidence
1: Small sample studies or mixed evidence (some studies support the association, while others do not)
−1: Minimal information or unclear rationale
Comparison direction1: Bidirectional
0: Unidirectional
Datasets1: Uses the most recent and largest GWAS dataset
0: Does not use latest GWAS dataset
Ancestry comparison1: Comparison involves the same ethnicities
0: Ethnicity information is either absent in one or all datasets, or study compares a mixed ancestry database against a single ancestry without appropriate adjustments
−1: Comparisons between different ethnicities
Dataset independence1: Exposure and outcome datasets are independent
−1: Not independent
Replication3: Replication study included
−1: no replication
Statistical methodsSNP selection1: SNPs were associated with exposure at genome-wide significance (p < 5 × 10−8) or F-statistic > 10
and
1: SNPs were pruned for LD with R2 < 0.1.
Mediator analysis1 If a mediator variable analysis was conducted
Confounder analysis1 If testing for confounders was performed
Presented SNPs2: SNPs significantly associated with the exposure were clearly listed, including their effect alleles, effect sizes, and p-values.
1: SNPs associated with the exposure were listed but without complete information on effect alleles, effect sizes, and p-values.
−1: SNPs were not listed
p-value correction2: Applied
−1: When correction required was <10 tests but not applied
−3: When correction required was ≥10 tests but not applied
0: Not required
Was the study power considered?2: Yes
−1: No
Interpretation of results2: Results concluded appropriately according to statistical evidence
−2: Results not concluded appropriately according to statistical evidence
STROBE guidelines presented?1: Yes
0: No
Table 2. Most frequently studied phenotypes in gout and urate MR studies.
Table 2. Most frequently studied phenotypes in gout and urate MR studies.
ExposureOutcomeN ArticlesArticles That Found AssociationArticles That Found No Association
Mean ScoreArticlesMean ScoreArticles
Urate → trait
UrateCoronary heart disease 89.375, 176, 2002.312, 100, 129, 131, 136
UrateHypertension62.146, 7810.565, 97, 129, 131
UrateBMI4--1.95, 31, 60, 65
UrateHeart failure 49.162, 782.11, 26
UrateCKD 3--10.797, 41, 137
UrateGut microbiota4--6.252, 30, 43, 199
UrateMyocardial infarction3147510129, 131
UrateFasting insulin3--12.365, 91, 99
Gout → trait
GoutCoronary heart disease 2578, 200--
Trait → Urate
BMIUrate795, 15, 31, 60, 65, 93, 139--
CoffeeUrate479, 38873, 106
Gut MicrobiotaUrate4--6.252, 30, 43, 199
Fasting InsulinUrate312.365, 91, 99--
Waist/Hip ratioUrate39319139, 65
HDLcUrate311.365, 93, 102--
TGUrate311.365, 93, 102--
T2DMUrate29991165
Trait → Gout
Tea intakeGout41026, 215316, 211
BMIGout39.6731, 65, 93--
CoffeeGout25.573, 142--
Blood pressureGout21365, 198--
Gut microbiotaGout2--9.530, 43
Table 3. Datasets used by MR studies.
Table 3. Datasets used by MR studies.
DatasetAncestryYearUrate Sample SizeGout Sample Size
(Cases/Controls)
Freq (%)PMID
KöttgenEuropean2013110,3472115/67,25944 (51.16%)23263486
TinEuropean2019288,64913,179/750,63420 (23.26%)31578528
Japan BiobankEast Asian2019109,0293053/45546 (6.98%)32238385
UK BiobankEuropean NA6542/456,39112 (13.95%)
SakaueEuropean + East Asian2021343,836-2 (2.33%)34594039
FinnGenEuropean -3576/147,2218 (9.3%)
UK BiobankAfrican20216206-1 (1.16%)
Taiwan BiobankEast Asian20083483-1 (1.16%)18370851
NakatochiEast Asian2019121,745-1 (1.16%)30993211
KolzEuropean200928,141-1 (1.16%)19503597
HuffmanEuropean201542,569 2 (2.33%)25811787
WhiteEuropean2016166,486-2 (2.33%)26781229
Leon-MimilaHispanic20131073 adults, 1080 children-1 (1.16%)23950976
DönertaşEuropean2021-488,2951 (1.16%)33959723
ZhouEuropean + East Asian2022-30,549/1,039,2901 (1.16%)36777996
Table 4. Summary of mean scores and variability by continent.
Table 4. Summary of mean scores and variability by continent.
ContinentMeanIQR
Asia8.9 ± 0.55
Europe9.8 ± 0.74
North America9.5 ± 23
Oceania10 ± 0.62
Total9.1 ± 44
IQR = Interquartile range.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rosas-Chavez, F.; Merriman, T.R. Mendelian Randomization Studies: A Metric for Quality Evaluation. Gout Urate Cryst. Depos. Dis. 2025, 3, 8. https://doi.org/10.3390/gucdd3020008

AMA Style

Rosas-Chavez F, Merriman TR. Mendelian Randomization Studies: A Metric for Quality Evaluation. Gout, Urate, and Crystal Deposition Disease. 2025; 3(2):8. https://doi.org/10.3390/gucdd3020008

Chicago/Turabian Style

Rosas-Chavez, Fiorella, and Tony R. Merriman. 2025. "Mendelian Randomization Studies: A Metric for Quality Evaluation" Gout, Urate, and Crystal Deposition Disease 3, no. 2: 8. https://doi.org/10.3390/gucdd3020008

APA Style

Rosas-Chavez, F., & Merriman, T. R. (2025). Mendelian Randomization Studies: A Metric for Quality Evaluation. Gout, Urate, and Crystal Deposition Disease, 3(2), 8. https://doi.org/10.3390/gucdd3020008

Article Metrics

Back to TopTop