Next Article in Journal
Yeast-Produced Human Recombinant Lysosomal β-Hexosaminidase Efficiently Rescues GM2 Ganglioside Accumulation in Tay–Sachs Disease
Previous Article in Journal
The Accuracy of ChatGPT-4o in Interpreting Chest and Abdominal X-Ray Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

Methodologies for the Emulation of Biomarker-Guided Trials Using Observational Data: A Systematic Review

by
Faye D. Baldwin
1,*,
Rukun K. S. Khalaf
2,
Ruwanthi Kolamunnage-Dona
1 and
Andrea L. Jorgensen
1
1
Department of Health Data Science, University of Liverpool, Liverpool L69 3GL, UK
2
Department of Public Health, Policy and Systems, University of Liverpool, Liverpool L69 3GL, UK
*
Author to whom correspondence should be addressed.
J. Pers. Med. 2025, 15(5), 195; https://doi.org/10.3390/jpm15050195
Submission received: 25 March 2025 / Revised: 25 April 2025 / Accepted: 1 May 2025 / Published: 10 May 2025
(This article belongs to the Section Disease Biomarker)

Abstract

:
Background: Target trial emulation involves the application of design principles from randomised controlled trials (RCTs) to observational data, and is particularly useful in situations where an RCT would be unfeasible. Biomarker-guided trials, which incorporate biomarkers within their design to either guide treatment and/or determine eligibility, are often unfeasible in practice due to sample size requirements or ethical concerns. Here, we undertake a systematic review of methodologies used in target trial emulations, comparing treatment effectiveness, critically appraising them, and considering their applicability to the emulation of biomarker-guided trials. Methods: A comprehensive search strategy was developed to identify studies reporting on methods for target trial emulation comparing the effectiveness of treatments using observational data, and applied to the following bibliographic databases: PubMed, Scopus, Web of Science, and Ovid MEDLINE. A narrative description of methods identified in the review was undertaken alongside a critique of their relative strengths and limitations. Results: We identified a total of 59 papers: 47 emulating a target trial (‘application’ studies), and 12 detailing methods to emulate a target trial (‘methods’ studies). A total of 25 papers were identified as emulating a biomarker-guided trial (42%). While all papers reported methods to adjust for baseline confounding, 40% of application papers did not specify methods to adjust for time-varying confounding. Conclusions: This systematic review has identified a range of methods used to control for baseline, time-varying, and residual/unmeasured confounding within target trial emulation and provides a guide for researchers interested in emulation of biomarker-guided trials.

1. Introduction

While randomised controlled trials (RCTs) are considered the gold standard for identifying causal relationships, in some scenarios they may not be a feasible, ethical, or cost-effective option [1]. In circumstances where it is not possible to carry out an RCT, observational studies may be used to examine the effectiveness of an intervention. However, due to a lack of randomisation, observational studies are prone to confounding bias, making the ability to infer causal relationships from such research challenging [2]. Furthermore, immortal time bias, which results from the failure to align the start of follow-up with the time that eligibility criteria are met and treatment is assigned, is another common source of bias that affects the validity of observational research [3]. The target trial emulation framework has been proposed to address these challenges, and involves specifying the hypothetical trial that would be conducted to investigate treatment effectiveness (the “target trial”) and comparing this with the observational data proposed to emulate the target trial, allowing researchers to identify potential sources of bias at the design stage [1]. Many different methodologies aiming to control for different sources of bias within an emulated trial have been proposed, and it can be challenging for researchers to identify the most appropriate methods to use, particularly as there are no systematic reviews that critically review and compare them.
Biomarkers, characteristics that are measured and evaluated as an indicator of a biological process, a pathogenic process or a pharmacological response, are increasingly used within research aimed at tailoring treatment to an individual—an area of personalised medicine. This growing area of research often adopts biomarker-guided trials—trials that incorporate one or more biomarkers in their design to either guide treatment and/or determine eligibility—to demonstrate the utility of using the biomarker(s) to inform treatment [4]. However, many biomarker-guided trial designs are unfeasible in practice, for example where long-term outcomes are studied, or when the outcome and/or biomarker is particularly rare, thus requiring unachievable sample sizes. Biomarker-guided trials often have complex designs involving large numbers of treatment arms and subgroups, which can result in insufficient power to detect a clinically meaningful difference [5,6,7]. Challenges associated with conducting biomarker-guided trials, including the cost and complexity of biomarker analysis, difficulty in estimation of recruitment rates, and dropout of participants with advanced disease, have been reported within the literature [8]. Additionally, the incorporation of dynamic treatment strategies, where treatment decisions change over time based on an individual’s biomarker status or treatment response, can be difficult to implement within biomarker-guided trials due to logistical and analytical challenges. This is particularly the case when such strategies rely on biomarker thresholds that are not known a priori, either requiring estimation during the trial and/or the requirement of additional treatment arms, whereby participants are randomised to different biomarker thresholds or biomarker-guided treatment strategies. These complexities may also raise ethical concerns, especially when preliminary evidence suggests potential benefit or harm in a biomarker-defined subgroup but lacks definitive validation.
Echoing the reasons outlined above, emulation of a biomarker-guided target trial using observational data could be a viable alternative in situations where it is not feasible to run a biomarker-guided clinical trial. Emulation of biomarker-guided trials may be particularly suited to the study of dynamic treatment strategies that would be unethical to implement within a clinical trial, including the effect of initiating, stopping, or switching treatment based on biomarker response. Emulation of biomarker-guided trials that compare dynamic treatment regimens could help address clinical questions that are unfeasible or unethical to study in randomised trials. This approach is particularly applicable to critical care settings, where randomisation is not feasible or ethical, and rapid clinical decisions are required. For example, emulation of a biomarker-guided trial could be used to determine the best time to initiate antibiotic therapy based on procalcitonin-guided strategies in patients with sepsis, incorporating the study of rapid implementation of treatment alongside the minimisation of unnecessary antibiotic use [9].
The objective of this systematic review was to identify and evaluate the various methodologies that have been proposed and/or used in target trial emulation studies comparing treatment effectiveness, with a focus on methods to control for bias from baseline, and time-varying, unmeasured, and residual confounding, in addition to immortal time bias. Furthermore, we identified emulated target trials utilising biomarkers and considered the applicability of the methods used to control for bias within emulated target trials more generally to those featuring biomarkers.

2. Materials and Methods

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement and checklist were followed within this manuscript (Supplementary Table S2). A protocol was written before starting the review and registered on the Open Science Framework (OSF) after the review was completed on 12 March 2025 (registration number: 92vhq) [10].

2.1. Eligibility Criteria

We included studies published in English which either detail the emulation of a target trial of treatment effectiveness using observational data (e.g., prospective and retrospective cohort studies, case-control research studies, real-world data registries, or databases), classified as ‘application studies’, or studies which propose methodology to be used within target trial emulation using observational data, classified as ‘methods studies’. Editorials, letters, commentaries, reviews, conference abstracts, case reports, and papers that did not detail the methods used to emulate a target trial were excluded, as well as studies not using observational data to emulate a target trial, for example those which used data exclusively from previous randomised controlled trials. Studies not assessing treatment effectiveness were also excluded. Additionally, animal studies were excluded.

2.2. Search Strategy

Four bibliographic databases (PubMed, Scopus, Web of Science, and Ovid MEDLINE) were searched for articles published up to 17 March 2023, using predefined search terms relating to target trial emulation and observational data (Supplementary File S1).

2.3. Data Collection

Search results were entered into EndNote X9 reference management software for the removal of duplicates before being exported and screened in two stages using Rayyan [11]. Two reviewers (FDB and RKS) independently undertook screening of titles, abstracts and full-text articles against the inclusion criteria. In the event of uncertainty regarding eligibility, discrepancies were resolved through discussion between reviewers and referral to third and fourth reviewers (ALJ and RDK).
Due to time constraints, we randomly selected 50% of the application papers for inclusion in the review, along with all remaining methods papers. Application papers were chosen by inputting their record numbers into R version 4.3.1 and using the “sample()” function to select a random subset.

2.4. Data Extraction

A data extraction spreadsheet was created specifically to extract data from all eligible studies by two reviewers (FDB and RKS). Data was extracted regarding:
  • Basic information about the study;
  • Study design, including specifics of the trial design emulated;
  • Statistical analysis, including details of statistical tests applied;
  • Methods to control for confounding/biases, including specific details and rationale of methods to adjust for baseline, time-varying, unmeasured and residual confounding, alongside immortal time bias
  • Outcomes measured;
  • Use of biomarkers, including details of any biomarkers measured, and whether the study would be identified as a biomarker-guided trial, as defined above.

2.5. Data Analysis

A descriptive analysis of studies included in the review was undertaken, together with a narrative description of the methods detailed in the papers, in addition to a critique of their relative strengths and limitations.

3. Results

The flow of studies in this review is presented in Figure 1. A total of 557 records were retrieved from all databases. After removing duplicates, 302 titles and abstracts were screened, and 157 were excluded; reasons for exclusion included not emulating or proposing methods for trial emulation (n = 39), not using observational data to emulate trial (n = 24), not assessing treatment effectiveness (n = 30), wrong publication type (n = 59), or being a duplicate record (n = 5). The full text of 145 papers were retrieved, with a further 38 excluded based on full-text screening; reasons for exclusion included not emulating or proposing methods to emulate trial (n = 15), not using observational data to emulate trial (n = 7), not assessing treatment effectiveness (n = 9), wrong publication type (n = 6), or being a duplicate record (n = 1). Due to time constraints, we randomly selected 50% of the application papers to include in the review, using the R function “sample()” while including all remaining methods papers. This resulted in 47 application studies [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58] and 12 methods studies in the final review [59,60,61,62,63,64,65,66,67,68,69,70].

3.1. Observations from the Included Studies

3.1.1. Specification of the Target Trial Protocol

Of the application studies, 29 (62%) referred to a target trial protocol, included commonly in the methods or supplementary sections, of which 4 (9%) protocols were published in clinical trial registries or research repositories. Of the methods papers, seven (58%) papers also specified a target trial protocol, again either in the methods or supplementary material sections, when specifying the target trial protocol.

3.1.2. Specification of Causal Contrasts of Interest

Causal contrasts of interest are a key component of the target trial protocol, and are clearly defined comparisons made between different treatment strategies as they would ideally be implemented in a hypothetical RCT [1]. The most commonly assessed causal contrasts within RCTs involve the effect of being randomised to a treatment strategy, the intention-to-treat effect (ITT), and the effect of receiving the treatment strategy as specified in the protocol, the per-protocol effect, which excludes individuals who deviate from the protocol, drop out, or become non-adherent [1].
Of the application papers, the most assessed causal contrast of interest was a combination of the intention-to-treat and per-protocol effects (n = 14, 30%), followed by the per-protocol effect alone (n = 12, 26%) and the intention-to-treat (ITT) effect alone (n = 11, 23%). Other causal contrasts of interest included assessment of an as-treated approach (n = 1, 2%), the effect of the treatment a participant received, and assessment of an as-started approach (n = 2, 4%), the effect of the initially commenced treatment. Of the methods papers, the majority described assessment of the per-protocol effect (n = 5, 42%), followed by the intention-to-treat effect (n = 4, 33%). Several applications (n = 7, 15%) and methods papers (n = 2, 16%) did not specify the causal contrast of interest, an important component of the target trial protocol.

3.1.3. Specification of Follow-Up

While most application papers specified the length of follow-up (n = 45, 96%), a lower number specified the time zero of follow-up (n = 36, 77%). All methodological papers specified the length of follow-up (n = 12, 100%), and most methods papers specified the time zero of follow-up (n = 11, 92%). We classified a paper as specifying time zero if it was clear that the start of follow-up aligned with meeting the eligibility criteria and being assigned to a treatment strategy (i.e., we did not require specific reporting of ‘time zero’ within the text).

3.2. Methods to Control for Confounding

3.2.1. Baseline Confounding

Baseline confounding arises when one or more pre-intervention prognostic variables (a variable measured before starting the intervention of interest) are predictive of the intervention received at baseline and the start of follow-up [71]. Failure to adjust for baseline confounding can result in biased estimates of the association between exposure and outcome, and can result in invalid conclusions made about the identification of causal relationships. Baseline confounding cannot be directly solved by target trial emulation and requires the use of specific methods to appropriately adjust for it [72].
All application studies reported the use of at least one statistical method to control for baseline confounding, which was most achieved through the use of inverse probability weighting (IPW) within a marginal structural model (IPW-MSM, n = 19, 40%). Other methods to adjust for baseline confounding included standard adjustment within a regression model (n = 18, 38%), propensity score matching (n = 10, 21%), the clone-censor-weight method (n = 9, 19%), inverse probability of treatment weighting (IPTW, n = 7, 15%), and the parametric G-formula (n = 4, 9%). Most application studies used more than one method to control for baseline confounding (n = 22, 51%).
Similarly, all methods studies presented the use of at least one method to control for baseline confounding. This was also mostly achieved by use of IPW within a marginal structural model (n = 4, 33%) or the clone-censor-weight method (n = 4, 33%), followed by the parametric G-formula (n = 3, 25%). Half of methodological studies utilised more than one method to control for baseline confounding (n = 6, 50%). Details of the main methodologies applied to control for baseline confounding in both application and methods studies, alongside their advantages and disadvantages, can be found in Table 1.

3.2.2. Time-Varying Confounding

Time-varying confounding occurs when a confounder changes over time and is associated with both the exposure (e.g., treatment) and the outcome. Examples of time-varying confounders include biomarker measurements, such as BMI and blood pressure. Time-varying confounding often arises alongside time-varying exposures, such as treatment dose [3]. In personalised medicine, time-varying confounding by prior exposure is a common challenge due to the dynamic nature of biomarker measurements. Prior biomarker levels often influence treatment decisions at baseline, and biomarker levels post-baseline can influence the nature of the exposure, such as treatment, and its relationship with the outcome [72]. Time-varying confounding cannot be solved by target trial emulation alone or conventional methods to control for confounding, and requires the use of more statistically advanced, generalised ‘G-methods’, which appropriately model the effect of time-varying confounders on outcome [3].
Most application studies reported using at least one method to control for time-varying confounding (n = 28, 60%). The most commonly used method was inverse probability weighting (IPW) within a marginal structural model (n = 19, 40%), followed by the clone-censor-weight method (n = 8, 17%), implementation of a sequential trial design (n = 8, 17%), inverse probability of censoring weighting (IPCW, n = 4, 9%) and the parametric G-formula (n = 4, 9%). Several studies used more than one method to control for time-varying confounding (n = 16, 34%). Over a third of studies (n = 18, 38%) did not report a method to control for time-varying confounding.
Similarly, the majority of methods studies presented the use of at least one method to control for time-varying confounding (n = 11, 92%). Methods included the use of IPW as part of an MSM (n = 4, 33%), the clone-censor-weight method (n = 4, 33%), IPCW (n = 3, 25%), and the parametric G-formula (n = 3, 25%). Most methods studies utilised more than one method to adjust for time-varying confounding (n = 7, 58%). Details of the main methodologies applied to control for time-varying confounding in both application and methods studies, alongside their advantages and disadvantages, can be found in Table 1.

3.2.3. Residual and Unmeasured Confounding

Unmeasured confounding occurs when a variable related to both the exposure and the outcome is not measured or adjusted for in an analysis. Residual confounding refers to bias due to measurement error following adjustment. Neither type of confounding can be directly resolved by target trial emulation and therefore requires the adoption of methods to appropriately adjust for them.
Most application studies reported one or more methods to control for residual and/or unmeasured confounding (n = 42, 89%), the most common method being use of sensitivity analyses (n = 36, 77%), followed by use of negative outcome controls (n = 10), creation of directed acyclic graphs (DAGs) to identify unmeasured confounders (n = 7, 15%), and calculation of the E-value for unmeasured confounding (n = 6, 13%). Other methods included the use of tracer outcomes (n = 2, 4%), instrumental variable analysis (n = 1, 2%), and use of positive outcome controls (n = 1, 2%). A minority of application studies did not specify a method to control for residual and/or unmeasured confounding (n = 5, 11%).
Similarly, the majority of methods studies presented use of at least one method to control for residual and/or unmeasured confounding (n = 7, 58%), using sensitivity analyses (n = 5, 42%), DAGs (n = 5, 42%), or negative outcome controls (n = 1, 8%) to minimize risk of residual and/or unmeasured confounding. Likewise, a minority of studies did not report or describe the use of a method to control for residual and/or unmeasured confounding (n = 5, 42%).

3.2.4. Use of Biomarker-Guided Trial Designs

Of the application studies, 31 reported measurements of a biomarker in their target trial (n = 31, 66%). Of these, 20 incorporated one or more biomarkers in their design (n = 20, 43%). Of the papers that incorporated biomarkers in their design, 7 (35%) used a biomarker to determine eligibility, 2 (10%) used a biomarker to guide treatment, and 11 (55%) used a biomarker both to determine trial eligibility and to guide treatment.
Of the methods studies, 6 reported and/or described measurements of a biomarker in a target trial (n = 6, 50%). Of these, five papers (83%) incorporated one or more biomarkers in their design. Of the papers that incorporated biomarkers in their design, two (40%) used a biomarker to determine eligibility, and three (60%) used a biomarker both to determine trial eligibility and to guide treatment. Characteristics of studies identified as biomarker-guided trials can be found in Supplementary Table S1.

4. Discussion

4.1. Overview

This systematic review identified, summarised, and critically appraised the methods used and/or described to emulate a target trial comparing the effectiveness of treatments using observational data. A total of 59 studies were included, of which 47 applied methods to emulate a target trial (‘application studies’) and 12 described and/or utilised methods to emulate a target trial (‘methods studies’). A range of statistical methods used to account for baseline and time-varying confounding, as well as immortal time bias within the included studies, were identified and are summarised in Table 1.
While all application studies emulated a target trial, over a third (38%) did not specify the target trial protocol, the main component of the target trial framework. Failure to specify the target trial protocol raises questions regarding the validity of the target trial emulation and what an ideal hypothetical trial would look like if conducted in a real-life scenario. Furthermore, a failure to specify the target trial protocol means that key components of the target trial framework, including specification of the time zero of follow-up and causal contrasts of interest, are not specified. Without a clear definition of the start of follow-up, it is challenging to determine whether treatment assignment and the initiation of follow-up align as they would in a true randomised trial [2,72]. Furthermore, failure to align treatment assignment with the start of follow-up can result in immortal time bias, whereby participants assigned to the treatment group have a period of follow-up during which they cannot experience the event of interest, where they are deemed ‘immortal’, which can result in biased treatment effects [2,72].
Some studies also failed to specify the causal contrasts of interest, such as whether the target trial estimates the intention-to-treat effect (the effect of treatment assignment regardless of adherence) or the per-protocol effect (the effect of adhering to the assigned treatment strategy). A failure to specify these key components of the target trial protocol reduces the validity of a target trial, as it raises the question of whether the trial is truly being emulated, and whether results are different to those that would be achieved in a standard observational study, given that the target trial protocol is not fully specified. Furthermore, it casts doubt on whether the results of the emulated trial are comparable to those from randomised trials, which would specify their trial protocols in detail.

4.2. Applicability of Methods in a Biomarker-Guided Target Trial Setting

Target trial emulation offers a unique opportunity to compare and validate the clinical utility of new and existing biomarkers for guiding treatment decisions, particularly in situations where a randomised trial is unethical or unfeasible. Many of the studies which included biomarker measurements do not explicitly mention biomarker-guided trials or personalised medicine, despite fitting into these categories. While many real-life biomarker-guided randomised trials have focused on time-fixed genetic biomarkers to guide treatment decisions, many of the identified biomarker-guided target trials have utilised time-varying, non-genetic biomarkers, such as CD4 cell count, estimated glomerular filtration rate (eGFR), and low-density lipoprotein cholesterol (LDL-C) [8,20,21,26,27,30,49]. This may be due to the fact that these measurements are routinely collected in electronic health records (EHRs), often used for target trial emulation, whereas genetic biomarkers require genotyping, which is not as commonly conducted within healthcare settings, and are usually unavailable via EHRs.
Of note, many of the biomarker-guided target trials identified compared the effect of dynamic treatment strategies, where treatment initiation, discontinuation, and switching are dependent on biomarker thresholds. For example, Cain et al. compared the effect of dynamic regimes in the context of initiation of combined antiretroviral therapy (cART) dependent on CD4 cell count, specified as follows: ‘Initiate treatment within m months after the recorded CD4 cell count first drops below x cells/mm3’, where x takes values from 200 to 500 in increments of 10, and m takes values of 0 or 3 [20]. Comparing these different strategies would be challenging in a randomised trial, but feasible within an emulated target trial scenario. Furthermore, target trial emulation could be useful when there is uncertainty surrounding the most optimal biomarker threshold to use within a trial, and allows for the comparison of several biomarker thresholds, which would be difficult to do in a randomised trial.
Several studies used biomarker thresholds to determine time of trial entry and to guide treatment decisions, such as McGrath et al., who required a platelet count of ≤30 × 109/L after treatment initiation to enter the trial, and Fu et al., who compared strategies of ‘stopping RASi within 6 months and remaining off treatment after an eGFR decrease <30 mL/min per 1.73 m2’ versus ‘continuing RASi for the entire follow-up’ [30,43]. Again, these strategies are challenging to implement in a randomised trial and within observational studies due to ethical concerns related to early treatment cessation and switching. However, applying the target trial framework to study biomarker-guided strategies helps mitigate design-related biases—such as immortal time bias and selection bias—by aligning treatment assignment with the start of follow-up [1]. Given that a recent review found 57% of observational studies suffer from immortal time bias, using the target trial framework to emulate biomarker-guided trials, rather than relying on standard observational analyses, could improve the validity of results by more closely approximating a randomised trial using observational data [72,78].
A common theme amongst biomarker-guided target trials, specifically studies that used a biomarker to guide treatment, was the use of methods to control for time-varying confounding, particularly the use of inverse probability weighting (IPW), either within a clone-censor-weight (CCW) design or as part of a marginal structural model (IPW-MSM). Biomarkers are often time-varying in nature and share complex relationships with other confounding variables, alongside past exposure to treatment. Use of standard statistical methods, such as regression, and even more advanced methods, such as random-effects models, have been shown to be biased in the presence of time-varying confounding [3]. To accurately adjust for the time-varying effect of the biomarker and prevent blocking the effect of past exposure to treatment on the biomarker, the use of causal ‘G-methods’ is recommended [3]. G-methods include inverse probability weighting, parametric and non-parametric G-computation (e.g., parametric G-formula), and G-estimation [3,84].
Alongside their ability to adjust for baseline and time-varying confounding, G-methods can also be used to compare counterfactual outcomes, allowing the emulation of ‘what would have happened’ situations depending on treatment strategies [3]. G-methods can be used to compare counterfactual outcomes in specific risk groups, such as individuals with a genetic predisposition to a disease, by predicting what would happen under different treatment or exposure scenarios. For example, the scenarios ‘What if all individuals with a genetic mutation experienced poor treatment response?’ and ‘Would implementation of a genotype-informed treatment strategy versus standard care reduce the risk of adverse events?’ could be implemented using G-methods. As such, the use of G-methods within an emulated biomarker-guided trial could be beneficial for researchers aiming to evaluate personalised treatment strategies.

5. Recommendations and Future Directions

Although most application studies specified a target trial protocol, 38% did not. Without explicitly defining the target trial and detailing how observational data is used for emulation, studies may be more susceptible to confounding. Full specification of the target trial protocol within a paper’s methods or supplementary section is recommended to enhance replicability. Alternatively, researchers could register the target trial protocol on a data sharing repository or a trial registration site. Within the target trial protocol, we also recommend specification of the causal contrasts of interest, such as whether intention-to-treat (ITT) and/or per-protocol (PP) effects were estimated. As the goal of target trial emulation is to emulate the ideal, hypothetical RCT, specifying causal contrasts of interest enhances both the replicability and validity of the emulated trial. A reporting guideline for researchers emulating a target trial (TARGET) is in development, which seeks to improve and standardise the approaches adopted by researchers in the field [86]. We anticipate that adherence to the TARGET guideline will improve the reporting and replicability of target trial emulation studies.
A common limitation within application and methods studies was the presence of residual and unmeasured confounding. While many studies aimed to investigate the impact of residual and unmeasured confounding using sensitivity analyses, we recommend that researchers identify potential sources of confounding in the planning stages of target trial emulation through use of directed acyclic graphs (DAGs). DAGs, also known as causal graphs, present all assumed potential causal relationships between potential confounders, outcome, and the exposure/treatment, using unidirectional arrows [60,75]. DAGs can include measured and unmeasured confounders, alongside causes and effects of exposures and outcomes [60]. We particularly recommend the use of DAGs to illustrate potential confounders that change over time, such as biomarkers, and how these are affected by prior treatment, in addition to other confounders, and their effect on the outcome. An example of a simple DAG is shown in Figure 2. The DAG highlights the time-varying nature of statin use, measured both at baseline (T0) and follow-up (T1), and incorporates a time-varying biomarker, LDL cholesterol. The graph shows how baseline LDL cholesterol (LDL Cholesterol T0) influences statin use at baseline (Statin Use T0) and that follow-up LDL cholesterol (LDL Cholesterol T1) can be affected by prior statin use (Statin Use T0) via LDL cholesterol measurements at baseline. Furthermore, the DAG depicts the clinical feedback loop where LDL Cholesterol T1 directly influences Statin Use T1, representing a clinician’s decision to adjust statin treatment based on follow-up cholesterol levels. Additionally, the DAG accounts for smoking status as a time-varying confounder, measured at both baseline (Smoking Status T0) and follow-up (Smoking Status T1). Age is included as a covariate influencing baseline and follow-up LDL cholesterol, statin use, smoking status, and the development of cardiovascular disease, and sex is included as a covariate to account for potential sex differences in statin prescribing and cardiovascular disease risk. While this example DAG incorporates some key confounders, it does not represent unmeasured confounders (such as diet and physical activity) that could influence the relationships depicted.
We also recommend that researchers implement methods to identify residual and unmeasured confounding within their analysis. This could include investigation of negative controls (exposure controls and outcome controls), which may share the same sources of bias present in the original association between exposure and outcome, yet no causal effect is expected [1,87]. For example, in a target trial investigating the effect of antidepressants on severity of depression, a negative exposure control could include an alternative treatment which is not prescribed for depression, such as antihistamines, and a negative outcome control could include a health condition unrelated to depression, such as a broken bone [87]. In this example, the negative outcome control and exposure are not expected to influence depression severity, but may share common confounders with the exposure and outcome, such as age, gender or physical activity [87]. Alternatively, the E-value can be calculated to assess the level of unmeasured confounding within target trial emulation. The E-value assesses the minimal strength an unmeasured confounder would need to explain away an association, conditional on measured covariates [88].
We strongly recommend the use of advanced methods to account for time-varying confounding, particularly if a biomarker has been identified as a potential confounder, the biomarker is used as a measure of treatment response, and if aiming to emulate a biomarker-guided trial. If researchers are unsure of how to accurately adjust for time-varying confounders, several papers in the literature provide step-by-step instructions on methods for doing so, including clone-censor-weighting, sequential trial designs, marginal structural models, and the parametric G-formula [62,67,79,81,84,89]. Additionally, statistical packages designed for use in target trial emulation have been developed, including the R packages TrialEmulation, which combines data preparation for the emulation of sequential trials and calculation of inverse probability weights, and gfoRmula, which can be used to estimate the effects of sustained treatment strategies over time using the parametric G-formula [80,90].
While target trial emulation is a relatively new methodological framework, there are potential uses of target trial emulation that have not been identified in this systematic review. For example, future emulated trials could involve the use of other emerging methodologies such as machine learning, which could be used to predict treatment response dependent on biomarker information, such as physiological biomarkers (e.g., CD4 cell count, eGFR). Alternatively, use of artificial intelligence (AI) methods, such as AlphaFold3, a neural-network model used to predict the structure of biological molecules and their interactions, could be utilized alongside biomarker-guided trial emulation to better understand how treatments affect specific protein targets, or to help define biomarker subtypes based on protein structure and function (e.g., by identifying individuals with clinically relevant mutations) [91,92]. Emulated biomarker-guided trials could be used to assess the clinical utility of AI-based interventions, such as the clinical utility of AI in the interpretation of clinical images, such as X-rays, and could be extended to account for the time-varying nature of imaging data over the course of disease progression or treatment [93]. All of these applications of biomarker-guided trial emulation have potential for the ability to develop innovative personalised approaches to treating disease.

6. Conclusions

The implementation of the target trial emulation framework in observational research has enabled the estimation of causal effects that would otherwise be difficult or impossible to obtain, for example when conducting a trial would be unethical or unfeasible. Beyond standard two-arm trials, the use of target trial emulation has resulted in the emulation of trials with more complex designs, including comparison of static and dynamic treatment strategies dependent on levels of time-varying biomarkers, alongside trials with multiple treatment arms. However, to ensure that the results of target trial emulation would resemble those achieved in the ideal RCT, methods to adjust for baseline, time-varying, residual, and unmeasured confounding are required to be implemented while following the target trial framework. By doing so, emulated trials provide an efficient alternative to prospectively conducted trials and can greatly enhance the evidence base for both current and new treatments.
We believe that the target trial emulation framework provides unique opportunities for the demonstration of personalised approaches to treatments, in particular how biomarkers can be used to guide treatment strategies. We hope to see more biomarker-guided target trials in the future, especially in underfunded areas such as rare diseases.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/jpm15050195/s1, Table S1: Characteristics of biomarker-guided trials; File S1: Full search strategy used in each database; Table S2: PRISMA 2020 checklist.

Author Contributions

Conceptualisation, F.D.B., A.L.J. and R.K.-D.; methodology, F.D.B., A.L.J. and R.K.-D.; validation, F.D.B., R.K.S.K., A.L.J. and R.K.-D.; data screening and extraction, F.D.B. and R.K.S.K.; data synthesis, F.D.B., A.L.J. and R.K.-D.; writing—original draft preparation, F.D.B.; writing—review and editing, F.D.B., R.K.S.K., A.L.J. and R.K.-D.; visualisation, F.D.B., A.L.J., R.K.S.K. and R.K.-D.; supervision, A.L.J. and R.K.-D. All authors have read and agreed to the published version of the manuscript.

Funding

F.D.B. is funded by the Medical Research Council (MRC) Trials Methodology Research Partnership (TMRP), grant number: MR/W006049/1, and was also provided with additional financial support from Health Data Research UK North (HDR UK North).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This systematic review did not generate or analyse any new datasets. The study synthesised findings from existing published literature, all of which are cited within the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
RCTRandomised controlled trial
ITTIntention-to-treat
PPPer-protocol
IPWInverse probability weighting
IPTWInverse probability of treatment weighting
MSMMarginal structural model
IPW-MSM Inverse probability weighting as part of a marginal structural model
PSMPropensity score matching
ATEAverage treatment effect
ATTAverage treatment effect on the treated
IPCWInverse probability of censoring weighting
SMDStandardised mean difference
MI Multiple imputation
MCMC Markov chain Monte Carlo
BARTBayesian additive regression trees
MARMissing at Random
MNAR Missing Not at Random
EHRElectronic health record
eGFREstimated glomerular filtration rate
DAGDirected acyclic graph
TARGETTrAnsparent ReportinG of observational studies Emulating a Target trial reporting guideline
AIArtificial intelligence

References

  1. Hernán, M.A.; Robins, J.M. Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available. Am. J. Epidemiol. 2016, 183, 758–764. [Google Scholar] [CrossRef]
  2. Hernán, M.A.; Sauer, B.C.; Hernández-Díaz, S.; Platt, R.; Shrier, I. Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses. J. Clin. Epidemiol. 2016, 79, 70–75. [Google Scholar] [CrossRef] [PubMed]
  3. Mansournia, M.A.; Etminan, M.; Danaei, G.; Kaufman, J.S.; Collins, G. Handling time varying confounding in observational research. BMJ 2017, 359, j4587. [Google Scholar] [CrossRef]
  4. Brown, L.C.; Jorgensen, A.L.; Antoniou, M.; Wason, J. Biomarker-Guided Trials. In Principles and Practice of Clinical Trials; Piantadosi, S., Meinert, C.L., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 1145–1170. [Google Scholar]
  5. Le Tourneau, C.; Delord, J.-P.; Gonçalves, A.; Gavoille, C.; Dubot, C.; Isambert, N.; Campone, C.; Trédan, O.; Massiani, M.-A.; Mauborgne, C.; et al. Molecularly targeted therapy based on tumour molecular profiling versus conventional therapy for advanced cancer (SHIVA): A multicentre, open-label, proof-of-concept, randomised, controlled phase 2 trial. Lancet Oncol. 2015, 16, 1324–1334. [Google Scholar] [CrossRef] [PubMed]
  6. Duijkers, R.; Prins, H.J.; Kross, M.; Snijders, D.; van den Berg, J.W.K.; Werkman, G.M.; van der Veen, N.; Schoorl, M.; Bonten, M.J.M.; van Werkhoven, C.H.; et al. Biomarker guided antibiotic stewardship in community acquired pneumonia: A randomized controlled trial. PLoS ONE 2024, 19, e0307193. [Google Scholar] [CrossRef] [PubMed]
  7. O’Dwyer, P.J.; Gray, R.J.; Flaherty, K.T.; Chen, A.P.; Li, S.; Wang, V.; McShane, L.M.; Patton, D.R.; Tricoli, J.V.; Williams, P.M.; et al. The NCI-MATCH trial: Lessons for precision oncology. Nat. Med. 2023, 29, 1349–1357. [Google Scholar] [CrossRef]
  8. Antoniou, M.; Kolamunnage-Dona, R.; Wason, J.; Bathia, R.; Billingham, C.; Bliss, J.M.; Brown, L.C.; Gillman, A.; Paul, J.; Jorgensen, A.L.; et al. Biomarker-guided trials: Challenges in practice. Contemp. Clin. Trials Commun. 2019, 16, 100493. [Google Scholar] [CrossRef]
  9. Papp, M.; Kiss, N.; Baka, M.; Trásy, D.; Zubek, L.; Fehérvári, P.; Harnos, A.; Turan, C.; Heygi, P.; Molnár, Z.; et al. Procalcitonin-guided antibiotic therapy may shorten length of treatment and may improve survival-a systematic review and meta-analysis. Crit. Care 2023, 27, 394. [Google Scholar] [CrossRef]
  10. Jorgensen, A.L.; Khalaf, R.K.; Kolamunnage-Dona, R.; Baldwin, F.D. Methodologies for the Emulation of Biomarker-Guided Trials Using Observational Data: A Systematic Review Protocol [Internet]; Open Science Framework: Charlottesville, VA, USA, 2025. [Google Scholar] [CrossRef]
  11. Ouzzani, M.; Hammady, H.; Fedorowicz, Z.; Elmagarmid, A. Rayyan—A web and mobile app for systematic reviews. Syst. Rev. 2016, 5, 210. [Google Scholar] [CrossRef]
  12. Ahn, N.; Nolde, M.; Günter, A.; Güntner, F.; Gerlach, R.; Tauscher, M.; Amann, U.; Linseisen, J.; Meisinger, C.; Rückert-Eheberg, I.M.; et al. Emulating a target trial of proton pump inhibitors and dementia risk using claims data. Eur. J. Neurol. 2022, 29, 1335–1343. [Google Scholar] [CrossRef]
  13. Althunian, T.A.; de Boer, A.; Groenwold, R.H.H.; Rengerink, K.O.; Souverein, P.C.; Klungel, O.H. Rivaroxaban was found to be noninferior to warfarin in routine clinical care: A retrospective noninferiority cohort replication study. Pharmacoepidemiol. Drug Saf. 2020, 29, 1263–1272. [Google Scholar] [CrossRef]
  14. Aubert, C.E.; Sussman, J.B.; Hofer, T.P.; Cushman, W.C.; Ha, J.K.; Min, L. Adding a New Medication Versus Maximizing Dose to Intensify Hypertension Treatment in Older Adults: A Retrospective Observational Study. Ann. Intern. Med. 2021, 174, 1666–1673. [Google Scholar] [CrossRef] [PubMed]
  15. Barbulescu, A.; Askling, J.; Saevarsdottir, S.; Kim, S.C.; Frisell, T. Combined Conventional Synthetic Disease Modifying Therapy vs. Infliximab for Rheumatoid Arthritis: Emulating a Randomized Trial in Observational Data. Clin. Pharmacol. Ther. 2022, 112, 836–845. [Google Scholar] [CrossRef] [PubMed]
  16. Becker, W.C.; Li, Y.; Caniglia, E.C.; Vickers-Smith, R.; Feinberg, T.; Marshall, B.D.L.; Edelman, E.J. Cannabis use, pain interference, and prescription opioid receipt among persons with HIV: A target trial emulation study. AIDS Care 2022, 34, 469–477. [Google Scholar] [CrossRef]
  17. Börnhorst, C.; Reinders, T.; Rathmann, W.; Bongaerts, B.; Haug, U.; Didelez, V.; Kollhorst, B. Avoiding Time-Related Biases: A Feasibility Study on Antidiabetic Drugs and Pancreatic Cancer Applying the Parametric g-Formula to a Large German Healthcare Database. Clin. Epidemiol. 2021, 13, 1027–1038. [Google Scholar] [CrossRef]
  18. Bosch, N.A.; Law, A.C.; Vail, E.A.; Gillmeyer, K.R.; Gershengorn, H.B.; Wunsch, H.; Walkey, A. Inhaled Nitric Oxide vs Epoprostenol During Acute Respiratory Failure: An Observational Target Trial Emulation. Chest 2022, 162, 1287–1296. [Google Scholar] [CrossRef] [PubMed]
  19. Boyne, D.J.; Brenner, D.R.; Gupta, A.; Mackay, E.; Arora, P.; Wasiak, R.; Cheung, W.Y.; Hernán, M.A. Head-to-head comparison of FOLFIRINOX versus gemcitabine plus nab-paclitaxel in advanced pancreatic cancer: A target trial emulation using real-world data. Ann. Epidemiol. 2023, 78, 28–34. [Google Scholar] [CrossRef]
  20. Cain, L.E.; Saag, M.S.; Petersen, M.; May, M.T.; Ingle, S.M.; Logan, R.; Robins, J.M.; Abgrall, S.; Shepherd, B.E.; Deeks, S.G.; et al. Using observational data to emulate a randomized trial of dynamic treatment-switching strategies: An application to antiretroviral therapy. Int. J. Epidemiol. 2016, 45, 2038–2049. [Google Scholar] [CrossRef]
  21. Caniglia, E.C.; Robins, J.M.; Cain, L.E.; Sabin, C.; Logan, R.; Abgrall, S.; Mugavero, M.J.; Hernández-Díaz, S.; Meyer, L.; Seng, R.; et al. Emulating a trial of joint dynamic strategies: An application to monitoring and treatment of HIV-positive individuals. Stat. Med. 2019, 38, 2428–2446. [Google Scholar] [CrossRef]
  22. Caniglia, E.C.; Rojas-Saunero, L.P.; Hilal, S.; Licher, S.; Logan, R.; Stricker, B.; Ikram, M.A.; Swanson, S.A. Emulating a target trial of statin use and risk of dementia using cohort data. Neurology 2020, 95, e1322–e1332. [Google Scholar] [CrossRef]
  23. Caniglia, E.C.; Zash, R.; Jacobson, D.L.; Diseko, M.; Mayondi, G.; Lockman, S.; Chen, J.Y.; Mmalane, M.; Makhema, J.; Hernán, M.A.; et al. Emulating a target trial of antiretroviral therapy regimens started before conception and risk of adverse birth outcomes. Aids 2018, 32, 113–120. [Google Scholar] [CrossRef] [PubMed]
  24. Cheng-Lai, A.; Prlesi, L.; Murthy, S.; Bellin, E.Y.; Sinnett, M.J.; Goriacko, P. Evaluating Pharmacist-Led Heart Failure Transitions of Care Clinic: Impact of Analytic Approach on Readmission Rate Endpoints. Curr. Probl. Cardiol. 2023, 48, 101507. [Google Scholar] [CrossRef]
  25. Danaei, G.; García Rodríguez, L.A.; Cantero, O.F.; Logan, R.W.; Hernán, M.A. Electronic medical records can be used to emulate target trials of sustained treatment strategies. J. Clin. Epidemiol. 2018, 96, 12–22. [Google Scholar] [CrossRef]
  26. Dickerman, B.A.; García-Albéniz, X.; Logan, R.W.; Denaxas, S.; Hernán, M.A. Avoidable flaws in observational analyses: An application to statins and cancer. Nat. Med. 2019, 25, 1601–1606. [Google Scholar] [CrossRef]
  27. Dickerman, B.A.; García-Albéniz, X.; Logan, R.W.; Denaxas, S.; Hernán, M.A. Emulating a target trial in case-control designs: An application to statins and colorectal cancer. Int. J. Epidemiol. 2020, 49, 1637–1646. [Google Scholar] [CrossRef] [PubMed]
  28. Dickerman, B.A.; Gerlovin, H.; Madenci, A.L.; Kurgansky, K.E.; Ferolito, B.R.; Figueroa Muñiz, M.J.; Gagnon, D.R.; Gaziano, J.M.; Cho, K.; Casas, J.P.; et al. Comparative Effectiveness of BNT162b2 and mRNA-1273 Vaccines in U.S. Veterans. N. Engl. J. Med. 2022, 386, 105–115. [Google Scholar] [CrossRef]
  29. Franklin, J.M.; Patorno, E.; Desai, R.J.; Glynn, R.J.; Martin, D.; Quinto, K.; Pawar, A.; Bessette, L.G.; Lee, H.; Garry, E.M.; et al. Emulating Randomized Clinical Trials with Nonrandomized Real-World Evidence Studies. Circulation 2021, 143, 1002–1013. [Google Scholar] [CrossRef] [PubMed]
  30. Fu, E.L.; Evans, M.; Clase, C.M.; Tomlinson, L.A.; van Diepen, M.; Dekker, F.W.; Carrero, J.J. Stopping Renin-Angiotensin System Inhibitors in Patients with Advanced CKD and Risk of Adverse Outcomes: A Nationwide Study. J. Am. Soc. Nephrol. 2021, 32, 424–435. [Google Scholar] [CrossRef]
  31. Hernán, M.A.; Alonso, A.; Logan, R.; Grodstein, F.; Michels, K.B.; Willett, W.C.; Manson, J.E.; Robins, J.M. Observational studies analyzed like randomized experiments: An application to postmenopausal hormone therapy and coronary heart disease. Epidemiology 2008, 19, 766–779. [Google Scholar] [CrossRef]
  32. Ioannou, G.N.; Bohnert, A.S.B.; O’Hare, A.M.; Boyko, E.J.; Maciejewski, M.L.; Smith, V.A.; Bowling, C.A.; Viglianti, E.; Iwashyna, T.J.; Hynes, D.M.; et al. Effectiveness of mRNA COVID-19 Vaccine Boosters Against Infection, Hospitalization, and Death: A Target Trial Emulation in the Omicron (B.1.1.529) Variant Era. Ann. Intern. Med. 2022, 175, 1693–1706. [Google Scholar] [CrossRef]
  33. Ioannou, G.N.; Locke, E.R.; Green, P.K.; Berry, K. Comparison of Moderna versus Pfizer-BioNTech COVID-19 vaccine outcomes: A target trial emulation study in the U.S. Veterans Affairs healthcare system. EClinicalMedicine 2022, 45, 101326. [Google Scholar] [CrossRef] [PubMed]
  34. Kirchgesner, J.; Desai, R.J.; Beaugerie, L.; Kim, S.C.; Schneeweiss, S. Calibrating Real-World Evidence Studies Against Randomized Trials: Treatment Effectiveness of Infliximab in Crohn’s Disease. Clin. Pharmacol. Ther. 2022, 111, 179–186. [Google Scholar] [CrossRef] [PubMed]
  35. Kirchgesner, J.; Desai, R.J.; Schneeweiss, M.C.; Beaugerie, L.; Kim, S.C.; Schneeweiss, S. Emulation of a randomized controlled trial in ulcerative colitis with US and French claims data: Infliximab with thiopurines compared to infliximab monotherapy. Pharmacoepidemiol. Drug Saf. 2022, 31, 167–175. [Google Scholar] [CrossRef] [PubMed]
  36. Kirchgesner, J.; Desai, R.J.; Schneeweiss, M.C.; Beaugerie, L.; Schneeweiss, S.; Kim, S.C. Decreased risk of treatment failure with vedolizumab and thiopurines combined compared with vedolizumab monotherapy in Crohn’s disease. Gut 2022, 71, 1781–1789. [Google Scholar] [CrossRef]
  37. Kraglund, F.; Christensen, D.H.; Eiset, A.H.; Villadsen, G.E.; West, J.; Jepsen, P. Effects of statins and aspirin on HCC risk in alcohol-related cirrhosis: Nationwide emulated trials. Hepatol. Commun. 2023, 7, e0013. [Google Scholar] [CrossRef]
  38. Kuehne, F.; Arvandi, M.; Hess, L.M.; Faries, D.E.; Matteucci Gothe, R.; Gothe, H.; Beyrer, J.; Zeimet, A.G.; Stojkov, I.; Mühlberger, N.; et al. Causal analyses with target trial emulation for real-world evidence removed large self-inflicted biases: Systematic bias assessment of ovarian cancer treatment effectiveness. J. Clin. Epidemiol. 2022, 152, 269–280. [Google Scholar] [CrossRef]
  39. Kwee, S.A.; Wong, L.L.; Ludema, C.; Deng, C.K.; Taira, D.; Seto, T.; Landsittel, D. Target Trial Emulation: A Design Tool for Cancer Clinical Trials. JCO Clin. Cancer Inform. 2023, 7, e2200140. [Google Scholar] [CrossRef]
  40. Lyu, B.; Chan, M.R.; Yevzlin, A.S.; Gardezi, A.; Astor, B.C. Arteriovenous Access Type and Risk of Mortality, Hospitalization, and Sepsis Among Elderly Hemodialysis Patients: A Target Trial Emulation Approach. Am. J. Kidney Dis. 2022, 79, 69–78. [Google Scholar] [CrossRef]
  41. Massol, J.; Simon-Tillaux, N.; Tohme, J.; Hariri, G.; Dureau, P.; Duceau, B.; Belin, L.; Hajage, D.; De Rycke, Y.; Charfeddine, A.; et al. Levosimendan in patients undergoing extracorporeal membrane oxygenation after cardiac surgery: An emulated target trial using observational data. Crit. Care 2023, 27, 51. [Google Scholar] [CrossRef]
  42. Mazzotta, V.; Cozzi-Lepri, A.; Colavita, F.; Lanini, S.; Rosati, S.; Lalle, E.; Mastrorosa, I.; Cimaglia, C.; Vergori, A.; Bevilacqua, N.; et al. Emulation of a Target Trial From Observational Data to Compare Effectiveness of Casirivimab/Imdevimab and Bamlanivimab/Etesevimab for Early Treatment of Non-Hospitalized Patients with COVID-19. Front. Immunol. 2022, 13, 868020. [Google Scholar] [CrossRef]
  43. McGrath, L.J.; Nielson, C.; Saul, B.; Breskin, A.; Yu, Y.; Nicolaisen, S.K.; Kilpatrick, K.; Ghanima, W.; Christiansen, C.F.; Bahmanyar, S.; et al. Lessons Learned Using Real-World Data to Emulate Randomized Trials: A Case Study of Treatment Effectiveness for Newly Diagnosed Immune Thrombocytopenia. Clin. Pharmacol. Ther. 2021, 110, 1570–1578. [Google Scholar] [CrossRef] [PubMed]
  44. Nolde, M.; Ahn, N.; Dreischulte, T.; Rückert-Eheberg, I.M.; Güntner, F.; Günter, A.; Gerlach, R.; Tauscher, M.; Amann, U.; Linseisen, J.; et al. The long-term risk for myocardial infarction or stroke after proton pump inhibitor therapy (2008–2018). Aliment. Pharmacol. Ther. 2021, 54, 1033–1040. [Google Scholar] [CrossRef]
  45. Puéchal, X.; Iudici, M.; Perrodeau, E.; Bonnotte, B.; Lifermann, F.; Le Gallou, T.; Karras, A.; Blanchard-Delaunay, C.; Quéméneur, T.; Aouba, A.; et al. Rituximab vs Cyclophosphamide Induction Therapy for Patients with Granulomatosis with Polyangiitis. JAMA Netw. Open 2022, 5, e2243799. [Google Scholar] [CrossRef]
  46. Schroeder, E.B.; Neugebauer, R.; Reynolds, K.; Schmittdiel, J.A.; Loes, L.; Dyer, W.; Pimental, N.; Desai, J.R.; Vazquez-Benitez, G.; Ho, P.M.; et al. Association of Cardiovascular Outcomes and Mortality with Sustained Long-Acting Insulin Only vs Long-Acting Plus Short-Acting Insulin Treatment. JAMA Netw. Open 2021, 4, e2126605. [Google Scholar] [CrossRef] [PubMed]
  47. Smith, L.H.; García-Albéniz, X.; Chan, J.M.; Zhao, S.; Cowan, J.E.; Broering, J.M.; Cooperberg, M.R.; Carroll, P.R.; Hernán, M.A. Emulation of a target trial with sustained treatment strategies: An application to prostate cancer using both inverse probability weighting and the g-formula. Eur. J. Epidemiol. 2022, 37, 1205–1213. [Google Scholar] [CrossRef] [PubMed]
  48. Takeuchi, Y.; Kumamaru, H.; Hagiwara, Y.; Matsui, H.; Yasunaga, H.; Miyata, H.; Matsuyama, Y. Sodium-glucose cotransporter-2 inhibitors and the risk of urinary tract infection among diabetic patients in Japan: Target trial emulation using a nationwide administrative claims database. Diabetes Obes. Metab. 2021, 23, 1379–1388. [Google Scholar] [CrossRef]
  49. Talmor-Barkan, Y.; Yacovzada, N.S.; Rossman, H.; Witberg, G.; Kalka, I.; Kornowski, R.; Segal, E. Head-to-head efficacy and safety of rivaroxaban, apixaban, and dabigatran in an observational nationwide targeted trial. Eur. Heart J. Cardiovasc. Pharmacother. 2022, 9, 26–37. [Google Scholar] [CrossRef]
  50. Trevisan, M.; Fu, E.L.; Xu, Y.; Savarese, G.; Dekker, F.W.; Lund, L.H.; Clase, C.M.; Sjölander, A.; Carrero, J.J. Stopping mineralocorticoid receptor antagonists after hyperkalaemia: Trial emulation in data from routine care. Eur. J. Heart Fail. 2021, 23, 1698–1707. [Google Scholar] [CrossRef] [PubMed]
  51. van Santen, D.K.; Boyd, A.; Matser, A.; Maher, L.; Hickman, M.; Lodi, S.; Prins, M. The effect of needle and syringe program and opioid agonist therapy on the risk of, H.I.V.; hepatitis B and C virus infection for people who inject drugs in Amsterdam, the Netherlands: Findings from an emulated target trial. Addiction 2021, 116, 3115–3126. [Google Scholar] [CrossRef]
  52. van Santen, D.K.; Lodi, S.; Dietze, P.; van den Boom, W.; Hayashi, K.; Dong, H.; Cui, Z.; Maher, L.; Hickman, M.; Boyd, A.; et al. Comprehensive needle and syringe program and opioid agonist therapy reduce HIV and hepatitis c virus acquisition among people who inject drugs in different settings: A pooled analysis of emulated trials. Addiction 2023, 118, 1116–1126. [Google Scholar] [CrossRef]
  53. Xie, Y.; Bowe, B.; Gibson, A.K.; McGill, J.B.; Maddukuri, G.; Yan, Y.; Al-Aly, Z. Comparative Effectiveness of SGLT2 Inhibitors, GLP-1 Receptor Agonists, DPP-4 Inhibitors, and Sulfonylureas on Risk of Kidney Outcomes: Emulation of a Target Trial Using Health Care Databases. Diabetes Care 2020, 43, 2859–2869. [Google Scholar] [CrossRef]
  54. Xu, Y.; Fu, E.L.; Trevisan, M.; Jernberg, T.; Sjölander, A.; Clase, C.M.; Carrero, J.J. Stopping renin-angiotensin system inhibitors after hyperkalemia and risk of adverse outcomes. Am. Heart J. 2022, 243, 177–186. [Google Scholar] [CrossRef]
  55. Yarnell, C.J.; Angriman, F.; Ferreyro, B.L.; Liu, K.; De Grooth, H.J.; Burry, L.; Munshi, L.; Mehta, S.; Celi, L.; Elbers, P.; et al. Oxygenation thresholds for invasive ventilation in hypoxemic respiratory failure: A target trial emulation in two cohorts. Crit. Care 2023, 27, 67. [Google Scholar] [CrossRef] [PubMed]
  56. Yiu, Z.Z.N.; Mason, K.J.; Hampton, P.J.; Reynolds, N.J.; Smith, C.H.; Lunt, M.; Griffiths, C.E.M.; Warren, R.B.; BADBIR Study Group. Randomized Trial Replication Using Observational Data for Comparative Effectiveness of Secukinumab and Ustekinumab in Psoriasis: A Study From the British Association of Dermatologists Biologics and Immunomodulators Register. JAMA Dermatol. 2021, 157, 66–73. [Google Scholar] [CrossRef]
  57. Young, J.; Wong, S.; Janjua, N.Z.; Klein, M.B. Comparing direct acting antivirals for hepatitis C using observational data—Why and how? Pharmacol. Res. Perspect 2020, 8, e00650. [Google Scholar] [CrossRef] [PubMed]
  58. Zhang, Y.; Young, J.G.; Thamer, M.; Hernán, M.A. Comparing the Effectiveness of Dynamic Treatment Strategies Using Electronic Health Records: An Application of the Parametric g-Formula to Anemia Management Strategies. Health Serv. Res. 2018, 53, 1900–1918. [Google Scholar] [CrossRef] [PubMed]
  59. Admon, A.J.; Donnelly, J.P.; Casey, J.D.; Janz, D.R.; Russell, D.W.; Joffe, A.M.; Vonderhaar, D.J.; Dischert, K.M.; Stempek, S.B.; Dargin, J.M.; et al. Emulating a Novel Clinical Trial Using Existing Observational Data. Predicting Results of the PreVent Study. Ann. Am. Thorac. Soc. 2019, 16, 998–1007. [Google Scholar] [CrossRef]
  60. Bakker, L.; Goossens, L.; O’Kane, M.; Groot, C.; Redekop, W. Analysing Electronic Health Records: The Benefits of Target Trial Emulation. Health Policy Technol. 2021, 10, 100545. [Google Scholar] [CrossRef]
  61. Cain, L.E.; Robins, J.M.; Lanoy, E.; Logan, R.; Costagliola, D.; Hernán, M.A. When to start treatment? A systematic approach to the comparison of dynamic regimes using observational data. Int. J. Biostat. 2010, 6, 18. [Google Scholar] [CrossRef]
  62. Hernán, M.A. How to estimate the effect of treatment duration on survival outcomes using observational data. BMJ 2018, 360, k182. [Google Scholar] [CrossRef]
  63. Hernández-Díaz, S.; Huybrechts, K.F.; Chiu, Y.H.; Yland, J.J.; Bateman, B.T.; Hernán, M.A. Emulating a Target Trial of Interventions Initiated During Pregnancy with Healthcare Databases: The Example of COVID-19 Vaccination. Epidemiology 2023, 34, 238–246. [Google Scholar] [CrossRef] [PubMed]
  64. Kalia, S.; Saarela, O.; Escobar, M.; Moineddin, R.; Greiver, M. Estimation of marginal structural models under irregular visits and unmeasured confounder: Calibrated inverse probability weights. BMC Med. Res. Methodol. 2023, 23, 4. [Google Scholar] [CrossRef] [PubMed]
  65. Kuehne, F.; Jahn, B.; Conrads-Frank, A.; Bundo, M.; Arvandi, M.; Endel, F.; Popper, N.; Endel, G.; Urach, C.; Gyimesi, M.; et al. Guidance for a causal comparative effectiveness analysis emulating a target trial based on big real world evidence: When to start statin treatment. J. Comp. Eff. Res. 2019, 8, 1013–1025. [Google Scholar] [CrossRef]
  66. Lodi, S.; Phillips, A.; Lundgren, J.; Logan, R.; Sharma, S.; Cole, S.R.; Babiker, A.; Law, M.; Chu, H.; Byrne, D.; et al. Effect Estimates in Randomized Trials and Observational Studies: Comparing Apples with Apples. Am. J. Epidemiol. 2019, 188, 1569–1577. [Google Scholar] [CrossRef] [PubMed]
  67. Maringe, C.; Benitez Majano, S.; Exarchakou, A.; Smith, M.; Rachet, B.; Belot, A.; Leyrat, C. Reflection on modern methods: Trial emulation in the presence of immortal-time bias. Assessing the benefit of major surgery for elderly lung cancer patients using observational data. Int. J. Epidemiol. 2020, 49, 1719–1729. [Google Scholar] [CrossRef]
  68. Schnitzer, M.E.; Guerra, S.F.; Longo, C.; Blais, L.; Platt, R.W. A potential outcomes approach to defining and estimating gestational age-specific exposure effects during pregnancy. Stat. Methods Med. Res. 2022, 31, 300–314. [Google Scholar] [CrossRef]
  69. Wendling, T.; Jung, K.; Callahan, A.; Schuler, A.; Shah, N.H.; Gallego, B. Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases. Stat. Med. 2018, 37, 3309–3324. [Google Scholar] [CrossRef]
  70. Wintzell, V.; Svanström, H.; Pasternak, B. Selection of Comparator Group in Observational Drug Safety Studies: Alternatives to the Active Comparator New User Design. Epidemiology 2022, 33, 707–714. [Google Scholar] [CrossRef]
  71. Sterne, J.A.C.; Hernán, M.A.; McAleenan, A.; Reeves, B.C.; Higgins, J.P.T. Chapter 25: Assessing risk of bias in a non-randomized study [last updated October 2019]. In Cochrane Handbook for Systematic Reviews of Interventions Version 6.5. Cochrane; Higgins, J.P.T., Thomas, J., Chandler, J., Cumpston, M., Li, T., Page, M.J., Welch, V.A., Eds.; Cochrane: London, UK, 2024; Available online: www.training.cochrane.org/handbook (accessed on 10 March 2025).
  72. Fu, E.L. Target Trial Emulation to Improve Causal Inference from Observational Data: What, Why, and How? J. Am. Soc. Nephrol. 2023, 34, 1305–1314. [Google Scholar] [CrossRef]
  73. Chesnaye, N.C.; Stel, V.S.; Tripepi, G.; Dekker, F.W.; Fu, E.L.; Zoccali, C.; Jager, K.J. An introduction to inverse probability of treatment weighting in observational research. Clin. Kidney J. 2022, 15, 14–20. [Google Scholar] [CrossRef]
  74. Austin, P.C. The performance of different propensity score methods for estimating marginal hazard ratios. Stat. Med. 2013, 32, 2837–2849. [Google Scholar] [CrossRef] [PubMed]
  75. Hernan, M.A.; Robins, J.M. Causal Inference: What If; CRC Press: Boca Raton, FL, USA, 2024. [Google Scholar]
  76. Austin, P.C. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies. Multivar. Behav. Res. 2011, 46, 399–424. [Google Scholar] [CrossRef] [PubMed]
  77. Lu, B. Propensity Score Matching with Time-Dependent Covariates. Biometrics 2005, 61, 721–728. [Google Scholar] [CrossRef]
  78. Bykov, K.; Patorno, E.; D’Andrea, E.; He, M.; Lee, H.; Graff, J.S.; Franklin, J.M. Prevalence of Avoidable and Bias-Inflicting Methodological Pitfalls in Real-World Studies of Medication Safety and Effectiveness. Clin. Pharmacol. Ther. 2022, 111, 209–217. [Google Scholar] [CrossRef]
  79. Keil, A.P.; Edwards, J.K.; Richardson, D.B.; Naimi, A.I.; Cole, S.R. The parametric g-formula for time-to-event data: Intuition and a worked example. Epidemiology 2014, 25, 889–897. [Google Scholar] [CrossRef]
  80. McGrath, S.; Lin, V.; Zhang, Z.; Petito, L.C.; Logan, R.W.; Hernán, M.A.; Young, J.G. gfoRmula: An R Package for Estimating the Effects of Sustained Treatment Strategies via the Parametric g-formula. Patterns 2020, 1, 100008. [Google Scholar] [CrossRef]
  81. Williamson, T.; Ravani, P. Marginal structural models in clinical research: When and how to use them? Nephrol. Dial. Transplant. 2017, 32 (Suppl. 2), ii84–ii90. [Google Scholar] [CrossRef] [PubMed]
  82. Sterne, J.A.C.; White, I.R.; Carlin, J.B.; Spratt, M.; Royston, P.; Kenward, M.G.; Wood, A.M.; Carpenter, J.R. Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. BMJ 2009, 338, b2393. [Google Scholar] [CrossRef]
  83. Huque, M.H.; Carlin, J.B.; Simpson, J.A.; Lee, K.J. A comparison of multiple imputation methods for missing data in longitudinal studies. BMC Med. Res. Methodol. 2018, 18, 168. [Google Scholar] [CrossRef]
  84. Daniel, R.M.; Cousens, S.N.; De Stavola, B.L.; Kenward, M.G.; Sterne, J.A.C. Methods for dealing with time-dependent confounding. Stat. Med. 2013, 32, 1584–1618. [Google Scholar] [CrossRef]
  85. Greifer, N.; Stuart, E.A. Matching Methods for Confounder Adjustment: An Addition to the Epidemiologist’s Toolbox. Epidemiol. Rev. 2022, 43, 118–129. [Google Scholar] [CrossRef] [PubMed]
  86. Hansford, H.J.; Cashin, A.G.; Jones, M.D.; Swanson, S.A.; Islam, N.; Dahabreh, I.J.; Dickerman, B.A.; Egger, M.; Garcia-Albeniz, X.; Golub, R.M.; et al. Development of the TrAnsparent ReportinG of observational studies Emulating a Target trial (TARGET) guideline. BMJ Open 2023, 13, e074626. [Google Scholar] [CrossRef] [PubMed]
  87. Lipsitch, M.; Tchetgen, E.T.; Cohen, T. Negative controls: A tool for detecting confounding and bias in observational studies. Epidemiology 2010, 21, 383–388. [Google Scholar] [CrossRef]
  88. VanderWeele, T.J.; Ding, P. Sensitivity Analysis in Observational Research: Introducing the E-Value. Ann. Intern. Med. 2017, 167, 268–274. [Google Scholar] [CrossRef] [PubMed]
  89. Keogh, R.H.; Gran, J.M.; Seaman, S.R.; Davies, G.; Vansteelandt, S. Causal inference in survival analysis using longitudinal observational data: Sequential trials and marginal structural models. Stat. Med. 2023, 42, 2191–2225. [Google Scholar] [CrossRef]
  90. Su, L.; Rezvani, R.; Seaman, S.R.; Starr, C.; Gravestock, I. TrialEmulation: An R Package to Emulate Target Trials for Causal Analysis of Observational Time-to-Event Data. arXiv 2024, arXiv:2402.12083. [Google Scholar]
  91. Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.J.; Bambrick, J.; et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef]
  92. Desai, D.; Kantliwala, S.V.; Vybhavi, J.; Ravi, R.; Patel, H.; Patel, J. Review of AlphaFold 3: Transformative Advances in Drug Design and Therapeutics. Cureus 2024, 16, e63646. [Google Scholar] [CrossRef]
  93. Johnson, K.B.; Wei, W.Q.; Weeraratne, D.; Frisse, M.E.; Misulis, K.; Rhee, K.; Zhao, J.; Snowdon, J.L. Precision Medicine, AI, and the Future of Personalized Health Care. Clin. Transl. Sci. 2021, 14, 86–93. [Google Scholar] [CrossRef]
Figure 1. PRISMA flow chart.
Figure 1. PRISMA flow chart.
Jpm 15 00195 g001
Figure 2. Directed acyclic graph (DAG) depicting the hypothesised causal relationships between statin use (exposure) and cardiovascular disease (outcome), accounting for time-varying statin use (at baseline, T0 and follow-up, T1) alongside time-varying variables LDL cholesterol and smoking status, and age and sex as baseline confounders.
Figure 2. Directed acyclic graph (DAG) depicting the hypothesised causal relationships between statin use (exposure) and cardiovascular disease (outcome), accounting for time-varying statin use (at baseline, T0 and follow-up, T1) alongside time-varying variables LDL cholesterol and smoking status, and age and sex as baseline confounders.
Jpm 15 00195 g002
Table 1. Advantages and disadvantages of the methods used to control for confounding within application and methodological studies.
Table 1. Advantages and disadvantages of the methods used to control for confounding within application and methodological studies.
MethodOverview of MethodTypes of Confounding Adjusted forNumber of Application Studies (N = 47)References of Application PapersNumber of Methodological Studies (N = 12)References of Methods PapersAdvantages Disadvantages
Standard regression adjustment (adjusting for covariates in a regression model)Inclusion of confounders as covariates in a regression model, e.g., logistic, linear, and Cox regression.Baseline confounding only.18[13,16,17,18,19,20,23,25,26,27,31,32,33,37,38,41,42,45]2[60,67]
  • Easy to implement and interpret
  • Less computationally intensive than other methods
  • Versatile with other statistical approaches (e.g., machine learning)
  • Cannot adjust for time-varying covariates or their time-dependent relationship with both treatment and outcome as it assumes covariates remain fixed over time.
  • Can introduce collider stratification bias when adjusting for a time-varying confounder that shares a common cause with the outcome [3].
  • Not appropriate when the exposure is time-varying, as it can introduce time-varying confounding by past exposure through over-adjustment bias [3].
Inverse probability of treatment weighting (IPTW)Weighting individuals by the inverse probability of being assigned the treatment they were actually assigned to conditionally on the individual’s baseline covariates [73]. The weighted observations are then adjusted in a regression model (e.g., weighted logistic regression, weighted Cox model) for estimation of the average treatment effect (ATE).Baseline and time-varying confounding 7[15,40,44,45,48,53,56]0Not Applicable *
  • Mimics a randomised trial by creating a pseudo-population where treatment is independent of confounders [73].
  • Can be applied to time-varying exposures and confounders.
  • Unlike propensity score matching, IPTW retains sample size by weighting observations instead of removing individuals who do not match [73].
  • Has been shown to reduce bias more effectively than standard regression adjustment [74].
  • Possibility of extreme weights if there are large differences in characteristics between groups, leading to biased results [73].
  • Requires extra adjustment for within-individual correlation and inflation of sample size (e.g., bootstrapping and robust variance estimation) [73].
  • Assumes there is no unmeasured confounding, that the propensity score model is correctly specified, and positivity (every individual must have some chance of receiving the treatment of interest) [75]. These assumptions may be difficult to understand for less-experienced researchers.
Propensity score matching (PSM)Participants receiving the intervention are matched with those receiving the comparator based on similar propensity scores, which represent the probability of treatment assignment given observed covariates [76]. This approach estimates the Average Treatment Effect on the Treated (ATT). Covariate balance is assessed using the standardised mean difference (SMD). Primarily baseline confounding; however, time-dependent propensity scores have been developed [77].10[29,32,33,34,35,36,39,49,56,57]2[59,70] **
  • Straightforward to implement and interpret.
  • Less computationally intensive than other methods.
  • Improves covariate balance without use of outcome data, allowing a separation of design from the analysis, which may be preferable to some researchers [76].
  • Can be applied to time-varying covariates using time-dependent PSM [77].
  • Removes unmatched individuals from analysis, reducing sample size.
  • Can only improve balance on confounders provided they are included in the propensity score model [76].
  • Assumes there is no unmeasured confounding, that the propensity score model is correctly specified, and assumes positivity (every individual must have some chance of receiving the treatment of interest) [75]. These assumptions may be difficult to understand for less-experienced researchers.
Clone-censor-weightEach individual is assigned to all treatment strategies compatible with their data at time zero, creating clones for each strategy. Clones deviating from their assigned strategy are artificially censored, and inverse probability of censoring weighting adjusts for the resulting selection bias [62].Baseline, time-varying confounding and immortal-time bias.9[19,20,21,30,38,41,47,50,54]4[61,62,65,67]
  • Allows comparison of static and dynamic treatment strategies [62]
  • Well documented within target trial emulation literature, more accessible than likes of other G-methods [62,67]
  • Can be computationally intensive if applied to large datasets.
  • Assumes there is no unmeasured confounding, that the propensity score model is correctly specified, and assumes positivity (every individual must have some chance of receiving the treatment of interest). These assumptions may be difficult to understand for less-experienced researchers [78].
  • Requires extra adjustment for within-individual correlation and inflation of sample size (e.g., bootstrapping and robust variance estimation) [62].
Inverse probability of censoring weighting (IPCW)To account for selection bias from artificial censoring, inverse probability of censoring weights are calculated for each individual at all time points, based on the probability of being uncensored, given prior exposure and censoring-related characteristics [62]. These weights are then used in the outcome model (e.g., weighted linear, logistic, or Cox regression).Time-varying confounding only, but can be combined with IPTW in a marginal structural model or included as part of clone-censor-weighting to adjust for baseline and time-varying confounding. 10[12,22,23,25,26,27,31,43,48,56]3[55,58,63]
  • Controls for bias introduced by censoring (loss to follow-up or dropout).
  • Can be combined with other methods, including IPTW, PSM and standard covariate adjustment.
  • Useful for modelling dynamic treatment strategies that include time-varying covariates or exposures.
  • Possibility of extreme weights if there are large differences in characteristics between groups, leading to biased results [73].
  • Assumes exchangeability, positivity, correct model specification and no unmeasured confounding. If these assumptions are not met, the model could fail to accurately adjust for selection bias [75].
Parametric G-FormulaThe g-formula adjusts for time-varying confounders affected by prior exposures using a three-step algorithm. First, it models conditional probabilities from the observed data. Next, it uses these probabilities and baseline covariates to simulate time-varying covariates and outcomes via Monte Carlo sampling for each treatment group. Finally, the datasets are combined, and treatment effects are estimated by comparing hazard ratios across groups using a Cox model [79].Baseline and time-varying confounding.4[17,21,47,58]3[64,66,68]
  • Flexibility to model complex treatment-confounder relationships across time, accounting for interactions between exposures and confounders, alongside modelling non-linear relationships [3,79].
  • Ideal for studies examining interventions on multiple risk factors (joint interventions) [3,79].
  • Dual role of adjusting for baseline and time-varying confounding.
  • Provides estimates of counterfactual outcomes under different treatment scenarios and can be used to simulate real-world scenarios with complex, time-varying patterns of treatment use [79,80].
  • Harder to implement and interpret, may not be ideal for researchers less familiar with methods for time-varying confounding.
  • Computationally intensive.
  • Requires large sample sizes for more stable estimates, to minimise simulation error due to use of Monte Carlo sampling [79].
  • Requires explicit modelling of outcomes and covariates under specified treatment patterns, including whether they are static or dynamic, offering less flexibility than the likes of IPTW [3].
  • Reliant on correct model specification for outcome and confounder models [3,79].
Inverse probability weighting as part of a marginal structural model (IPW-MSM)Rather than using a single treatment weight for baseline confounding, separate treatment weights are calculated at each time point, based on prior treatment, time-varying confounders, and baseline covariates, using a pooled logistic regression model [81]. A separate pooled logistic regression is used to calculate inverse probability of censoring weights for informative censoring. The final weights are obtained by multiplying treatment and censoring weights, which are then adjusted for in the outcome model (e.g., weighted logistic or Cox regression) [81].Baseline, time-varying confounding and immortal time bias.19[14,20,21,22,25,26,27,30,31,36,38,42,46,47,48,51,52,53,58]4[61,64,65,68]
  • Allows estimation of unbiased estimates of treatment effects in longitudinal data, where confounders change over time because of treatment decisions [3].
  • Can be applied to both static and dynamic treatment regimes, making them ideal for comparing personalised treatment strategies or assessing the impact of interventions that depend on changing biomarkers or treatment decisions [3,62].
  • Provides robust estimates of treatment effects in longitudinal settings with time-varying treatments, making them useful for observational studies aiming to estimate the effect of treatment on long-term outcomes.
  • Can result in biased effect estimates if either outcome or weight models (e.g., propensity score models) are mis-specified [81].
  • Reliance on assumptions such as no unmeasured confounding, positivity (treatment must be possible for all individuals at all time points), and exchangeability (treatment assignment is independent of potential outcomes given confounders) [75]. Violations of these assumptions can compromise the validity of the results.
  • Requires the calculation of inverse probability weights for each individual at each time point, which can be computationally demanding.
  • Inter-individual correlation must be accounted for, as the same individual can contribute multiple observations. Additionally, to properly estimate the variability in treatment effects, methods such as bootstrapping and robust variance estimation are required [3,81].
Multiple imputation (MI)Multiple imputation (MI) handles missing data by generating multiple datasets, where missing values are replaced with imputed values drawn from their predicted distribution based on available data [82].Can be used to adjust for time-varying confounding by imputing missing confounder values, but is primarily used as a missing data technique. 1[55]1[60]
  • Utilises all available data instead of discarding individuals with missing values, thereby maximising sample size and statistical power [82].
  • Reduces the risk of bias compared to complete case analysis, which removes individuals with missing data [82].
  • Compatible with various data types, including longitudinal, ordinal, and categorical data.
  • Does not explicitly adjust for time-varying confounding in the same way as g-methods (e.g., IPTW or the parametric g-formula), which are able to model relationships between treatment, confounders and outcome. As a result, use of MI alone may not fully account for time-dependent confounding, potentially leading to biased effect estimates.
  • Assumes data is missing at random (MAR). If data is missing not at random (MNAR), results may be biased. This issue is particularly relevant when measuring biomarkers associated withadverse events, where missing data may be directly related to extreme or critical values [82].
  • Standard MI assumes independent observations. In the event of repeated measurements over time, an MI method that accounts for within-subject correlation should be used (e.g., standard fully conditional specification, joint multivariate normal imputation) [83].
  • Many MI methods assume normality, making imputation less reliable for skewed variables or extreme biomarker values; transformations (e.g. log transformation) are recommended [82,83].
  • Computationally intensive, especially when using large datasets or complex models.
Nonparametric Bayesian G-computationLike the parametric G-formula, nonparametric Bayesian G-computation models conditional probabilities from observed data. Using Markov chain Monte Carlo sampling (MCMC), it simulates time-varying covariates and outcomes, with the flexibility to employ models such as Bayesian additive regression trees (BART) and Hilbert S pace Gaussian Processes to capture complex, high-dimensional relationships between covariates, outcome and treatment [69,84].Baseline and time-varying confounding1[55]1[69]
  • Can model complex, non-linear relationships between covariates, outcomes and treatments without assuming a fixed functional form [69].
  • More flexible than the parametric G-formula, with no additional assumptions besides the no unmeasured confounding assumption [69].
  • Compatible with different types of outcome variables, including continuous, binary, categorical and time-to-event [84].
  • Allows prediction of counterfactual outcomes, allowing comparison of ‘what would have happened’ scenarios under different hypothetical interventions [69,84].
  • Applicable to joint interventions [84].
  • Computationally intensive due to requirement of MCMC and bootstrapping [84].
  • Results are harder to interpret compared to other methods such as standard regression, due to large number of possible hypothetical interventions compared [84].
  • Models may be incompatible with other parametric models, including MSMs, due to differences in model specification [84].
Exact matchingMatching treated individuals to controls with identical values for all covariates of interest [85].Baseline confounding only2[32,33]1[63]
  • Simple to understand and implement.
  • Less prone to bias from model misspecification [85].
  • Results are easier to interpret due to matched pairs being directly comparable.
  • Can lead to a significant reduction in sample size, as only individuals with exactly matching values across all covariates are retained [85].
  • Challenging to use in studies with many covariates or covariates with varying measurements, especially in biomarker-guided designs, where finding exact matches across biomarkers is difficult.
  • Results in a large proportion of data being excluded when exact matches are not found.
  • Low external validity, as exact matching on clinical factors may not be representative of real-world clinical practice.
* In the event of no studies (application or methods) reporting the method, the References column is defined as Not Applicable (NA). ** Time-dependent propensity score matching (time-dependent PSM).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Baldwin, F.D.; Khalaf, R.K.S.; Kolamunnage-Dona, R.; Jorgensen, A.L. Methodologies for the Emulation of Biomarker-Guided Trials Using Observational Data: A Systematic Review. J. Pers. Med. 2025, 15, 195. https://doi.org/10.3390/jpm15050195

AMA Style

Baldwin FD, Khalaf RKS, Kolamunnage-Dona R, Jorgensen AL. Methodologies for the Emulation of Biomarker-Guided Trials Using Observational Data: A Systematic Review. Journal of Personalized Medicine. 2025; 15(5):195. https://doi.org/10.3390/jpm15050195

Chicago/Turabian Style

Baldwin, Faye D., Rukun K. S. Khalaf, Ruwanthi Kolamunnage-Dona, and Andrea L. Jorgensen. 2025. "Methodologies for the Emulation of Biomarker-Guided Trials Using Observational Data: A Systematic Review" Journal of Personalized Medicine 15, no. 5: 195. https://doi.org/10.3390/jpm15050195

APA Style

Baldwin, F. D., Khalaf, R. K. S., Kolamunnage-Dona, R., & Jorgensen, A. L. (2025). Methodologies for the Emulation of Biomarker-Guided Trials Using Observational Data: A Systematic Review. Journal of Personalized Medicine, 15(5), 195. https://doi.org/10.3390/jpm15050195

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop