1. Introduction
The convergence of artificial intelligence (AI) and precision medicine is transforming drug therapy through scalable analysis of high-dimensional molecular data, with pharmacogenomics serving as a critical molecular foundation for predicting drug efficacy, toxicity, and individualized dosing. Early advances were driven by large-scale pharmacological screening resources such as the Genomics of Drug Sensitivity in Cancer (GDSC) and the Cancer Cell Line Encyclopedia (CCLE), which established foundational benchmarks for machine learning-based genotype-phenotype prediction [
1,
2,
3]. Since then, a range of advances, including population-scale resources such as the UK Biobank [
4], ancestrally diverse initiatives such as the NIH All of Us Research Program [
5], and genomic foundation models pre-trained across millions of DNA sequences [
6], have markedly expanded the scale, representativeness, and predictive capacity of available genomic training resources. These cumulative advances are accelerating the integration of AI-driven pharmacogenomic tools into increasingly diverse clinical settings [
7].
Yet a fundamental equity problem pervades this field: the genomic datasets that train these AI models are overwhelmingly derived from individuals of European ancestry. As of mid-2025, European participants account for approximately 87–88% of all GWAS participants, while all other ancestry groups remain markedly underrepresented, with continental African populations below 1% [
8,
9]. This representation imbalance remains substantial in contemporary GWAS datasets (
Figure 1). As a result, pharmacogenomic AI models often generalize poorly across ancestry groups; European-derived polygenic risk scores lose approximately 39–73% of predictive accuracy in African-ancestry cohorts, exemplifying clinically consequential algorithmic bias [
10].
Multi-omics integration, the combined analysis of genomic, transcriptomic, proteomic, and metabolomic data layers, has increasingly been explored as a strategy to mitigate limitations of single-omics pharmacogenomics. Integrating multiple molecular layers may more effectively capture aspects of drug-response biology that are not fully represented by any single modality alone [
11]. Each layer contributes distinct yet complementary molecular information: germline genomics provides relatively stable variant–drug response associations; transcriptomics captures dynamic regulatory states and disease-specific expression patterns; proteomics reflects functional protein abundance and post-translational modifications relevant to drug target engagement; and metabolomics profiles the downstream biochemical output of these upstream processes within the patient’s physiological context. Metabolomics, in particular, may offer a functionally broader molecular layer across heterogeneous clinical settings, as it reflects the downstream functional state of biological systems formed by both genetic and environmental influences. Unlike ancestry-stratified germline variants, it captures both genetic and non-genetic influences, including environmental, microbiome, and lifestyle factors. When incorporated into multi-omics AI frameworks, it may help reduce reliance on ancestry-specific genomic features, although further population-level validation remains needed.
This review examines ancestry-related bias in AI-driven pharmacogenomics and evaluates the emerging rationale for metabolomics-inclusive multi-omics integration as a strategy for improving cross-population generalizability. Although the primary focus is pharmacogenomics and pharmacometabolomics, selected examples from adjacent precision-medicine settings are included where they help illuminate whether metabolomics-informed prediction remains reproducible across heterogeneous cohorts, sites, or population groups. At the current stage of the literature, these examples are intended to be hypothesis-generating rather than definitive evidence that metabolomics-inclusive models reduce ancestry-related bias in clinical AI outputs. Relevant literature was identified through iterative searches of PubMed, Scopus, and Google Scholar, supplemented by citation tracking of relevant articles, using combinations of terms related to pharmacogenomics, pharmacometabolomics, metabolomics, multi-omics integration, artificial intelligence, machine learning, bias, fairness, ancestry, race, ethnicity, cross-population generalizability, and precision medicine. Studies were selected based on relevance to ancestry-related bias, cross-population or cross-site validation, metabolomics-informed prediction, and clinical or translational significance. Formal risk-of-bias assessment, study weighting, and quantitative meta-analysis were not performed.
4. Algorithmic Bias Auditing Framework
Even with improved biological representation through multi-omics integration, equitable clinical deployment cannot be assumed without explicit algorithmic auditing. Multi-omics models may still inherit bias from imbalanced training cohorts, site-specific practice patterns, or population differences in data quality and missingness, making subgroup-level evaluation essential [
29].
Model performance should therefore be assessed separately across ancestry groups, sex, age strata, and clinically relevant subpopulations rather than relying solely on aggregate metrics. In pharmacogenomics, even modest prediction errors may contribute to unsafe dosing or adverse drug events, particularly for therapies with narrow therapeutic windows such as warfarin [
30,
31]. Recent benchmarking studies further show that genomic prediction models developed predominantly in European-ancestry cohorts often experience measurable performance decline when transferred to African, South Asian, or admixed populations [
16].
In addition to discrimination, calibration across populations is also critical because models may systematically overestimate or underestimate risk in specific patient groups. Multicenter evaluations have demonstrated that models with strong internal performance may still exhibit calibration drift after external deployment, emphasizing the importance of recalibration and ongoing monitoring [
32]. At the same time, emerging multi-omics foundation models suggest that more stable cross-cohort generalization may be achievable when external validation is incorporated. For example, the SeNMo model maintained nearly identical survival prediction performance between internal and independent external oncology cohorts (C-index 0.760 vs. 0.758), supporting the feasibility of more transferable multi-omics systems [
33].
For high-stakes clinical decision-making, predictive systems should communicate not only classification outputs but also calibrated uncertainty, particularly for rare or underrepresented molecular profiles. Interpretability methods such as SHAP values and pathway-level feature attribution may help clinicians assess whether predictions are supported by biologically plausible signals, including CYP variants, inflammatory transcriptomic states, or metabolomic indicators of impaired drug clearance [
34,
35]. Recent multimodal biomedical AI studies have explored whether uncertainty-aware architectures may improve robustness to missing or heterogeneous omics modalities by dynamically weighting modality reliability during inference [
36]. Representative considerations for auditing equitable multi-omics pharmacogenomics AI systems are summarized in
Table 2.
5. Computational Strategies for Bias Mitigation in Multi-Omics AI
Several computational strategies may help reduce performance disparities in AI-driven precision medicine while more representative global multi-omics datasets are being developed. Rather than replacing the need for diverse data collection, these approaches aim to improve cross-population generalizability, preserve privacy, and mitigate structural imbalance in current training resources [
37,
38]. Transfer learning and domain adaptation are among the most direct approaches for improving model portability across populations. Models pretrained on large majority-population cohorts can be fine-tuned using smaller underrepresented datasets, allowing broadly shared biological patterns to be retained while adapting to population-specific variation. Domain adaptation methods further aim to align feature distributions between source and target populations, thereby reducing errors caused by ancestry-related dataset shift [
37]. These approaches may be especially relevant for multi-omics precision medicine, where sequencing platforms, metabolomic assays, and clinical workflows frequently differ across institutions [
39]. Recent population-aware frameworks such as PhyloFrame illustrate that explicitly modeling genetic structure can improve predictive equity across ancestry groups [
10].
Federated learning offers a complementary strategy by enabling collaborative model training across hospitals, biobanks, and countries without transferring raw patient-level data. This is particularly relevant in settings where privacy regulations, cost-intensive assays, and fragmented sample collections often limit the assembly of centralized datasets [
38]. By allowing geographically distributed institutions to contribute to shared model development, federated frameworks may accelerate the construction of more globally representative multi-omics AI systems. However, successful deployment requires methods that can accommodate heterogeneous data quality, non-identically distributed populations, and unequal cohort sizes across participating sites [
37,
40].
Generative and synthetic data approaches may also help address underrepresentation by augmenting scarce minority-population datasets with statistically realistic molecular profiles. In principle, conditional generative models may help model ancestry-associated genomic or metabolomic variation, thereby supporting training in low-resource settings [
41]. However, synthetic augmentation should be approached with caution: statistical realism does not guarantee biological validity or clinical fairness, and poorly validated synthetic data may amplify rather than reduce existing bias [
42]. Overall, these computational strategies are best understood as interim accelerators, not substitutes for prospective recruitment of ancestrally diverse cohorts, standardized multi-omics data generation, and direct real-world validation of equitable precision medicine systems [
29,
41].
7. Conclusions and Future Directions
The evidence synthesized in this review supports a reconceptualization of the equity challenge in AI-driven precision medicine. The problem is not solely one of representation in genomic discovery cohorts, but also one of feature choice: models built predominantly on ancestry-linked germline variants inherit the limitations of those variants, including reduced transferability across populations and limited ability to capture dynamic determinants of health. Across oncology, metabolic disease, infectious disease, nephrology, cardiovascular medicine, and pharmacometabolomics, recurrent host-response signals, including kynurenine pathway activation, TCA-cycle perturbation, and lipid remodeling, suggest that metabolomic profiles may capture biologic information that could be less constrained by genetic ancestry than germline variant features alone [
59,
62,
68]. This review proposes that incorporating such signatures as a complementary functional layer alongside genomic data may support the development of AI models with improved transferability in precision medicine. It should be noted, however, that reduced ancestry constraint in metabolomic features represents a necessary biological precondition for—rather than direct evidence of—reduced ancestry-related bias in AI model outputs; the latter requires prospective head-to-head demonstration of equitable predictive performance that remains largely absent from the current literature.
Several translational implications emerge from this framework. Because metabolomic profiles reflect the integrated effects of genetics, diet, microbiome ecology, inflammation, medication exposure, and environmental context [
77], they may help improve prediction in populations for whom ancestry-matched genomic reference resources remain limited. Pharmacometabolomic phenotyping further offers a functional approach to treatment individualization by directly capturing the metabolic consequences of drug action and CYP activity in the individual patient rather than relying exclusively on ancestry-dependent pharmacogenomic reference models. Beyond treatment individualization, these metabolomics-informed approaches may also help identify biologically relevant pathways and candidate biomarkers that could inform future drug-development efforts. More broadly, the ability of metabolomics to identify biologically distinct subgroups within shared clinical diagnoses may expand precision medicine into areas where genotype-based stratification alone has shown limited clinical utility. Importantly, these approaches are best viewed as complementary to genomics rather than replacements for existing multi-omics frameworks.
Several limitations should also be acknowledged. The conclusions drawn here are based on heterogeneous studies that differed substantially in cohort composition, analytical platforms, validation rigor, and study design, and direct head-to-head comparisons between genomics-only and metabolomics-inclusive fairness metrics, the standard required to demonstrate bias reduction rather than merely improved feature transferability, remain limited. Moreover, many metabolomics-inclusive AI systems remain at the discovery or early validation stage, and prospective evidence for equitable clinical implementation across diverse populations is still emerging. Future progress will require prospective multi-ancestry validation studies, harmonized metabolomics acquisition and preprocessing pipelines, and clinically deployable frameworks that support portable multi-omics prediction across heterogeneous healthcare settings. Ultimately, whether metabolomics-inclusive AI can meaningfully improve cross-population generalizability and support more equitable precision medicine will depend on rigorous real-world validation across the full diversity of patient populations these systems are intended to serve.