Next Article in Journal
Flexible Multi-Domain IoT Architecture for Smart Cities
Previous Article in Journal
Bridging the Semantic Gap in BIM Interior Design: A Neuro-Symbolic Framework for Explainable Scene Completion
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Symbiosis in Health: The Powerful Alliance of AI and Propensity Score Matching in Real World Medical Data Analysis

1
Community Healthcare Center Dr. Adolf Drolc Maribor, 2000 Maribor, Slovenia
2
Faculty of Electrical Engineering and Computer Sciences, University of Maribor, 2000 Maribor, Slovenia
3
Faculty ECM (European Center Maribor), Alma Mater Europaea University, 2000 Maribor, Slovenia
4
Department of Cardiology and Angiology, University Medical Center Maribor, 2000 Maribor, Slovenia
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(3), 1524; https://doi.org/10.3390/app16031524
Submission received: 1 December 2025 / Revised: 29 January 2026 / Accepted: 30 January 2026 / Published: 3 February 2026
(This article belongs to the Special Issue Health Informatics: Human Health and Health Care Services)

Abstract

The rapid expansion of real-world medical data is driving a transformative shift toward integrating artificial intelligence (AI) with propensity score matching (PSM) to enhance clinical research. While AI provides advanced capabilities in diagnostics and prediction, PSM serves as a critical statistical tool for mitigating confounding bias in quasi-experimental studies, thereby approximating the reliability of randomized controlled trials. This study utilized synthetic thematic analysis (STA) and bibliometric mapping via VOSviewer and Bibliometrix to analyze 433 documents retrieved from the Scopus database. The findings reveal an exponential growth in this field between 2020 and 2024, with the United States and China emerging as the primary contributors to global research output. Four central thematic clusters were identified: prediction, cancer management, diagnostics, and deep learning. The integration is bidirectional, characterized by AI algorithms optimizing propensity score estimation and PSM frameworks being used to enhance AI-driven models. This methodological convergence is significantly improving the rigour of observational studies, particularly in complex clinical domains such as cardiovascular disease and chronic illness management. Ultimately, the AI-PSM symbiosis represents a critical trend in medical informatics, refining the accuracy of predictive modelling and strengthening the evidentiary value of real-world data in global health research.

1. Introduction

The rapid expansion of real-world data and evidence [1,2,3] has driven the adoption of advanced technologies and methods in medicine, including artificial intelligence (AI) and propensity score matching (PSM). AI is applied in various medical fields, such as diagnostics, prediction, imaging and pattern recognition, risk assessment, robotics, education, treatment planning and many others [4,5,6,7,8,9,10,11,12,13,14]. In contrast, PSM is primarily a statistical methodology with a more specific purpose. It is widely used in quasi-experimental studies, such as retrospective analyses of medical claims, data from disease or product registries, digital health technologies, routine healthcare datasets, observational studies, and electronic medical records. PSM leverages individual patient covariates to balance potential confounding factors when comparing different patient groups. Adjusting for baseline disparities enables analyses to approximate the reliability of randomized, prospective studies, which are often considered the gold standard in research [15,16,17,18,19,20,21].
The use of PSM and AI in data analysis has garnered significant research interest, evolving from traditional methods with baseline use of PSM to more advanced enhanced AI techniques and combined hybrid approaches and integrations [22,23]. Researchers typically begin with conventional propensity score matching to construct a methodological baseline, as this established technique provides fundamental insights into the data structure before applying alternative approaches [24,25]. However, there is growing interest in using AI and machine learning methods to enhance propensity score estimation (utilizing neural networks and decision trees to automate variable selection), especially when dealing with high-dimensional data or complex relationships [26,27].
When thus combined, AI and PMS enhance medical research in a symbiotic way. First, PSM improves AI applications by increasing the reliability and efficiency of machine learning algorithms [28,29,30,31]. Second, integrating artificial intelligence with propensity score matching presents a promising avenue to overcome limitations of traditional PSM and broaden its utility in medical and healthcare research. AI can augment multiple stages of the PSM workflow—including data preprocessing (e.g., natural language, image, and signal processing [32], propensity score estimation, covariate selection, and post-matching analysis). By mitigating biases, refining causal inference, and fostering more robust and interpretable models [33], AI-enhanced PSM holds substantial potential for advancing methodological rigour in observational studies [34].
The above symbiotic association between AI and PSM has not yet been thoroughly and holistically investigated. To close this gap, synthetic thematic analysis (STA), derived from the synthetic knowledge synthesis [35,36], was used in this study to review for the first time the scope and content of the existing research literature. STA transforms literature reviews by merging quantitative bibliometrics with qualitative analysis. Its primary advantage is scalability, allowing researchers to synthesize thousands of papers that would overwhelm manual methods. By employing semi-automated mapping, it uncovers hidden thematic connections and emerging trends with high efficiency and with less resources. While classical reviews focus on aggregating the findings of a few high-quality studies to answer specific questions, STA maps the “big picture” of a research field. This approach reduces subjective bias through algorithmic clustering while providing a holistic “big picture” view of a research landscape. Through triangulation—linking descriptive data, visual maps, and content analysis—STA ensures findings are objective, reproducible, and more insightful for navigating the modern scientific knowledge explosion.
The objective of this study is to holistically analyze the symbiotic association between PSM and AI in the medical field. In that manner, STA was used for answering the main research question, namely how and in which context do AI and PSM complement each other? Furthermore, the main research question has been further decomposed into following more specific research questions:
  • What are the dynamic and spatial features of the research literature production of the AI and PSM use in medicine?
  • How is the symbiosis association between AI and PSM reflected in most prolific source titles and most productive countries?
  • What research themes emerge in studies combining AI and PSM for medical data analysis?
  • What are the more prolific AI methods, medical applications, and diagnoses in combined AI and PSM analyses?
  • What are the dominant research trends in the combined use of PSM and AI?

2. Methodology

Synthetic thematic analysis (STA) in this study was performed with the algorithm shown in Figure 1:
To obtain the necessary bibliometric data for extensive bibliometric study, Scopus was used due to its wider interdisciplinary coverage of reviewed research literature, inclusion of PubMed, advanced search options including AI, and the ability to export larger chunks of data for bibliometric analysis in one export run. For the bibliometric analysis, two main bibliometric tools were used, namely Bibliometrix version 5.1.1 and VosViewer version 1.6.20 [37,38]. VosViewer was used to create the bibliometric and thematic mapping based on the author keywords found in the bibliometric data, while the Bibliometric library was used to perform the remaining part of the bibliometric analysis, including base bibliometric information gathering, publication overview, citation analysis, countries production, Lotka’s Law, and Bradford’s Law. Bibliometrix was run and utilized using the Biblioshiny method, which allows for interactive web-based engagement.

3. Results

3.1. Dynamic and Spatial Features of the Research Literature Production

The search based on the provided search string in Scopus was performed end of September 2025. The search yielded 433 documents from 283 sources that were authored by 3858 authors. Of the 433 documents, 415 were articles, 1 book, 11 conference papers, 1 conference review, and 4 reviews. The results encompass documents from 2011 to 2025. The average citation was 10.73 per document, and the average document age was 2.02, indicating a relatively young and interesting field of study. The average number of co-authors per document is 21, and the international co-authorship is 18.94, indicating global interest and collaboration.
In Figure 2, there is a noticeable spike in the number of published papers over the last few years, indicating a heightened interest in the observed field from 2020 to 2024. There is a small dip in 2025. This is to be expected, as the results encompass data up to the end of September, which is still several months away from the end of the year. It is important to note that while the dip in the graph may suggest a larger disparity, the actual difference is just one paper. Given the remaining time for publication, the total number of publications is expected to far exceed those published in 2024. While the citation, as seen in Figure 3, declines over time, this is natural since citation is accumulated through time, and as the number of publications rises, the average number of citations decreases. However, there were three noticeable spikes in the average annual number of citations in the years 2011, 2014, and 2018. Those dates correspond to the top three most cited documents in the field [24,39,40].
Observing the most relevant authors through Lotka’s Law, there is a classical and clear power distribution with a very small number of authors present in the scientific space that contributed several papers. The top five contributors, Li C. (five articles), Liu M. (four articles), Wang Y. (four articles), Qu J. (four articles), and Zhang Y. (four articles), all started publishing in this particular field of study in 2021 or later. All of them also contributed in 2025. When looking at the authors’ local impact based on the h-index and additionally taking into account their relevance, Li C. is the most prominent contributor, tightly followed by the previously mentioned remaining four contributors. Fujian Medical University (54 articles), Guangxi Medical University (54 articles), The Second Hospital of Xian Jiaotong University (46 articles), Capital Medical University (45 articles), Tongji Medical College of Huazhong University of Science and Technology (39 articles). The production over time for the outlined affiliation is starting to grow from 2021 and is noticeably growing each year.

3.2. Productive Countries and Source Titles

Looking at Figure 4, it is clear that China and the United States of America are dominating in this field of study, with the former far outcompeting the other areas and the latter having a far more international presence in relation to cumulative production. Also noteworthy is that Germany has far more Multiple Country Publications (MCP) than Single Country Publications (SCP). Looking at the countries’ production over time, there was very little publication from 2011 until 2020. After 2020, the production in China and the United States of America increased exponentially. Looking at the citations based on the country, the United States of America and China again dominate the research field and are followed by the United Kingdom and Canada.
Table 1 shows that the rank of countries’ productivity in PSM-related research corresponds well with the rankings of countries in overall productivity across all disciplines, as well as in medicine, AI, medicine, and statistics and probability. That might indicate that systemic national factors—like consistent R&D funding, shared infrastructure, high-quality education, research capacity and supportive policies—uniformly drive research productivity across academic disciplines/topics shown in Table 1. It might also signal the co-operation in symbiotic multidisciplinary research across medicine, AI, and statistic focusing on PSM.
As seen in Figure 5, there are more than 30 core source titles publishing research on PSM and AI, according to Bradford’s law [42]. Among them, Frontiers in Oncology, Frontiers in Cardiovascular Medicine, BMC Infectious Diseases, Frontiers in Public Health, Journal of Clinical Medicine, and JMIR Medical Informatics published five or more papers, and there were 14 journals publishing more than three papers. When considering the local impact of the sources in relation to the h-index, the above journals are complemented by Cancers, Frontiers in Pharmacology, and BMC Public Health. The publication rate for the top five journals has increased steadily over time, with Frontiers in Oncology and JMIR Medical Informatics at the forefront since 2018.
Table 2 and Supplementary Table S1 show the most prolific core journals. Most of them are ranked in the first quarter of their respective research areas indicating the high quality of combined research on AI and PSM. Most of the top core journals are from the medical research areas with the exception of JMIR which is categorized in the health informatics category. However, taking into account the large number of medical journals, compared to the much smaller number of health informatics journals, this combination of medical and informatics research areas might indicate the symbiotic nature of AI and PSM research.
Table 2 enumerates the most prolific core journals contributing to research on artificial intelligence use in PSM. The majority of these journals are ranked into the first quartile within their respective subject categories, underscoring the high scholarly impact and methodological rigour characterizing this interdisciplinary domain. Predominantly, these journals are situated within medical research fields, with the notable exception of the Journal of Medical Internet Research (JMIR), which is classified under health informatics. Given the disproportionate representation of medical journals relative to the comparatively limited number of health informatics journals, this distribution might suggest a structural interdependence between medical and informatics research, highlighting the possible synergistic nature of AI and PSM scholarship, wherein Al methodologies and clinical applications converge to advanced and more innovative health care.

3.3. More Prolific Themes

The results of the STA are presented in Figure 6 and Table 3 and Table 4. Following Zipf’s Law [34], bibliometric mapping was applied to all keywords appearing in four or more publications, yielding a landscape of 45 author keywords grouped into four distinct clusters (Figure 1). Implementation of steps 2–6 in Figure 1 generated 4 thematic categories, 12 association sub-networks, and 28 high-impact publications (Table 4). A presentation of these 28 publications is provided in Table 4.
Table 5 demonstrates the widespread integration of propensity score matching with machine learning especially deep learning methods across prediction, diagnosis, and causal inference in healthcare. In cardiovascular and metabolic disease research, PSM is frequently combined with regression models, random forests, deep neural networks, and survival analysis to reduce confounding in observational data. These approaches enable accurate prediction of outcomes such as cardiovascular events, hypoglycaemia, mortality, and treatment-effects in high-risk populations, including patients with diabetes, atrial fibrillation, sepsis, stroke, and cancer. Several studies highlight individualized risk estimation and heterogeneous treatment effect modelling, supporting precision medicine and clinical decision-making.
In cancer management, large population-based datasets such as SEER are commonly used. Researchers apply PSM alongside Cox regression, decision trees, random forests, and explainable AI techniques to identify prognostic factors and evaluate treatment effects in breast, hepatocellular, and gastric cancers. These studies demonstrate how ML-enhanced analyses improve survival prediction, biomarker discovery, and treatment stratification while addressing selection bias inherent in retrospective datasets.
Natural language processing and clinical text mining represent a growing theme, particularly for outcome prediction, drug repurposing, and comparative effectiveness research. By combining PSM with NLP-derived features and transformer-based models (e.g., Clinical BERT), studies show improved predictive performance and fairness in real-world electronic health record analyses.
In diagnostic and public health contexts, PSM is used with causal forests, generalized additive models, and deep learning to study cardiovascular risk markers, infectious disease diagnosis, perioperative outcomes, and chronic kidney disease prediction. Methodological contributions further advance the field, including deep learning-based propensity score estimation, genetic matching optimized via Monte Carlo simulation, and novel algorithms addressing limitations of traditional PSM.
The deductive thematic analysis of author-supplied keywords (Table 5) initially demonstrated a lack of granularity, as a significant majority of publications utilized the broad descriptor “Machine Learning” To achieve a more nuanced understanding of the computational landscape, a secondary lexical analysis was performed on publication abstracts and author keywords.
This refined inquiry revealed a preference for ensemble methods and high-dimensional classifiers, specifically, Boosting, Support Vector Machines, Nearest Neighbours and Random Forests.
Notably, SHAP (SHapley Additive exPlanations) was identified in 25 documents, signifying a critical shift toward explainable AI (XAI) to mitigate “black box” limitations in clinical settings. The primary applied focus remains on prognosis, predictive modelling, and risk assessment, predominantly within the domains of oncology and chronic cardiovascular pathologies.

3.4. Prolific Term and Topic Trend Analysis

Figure 7 is a burst plot that shows the prevalence and duration of author keywords. Each row represents a specific author keyword, and the horizontal bars indicate the time span during which that keyword has been prolific. The blue nodes represent the peak popularity for specific keywords. As expected, machine learning is the most prominent keyword in the set, showing a massive burst around 2022. Artificial intelligence, deep learning, and prognosis all share a similar trajectory, peaking slightly later in 2023. Focusing on the keyword trajectories, there is a clear trend developing where AI, machine learning, and deep learning are being introduced and considered for application in PSM since 2022. The symbiosis seems to evolve first from association of PSM and machine learning, through introduction of AI and deep learning and finally of applications of both approaches in concrete medical applications (cancer and sepsis).
Figure 8 presents a bibliometric quadrant, typically used in science mapping and co-word analysis to visualize the conceptual evolution and structural importance of specific research themes within a given field. The visualization is structured along two axes, dividing the research landscape into four distinct quadrants based on thematic maturity and impact:
  • Vertical Axis (Y): Development Degree (Density). This measures the internal strength and cohesion of a theme (the strength of links within the cluster).
  • Horizontal Axis (X): Centrality. (While not explicitly labelled, the layout represents the theme’s external relevance or “relevance degree”). This measures the interaction between a theme and other research topics.
The diagram categorizes research theme into four domains. Motor themes represent highly developed themes, central to the research field. Nice themes also represent well-developed, but specialized themes isolated from other themes. Basic themes are foundational and transversal, widely researched but lacking deeper specialized development. Finally, emerging or declining themes represent poorly developed and marginal to the overall field. Based on the above, Figure 8 shows following distribution of PSM-AI themes:
  • Methodological Focus: There is a significant presence of PSM and AI computational methods in the “Motor” and “Basic” quadrants, specifically deep learning, machine learning, and propensity score matching.
  • Clinical Applications: High-impact themes (are heavily weighted toward chronic conditions and mortality, including Stroke, Diabetes Mellitus, Atrial Fibrillation, and Hepatocellular Carcinoma.
  • Thematic Transition: The cluster containing “artificial intelligence” appears in both the Emerging and Motor quadrants, suggesting a transition where generic AI research is emerging in specific sub-fields, while specialized applications (like AI in chronic kidney disease) have become “Motor” themes.
The presence of deep learning, causal inference, and artificial intelligence in relation to chronic kidney disease among motor themes suggests that these themes are no longer experimental but are integrated, mainstream tools for clinical research in this dataset. Emergence of themes like Lumbar Fusion, Spine Surgery, and Migrant Health Services in niche themes shows that these groups is mature and cohesive but does not frequently overlap with the more “central” themes of machine learning or chronic disease management seen in other quadrants. Interestingly, epidemiology and personalized medicine appear among emerging/declining themes, which might be suggesting that general epidemiological topics are being replaced by more specific clinical or computational themes, or that the specific datasets like NHIRD are becoming less central compared to newer data sources. Topics like machine learning, atrial fibrillation, and mortality are found among the foundational themes, meaning that they might provide a common research matter for the PSM-AI field, acting as a bridge between various specialized niche or motor themes. Symbiosis of AI and PSM in chronic diseases is further shown as both artificial intelligence/machine learning and PSM emerge in the same quadrants (Motor and Basic themes).

4. Discussion

This study systematically examined the evolving intersection between AI and PSM in medical research and identified a rapidly expanding methodological domain characterized by bidirectional methodological enhancement. Evidence from the thematic and bibliometric synthesis indicates that the combined use of AI and PSM has accelerated sharply since 2020, coinciding with the broader adoption of large-scale electronic health records, registry data, and digital health infrastructures. The findings suggest that the integration of AI into PSM workflows, and conversely the incorporation of PSM into AI-driven medical studies, is emerging as a critical methodological approach for improving causal inference, mitigating confounding bias, and advancing real-world digital medical evidence/big data analysis, aggregation and synthesis.
Deductive analysis showed that the current research landscape is characterized by the overwhelming dominance of machine learning (ML) as the foundational methodological framework. This prevalence suggests that while specialized subfields such as deep learning and Natural Language Processing are garnering significant attention, the core of PSM—AI symbiosis remains rooted in the analysis of structured, tabular data—likely derived from electronic health records. Furthermore, the emergence of explainable AI, specifically through the implementation of SHAP values, marks a critical pivot in the field. Although the current frequency of such studies is relatively low, their presence indicates an evolving demand for “white-box” transparency. This transition is essential for bridging the gap between computational research and “at-the-bedside” clinical application, where interpretability is a prerequisite for physician trust and informed decision-making. The focus of PSM–AI symbiosis appears to be shifting from retrospective automated detection toward prospective, risk-based medicine. The high frequency of studies regarding risk assessment and survival analysis underscores a broader movement toward precision medicine. Furthermore, the PSM–AI symbiosis is utilized not merely as a diagnostic tool, but as a prognostic approach capable of predicting disease trajectory and mortality. This shift is further evidenced by the integration of Decision Support Systems and nomograms, which provide visual, statistically grounded insights that augment, rather than replace human clinical judgement. Deductive analysis also reveals a significant concentration of PSM-AI implementations within cardiovascular health, which accounts across various heart-related pathologies. This dominance is likely attributable to the historical availability of high-fidelity, signal-based data—such as electrocardiograms. Beyond cardiology, the analysis exhibits a strong focus on chronic disease management, particularly regarding kidney disease and diabetes. These areas necessitate continuous monitoring and data-intensive risk stratification, making them ideal candidates for AI-driven PSM analysis. Additionally, the continued presence of COVID-19 in the literature serves as a testament to the pandemic’s role as a catalyst for rapid PSM—AI integration in infectious disease research.
A key observation of this knowledge synthesis study is that AI techniques are increasingly being adopted to augment the PSM pipeline. Across included studies, machine learning and deep learning algorithms were employed to support covariate selection, estimate propensity scores in high-dimensional settings, identify non-linear and interaction effects, and improve preprocessing of structured and unstructured clinical data. Natural language processing models, particularly domain BERT, ChatGPT, and Gemini adoptions, were used to extract clinically meaningful covariates from free-text clinical notes, thereby enhancing the completeness and quality of matching sets. The results demonstrate that AI-enhanced PSM is particularly valuable in complex observational datasets, where classical regression-based estimation techniques may be limited by covariate sparsity, non-linearity, or multidimensional feature relationships.
Importantly, the inverse symbiotic relationship—PSM used to strengthen AI applications—was also evident. In a substantial proportion of studies, PSM was applied prior to machine learning model training to reduce baseline imbalances, control confounding, and ensure that subsequent AI-based prediction models were trained on balanced and comparable cohorts. This sequencing reflects a methodological recognition that AI systems, when trained on imbalanced observational data, risk amplifying bias and generating unreliable or unfair predictions. PSM therefore functions as a foundational causal-inference layer that improves the validity, interpretability, and clinical acceptability of AI-based models. Notably, several studies coupled PSM with fairness-aware machine learning frameworks, suggesting an emerging alignment between causal inference methodology and ethical AI development.
Comparison with the broader literature confirms alignment with current methodological transitions in medical data science, including the adoption of target trial emulation, doubly robust learning methods, and explainable AI techniques. The increasing appearance of SHAP-based model interpretability approaches in the analyzed corpus reflects a growing acknowledgement that explainability is essential when AI is deployed in clinical decision-making environments. Furthermore, the results correspond with regulatory trends emphasizing transparency, fairness, and accountability in medical AI systems, as seen in evolving FDA and EMA guidance on machine learning-based medical technologies.
Several implications arise from these findings. First, the observed symbiosis between AI and PSM underscores the value of hybrid analytical pipelines for generating robust evidence from real-world clinical data. Such approaches may support precision medicine, enable reliable risk stratification, and enhance translational research by strengthening causal assumptions in observational studies. Second, the combined use of AI and PSM holds promise for hospitals and healthcare systems seeking to leverage routinely collected data for predictive modelling, outcome evaluation, and clinical decision support. Finally, the results highlight a methodological trajectory that can inform future clinical research training and the development of multidisciplinary analytical frameworks integrating biostatistics, machine learning, and clinical epidemiology.
From a practical point of view in combining AI and PSM effectively, researchers should perform the following:
  • Restrict input variables in the manner to manually exclude variables that predict treatment but not the outcome to prevent the AI from creating “perfect separation” that constricts matching.
  • Use AI algorithms that are mathematically tuned to achieve clinical similarity rather than just high accuracy.
  • Proactively discard patients with propensity scores at the extremes to ensure matching only the patients who truly had a clinical “choice” of receiving either treatment.
Despite substantial momentum, several gaps and challenges remain. The reviewed literature indicates limited methodological standardization; AI-enhanced PSM implementations vary considerably across studies, and benchmarking frameworks for assessing performance and bias reduction are not yet mature. Although causal machine learning techniques were identified, their use remained limited relative to classical PSM frameworks, signalling a need for broader adoption of causal representation learning, counterfactual inference, and heterogeneous treatment-effect modelling. Furthermore, dependency on cross-sectional or static datasets limits applicability to time-varying clinical processes. Dynamic PSM and online learning approaches capable of updating matches as new data accumulate remain under-developed. Ethical considerations—including privacy protection, bias mitigation, and responsible model governance—were acknowledged only sporadically, reflecting another key research frontier.
This study also has methodological limitations. Although Scopus provides broad multidisciplinary coverage, relevant studies indexed exclusively in other databases may have been excluded. Another limitation of this study involves the broad and often inconsistent labelling of computational methods within the analyzed literature, which are frequently grouped under broad labels like AI, ML, DL [75]. However, those labels might also denote concepts like “Aortic Iinsufficiency”, “Maximum Likelihood”, or “Decilitre” and can thus not be used in the search string, which might weaken specific conclusions regarding the evolution of distinct algorithmic trends within the field. Additionally, keyword-based bibliometric retrieval may not fully capture studies using alternative terminologies for matching or AI methods. Author keyword heterogeneity limited granular quantification of algorithm-specific trends, necessitating complementary abstract-level analysis. Additionally, synthetic thematic analysis provides interpretive depth but does not replicate the exhaustive appraisal of systematic reviews. Nonetheless, the triangulation of bibliometric mapping with thematic synthesis strengthens the credibility of the findings. Another limitation might be that AI-enhanced matching can create “perfect separation” between groups, destroying the overlap needed to find comparable patients and potentially pairing clinically different individuals who only share random statistical noise.
The future trajectory of AI integration into PSM use in clinical research must transcend simple predictive modelling to embrace the “causal revolution.” By prioritizing automated and scalable AI frameworks for propensity score (PS) estimation, the field can address the inherent limitations of traditional logistic regression, particularly when dealing with high-dimensional confounders and non-linear relationships. The integration of Causal Machine Learning (CausalML) systems represents a pivotal shift; these systems do not merely identify associations but aim to estimate heterogeneous treatment effects, thereby facilitating a more granular approach to individualized patient care. To achieve this, the development of shared benchmark datasets is paramount. These datasets will serve as a “gold standard” for evaluating hybrid AI-PSM pipelines, ensuring that new algorithmic developments are validated against consistent metrics of balance and bias reduction. Furthermore, the adoption of multi-modal clinical data—incorporating genomics, medical imaging, and real-world evidence from electronic health records—will necessitate the use of federated architectures. These decentralized frameworks allow for the training of robust models across multiple institutions without the need for direct data sharing, thereby preserving patient privacy while maximizing statistical power. As these methodologies mature, the focus must shift from in silico performance to prospective clinical validation. To achieve these standardized protocols for “white-box” interpretability to ensure that the logic behind causal claims is audible by clinical stakeholders should be established. Furthermore, algorithmic fairness: implementing rigorous checks to prevent the propagation of historical biases inherent in clinical datasets, ensuring that AI-PSM models perform equitably across diverse demographic strata must be achieved. Finally regulatory alignment: collaborating with bodies such as the FDA and EMA to create frameworks that accommodate the dynamic nature of AI-driven causal inference in post-market surveillance and drug efficacy trials should be realized.

5. Conclusions

In conclusion, this study reviews and synthetically analyses the current scientific literature in order to present the current state of integration and usage of AI in the domain of propensity score matching. This synthetic knowledge review provided several clear research themes, which demonstrate the bidirectional and complementary nature of AI in propensity score matching and integration in condition prognosis and diagnostics. The integration of AI in propensity score matching proved integral in the improvement of diagnosis accuracy and the understanding of reasoning and helped overcome the limitations that are intrinsic to propensity score measurements. On the other hand, critical shortcomings were outlined, for instance, the need for advanced techniques for flawless integration of AI into propensity score matching, the need for clarity and interpretability of AI models, robust solutions for dealing with confounding unmeasured factors, and the need for standardization of evaluation. The other major problem, which is still one of the greatest obstacles when it comes to AI usage in medicine, is the ethical concern. Future research will have to focus on those critical points, the integration of AI, adding robustness, integrating, and formulating ethical concepts that will enable AI to be safe and acceptable. The implementation and verification of those new approaches will have to be performed on different longitudinal medical datasets since the synergistic usage of AI in propensity score matching provides new avenues and progress in the field of precise medicine. The main goal must be to improve decision-making in healthcare and the optimization of patient outcomes.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app16031524/s1, Table S1: Most prolific Core Journals.

Author Contributions

Writing—review and editing, Writing—original draft, Supervision, Conceptualization: P.K., B.Ž. and T.Z., Data analysis, Methodology development, Visualization: P.K., H.B.V. and B.Ž.; Writing—review and editing, supervision, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Dang, A. Real-World Evidence: A Primer. Pharm. Med. 2023, 37, 25–36. [Google Scholar] [CrossRef]
  2. Li, Q.; Lin, J.; Chi, A.; Davies, S. Practical Considerations of Utilizing Propensity Score Methods in Clinical Development Using Real-World and Historical Data. Contemp. Clin. Trials 2020, 97, 106123. [Google Scholar] [CrossRef]
  3. Rivas, J.G.; Kraft, P.; Evans-Axelsson, S.; Hijazy, A.; Beyer, K.; De Meulder, B.; Liu, A.Q.; Golozar, A.; Harbachou, A.; Feng, Q.; et al. Real-World Evidence on Baseline Characteristics and Treatment in Metastatic Hormone-Sensitive Prostate Cancer: Findings from the PIONEER 2.0 Big Data Investigation Group. Eur. Urol. Open Sci. 2025, 81, 82–91. [Google Scholar] [CrossRef]
  4. Al-Antari, M.A. Artificial Intelligence for Medical Diagnostics—Existing and Future AI Technology. Diagnostics 2023, 13, 688. [Google Scholar] [CrossRef]
  5. Artificial Intelligence Meets Medical Robotics|Science. Available online: https://www.science.org/doi/full/10.1126/science.adj3312?casa_token=HoLADs-riL4AAAAA%3AlU3aQJbwQEQy0iPYzPU33NHeoF8CLJxIq8kJonOrHDAyKUZ1yYmEgCiA1wbPSyJFsiEKks2hnpeys2U (accessed on 13 December 2024).
  6. Bonkhoff, A.K.; Grefkes, C. Precision Medicine in Stroke: Towards Personalized Outcome Predictions Using Artificial Intelligence. Brain 2022, 145, 457–475. [Google Scholar] [CrossRef]
  7. Briganti, G.; Le Moine, O. Artificial Intelligence in Medicine: Today and Tomorrow. Front. Med. 2020, 7, 27. [Google Scholar] [CrossRef]
  8. Liao, J.; Li, X.; Gan, Y.; Han, S.; Rong, P.; Wang, W.; Li, W.; Zhou, L. Artificial Intelligence Assists Precision Medicine in Cancer Treatment. Front. Oncol. 2023, 12, 998222. [Google Scholar] [CrossRef]
  9. Muehlematter, U.J.; Daniore, P.; Vokinger, K.N. Approval of Artificial Intelligence and Machine Learning-Based Medical Devices in the USA and Europe (2015–2020): A Comparative Analysis. Lancet Digit. Health 2021, 3, e195–e203. [Google Scholar] [CrossRef]
  10. Shick, A.A.; Webber, C.M.; Kiarashi, N.; Weinberg, J.P.; Deoras, A.; Petrick, N.; Saha, A.; Diamond, M.C. Transparency of Artificial Intelligence/Machine Learning-Enabled Medical Devices. npj Digit. Med. 2024, 7, 21. [Google Scholar] [CrossRef]
  11. Tian, M.; Shen, Z.; Wu, X.; Wei, K.; Liu, Y. The Application of Artificial Intelligence in Medical Diagnostics: A New Frontier. Acad. J. Sci. Technol. 2023, 8, 57–61. [Google Scholar] [CrossRef]
  12. van de Sande, D.; Van Genderen, M.E.; Smit, J.M.; Huiskens, J.; Visser, J.J.; Veen, R.E.R.; van Unen, E.; BA, O.H.; Gommers, D.; van Bommel, J. Developing, Implementing and Governing Artificial Intelligence in Medicine: A Step-by-Step Approach to Prevent an Artificial Intelligence Winter. BMJ Health Care Inform. 2022, 29, e100495. [Google Scholar] [CrossRef]
  13. Lu, Y.; Jin, J.; Zhang, H.; Lu, Q.; Zhang, Y.; Liu, C.; Liang, Y.; Tian, S.; Zhao, Y.; Fan, H. Traumatic Brain Injury: Bridging Pathophysiological Insights and Precision Treatment Strategies. Neural Regen. Res. 2026, 21, 887–907. [Google Scholar] [CrossRef] [PubMed]
  14. Xiong, X.; Zheng, L.-W.; Ding, Y.; Chen, Y.-F.; Cai, Y.-W.; Wang, L.-P.; Huang, L.; Liu, C.-C.; Shao, Z.-M.; Yu, K.-D. Breast Cancer: Pathogenesis and Treatments. Signal Transduct. Target. Ther. 2025, 10, 49. [Google Scholar] [CrossRef]
  15. Katip, W.; Rayanakorn, A.; Oberdorfer, P.; Taruangsri, P.; Nampuan, T. Short versus Long Course of Colistin Treatment for Carbapenem-Resistant A. baumannii in Critically Ill Patients: A Propensity Score Matching Study. J. Infect. Public Health 2023, 16, 1249–1255. [Google Scholar] [CrossRef]
  16. Krenzien, F.; Schmelzle, M.; Pratschke, J.; Feldbrügge, L.; Liu, R.; Liu, Q.; Zhang, W.; Zhao, J.J.; Tan, H.-L.; Cipriani, F.; et al. Propensity Score-Matching Analysis Comparing Robotic Versus Laparoscopic Limited Liver Resections of the Posterosuperior Segments: An International Multicenter Study. Ann. Surg. 2024, 279, 297–305. [Google Scholar] [CrossRef]
  17. Langworthy, B.; Wu, Y.; Wang, M. An Overview of Propensity Score Matching Methods for Clustered Data. Stat. Methods Med. Res. 2023, 32, 641–655. [Google Scholar] [CrossRef] [PubMed]
  18. Meneguzzo, P.; Antoniades, A.; Garolla, A.; Tozzi, F.; Todisco, P. Predictors of Psychopathology Response in Atypical Anorexia Nervosa Following Inpatient Treatment: A Propensity Score Matching Study of Weight Suppression and Weight Loss Speed. Int. J. Eat. Disord. 2024, 57, 1002–1007. [Google Scholar] [CrossRef] [PubMed]
  19. Wang, S.V.; Schneeweiss, S.; Franklin, J.M.; Desai, R.J.; Feldman, W.; Garry, E.M.; Glynn, R.J.; Lin, K.J.; Paik, J.; Patorno, E.; et al. Emulation of Randomized Clinical Trials with Nonrandomized Database Analyses. JAMA 2023, 329, 1376–1385. [Google Scholar] [CrossRef] [PubMed]
  20. Zhu, P.; Liao, W.; Zhang, W.-G.; Chen, L.; Shu, C.; Zhang, Z.-W.; Huang, Z.-Y.; Chen, Y.-F.; Lau, W.Y.; Zhang, B.-X.; et al. A Prospective Study Using Propensity Score Matching to Compare Long-Term Survival Outcomes After Robotic-Assisted, Laparoscopic, or Open Liver Resection for Patients with BCLC Stage 0-A Hepatocellular Carcinoma. Ann. Surg. 2023, 277, e103–e111. [Google Scholar] [CrossRef]
  21. Jochum, F.; Dumas, É.; Gougis, P.; Hamy, A.-S.; Querleu, D.; Lecointre, L.; Gaillard, T.; Reyal, F.; Lecuru, F.; Laas, E.; et al. Survival Outcomes of Primary vs. Interval Cytoreductive Surgery for International Federation of Gynecology and Obstetrics Stage IV Ovarian Cancer: A Nationwide Population-Based Target Trial Emulation. Am. J. Obstet. Gynecol. 2025, 232, 194.e1–194.e11. [Google Scholar] [CrossRef]
  22. Yang, S.; Hussain, M.; Ammar Zahid, R.M.; Maqsood, U.S. The Role of Artificial Intelligence in Corporate Digital Strategies: Evidence from China. Kybernetes 2025, 54, 3062–3082. [Google Scholar] [CrossRef]
  23. Park, J.-B.; Bae, J.H. Effectiveness of a Novel Artificial Intelligence-Assisted Colonoscopy System for Adenoma Detection: A Prospective, Propensity Score-Matched, Non-Randomized Controlled Study in Korea. Clin. Endosc. 2025, 58, 112–120. [Google Scholar] [CrossRef]
  24. Benedetto, U.; Head, S.J.; Angelini, G.D.; Blackstone, E.H. Statistical Primer: Propensity Score Matching and Its Alternatives. Eur. J. Cardio-Thorac. Surg. 2018, 53, 1112–1117. [Google Scholar] [CrossRef]
  25. Kim, D.W. Statistical Methods for Baseline Adjustment and Cohort Analysis in Korean National Health Insurance Claims Data: A Review of PSM, IPTW, and Survival Analysis with Future Directions. J. Korean Med. Sci. 2025, 40, e110. [Google Scholar] [CrossRef]
  26. Ghimire, L.; Waller, E. The Future of Health Physics: Trends, Challenges, and Innovation. Health Phys. 2025, 128, 167–189. [Google Scholar] [CrossRef]
  27. Xiao, X.; Alharbi, K.; Zhang, P.; Qin, H.; Yue, X. Bayesian Federated Causal Inference and Its Application in Manufacturing. J. Intell. Manuf. 2025. [Google Scholar] [CrossRef]
  28. Hennecken, J. Predicting Subclinical Atrial Fibrillation Using Artificial Intelligence and Validate Using Propensity-Score Matching and Explainable AI. Master’s Thesis, Utrecht University, Utrecht, The Netherlands, 2024. Available online: https://studenttheses.uu.nl/handle/20.500.12932/47904 (accessed on 25 January 2026).
  29. Ishiyama, M.; Kudo, S.; Misawa, M.; Mori, Y.; Maeda, Y.; Ichimasa, K.; Kudo, T.; Hayashi, T.; Wakamura, K.; Miyachi, H.; et al. Impact of the Clinical Use of Artificial Intelligence–Assisted Neoplasia Detection for Colonoscopy: A Large-Scale Prospective, Propensity Score–Matched Study (with Video). Gastrointest. Endosc. 2022, 95, 155–163. [Google Scholar] [CrossRef] [PubMed]
  30. Kim, H.; Choi, J.S.; Kim, K.; Ko, E.S.; Ko, E.Y.; Han, B.-K. Effect of Artificial Intelligence–Based Computer-Aided Diagnosis on the Screening Outcomes of Digital Mammography: A Matched Cohort Study. Eur. Radiol. 2023, 33, 7186–7198. [Google Scholar] [CrossRef]
  31. Prosperi, M.; Ghosh, S.; Chen, Z.; Salemi, M.; Lyu, T.; Zhao, J.; Bian, J. Causal AI with Real World Data: Do Statins Protect from Alzheimer’s Disease Onset? In Proceedings of the 5th International Conference on Medical and Health Informatics, Kyoto, Japan, 14–16 May 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 296–303. [Google Scholar]
  32. Karim, M.E. Can Supervised Deep Learning Architecture Outperform Autoencoders in Building Propensity Score Models for Matching? BMC Med. Res. Methodol. 2024, 24, 167. [Google Scholar] [CrossRef] [PubMed]
  33. Lourenço, L.; Weber, L.; Garcia, L.; Ramos, V.; Souza, J. Machine Learning Algorithms to Estimate Propensity Scores in Health Policy Evaluation: A Scoping Review. Int. J. Environ. Res. Public Health 2024, 21, 1484. [Google Scholar] [CrossRef] [PubMed]
  34. Whata, A.; Chimedza, C. Evaluating Uses of Deep Learning Methods for Causal Inference. IEEE Access 2022, 10, 2813–2827. [Google Scholar] [CrossRef]
  35. Kokol, P.; Kokol, M.; Zagoranski, S. Machine Learning on Small Size Samples: A Synthetic Knowledge Synthesis. Sci. Prog. 2022, 105, 00368504211029777. [Google Scholar] [CrossRef]
  36. Kokol, P. Synthetic Knowledge Synthesis in Hospital Libraries. J. Hosp. Libr. 2024, 24, 10–17. [Google Scholar] [CrossRef]
  37. Van Eck, N.J.; Waltman, L. Software Survey: VOSviewer, a Computer Program for Bibliometric Mapping. Scientometrics 2010, 84, 523–538. [Google Scholar] [CrossRef]
  38. Aria, M.; Cuccurullo, C. Bibliometrix: An R-Tool for Comprehensive Science Mapping Analysis. J. Informetr. 2017, 11, 959–975. [Google Scholar] [CrossRef]
  39. Austin, P.C. Comparing Paired vs. Non-paired Statistical Methods of Analyses When Making Inferences About Absolute Risk Reductions in Propensity-Score Matched Samples. Stat. Med. 2011, 30, 1292–1301. [Google Scholar] [CrossRef] [PubMed]
  40. Austin, P.C.; Small, D.S. The Use of Bootstrapping When Using Propensity-Score Matching Without Replacement: A Simulation Study. Stat. Med. 2014, 33, 4306–4319. [Google Scholar] [CrossRef]
  41. Scimago Journal & Country Rank. Available online: https://www.scimagojr.com/ (accessed on 16 November 2025).
  42. Islam, N.; Islam, S.; Roy, P.B. A Bibliometric Technique for Analyzing Trends in Public Health Research. Data Sci. Inf. 2024, 4, 89–103. [Google Scholar] [CrossRef]
  43. Xie, Y.; Shen, H.; Xu, Q.; Tu, C.; Yang, R.; Liu, T.; Tang, H.; Miao, Z.; Zhang, J. Evaluating Coronary Arteries and Predicting MACEs Using CCTA in Lung Cancer Patients Receiving Chemotherapy or Chemoradiotherapy. Radiother. Oncol. 2024, 200, 110498. [Google Scholar] [CrossRef] [PubMed]
  44. Lim, J.; Choi, Y.-J.; Kim, B.S.; Rhee, T.-M.; Lee, H.-J.; Han, K.-D.; Park, J.-B.; Na, J.O.; Kim, Y.-J.; Lee, H.; et al. Comparative Cardiovascular Outcomes in Type 2 Diabetes Patients Taking Dapagliflozin Versus Empagliflozin: A Nationwide Population-Based Cohort Study. Cardiovasc. Diabetol. 2023, 22, 188. [Google Scholar] [CrossRef]
  45. Squiccimarro, E.; Lorusso, R.; Consiglio, A.; Labriola, C.; Haumann, R.G.; Piancone, F.; Speziale, G.; Whitlock, R.P.; Paparella, D. Impact of Inflammation After Cardiac Surgery on 30-Day Mortality and Machine Learning Risk Prediction. J. Cardiothorac. Vasc. Anesth. 2025, 39, 683–691. [Google Scholar] [CrossRef] [PubMed]
  46. Ngufor, C.; Zhang, N.; Van Houten, H.K.; Holmes, D.R.; Graff-Radford, J.; Alkhouli, M.; Friedman, P.A.; Noseworthy, P.A.; Yao, X. Causal Machine Learning for Left Atrial Appendage Occlusion in Patients with Atrial Fibrillation. JACC Clin. Electrophysiol. 2025, 11, 977–986. [Google Scholar] [CrossRef] [PubMed]
  47. Pettus, J.; Roussel, R.; Liz Zhou, F.; Bosnyak, Z.; Westerbacka, J.; Berria, R.; Jimenez, J.; Eliasson, B.; Hramiak, I.; Bailey, T.; et al. Rates of Hypoglycemia Predicted in Patients with Type 2 Diabetes on Insulin Glargine 300 U/ML Versus First- and Second-Generation Basal Insulin Analogs: The Real-World LIGHTNING Study. Diabetes Ther. 2019, 10, 617–633. [Google Scholar] [CrossRef]
  48. Kumar, S.; Gupta, M.P.; Dekker, A.L.; Bermejo, I.; Kar, S. Development and Validation of Multicenter Study on Novel Artificial Intelligence Based Cardiovascular Risk Score (AICVD). Res. Sq. 2021. [Google Scholar] [CrossRef]
  49. Wang, Z.; Zhang, L.; Chao, Y.; Xu, M.; Geng, X.; Hu, X. Development of a machine learning model for predicting 28-day mortality of septic patients with atrial fibrillation. Shock 2023, 59, 400–408. [Google Scholar] [CrossRef]
  50. Ruan, H.; Ran, X.; Li, S.; Zhang, Q. Dyslipidemia Versus Obesity as Predictors of Ischemic Stroke Prognosis: A Multi-Center Study in China. Lipids Health Dis. 2024, 23, 72. [Google Scholar] [CrossRef] [PubMed]
  51. Liang, H.; Pan, K.; Wang, J.; Lin, J. Association between Neutrophil Percentage-to-Albumin Ratio and Breast Cancer in Adult Women in the US: Findings from the NHANES. Front. Nutr. 2025, 12, 1533636. [Google Scholar] [CrossRef]
  52. Gao, Z.; Winhusen, T.J.; Gorenflo, M.; Ghitza, U.E.; Davis, P.B.; Kaelber, D.C.; Xu, R. Repurposing Ketamine to Treat Cocaine Use Disorder: Integration of Artificial Intelligence-Based Prediction, Expert Evaluation, Clinical Corroboration and Mechanism of Action Analyses. Addiction 2023, 118, 1307–1319. [Google Scholar] [CrossRef]
  53. Pundi, K.; Fan, J.; Kabadi, S.; Din, N.; Blomström-Lundqvist, C.; Camm, A.J.; Kowey, P.; Singh, J.P.; Rashkin, J.; Wieloch, M.; et al. Dronedarone Versus Sotalol in Antiarrhythmic Drug-Naive Veterans with Atrial Fibrillation. Circ. Arrhythmia Electrophysiol. 2023, 16, 456–467. [Google Scholar] [CrossRef]
  54. Qu, J.; Li, C.; Liu, M.; Wang, Y.; Feng, Z.; Li, J.; Wang, W.; Wu, F.; Zhang, S.; Zhao, X. Prognostic Models Using Machine Learning Algorithms and Treatment Outcomes of Occult Breast Cancer Patients. J. Clin. Med. 2023, 12, 3097. [Google Scholar] [CrossRef]
  55. Park, S.W.; Park, Y.-L.; Lee, E.-G.; Chae, H.; Park, P.; Choi, D.-W.; Choi, Y.H.; Hwang, J.; Ahn, S.; Kim, K.; et al. Mortality Prediction Modeling for Patients with Breast Cancer Based on Explainable Machine Learning. Cancers 2024, 16, 3799. [Google Scholar] [CrossRef]
  56. Hu, J.; Gong, N.; Li, D.; Deng, Y.; Chen, J.; Luo, D.; Zhou, W.; Xu, K. Identifying Hepatocellular Carcinoma Patients with Survival Benefits from Surgery Combined with Chemotherapy: Based on Machine Learning Model. World J. Surg. Oncol. 2022, 20, 377. [Google Scholar] [CrossRef] [PubMed]
  57. Huang, C.; Liu, Z.; Xiao, L.; Xia, Y.; Huang, J.; Luo, H.; Zong, Z.; Zhu, Z. Clinical Significance of Serum CA125, CA19-9, CA72-4, and Fibrinogen-to-Lymphocyte Ratio in Gastric Cancer with Peritoneal Dissemination. Front. Oncol. 2019, 9, 1159. [Google Scholar] [CrossRef]
  58. Xu, S.; Xiang, C.; Wu, J.; Teng, Y.; Wu, Z.; Wang, R.; Lu, B.; Zhan, Z.; Wu, H.; Zhang, J. Tongue Coating Bacteria as a Potential Stable Biomarker for Gastric Cancer Independent of Lifestyle. Dig. Dis. Sci. 2021, 66, 2964–2980. [Google Scholar] [CrossRef] [PubMed]
  59. Makhnevich, A.; Perrin, A.; Talukder, D.; Liu, Y.; Izard, S.; Chiuzan, C.; D’Angelo, S.; Affoo, R.; Rogus-Pulia, N.; Sinvani, L. Thick Liquids and Clinical Outcomes in Hospitalized Patients with Alzheimer Disease and Related Dementias and Dysphagia. JAMA Intern. Med. 2024, 184, 778–785. [Google Scholar] [CrossRef] [PubMed]
  60. Digumarthi, V.; Amin, T.; Kanu, S.; Mathew, J.; Edwards, B.; Peterson, L.A.; Lundy, M.E.; Hegarty, K.E. Preoperative Prediction Model for Risk of Readmission After Total Joint Replacement Surgery: A Random Forest Approach Leveraging NLP and Unfairness Mitigation for Improved Patient Care and Cost-Effectiveness. J. Orthop. Surg. Res. 2024, 19, 287. [Google Scholar] [CrossRef]
  61. Pimentel, S.D.; Yu, R. Re-Evaluating the Impact of Hormone Replacement Therapy on Heart Disease Using Match-Adaptive Randomization Inference. arXiv 2024, arXiv:2403.01330. [Google Scholar]
  62. Feller, D.J.; Zucker, J.; Yin, M.T.; Gordon, P.; Elhadad, N. Using Clinical Notes and Natural Language Processing for Automated HIV Risk Assessment. J. Acquir. Immune Defic. Syndr. 2018, 77, 160–166. [Google Scholar] [CrossRef]
  63. Zoccali, C.; Tripepi, G. Clinical Trial Emulation in Nephrology. J. Nephrol. 2024, 38, 11–23. [Google Scholar] [CrossRef]
  64. Patel, S.S.; Raman, V.K.; Zhang, S.; Sheriff, H.M.; Fonarow, G.C.; Heidenreich, P.A.; Faselis, C.; Lam, P.H.; Morgan, C.J.; Moore, H.; et al. Renin Angiotensin Inhibition and Lower Risk of Kidney Failure in Patients with Heart Failure. Am. J. Med. 2025, 138, 1384–1393.e5. [Google Scholar] [CrossRef]
  65. Inoue, K.; Seeman, T.E.; Horwich, T.; Budoff, M.J.; Watson, K.E. Heterogeneity in the Association between the Presence of Coronary Artery Calcium and Cardiovascular Events: A Machine-Learning Approach in the MESA Study. Circulation 2023, 147, 132–141. [Google Scholar] [CrossRef]
  66. Pietropaoli, D.; Monaco, A.; D’Aiuto, F.; Aguilera, E.M.; Ortu, E.; Giannoni, M.; Czesnikiewicz-Guzik, M.; Guzik, T.J.; Ferri, C.; Del Pinto, R.D. Active Gingival Inflammation Is Linked to Hypertension. J. Hypertens. 2020, 38, 2018–2027. [Google Scholar] [CrossRef] [PubMed]
  67. Fu, S.; Chen, L.; Lin, H.; Jiang, X.; Zhang, S.; Zhong, F.; Liu, D. Prediction Model for Delayed Behavior of Early Ambulation After Surgery for Varicose Veins of the Lower Extremity: A Prospective Case-Control Study. Arch. Phys. Med. Rehabil. 2024, 105, 1908–1920. [Google Scholar] [CrossRef]
  68. Krishnamurthy, S.; Kapeleshh, K.S.; Dovgan, E.; Luštrek, M.; Gradišek Piletič, B.; Srinivasan, K.; Li, Y.-C.; Gradišek, A.; Syed-Abdul, S. Machine Learning Prediction Models for Chronic Kidney Disease Using National Health Insurance Claim Data in Taiwan. Healthcare 2021, 9, 546. [Google Scholar] [CrossRef]
  69. Ghosh, S.; Bian, J.; Guo, Y.; Prosperi, M. Deep Propensity Network Using a Sparse Autoencoder for Estimation of Treatment Effects. J. Am. Med. Inform. Assoc. 2021, 28, 1197–1206. [Google Scholar] [CrossRef]
  70. Luo, Q.; Zheng, Z.; Luo, W.; Zhu, J. Development and External Validation of Interpretable Machine Learning Models for Personalized Multiple Treatment Recommendations in Non-Small Cell Lung Cancer. Int. J. Med. Inform. 2026, 206, 106160. [Google Scholar] [CrossRef]
  71. Weymann, D.; Chan, B.; Regier, D.A. Genetic Matching for Time-Dependent Treatments: A Longitudinal Extension and Simulation Study. BMC Med. Res. Methodol. 2023, 23, 181. [Google Scholar] [CrossRef]
  72. Cui, X.; Shi, Y.; He, X.; Zhang, M.; Zhang, H.; Yang, J.; Leng, Y. Abdominal Physical Examinations in Early Stages Benefit Critically Ill Patients without Primary Gastrointestinal Diseases: A Retrospective Cohort Study. Front. Med. 2024, 11, 1338061. [Google Scholar] [CrossRef] [PubMed]
  73. Chen, M.; Yang, J.; Lu, J.; Zhou, Z.; Huang, K.; Zhang, S.; Yuan, G.; Zhang, Q.; Li, Z. Ureteral Calculi Lithotripsy for Single Ureteral Calculi: Can DNN-Assisted Model Help Preoperatively Predict Risk Factors for Sepsis? Eur. Radiol. 2022, 32, 8540–8549. [Google Scholar] [CrossRef] [PubMed]
  74. Colaneri, M.; Fama, F.; Fassio, F.; Holmes, D.; Scaglione, G.; Mariani, C.; Galli, L.; Lai, A.; Antinori, S.; Gori, A.; et al. Impact of Early Antiviral Therapy on SARS-CoV-2 Clearance Time in High-Risk COVID-19 Subjects: A Propensity Score Matching Study. Int. J. Infect. Dis. 2024, 149, 107265. [Google Scholar] [CrossRef]
  75. Khan, S.; Ali, H.; Shah, Z. Identifying the Role of Vision Transformer for Skin Cancer—A Scoping Review. Front. Artif. Intell. 2023, 6, 1202990. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Procedure for STA [37].
Figure 1. Procedure for STA [37].
Applsci 16 01524 g001
Figure 2. Number of publications over time.
Figure 2. Number of publications over time.
Applsci 16 01524 g002
Figure 3. Average number of citations over time.
Figure 3. Average number of citations over time.
Applsci 16 01524 g003
Figure 4. Most productive corresponding author’s countries in the form of cumulative, international, and domestic publications.
Figure 4. Most productive corresponding author’s countries in the form of cumulative, international, and domestic publications.
Applsci 16 01524 g004
Figure 5. Presentation of core sources using Bradford’s Law.
Figure 5. Presentation of core sources using Bradford’s Law.
Applsci 16 01524 g005
Figure 6. The authors keyword network.
Figure 6. The authors keyword network.
Applsci 16 01524 g006
Figure 7. Mapping of trend topics based on author’s keywords.
Figure 7. Mapping of trend topics based on author’s keywords.
Applsci 16 01524 g007
Figure 8. Bibliometric quadrant: niche, emerging, motor, and basic themes.
Figure 8. Bibliometric quadrant: niche, emerging, motor, and basic themes.
Applsci 16 01524 g008
Table 1. Most productive countries research rankings according to Scimago [41].
Table 1. Most productive countries research rankings according to Scimago [41].
CountryRank All DisciplinesRank in MedicineRank in Artificial IntelligenceRank in Statistics and Probability
China2212
United states1121
South Korea13141216
Japan55410
Germany4464
France7775
Canada9898
Italy8685
India6937
Table 2. The bibliometric profile of journals publishing more than four or more papers.
Table 2. The bibliometric profile of journals publishing more than four or more papers.
Journal NameSNIPQuarterResearch Area
Frontiers in Oncology0.8312.Cancer Research, Oncology
Frontiers in Cardiovascular Medicine0.7422.Cardiology and Cardiovascular Medicine
BMC Infectious Diseases1.1061.Infectious Diseases
Frontiers in Public Health0.9382.Public Health, Environmental and Occupational Health
Journal of Clinical Medicine1.0221.Medicine (all)
JMIR Medical Informatics1.0352.Health Information Management
Health Informatics
Cancers1.0302.Cancer Research
Oncology
Frontiers in Pharmacology0.9991.Pharmacology
Pharmacology (medical)
BMC Public Health1.3861.Public Health, Environmental and Occupational Health
European Radiology1.7751.Radiology, Nuclear Medicine and Imaging
Frontiers in Endocrinology1.1222.Endocrinology, Diabetes and Metabolism
Frontiers in Medicine0.8791.Medicine (all)
Table 3. Theme, subthemes, and prolific publications.
Table 3. Theme, subthemes, and prolific publications.
Theme Prolific Author’s Keywords Association Sub-Networks Publications Describing AI Use in Combination with PSMPublications Describing PSM Use in AI
Prediction
 
Blue (14 author keywords)
Cardiovascular diseases—Diabetes mellitus
 
Atrial fibrillation—sepsis-prediction
[43,44,45]
 
 
[46]
[47,48]
 
 
[49,50]
Cancer management
 
Red (15 author keywords)
Breast cancer—SEER
 
Hepatocellular carcinoma—SEER—chemotherapy-survival
 
Gastric cancer—random forest
 
Natural language model, prediction modelling
[51]
 
 
 
 
 
 
[52,53]
[54,55]
 
[56]
 
 
[57,58]
 
[59,60]
Diagnosing
 
Green (14 author keywords)
Coronary heart diseases—diagnosis
 
Diagnosis—Intensive care unit—Public health
 
Chronic kidney disease—Electronic health record
[61]
 
 
[62,63]
 
 
[64]
[65,66]
 
 
[67]
 
 
[68]
Deep learning
 
Yellow (7 author keywords)
Casual inference—Big data—deep learning
 
Monte Carlo simulation
 
Computer tomography—deep learning
[69,70]
 
 
[71]
[72]
 
 
 
 
[73]
Table 4. The synthesis of publications concerned with identified themes.
Table 4. The synthesis of publications concerned with identified themes.
ThemeAuthors Keywords Association Sub-Networks The Synthesis of High Impact Publications Identified in Table 3
PredictionCardiovascular diseases—Diabetes mellitusBinomial regression models and random forest regression was performed on a dataset of high-risk COVID-19 subjects (inclusion criteria: age over 65 years old, presence of solid or haematological cancer, chronic kidney disease, chronic liver disease, chronic lung disease, uncontrolled diabetes, neurological disease, cardiovascular disease, obesity, cerebrovascular disease or being immunocompromised (AIDS, solid organ or blood stem cell transplantation, and all conditions requiring use of corticosteroids or other immunosuppressive medications)) after performing PSM based on being early treated or not [74].
 
In a cohort study cardiovascular outcomes between new and existing users of dapagliflozin and empagliflozin in type 2 diabetes patients were compared. Using a Korean cohort dataset, the authors employed a nearest-neighbours machine learning approach for propensity score matching prior to statistical analysis [44].
 
The LIGHTNING study modelled, predicted, and compared hypoglycaemia rates of people with type 2 diabetes, comparing patients using first- or second-generation insulin preparations. During analysis, authors first used conventional PSM and then advanced machine learning [47].
 
A large-scale Indian patient database was analyzed using the Spearman correlation coefficient method and deep learning to build a hazard model, which was used to predict CVD events and their time of occurrence that reportedly had a good performance rate. PSM was used to match patients with and without CVD [48].
 
The utility of coronary computed tomography angiography was investigated in detecting cancer treatment-related coronary artery impairments and predicting major adverse cardiovascular events in lung cancer patients undergoing chemotherapy or chemo-radiotherapy. Their methodology involved (1) AI-driven image recognition for initial assessment, (2) PSM mach patients with and without carcinoma, and (3) Cox regression modelling to evaluate differences survival rates [43].
 
Authors examined systemic inflammatory response syndrome impact on 30-day mortality post-cardiac surgery and developed predictive machine learning models. PSM was used to balance the training set [45].
Atrial fibrillation—sepsis-predictionA model to predict the risk of mortality in septic patients with atrial fibrillation was developed using different ML algorithms. They used PSM to reduce the imbalance between the external validation and internal validation data sets [49].
 
Five different ML algorithms were used to determine whether dyslipidaemia or obesity contributes more towards unfavourable clinical outcomes in patients suffering a first-ever ischemic stroke. PSM was employed to ascertain associations between indicators and prognosis [50].
 
PSM and a causal machine learning framework were applied to predict heterogeneous treatment effects of LAAO versus DOAC in atrial fibrillation patients, enabling AI-driven individualized benefit estimation for improved patient selection and clinical decision-making [46].
Cancer managementBreast cancer—SEER (Surveillance, Epidemiology, and End Results) databaseSEER database was used to identify the prognostic variables for patients with occult breast cancer, which is an uncommon malignant tumour for which the prognosis and treatment remain a controversial topic. Cox regression analysis was performed to construct prognostic models with the help of six machine learning algorithms to predict overall survival. The authors further examined the impact of chemotherapy and surgery on survival outcomes in occult breast cancer patients stratified by molecular subtype, utilizing Kaplan–Meier survival analysis and propensity score matching. These findings were subsequently validated through subgroup Cox regression analysis [54].
 
South Korean investigators used machine learning-based risk factor detection and breast cancer mortality prediction with the Shappley Additive Explanation, which is an explainable artificial intelligence technique, to identify and interpret key features that have a significant impact on breast cancer mortality. To enhance the robustness and generalizability of their primary findings and balance the baseline covariates, they employed an exposure-driven 1:3 PSM analysis while minimizing a logistic regression model with the implications of potential confounders [55].
 
A total of 18,726 participants were examined for assessing breast cancer prevalence and neutrophil-percentage-to-albumin ratio. Study revealed a significant positive association, potentially mediated by sex hormone levels, validated through advance multivariate, subgroup, and PSM [51].
Hepatocellular carcinoma—SEER (Surveillance, Epidemiology, and End Results) database—chemotherapy-survivalPatients diagnosed with hepatocellular carcinoma were identified in the Surveillance, Epidemiology, and End Results database. The authors first conducted univariate and multivariate logistic regression analyses to assess prognostic factors, then developed a 5-year survival risk prediction model using classical decision tree methodology. To address potential confounding variables related to chemotherapy administration, PSM was used for both high-risk and low-risk patient cohorts [56].
Gastric cancer—random forestsClinical data from 391 gastric cancer patients were analyzed using PSM. The authors subsequently performed both univariate and multivariate conditional logistic regression analyses. Their methodology further incorporated classification tree analysis to establish decision rules, followed by random forest algorithm implementation to extract significant risk factors for peritoneal dissemination in gastric cancer [57].
 
The association of the tongue coating microbiota with the serum metabolic features and inflammatory cytokines in gastric cancer patients was explored to identify potential non-invasive biomarker for diagnosing gastric cancer. The tongue coating microbiota was profiled by 16S rRNA and 18S rRNA genes sequencing technology in the original population with 181 patients and 112 healthy controls. The PSM was used to eliminate potential confounders, including age, gender, and six lifestyle factors, and a matching population was created. Random forest model was used for diagnosis classification [58].
Natural language model prediction modellingAn innovative integrated strategy was developed to identify FDA-approved drugs for repurposing in cocaine use disorder treatment. The study combined AI-driven drug prediction with clinical validation through the National Drug Abuse Treatment Clinical Trials Network, incorporating expert panel review and mechanistic action analysis. Based on combined AI prioritization and clinical expertise, ketamine emerged as the top candidate for further evaluation. The team conducted electronic health record analysis comparing outcomes in patients prescribed ketamine (for anaesthesia/depression) against PSM identified controls receiving alternative treatments [52].
 
PSM was used to balance the covariates across two groups of Alzheimer’s disease and related dementia patients with oropharyngeal during hospitalization, whether at least 75% of their hospital diet consisted of a thick liquid diet or a thin liquid diet. Machine learning was used to predict hospital outcomes such as mortality, length of stay, and complications [59].
 
Data from 38,581 shoulder and hip replacement patients was analyzed to develop a random forest model predicting 30-day post-discharge outcomes (emergency department visits, unplanned readmissions, or discharge to skilled nursing facilities). The study incorporated 98 features spanning laboratory results, diagnoses, vital signs, medications, and utilization history. Clinical BERT-finetuned NLP model was used to generate risk scores from clinical notes. To address potential biases, the methodology combined PSM with comprehensive feature bias analysis, implementing Fairlearn toolkit’s threshold optimization to mitigate gender and payer-related prediction disparities [75].
 
Natural language processing on clinical records of antiarrhythmic drug-naïve patients from the Veterans Health Administration database was used to identify and compare baseline left ventricular ejection fraction between treatments with different drugs. PSM was used on patient demographics, comorbidities, and medications, as well as Cox regression to compare treatments. A falsification analysis with non-plausible outcomes was performed to evaluate residual confounding [53].
DiagnosingCoronary heart diseases-diagnosis
A study examined whether coronary artery calcium predictive value across demographic subgroups in the Multi-Ethnic Study of cohort. After using PSM, the team employed causal forest modelling to (1) quantify heterogeneity in CAC-CVD associations, and (2) predict individualized 10-year CVD risk increases. These machine learning estimates were subsequently benchmarked against absolute 10-year risks calculated via 2013 ACC/AHA pooled cohort equations [65].
 
A recent study explored whether gingival bleeding—a simple clinical indicator of periodontal disease—might serve as a marker for hypertension. Given the established link between cardiovascular diseases and systemic inflammation, with periodontitis potentially exacerbating this inflammatory burden, researchers analyzed data from 5396 adults aged ≥30 years who completed both blood pressure assessments and periodontal exams. Using survey-based PSM that accounted for key confounders shared by hypertension and periodontal disease, authors created matched cohorts with and without gingival bleeding. The analysis employed generalized additive models adjusted for inflammatory markers to evaluate associations between bleeding gums and both systolic blood pressure and uncontrolled hypertension. Further stratification by periodontal status (healthy, gingivitis, stable periodontitis, unstable periodontitis) provided additional insights, while machine learning techniques helped determine variable importance in these relationships [66].
 
A new algorithm was developed to improve propensity score matching by correctly sampling treatment distributions while accounting for “Z-dependence.” Unlike fixed-pair designs, it addresses how post-treatment matching changes with each permutation, ensuring valid inference where traditional methods fail due to fluid matched sets [61].
DiagnosingDiagnosis—Intensive care unit- Public healthA study evaluated the added value of natural language processing for enhancing HIV diagnosis prediction models. Their study included 181 HIV-positive patients, along with 543 PSM selected HIV-negative controls. Authors extracted structured EHR data (demographics, laboratory results, diagnosis codes) and unstructured clinical notes from the pre-diagnosis period. Next, they developed three machine learning models: (1) a baseline model using only structured EHR data, (2) baseline plus NLP-derived topics, and (3) baseline plus NLP-extracted clinical keywords. Results demonstrated that incorporating NLP features significantly improved predictive accuracy for HIV risk assessment [62].
 
A review paper claims that trial emulation with PMS use in observational studies represents a significant advancement in epidemiology and can support improving public health outcomes. However, traditional PSM techniques face challenges like data quality, unmeasured confounding, and implementation complexity that could be overcome with machine learning techniques to address unmeasured confounding [63].
 
In a recent study, information from selected participants divided into a normal and delayed ambulation group before surgery was collected and followed up until the day after surgery. PSM was applied to all participants by type of surgery and anaesthesia. All the characteristics in the two groups were compared using logistic regression, back propagation neural network, and decision tree models [67].
Chronic kidney disease—Electronic health recordA machine learning model to predict incidents of chronic kidney disease (6–12 months before clinical onset was developed using Taiwan’s National Health Insurance claims data. The study employed PSM to select 18,000 CKD cases and 72,000 matched controls, analyzing two years of demographic, medication, and comorbidity history for each subject. Among various algorithms tested, convolutional neural networks demonstrated superior predictive performance [68].
 
In a PSM study of 168,860 veterans with heart failure phenotyped by AI, high-dose RAS inhibitors were associated with a lower 5-year risk of kidney failure compared to low doses. This benefit was primarily driven by patients with existing chronic kidney disease, regardless of ejection fraction, potentially informing future clinical guidelines [64].
Deep LearningCasual inference—deep learningA large-scale analysis of ICU patients without primary gastrointestinal diseases using the MIMIC-IV database was performed to evaluate the prognostic value of abdominal physical examinations (palpation and auscultation). Patients were stratified based on examination status, with 28-day mortality as the primary endpoint. The researchers employed multiple analytical approaches: Cox proportional hazards models, PSM, and inverse probability treatment weighting. Six machine learning algorithms—Random Forest, Gradient Boosting Decision Trees, AdaBoost, Extra Trees, Bagging, and Multilayer Perceptron—were subsequently implemented to develop predictive models for in-hospital mortality [72].
 
To reduce the underlying bias in observational studies, Ghosh et al. [71] developed a new deep learning architecture for propensity score matching and counterfactual prediction.
 
Machine learning enhanced propensity score estimation by improving covariate balance, reducing bias in observational studies, and enabling robust causal inference, thereby advancing methodological rigour in treatment effect in patients with non-small lung cancer, by analysis across complex, high-dimensional healthcare and social science datasets [70].
Monte Carlo simulationTo address the imitations of manual PSM by developing a machine learning enhanced genetic matching approach that automatically optimizes covariate history balancing. Through Monte Carlo simulation studies, the authors demonstrated superior performance of their automated method compared to traditional manual matching techniques [71].
Computer tomography—deep learningRadiomics and deep learning approaches were investigated for predicting sepsis risk following stone removal procedures in ureteral calculus patients. After using PSM, they developed (1) a radiomics model for sepsis prediction, and (2) an enhanced deep learning model to boost predictive accuracy. LASSO regression identified 26 key predictive variables. The deep neural network (DNN) implementation showed improved AUC in internal validation, with subsequent external validation confirming model generalizability by addressing overfitting concerns [73].
Table 5. Deductive thematic synthesis of AI methodologies and clinical domains.
Table 5. Deductive thematic synthesis of AI methodologies and clinical domains.
Concepts, AI Algorithms and Techniques nMedical ApplicationsnDiagnosesn
Machine Learning (Generic)66Risk Assessment14 Cardiovascular Diseases18
Artificial Intelligence (Generic)23Prognosis11 Heart Diseases12
Deep Learning11Prediction Model11 Kidney Diseases11
Random Forest6Survival Analysis11 Diabetes9
Decision Tree5Mortality10 Coronary Diseases7
NLP5Decision Support6Gastric Cancer6
Big Data3Health Services6Hepatocellular Carcinoma6
SHAP/Explainable AI2Nomogram5Atrial Fibrillation6
Feature Selection2Epidemiology5COVID-196
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kokol, P.; Žlahtič, B.; Blažun Vošner, H.; Završnik, J.; Završnik, T. Symbiosis in Health: The Powerful Alliance of AI and Propensity Score Matching in Real World Medical Data Analysis. Appl. Sci. 2026, 16, 1524. https://doi.org/10.3390/app16031524

AMA Style

Kokol P, Žlahtič B, Blažun Vošner H, Završnik J, Završnik T. Symbiosis in Health: The Powerful Alliance of AI and Propensity Score Matching in Real World Medical Data Analysis. Applied Sciences. 2026; 16(3):1524. https://doi.org/10.3390/app16031524

Chicago/Turabian Style

Kokol, Peter, Bojan Žlahtič, Helena Blažun Vošner, Jernej Završnik, and Tadej Završnik. 2026. "Symbiosis in Health: The Powerful Alliance of AI and Propensity Score Matching in Real World Medical Data Analysis" Applied Sciences 16, no. 3: 1524. https://doi.org/10.3390/app16031524

APA Style

Kokol, P., Žlahtič, B., Blažun Vošner, H., Završnik, J., & Završnik, T. (2026). Symbiosis in Health: The Powerful Alliance of AI and Propensity Score Matching in Real World Medical Data Analysis. Applied Sciences, 16(3), 1524. https://doi.org/10.3390/app16031524

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop