You are currently viewing a new version of our website. To view the old version click .
Life
  • Review
  • Open Access

14 January 2025

Transforming Cardiovascular Risk Prediction: A Review of Machine Learning and Artificial Intelligence Innovations

and
Department of Nutrition and Dietetics, School of Physical Education, Sports and Dietetics, University of Thessaly, 42132 Trikala, Greece
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Advancements in Cardiovascular Epidemiology: Integrating Predictive Modeling into Public Health

Abstract

Cardiovascular diseases (CVDs) remain a leading cause of global mortality and morbidity. Traditional risk prediction models, while foundational, often fail to capture the multifaceted nature of risk factors or leverage the expanding pool of healthcare data. Machine learning (ML) and artificial intelligence (AI) approaches represent a paradigm shift in risk prediction, offering dynamic, scalable solutions that integrate diverse data types. This review examines advancements in AI/ML for CVD risk prediction, analyzing their strengths, limitations, and the challenges associated with their clinical integration. Recommendations for standardization, validation, and future research directions are provided to unlock the potential of these technologies in transforming precision cardiovascular medicine.

1. Introduction

Cardiovascular diseases (CVDs) continue to represent a major global health challenge, contributing significantly to morbidity and mortality worldwide, despite advances in prevention and early disease intervention [1]. Traditional risk prediction models, like the Framingham Risk Score (FRS) [2] and the Systematic Coronary Risk Evaluation (SCORE) model [3], have long been used to estimate an individual’s risk of developing CVD. These models rely heavily on classical statistical regression approaches to assess common risk factors, like age, sex, blood pressure, and lipid profiles. However, such methods, while effective to an extent, come with limitations. They tend to oversimplify complex, nonlinear relationships between risk factors and outcomes and may lack the flexibility to incorporate evolving patient data and more advanced biomarkers. In addition, these models are often region-specific, leading to challenges in generalizability across diverse populations. Moreover, traditional models are prone to underestimating risk in younger individuals [4] or certain demographic groups, such as women [5] and racial minorities [6], due to a lack of personalized or population-specific predictors.
In recent years, the increasing complexity of healthcare data coupled with the rapid digitalization of medical records have exposed the inadequacies of traditional CVD risk models. These models often rely on static and linear relationships between a limited number of risk factors, making them insufficient for capturing the intricate and multifactorial nature of CVDs [7]. To address these challenges, machine learning (ML) and artificial intelligence (AI) have emerged as promising tools in risk prediction. More precisely, ML, with its ability to process large, multidimensional datasets and uncover non-obvious patterns, offers a dynamic and scalable alternative to conventional statistical methods [8].
A key advantage of ML is its capacity to manage diverse and complex data types—ranging from electronic health records (EHRs) to imaging, genetic, and lifestyle information—which allows for more holistic and individualized risk assessments. In particular, deep learning (DL), a specialized subset of ML, employs neural networks that simulate the human brain’s structure and function. These networks are designed to analyze and learn from complex medical data in layers, making it possible to detect subtle, nonlinear relationships between variables [9]. This ability to extract features automatically from raw data facilitates personalized and more accurate risk assessments, particularly when applied to high-dimensional data such as medical imaging, genomics, and proteomics [10]. Moreover, natural language processing (NLP), another branch of AI, can mine unstructured data from medical records, such as physician notes, clinical reports, and patient histories, to extract meaningful insights and further enhance the predictive power of ML models [11]. By integrating these different forms of structured and unstructured data, AI models can move beyond traditional risk factors to include novel predictors, such as genetic polymorphisms [12], proteomic signatures [10], and real-time physiological monitoring data from wearable devices [13].
These advancements are transforming the landscape of cardiovascular risk prediction. For example, AI models have shown promise in integrating imaging data, such as those from coronary computed tomography angiography (CCTA) [14] and cardiac magnetic resonance (CMR) imaging, to identify early signs of subclinical atherosclerosis and plaque vulnerability, providing more precise predictions of future cardiovascular events. Additionally, incorporating genetic information through AI models allows for the identification of polygenic risk scores that can stratify individuals at high genetic risk for CVD [15]. Biomarker profiles, such as levels of inflammatory markers or cardiac-specific proteins like troponins, can also be integrated to improve prediction accuracy [16]. In this evolving context, ML-based CVD risk prediction models are not only improving risk stratification but are also opening the door for real-time, continuous monitoring and prediction [17,18].
As the field moves toward AI-driven models, the potential to improve CVD risk prediction and prevention is immense. The present review article examines the application of ML and AI in CVD risk prediction, emphasizing their potential to transform precision medicine. The present review analyses diverse AI techniques, such as deep learning, ensemble methods, and natural language processing, highlighting their capabilities to integrate and analyze multidimensional datasets, including genomics, proteomics, imaging, and wearable device data. A literature review was conducted using PubMed, with a focus on studies published between 6 April 2019 and 9 August 2024. Search terms included “cardiovascular disease”, “machine learning”, and “risk prediction”, with English-language publications prioritized. The included studies span various methodologies and populations, providing an overview of the strengths, limitations, and clinical implications of AI-driven CVD risk prediction.

2. Theoretical Background of Machine Learning and Artificial Intelligence

ML and AI have revolutionized various domains, including healthcare, by enabling the analysis of complex datasets to uncover patterns and make predictions. ML, a subset of AI, involves the use of algorithms that learn from data to improve their performance without explicit programming. It is broadly categorized into supervised learning (using labeled data), unsupervised learning (detecting patterns in unlabeled data), and reinforcement learning (optimizing decisions through trial and error). The application of AI and ML in healthcare began to gain significant attention in the scientific literature around the early 2000s, with a notable increase in publications from the mid-2010s. This surge coincided with advancements in computational power, the proliferation of big data in healthcare, and the development of more sophisticated algorithms such as deep learning. In cardiovascular medicine, interest in AI and ML has grown exponentially over the last decade, particularly as studies have demonstrated their ability to enhance risk prediction, personalize treatment strategies, and analyze complex datasets, such as medical imaging and genomic data. In the context of cardiovascular medicine, ML models, like random forests, support vector machines, and neural networks, are used for predictive analytics, while advanced approaches such as deep learning, particularly convolutional and recurrent neural networks, facilitate the analysis of high-dimensional data, like medical imaging and genomics. Techniques such as ensemble learning combine multiple models to enhance prediction accuracy and robustness. AI complements ML through methodologies like natural language processing (NLP) and computer vision. NLP enables the extraction of insights from unstructured text data (e.g., medical notes), while computer vision analyzes imaging data for diagnostic and predictive purposes. These capabilities make AI integral to modern healthcare, allowing for holistic, patient-specific insights. Despite their transformative potential, ML and AI face challenges, including the need for high-quality, diverse datasets to prevent model bias and the demand for interpretability to foster trust among clinicians. Addressing these challenges is crucial to ensure equitable and reliable applications in clinical settings.

3. Advancements in AI/ML for CVD Risk Prediction

3.1. Historical Context and Evolution

Traditional CVD risk models laid the groundwork for systematic risk assessment but are constrained by their reliance on linear relationships and limited data types. For instance, models such as the Framingham Risk Score (FRS) primarily focus on static variables, like age, sex, and blood pressure, neglecting the dynamic and multifactorial nature of cardiovascular risks. However, the evolution of computational tools has introduced models capable of handling these complexities. The majority of studies reviewed here were published in the last five years, reflecting the rapid advancements in this field. Notably, 30.8% of the studies emerged from the United States and Europe, with a trend toward larger sample sizes (46.2% of studies included over 10,000 participants). The increasing diversity of study settings—spanning community-based and institutional frameworks—has also broadened the applicability of AI in various healthcare environments. Ensemble learning methods, such as random forest and gradient boosting, have further advanced predictive modeling by combining multiple weak learners to enhance accuracy and robustness. Deep neural networks, including convolutional and recurrent architectures, have enabled the analysis of complex datasets, such as imaging and longitudinal records, offering insights into both static and temporal patterns of cardiovascular risk. Studies such as Lauber et al. (2022) [19] exemplify the use of advanced lipidomic profiling, employing Ridge regression to achieve significant predictive improvements. By integrating 184 lipid species, the study not only outperformed traditional models but also provided insights into early pathological changes through biomarkers like triacylglycerides and diacylglycerides. Hoogeveen et al. (2020) [20] further demonstrated the application of proteomic data, where biomarkers like GDF-15 and IL-6 revealed inflammatory and metabolic pathways crucial for risk assessment. These advancements mark a shift from reliance on clinical factors alone to a more comprehensive inclusion of molecular data.

3.2. Innovations in Predictive Modeling

AI and ML techniques have revolutionized CVD risk prediction by facilitating the integration of diverse data modalities. Advanced lipidomic and proteomic profiling, for instance, has been incorporated into ML models, significantly enhancing predictive accuracy. Studies utilizing Ridge regression demonstrated substantial improvements when integrating these biomarkers. Wu et al. (2024) [21] showcased how lipidomic risk scores (LRS) outperformed traditional models like the FRS, achieving an AUC increase from 0.545 to 0.659, with robust external validation. Proteomic data have also been leveraged to capture inflammatory and metabolic pathways relevant to CVD, as demonstrated by Hoogeveen et al. (2020) [20]. By incorporating biomarkers like GDF-15 and IL-6, the study achieved an AUC of 0.754, underscoring the role of advanced molecular profiling in improving short-term risk predictions. Additionally, the integration of polygenic risk scores, as highlighted in studies like Lauber et al. (2022) [19], has further enriched the predictive power of AI models, showcasing their potential to enhance early disease detection. Temporal modeling has also emerged as a critical innovation. Models such as Dynamic-DeepHit employ attention mechanisms to capture longitudinal data, ensuring that predictions remain relevant even as patient conditions evolve. Similarly, sex-specific models have leveraged predictors like systolic blood pressure variability and BMI to address unique risk profiles. These advancements ensure nuanced predictions, especially in high-risk groups often overlooked by conventional methods. Moreover, the incorporation of non-traditional predictors, such as social and environmental factors, has expanded the scope of risk assessment. Factors like socioeconomic status and environmental exposures are increasingly recognized as critical determinants of health. Atehortúa et al. (2023) [22] demonstrated the utility of exposome-based ML models, which integrated these variables to outperform traditional methods in predicting CVD risk across diverse populations. Dong et al. (2024) [23] further emphasized the importance of tailored models, achieving robust calibration and improved discrimination metrics in sex-specific analyses for Chinese populations with type 2 diabetes mellitus.

3.3. Application in Diverse Populations

Addressing disparities in healthcare outcomes remains a pressing challenge, and AI models hold promise for mitigating these inequities. Studies have shown that ML algorithms tailored to population-specific characteristics significantly improve predictive performance. For instance, sex-specific ML models, like those developed by Dong et al. (2024) [23], achieved superior discrimination metrics in predicting 10-year CVD risk in women compared to men. This highlights the ability of AI to fill gaps left by traditional models, which often fail to consider gender-specific variations. Similarly, models trained on datasets including racially and ethnically diverse populations have identified unique risk factors, such as genetic polymorphisms and culturally specific health behaviors. However, 69.2% of studies reviewed were conducted in institutional frameworks, emphasizing the need for community-based research to capture broader, real-world data. The work by An et al. (2021) [24] on the DeepRisk model exemplifies the integration of heterogeneous clinical, diagnostic, and demographic data to achieve superior predictive performance (AUC of 0.8375). This attention-based deep learning framework provided interpretable insights, identifying diabetes, hypertension, and hyperlipidemia as key predictors. Such models ensure that predictions remain adaptable to evolving clinical and demographic contexts, providing a template for addressing disparities in CVD outcomes. Efforts to address data variability, such as the standardization of feature selection and validation protocols, are essential for ensuring equitable model performance. Collaborations across institutions and geographical regions are crucial to developing large, representative datasets. Furthermore, leveraging advanced architectures like attention-based models ensures that predictions remain adaptable to the evolving clinical and demographic landscapes of diverse populations.
Table 1, Table 2 and Table 3 provide a small overview of the most recently developed AI and ML methodologies for CVD risk prediction.
Table 1. Comparison of ML, DL, and traditional models for CVD risk prediction: a summary of key studies and performance metrics.
Table 2. Overview of predictors, validation methods, and endpoints in ML, DL, and traditional models for CVD risk prediction.
Table 3. Characteristics of study populations, regions, and study designs in CVD risk prediction research.

4. AI and ML in Cardiovascular Risk Prediction: A Paradigm Shift in Precision Medicine

The emergence of ML and AI approaches in CVD risk prediction represents a paradigm shift in how risk is assessed and managed, marking a transition from traditional linear modelling frameworks to data-driven, multidimensional, and highly dynamic methodologies [32,33]. The majority of the studies reviewed herein consistently demonstrate that ML-based models outperform traditional methods in predictive accuracy, as evidenced by higher AUC values, improved calibration, and better NRIs. These metrics reflect the ability of ML models to offer more nuanced and individualized risk assessments, particularly for subpopulations where traditional models often falter, such as intermediate-risk groups and underserved populations [6]. For instance, models such as Dynamic-DeepHit and DeepRisk leverage longitudinal data and attention mechanisms to capture complex temporal patterns, which static, traditional models are unable to address. By doing so, these approaches account for the dynamic nature of patient risk profiles over time, providing real-time, adaptive predictions that align with the evolving clinical context. Similarly, models incorporating lipidomic and proteomic data extend the predictive scope by offering insights into the molecular mechanisms underlying CVD [34]. These approaches enhance our understanding of pathophysiological processes, such as inflammation, lipid metabolism, and vascular remodeling, providing unique predictive value beyond established clinical and genetic risk factors [10]. For example, lipidomic risk scores have shown significant potential in identifying high-risk individuals, even in cohorts where traditional models, such as the FRS, underperform [35]. The integration of these novel biomarkers underscores the role of ML and AI in bridging the gap between molecular research and clinical application [36].

5. Challenges at Development, Validation, and Impact Stages

Despite these advancements, several challenges persist across the translational pathway, encompassing the development, validation, and clinical impact of ML models [37].

5.1. Development Stage

Current research on ML applications in CVD risk prediction has been constrained by inadequate sample sizes, geographic and racial disparities, and poor data quality. In many cases, the selection of variables for ML algorithms is performed arbitrarily, raising concerns about model development rigor and interpretability. Ensuring the generalizability of data and reproducibility of methods is critical in epidemiological studies to facilitate clinical application, yet these issues remain inadequately addressed in most studies to date [38]. The consequence is a significant risk of bias, undermining the reliability of findings and their broader applicability. Moreover, the absence of standardized approaches for integrating heterogeneous data, such as imaging, omics, and clinical data, further complicates model development.

5.2. Validation Stage

The robustness of ML models relies heavily on internal and external validation. While cross-validation can mitigate some imbalances within datasets, it often fails to capture the full range of real-world clinical scenarios [39]. External validation is pivotal to ensure that ML models perform consistently across diverse populations, yet it remains notably absent in many studies [40]. A frequent challenge is model shrinkage, where predictive performance declines sharply when models are applied to independent external datasets. This limitation poses a significant barrier to clinical adoption, as models that lack validation cannot be confidently deployed in real-world settings.

5.3. Impact Stage

The assessment of clinical impact, including effectiveness and cost-effectiveness, remains underexplored. There is a significant gap in evaluating the clinical utility of ML models, particularly in terms of accessibility, interpretability, and outcomes relevant to healthcare practitioners [41]. Moreover, the lack of standardized reporting frameworks further hinders the translation of ML findings into actionable tools [42]. Addressing these gaps requires large-scale studies with access to comprehensive databases, covering various CVD subtypes and employing standardized validation methodologies.

5.4. Interpretability and Transparency

Interpretability is a central challenge for the adoption of ML models in clinical practice. While techniques such as SHapley Additive exPlanations (SHAP) and attention mechanisms have improved transparency, many black-box models remain difficult for clinicians to understand and trust [43]. This lack of transparency raises ethical concerns, as predictions that cannot be explained are harder to justify, particularly in cases with critical clinical or legal implications. The adoption of interpretable frameworks and rigorous reporting standards, such as the utilization of the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) [44] and the MINimum Information for Medical AI Reporting (MINIMAR) [45], is essential for addressing inconsistencies in data coding and fostering trust in ML systems.

5.5. Integration of Diverse Data and Standardization

To enhance the predictive capabilities of ML models, future research should focus on integrating diverse data types, including polygenic risk scores and lipidomic, proteomic, transcriptomic, and metabolomic datasets. Combining these data with phenotypic features, such as coronary artery imaging, will facilitate a more comprehensive understanding of individual CVD phenotypes. Additionally, challenges related to incomplete and inconsistent coding in medical databases, such as those involving ICD terminologies, must be addressed through standardized approaches for data normalization, employing, for example, the Unified Medical Language System (UMLS) [46]. Standardized methodologies will ensure that ML models are robust, reproducible, and clinically relevant. One of the key challenges in the application of ML in CVD risk prediction is the lack of transparency in algorithmic methodologies and the reporting of features used in the studies. As seen in Table 1 and Table 2, many studies do not provide detailed descriptions of the ML algorithms employed or the specific features selected for model development. This lack of transparency poses significant barriers to the reproducibility and validation of these models, limiting their clinical and research utility. To address these challenges, standardized reporting frameworks are essential. Initiatives such as TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) and MINIMAR (MINimum Information for Medical AI Reporting) provide clear guidelines for reporting prediction model studies, including details on feature selection, data preprocessing, and model development. Adopting such frameworks ensures consistency and completeness in reporting, thereby enhancing the interpretability and reliability of ML models. Furthermore, method development in ML research should prioritize explainability and user-friendly tools that enable clinicians to understand the rationale behind predictions. Explainable AI (XAI) techniques, for instance, can provide insights into the importance of features and the decision-making process of ML algorithms. By fostering transparency, these approaches can help bridge the gap between computational advancements and practical clinical applications. Future studies must also address the heterogeneity in feature selection methods and provide comprehensive documentation of the data and algorithms used. This effort will not only facilitate external validation but also encourage the development of generalizable and equitable models that can be reliably implemented across diverse populations and settings.

5.6. Addressing Censoring and Data Limitations

Another critical challenge in long-term CVD risk prediction is censoring, where patients are lost to follow-up due to withdrawal, death, or incomplete data capture. If improperly handled, censoring can bias model performance estimates. While traditional survival analysis models, such as QRESEARCH (cardiovascular risk algorithm) estimated version 3 (QRISK3 score), are equipped to address censoring, ML models like random survival forest offer a promising solution for incorporating time-to-event data [47]. Ensuring the inclusion of survival outcomes in ML models will enhance their clinical relevance, particularly in longitudinal studies.

5.7. Influence of Sample Size on ML Models

The variation in sample sizes across studies, as highlighted in Table 3, plays a crucial role in the development, validation, and testing of machine learning (ML) models for cardiovascular disease (CVD) risk prediction. Larger sample sizes generally provide a more diverse and representative dataset, enabling models to learn complex patterns and relationships while reducing the risk of overfitting. This enhances the generalizability of the model to new, unseen data. Conversely, studies with smaller sample sizes face challenges, such as limited representativeness, increased model variance, and overfitting, where the model performs well on the training dataset but poorly on external or independent datasets. Small sample sizes may also exacerbate biases, especially when the dataset is imbalanced with respect to critical variables, such as gender, ethnicity, or age. To address these challenges, researchers can employ strategies such as the following:
Data Augmentation: Generating synthetic data to increase the effective sample size while preserving the characteristics of the original dataset.
Transfer Learning: Leveraging pre-trained models from larger datasets to improve performance on smaller datasets.
External Validation: Validating models on independent datasets from diverse populations to ensure robustness and generalizability.
Additionally, future research should prioritize the development of collaborative, multi-center datasets to enhance the scalability and reliability of ML models. Standardized reporting of sample size and its impact on model performance is essential for advancing the transparency and reproducibility of ML-based CVD risk prediction.

6. Practical Implications and Challenges of Machine Learning-Based Risk Prediction Models

The development and application of ML-based models in CVD risk prediction raise several important questions about their practical implications in clinical management. One key consideration is how improved prediction capabilities will translate into actionable interventions. Traditional, clinically based prediction models rely on modifiable risk factors, such as blood pressure, cholesterol levels, and smoking status, allowing for clinicians to target these factors for treatment and behavioral modifications. In contrast, ML-based models may identify predictors such as genetic factors, race, or socioeconomic variables that are not directly modifiable through current interventions. To address this challenge, future research and clinical strategies will need to focus on integrating ML-based insights into actionable care pathways. For instance, while genetic predispositions cannot be altered, their identification could lead to earlier screening and more personalized preventive measures. Similarly, socio-environmental predictors could inform public health initiatives aimed at addressing systemic inequalities that contribute to cardiovascular risk. Another important consideration is whether a universal ML-based model can be effectively applied across diverse populations or whether region-specific models will be necessary. Variations in genetic, environmental, and healthcare factors between countries, regions, or even hospitals may limit the generalizability of a single model. Developing localized models tailored to specific populations may offer greater accuracy and clinical relevance, but this approach also presents challenges in terms of resource requirements, validation, and scalability. Finally, the integration of ML-based models into clinical workflows must balance predictive accuracy with interpretability and clinical utility. Tools such as explainable AI and frameworks for translating complex predictors into actionable insights will be critical for fostering clinician trust and facilitating the adoption of these models in everyday practice.

7. Future Directions and Recommendations

To reduce the global burden of CVD, prevention efforts should emphasize primordial and primary prevention strategies informed by robust ML models. Future research must prioritize the following:
-
Scalability and Validation: Conduct large-scale studies with diverse and geographically representative populations to ensure external validity.
-
Model Integration: Develop frameworks for integrating heterogeneous data types, including imaging, omics, and clinical records, into unified predictive models.
-
Standardization: Implement standardized reporting guidelines and coding systems, such as TRIPOD and UMLS, respectively, to improve data quality and model generalizability.
-
Transparency and Interpretability: Foster clinician trust through interpretable algorithms that clearly elucidate the rationale behind predictions.
-
Impact Assessment: Evaluate the effectiveness, cost-effectiveness, and real-world clinical utility of ML models to inform healthcare policies and decision-making.
By addressing these challenges and focusing on a data-driven, standardized approach to ML development, validation, and implementation, researchers can unlock the full potential of AI in improving CVD risk prediction and management. Integrating novel biomarkers, such as lipidomic and proteomic data, alongside advanced computational methods offers a transformative opportunity to enhance precision medicine and deliver equitable cardiovascular care.

8. Conclusions

The integration of ML and AI into CVD risk prediction represents a significant advancement in precision medicine, offering unprecedented opportunities to enhance patient outcomes management through improved predictive accuracy, interpretability, and personalized risk stratification. By leveraging diverse data sources and capturing complex, nonlinear relationships, ML and AI models effectively address many of the inherent limitations of traditional risk prediction tools. However, the widespread clinical adoption of these technologies remains contingent upon resolving critical challenges related to generalizability, accessibility, and transparency. While AI algorithms have demonstrated substantial promise in transforming CVD risk assessment, the number of ML models successfully implemented in clinical practice remains limited. A key obstacle lies in defining clear criteria for determining the point at which these models provide meaningful clinical benefits, a gap that continues to hinder their real-world applicability. Additionally, the inherent complexity of ML-based risk prediction tools often translates into increased computational demands, time constraints, and higher costs, which can compromise their usability in resource-limited clinical settings. This necessitates a balanced approach that optimizes predictive accuracy while maintaining operational feasibility to ensure seamless integration into healthcare workflows.
To address these challenges, future research must focus on improving the discriminatory power of ML models to distinguish between patients with unique risk factor combinations, accurately stratifying absolute CVD risk levels. Contemporary and emerging risk factors, including social determinants of health, behavioral data, and environmental exposures, should be rigorously evaluated for their incremental value in risk prediction. Furthermore, the integration of high-dimensional data from multi-omics approaches—such as genomics, proteomics, lipidomics, and metabolomics—alongside advanced imaging and clinical parameters will be critical in enhancing the precision and dynamic nature of CVD risk models. Despite their complexity, ML models do not always outperform conventional approaches like logistic regression or clinician-led predictions in certain contexts, as highlighted in systematic reviews. This observation underscores the importance of adopting a measured and pragmatic perspective, recognizing that risk assessment is inherently probabilistic and subject to assumptions, data quality, and analytic frameworks. Addressing these limitations requires robust external validation of ML models across heterogeneous and geographically diverse populations to ensure reliability, reproducibility, and applicability. Moreover, current geographical imbalances, low reproducibility rates, insufficient reporting quality, and a lack of standardized assessment frameworks must be systematically overcome to enable the development of unbiased and equitable ML models.

Author Contributions

All authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

D.-I.K. declares no conflict of interest. T.T. is Guest Editor in the Special Issue ‘Advancements in Cardiovascular Epidemiology: Integrating Predictive Modelling into Public Health’.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
ANNArtificial Neural Networks
ARICAtherosclerosis Risk in Communities
ASCVDAtherosclerotic Cardiovascular Disease
AUCArea Under the Curve
AUROCArea Under the Receiver Operating Characteristic curve
AusDiabAustralian Diabetes, Obesity, and Lifestyle Study
A-LSTMAttention-based Bidirectional Long-Short Term Memory
Bi-LSTMBidirectional Long-Short Term Memory
BMIBody Mass Index
CACCoronary Artery Calcification
CARDIACoronary Artery Risk Development in Young Adults
CCTACoronary Computed Tomography Angiography
CHSCardiovascular Health Study
CIConfidence interval
cIMTCarotid Intima-Media Thickness
CMRCardiac Magnetic Resonance
Cox PHCox Proportional Hazards
Cox PH-TWITwo-Way Interactions Cox Proportional Hazards
CTComputed Tomography
CUSCarotid Ultrasound
CVDsCardiovascular Diseases
C-indexConcordance index
DAGsDiacylglycerides
DLDeep Learning
ECGElectrocardiographic
EHRsElectronic Health Records
Elnet-CoxElastic-net Cox
EPICEuropean Prospective Investigation
FRSFramingham Risk Score
HbA1cGlycated Hemoglobin
IHDIschemic Heart Disease
IQRInterquartile Range
LASSOLeast Absolute Shrinkage and Selection Operator
LDLumen Diameter
LightGBMLight Gradient Boosting Machine
Linear SVMLinear Support Vector Machine
LRLogistic Regression
LRSLipidomic Risk Score
LUTSLower Urinary Tract Symptoms
MACCEMajor Adverse Cardiac and Cerebrovascular Events
MACEMajor Adverse Cardiovascular Events
MAEMajor Adverse Events
MDC-CCMalmö Diet and Cancer-Cardiovascular Study
MIMyocardial infarction
MINIMARMINimum Information for Medical AI Reporting
MLMachine Learning
MLPMultilayer Perceptron
MRIMagnetic Resonance Imaging
NACENet Adverse Clinical Events
NLPNatural Language Processing
NRNot Reported
NRINet Reclassification Improvement
PCEPooled Cohort Equations
PICPunjab Institute of Cardiology
PICOSPatient, Intervention, Comparison, Outcomes, and Study
RBF SVMRadial Basis Function Support Vector Machine
RFRandom Forest
RMSERoot Mean Square Error
SBPSystolic Blood Pressure
SCORESystematic Coronary Risk Evaluation
SDStandard Deviation
SGD-SVMStochastic Gradient Descent based Support Vector Machine
SHAPSHapley Additive exPlanations
SVMSupport Vector Machine
SVMsmoSupport Vector Machines with Sequential Minimal Optimization
TAGsTriacylglycerides
TRIPODTransparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis
T2DMType 2 Diabetes Mellitus
UKBBUnited Kingdom Biobank
UKPDSUnited Kingdom Prospective Diabetes Study
UMLSUnified Medical Language System
USUnited States
XGBoostEXtreme Gradient Boosting

References

  1. World Health Organization, Cardiovascular Diseases (CVDs). Available online: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) (accessed on 7 September 2024).
  2. D’Agostino, R.B., Sr.; Vasan, R.S.; Pencina, M.J.; Wolf, P.A.; Cobain, M.; Massaro, J.M.; Kannel, W.B. General cardiovascular risk profile for use in primary care: The Framingham Heart Study. Circulation 2008, 117, 743–753. [Google Scholar] [CrossRef] [PubMed]
  3. Conroy, R.M.; Pyörälä, K.; Fitzgerald, A.P.; Sans, S.; Menotti, A.; De Backer, G.; De Bacquer, D.; Ducimetière, P.; Jousilahti, P.; Keil, U.; et al. Estimation of ten-year risk of fatal cardiovascular disease in Europe: The SCORE project. Eur. Heart J. 2003, 24, 987–1003. [Google Scholar] [CrossRef] [PubMed]
  4. Cooney, M.T.; Dudina, A.L.; Graham, I.M. Value and Limitations of Existing Scores for the Assessment of Cardiovascular Risk: A Review for Clinicians. J. Am. Coll. Cardiol. 2009, 54, 1209–1227. [Google Scholar] [CrossRef] [PubMed]
  5. SBaart, S.J.; Dam, V.; Scheres, L.J.J.; Damen, J.A.A.G.; Spijker, R.; Schuit, E.; Debray, T.P.A.; Fauser, B.C.J.M.; Boersma, E.; Moons, K.G.M.; et al. Cardiovascular risk prediction models for women in the general population: A systematic review. PLoS ONE 2019, 14, e0210329. [Google Scholar] [CrossRef]
  6. Hong, C.; Pencina, M.J.; Wojdyla, D.M.; Hall, J.L.; Judd, S.E.; Cary, M.; Engelhard, M.M.; Berchuck, S.; Xian, Y.; D’agostino, R.; et al. Predictive Accuracy of Stroke Risk Prediction Models Across Black and White Race, Sex, and Age Groups. JAMA 2023, 329, 306–317. [Google Scholar] [CrossRef]
  7. Goldstein, B.A.; Navar, A.M.; Carter, R.E. Moving beyond regression techniques in cardiovascular risk prediction: Applying machine learning to address analytic challenges. Eur. Heart J. 2017, 38, 1805–1814. [Google Scholar] [CrossRef]
  8. Rajkomar, A.; Dean, J.; Kohane, I. Machine Learning in Medicine. N. Engl. J. Med. 2019, 380, 1347–1358. [Google Scholar] [CrossRef]
  9. Kagiyama, N.; Shrestha, S.; Farjo, P.D.; Sengupta, P.P. Sengupta, Artificial Intelligence: Practical Primer for Clinical Research in Cardiovascular Disease. J. Am. Heart Assoc. Cardiovasc. Cerebrovasc. Dis. 2019, 8, 12788. [Google Scholar] [CrossRef]
  10. Nurmohamed, N.S.; Kraaijenhof, J.M.; Mayr, M.; Nicholls, S.J.; Koenig, W.; Catapano, A.L.; Stroes, E.S.G. Proteomics and lipidomics in atherosclerotic cardiovascular disease risk prediction. Eur. Heart J. 2023, 44, 1594. [Google Scholar] [CrossRef]
  11. Turchioe, M.R.; Volodarskiy, A.; Pathak, J.; Wright, D.N.; Tcheng, J.E.; Slotwiner, D. Slotwiner, Systematic review of current natural language processing methods and applications in cardiology. Heart 2022, 108, 909–916. [Google Scholar] [CrossRef]
  12. Pattarabanjird, T.; Cress, C.; Nguyen, A.; Taylor, A.; Bekiranov, S.; McNamara, C. A Machine Learning Model Utilizing a Novel SNP Shows Enhanced Prediction of Coronary Artery Disease Severity. Genes 2020, 11, 1446. [Google Scholar] [CrossRef] [PubMed]
  13. Chen, Y.-H.; Sawan, M. Trends and Challenges of Wearable Multimodal Technologies for Stroke Risk Prediction. Sensors 2021, 21, 460. [Google Scholar] [CrossRef] [PubMed]
  14. Jaltotage, B.; Sukudom, S.; Ihdayhid, A.R.; Dwivedi, G. Enhancing Risk Stratification on Coronary Computed Tomography Angiography: The Role of Artificial Intelligence. Clin. Ther. 2023, 45, 1023–1028. [Google Scholar] [CrossRef] [PubMed]
  15. Usova, E.I.; Alieva, A.S.; Yakovlev, A.N.; Alieva, M.S.; Prokhorikhin, A.A.; Konradi, A.O.; Shlyakhto, E.V.; Magni, P.; Catapano, A.L.; Baragetti, A. Integrative Analysis of Multi-Omics and Genetic Approaches-A New Level in Atherosclerotic Cardiovascular Risk Prediction. Biomolecules 2021, 11, 1597. [Google Scholar] [CrossRef] [PubMed]
  16. Roseiro, M.; Henriques, J.; Paredes, S.; Rocha, T.; Sousa, J. An interpretable machine learning approach to estimate the influence of inflammation biomarkers on cardiovascular risk assessment. Comput. Methods Programs Biomed. 2023, 230, 107347. [Google Scholar] [CrossRef]
  17. Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, 71. [Google Scholar] [CrossRef]
  18. Mheta, D.; Mashamba-Thompson, T.P. Barriers and facilitators of access to maternal services for women with disabilities: Scoping review protocol. Syst. Rev. 2017, 6, 99. [Google Scholar] [CrossRef]
  19. Lauber, C.; Gerl, M.J.; Klose, C.; Ottosson, F.; Melander, O.; Simons, K. Lipidomic risk scores are independent of polygenic risk scores and can predict incidence of diabetes and cardiovascular disease in a large population cohort. PLoS Biol. 2022, 20, e3001561. [Google Scholar] [CrossRef]
  20. Hoogeveen, R.M.; Pereira, J.P.B.; Nurmohamed, N.S.; Zampoleri, V.; Bom, M.J.; Baragetti, A.; Boekholdt, S.M.; Knaapen, P.; Khaw, K.-T.; Wareham, N.J.; et al. Improved cardiovascular risk prediction using targeted plasma proteomics in primary prevention. Eur. Heart J. 2020, 41, 3998–4007. [Google Scholar] [CrossRef]
  21. Wu, J.; Giles, C.; Dakic, A.; Beyene, H.B.; Huynh, K.; Wang, T.; Meikle, T.; Olshansky, G.; Salim, A.; Duong, T.; et al. Lipidomic Risk Score to Enhance Cardiovascular Risk Stratification for Primary Prevention. J. Am. Coll. Cardiol. 2024, 84, 434–446. [Google Scholar] [CrossRef]
  22. Atehortúa, A.; Gkontra, P.; Camacho, M.; Diaz, O.; Bulgheroni, M.; Simonetti, V.; Chadeau-Hyam, M.; Felix, J.F.; Sebert, S.; Lekadir, K. Cardiometabolic risk estimation using exposome data and machine learning. Int. J. Med. Inform. 2023, 179, 105209. [Google Scholar] [CrossRef] [PubMed]
  23. Dong, W.; Wan, E.Y.F.; Fong, D.Y.T.; Tan, K.C.; Tsui, W.W.; Hui, E.M.; Chan, K.H.; Fung, C.S.C.; Lam, C.L.K. Development and validation of 10-year risk prediction models of cardiovascular disease in Chinese type 2 diabetes mellitus patients in primary care using interpretable machine learning-based methods. Diabetes Obes. Metab. 2024, 26, 3969–3987. [Google Scholar] [CrossRef] [PubMed]
  24. An, Y.; Huang, N.; Chen, X.; Wu, F.; Wang, J. High-Risk Prediction of Cardiovascular Diseases via Attention-Based Deep Neural Networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 18, 1093–1105. [Google Scholar] [CrossRef] [PubMed]
  25. Shen, T.-T.; Liu, C.-F.; Wu, M.-P. Implementation of a machine learning model in acute coronary syndrome and stroke risk assessment for patients with lower urinary tract symptoms. Taiwan. J. Obstet. Gynecol. 2024, 63, 518–526. [Google Scholar] [CrossRef]
  26. Yu, J.; Yang, X.; Deng, Y.; Krefman, A.E.; Pool, L.R.; Zhao, L.; Mi, X.; Ning, H.; Wilkins, J.; Lloyd-Jones, D.M.; et al. Incorporating longitudinal history of risk factors into atherosclerotic cardiovascular disease risk prediction using deep learning. Sci. Rep. 2024, 14, 2554. [Google Scholar] [CrossRef]
  27. Lip, G.Y.H.; Genaidy, A.; Estes, C. Cardiovascular disease (CVD) outcomes and associated risk factors in a medicare population without prior CVD history: An analysis using statistical and machine learning algorithms. Intern. Emerg. Med. 2023, 18, 1373–1383. [Google Scholar] [CrossRef]
  28. Yoshida, S.; Tanaka, S.; Okada, M.; Ohki, T.; Yamagishi, K.; Okuno, Y. Development and validation of ischemic heart disease and stroke prognostic models using large-scale real-world data from Japan. Environ. Health Prev. Med. 2023, 28, 16. [Google Scholar] [CrossRef]
  29. Deng, Y.; Liu, L.; Jiang, H.; Peng, Y.; Wei, Y.; Zhou, Z.; Zhong, Y.; Zhao, Y.; Yang, X.; Yu, J.; et al. Comparison of State-of-the-Art Neural Network Survival Models with the Pooled Cohort Equations for Cardiovascular Disease Risk Prediction. BMC Med. Res. Methodol. 2023, 23, 22. [Google Scholar] [CrossRef]
  30. Sajid, M.R.; Almehmadi, B.A.; Sami, W.; Alzahrani, M.K.; Muhammad, N.; Chesneau, C.; Hanif, A.; Khan, A.A.; Shahbaz, A. Development of Nonlaboratory-Based Risk Prediction Models for Cardiovascular Diseases Using Conventional and Machine Learning Approaches. Int. J. Environ. Res. Public Health 2021, 18, 12586. [Google Scholar] [CrossRef]
  31. Sajid, M.R.; Muhammad, N.; Zakaria, R.; Shahbaz, A.; Bukhari, S.A.C.; Kadry, S.; Suresh, A. Nonclinical Features in Predictive Modeling of Cardiovascular Diseases: A Machine Learning Approach. Interdiscip. Sci. 2021, 13, 201–211. [Google Scholar] [CrossRef]
  32. Chakraborty, C.; Bhattacharya, M.; Pal, S.; Lee, S.-S. From machine learning to deep learning: Advances of the recent data-driven paradigm shift in medicine and healthcare. Curr. Res. Biotechnol. 2024, 7, 100164. [Google Scholar] [CrossRef]
  33. Faizal, A.S.M.; Thevarajah, T.M.; Khor, S.M.; Chang, S.-W. A review of risk prediction models in cardiovascular disease: Conventional approach vs. artificial intelligent approach. Comput. Methods Programs Biomed. 2021, 207, 106190. [Google Scholar] [CrossRef]
  34. Schuermans, A.; Pournamdari, A.B.; Lee, J.; Bhukar, R.; Ganesh, S.; Darosa, N.; Small, A.M.; Yu, Z.; Hornsby, W.; Koyama, S.; et al. Integrative proteomic analyses across common cardiac diseases yield mechanistic insights and enhanced prediction. Nat. Cardiovasc. Res. 2024, 3, 1516–1530. [Google Scholar] [CrossRef] [PubMed]
  35. Zhu, D.; Vernon, S.T.; D’Agostino, Z.; Wu, J.; Giles, C.; Chan, A.S.; Kott, K.A.; Gray, M.P.; Gholipour, A.; Tang, O.; et al. Lipidomics Profiling and Risk of Coronary Artery Disease in the BioHEART-CT Discovery Cohort. Biomolecules 2023, 13, 917. [Google Scholar] [CrossRef] [PubMed]
  36. Albrecht, V.; Müller-Reif, J.; Nordmann, T.M.; Mund, A.; Schweizer, L.; Geyer, P.E.; Niu, L.; Wang, J.; Post, F.; Oeller, M.; et al. Bridging the Gap From Proteomics Technology to Clinical Application: Highlights From the 68th Benzon Foundation Symposium. Mol. Cell. Proteom. 2024, 23, 100877. [Google Scholar] [CrossRef]
  37. Cai, Y.-Q.; Gong, D.-X.; Tang, L.-Y.; Li, H.-J.; Jing, T.-C.; Gong, M.; Hu, W.; Zhang, Z.-W.; Zhang, X.; Zhang, G.-W. Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions. J. Med. Internet Res. 2024, 26, e47645. [Google Scholar] [CrossRef]
  38. Bernardi, F.A.; Alves, D.; Crepaldi, N.; Yamada, D.B.; Lima, V.C.; Rijo, R. Data Quality in Health Research: Integrative Literature Review. J. Med. Internet Res. 2023, 25, e41446. [Google Scholar] [CrossRef]
  39. Wilimitis, D.; Walsh, C.G. Practical Considerations and Applied Examples of Cross-Validation for Model Development and Evaluation in Health Care: Tutorial. JMIR AI 2023, 2, e49023. [Google Scholar] [CrossRef]
  40. Ramspek, C.L.; Jager, K.J.; Dekker, F.W.; Zoccali, C.; van Diepen, M. External validation of prognostic models: What, why, how, when and where? Clin. Kidney J. 2021, 14, 49–58. [Google Scholar] [CrossRef]
  41. Nasarian, E.; Alizadehsani, R.; Acharya, U.; Tsui, K.-L. Designing interpretable ML system to enhance trust in healthcare: A systematic review to proposed responsible clinician-AI-collaboration framework. Inf. Fusion 2024, 108, 102412. [Google Scholar] [CrossRef]
  42. Sedlakova, J.; Daniore, P.; Wintsch, A.H.; Wolf, M.; Stanikic, M.; Haag, C.; Sieber, C.; Schneider, G.; Staub, K.; Ettlin, D.A.; et al. Challenges and best practices for digital unstructured data enrichment in health research: A systematic narrative review. PLoS Digit. Health 2023, 2, e0000347. [Google Scholar] [CrossRef] [PubMed]
  43. Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cognit. Comput. 2024, 16, 45–74. [Google Scholar] [CrossRef]
  44. GCollins, G.S.; Reitsma, J.B.; Altman, D.G.; Moons, K.G. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. BMJ 2015, 350, g7594. [Google Scholar] [CrossRef] [PubMed]
  45. Hernandez-Boussard, T.; Bozkurt, S.; A Ioannidis, J.P.; Shah, N.H. MINIMAR (MINimum Information for Medical AI Reporting): Developing reporting standards for artificial intelligence in health care. J. Am. Med. Inform. Assoc. 2020, 27, 2011–2015. [Google Scholar] [CrossRef]
  46. Rasmy, L.; Tiryaki, F.; Zhou, Y.; Xiang, Y.; Tao, C.; Xu, H.; Zhi, D. Representation of EHR data for predictive modeling: A comparison between UMLS and other terminologies. J. Am. Med. Inform. Assoc. 2020, 27, 1593. [Google Scholar] [CrossRef]
  47. Li, Y.; Sperrin, M.; Ashcroft, D.M.; van Staa, T.P. Consistency of variety of machine learning and statistical models in predicting clinical risks of individual patients: Longitudinal cohort study using cardiovascular disease as exemplar. BMJ 2020, 371, m3919. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.