AI-Driven Technology in Heart Failure Detection and Diagnosis: A Review of the Advancement in Personalized Healthcare

Udoy, Ikteder Akhand; Hassan, Omiya

doi:10.3390/sym17030469

Open AccessReview

AI-Driven Technology in Heart Failure Detection and Diagnosis: A Review of the Advancement in Personalized Healthcare

by

Ikteder Akhand Udoy

¹

and

Omiya Hassan

^2,*

¹

Department of Computing, Boise State University, Boise, ID 83725, USA

²

Department of Electrical and Electronic Engineering, Boise State University, Boise, ID 83725, USA

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(3), 469; https://doi.org/10.3390/sym17030469

Submission received: 14 January 2025 / Revised: 5 March 2025 / Accepted: 11 March 2025 / Published: 20 March 2025

(This article belongs to the Special Issue Asymmetric and Symmetric in Deep Computer Vision and Generative Modeling)

Download

Browse Figures

Versions Notes

Abstract

Artificial intelligence (AI) is playing a dominant role in advancing heart failure detection and diagnosis, significantly furthering personalized healthcare. This review synthesizes AI-driven innovations by examining methodologies, applications, and outcomes. We investigate the integration of machine learning algorithms, diverse datasets including electronic health records (EHRs), medical records, imaging data, and clinical notes, deep learning models, and neural networks to enhance diagnostic accuracy. Key advancements include prediction models that leverage real-time data from wearable devices alongside state-of-the-art AI systems trained on patient data from hospitals and clinics. Notably, recent studies have reported diagnostic accuracies ranging from 86.7% to as high as 99.9%, with sensitivity and specificity values often exceeding 97%, underscoring the potential of these AI systems to improve early detection and clinical decision-making substantially. Our review further explores the impact of symmetry and asymmetry in model design, highlighting that symmetric architectures like U-Net offer computational efficiency and structured feature extraction. In contrast, asymmetric models improve the sensitivity to rare conditions and subtle clinical patterns. Incorporating these deep learning (DL) methods in anomaly detection and disease progression modeling further reinforces their positive impact on diagnostic accuracy and patient outcomes. Furthermore, this review identifies challenges in current AI applications, such as data quality, algorithmic transparency, model bias, and evaluation metrics, while outlining future research directions, including integrating generative models, hybrid architectures, and explainable AI techniques to optimize clinical practice.

Keywords:

deep learning; personalized healthcare; machine learning; heart failure

1. Introduction

Heart failure (HF) is a pervasive and critical global health issue that affects millions worldwide, imposing significant burdens on healthcare systems [1,2]. This complex clinical syndrome arises when the heart fails to pump sufficient blood to meet the body’s needs, resulting in symptoms such as breathlessness, fatigue, and fluid retention [3]. The rising prevalence of HF is driven by aging populations and the increasing incidence of risk factors such as hypertension, diabetes, and obesity [4]. According to the World Health Organization (WHO) [5], HF contributes to approximately 9% of all cardiovascular deaths globally, and an estimated 26 million people suffer from this condition [6].

Figure 1 illustrates a significant decline in global mortality rates from 1950 to 2021, highlighting the impact of technological advancements in healthcare. Groenewegen et al. [7] noted that developing nations initially faced high mortality rates due to a dual burden of communicable and non-communicable diseases. However, improvements in medical interventions and AI-driven healthcare solutions have contributed to reducing deaths worldwide [8]. Despite these advancements, disparities persist, emphasizing the need for continued targeted healthcare policies and interventions.

Current clinical practice relies on a variety of diagnostic tools—including echocardiograms, electrocardiograms (ECGs) [10], blood tests for biomarkers such as B-type natriuretic peptide (BNP) [11], chest X-rays [12], and cardiac magnetic resonance imaging (MRI) [13,14]—to evaluate the heart’s structure and function. While these standard methods are critical for diagnosis, evaluating disease severity, and guiding treatment decisions, they have limitations in predicting disease progression and individual patient outcomes [15]. In addition, the economic burden associated with HF is substantial; for example, in the United States, annual direct and indirect expenses are projected to exceed $70 billion by 2030 [16]. High rates of hospital readmissions [17] and mortality [18] further emphasize the urgent need for innovative approaches in the diagnosis, risk stratification, and overall management of HF.

The advent of artificial intelligence (AI), including machine learning (ML) and deep learning (DL), has opened new avenues for addressing these challenges. By leveraging large, complex datasets—ranging from medical images and electronic health records (EHRs) to genomic data—AI technologies have the potential to reveal subtle patterns and offer opportunities for early detection, personalized treatment, and improved patient outcomes. Deep learning, with its multi-layered neural networks, has already been applied in various contexts such as ECG analysis, yet gaps remain. For instance, a review by Hong et al. [19] focused on deep neural networks for ECG-based disease detection (targeting conditions such as atrial fibrillation, myocardial infarction, and congestive heart failure) but did not encompass broader areas like risk prediction, anomaly detection, or personalized medicine.

Motivated by these limitations, our review aims to fill the research gap by incorporating recent studies and a wider range of AI applications in HF research. The main objectives of this manuscript are as follows:

Synthesize current research on AI-driven methodologies for HF detection and diagnosis;
Evaluate the advantages and limitations of both established and emerging techniques;
Identify gaps in existing studies, particularly in risk prediction, anomaly detection, and personalization;
Propose future directions that could enhance diagnostic precision, interpretability, and clinical adoption.

The remainder of this paper is organized as follows. In Section 2, we examine the impact of symmetry and asymmetry in deep learning model designs for heart failure detection in related works, discussing their strengths and limitations. Section 3 describes our review methodology, detailing the search strategy, data extraction process, and criteria for selecting relevant studies. In Section 4, we explore the results of various symmetric and asymmetric models and emerging generative modeling techniques in healthcare and their applications in enhancing diagnostic accuracy. Section 5 discusses the implications of our findings, addresses current challenges, and suggests directions for future research. Finally, we conclude the paper in Section 6 with our final thoughts and comments regarding the extensive usage of AI in HF healthcare.

2. Related Work

Recent advances in AI have led to significant breakthroughs in HF detection, yet the field remains dynamic with ongoing challenges and opportunities. This section reviews key contributions, emphasizing model design, generative modeling techniques, and the integration of AI in clinical workflows.

2.1. Model Architectures: Symmetry and Asymmetry

A central theme in recent literature is the design of deep learning models that effectively capture the complexities of cardiovascular data. Symmetric neural networks, such as U-Net, utilize an encoder–decoder structure that maintains a balanced flow of information. This architecture is particularly effective for tasks like echocardiographic segmentation, where consistent feature extraction and reconstruction facilitate the precise delineation of cardiac chambers [20]. However, the uniformity of symmetric models can limit their ability to handle data imbalances and detect rare or subtle abnormalities.

To address these shortcomings, researchers have developed asymmetric neural networks that incorporate specialized mechanisms such as dual-attention modules [21] and multi-scale feature fusion. These approaches enhance sensitivity to subtle diagnostic patterns and rare conditions—such as pulmonary congestion and ventricular asymmetry—thereby improving overall diagnostic precision and model generalization [20].

2.2. Overview of AI Applications in Heart Failure Detection

The reviewed literature also reveals diverse AI applications that underscore the transformative potential of these technologies in clinical practice:

Diagnostic Tools: AI-driven models have been applied to analyze ECG data, medical imaging, and EHRs, providing highly accurate detection and classification of HF. For instance, studies have demonstrated the efficacy of CNNs in processing echocardiograms and X-ray images, while RNNs have been integrated to capture temporal dependencies in ECG signals.
Risk Stratification and Personalized Medicine: Several studies emphasize the use of AI for risk prediction, offering the potential to tailor treatments to individual patients. However, as noted by Hong et al. [19], previous reviews have primarily overlooked this aspect, pointing to a significant research gap.
Advanced Modeling Approaches: The evolution from symmetric to asymmetric model designs and generative modeling techniques has improved diagnostic precision, though data quality, interpretability, and model bias persist.

Collectively, these studies provide a comprehensive synthesis of current methodologies while highlighting the need for further research to address existing gaps. The literature, in particular, points to the need for more robust models that integrate multi-modal data and incorporate explainability and the development of lightweight, generalizable frameworks suitable for clinical deployment.

3. Methodology

3.1. Search Strategy, Data Gathering and Extraction

A comprehensive literature review was conducted to identify relevant studies on heart failure (HF) and the application of artificial intelligence (AI), machine learning (ML), and deep learning (DL). As shown in Figure 2, we selected the research papers focusing on cardiovascular disease detection and diagnostics using deep learning and AI models. We exclude studies irrelevant to heart disease or reviews focusing on different healthcare systems. The electronic databases are Google Scholar, American Heart Association (AHA), European Journal of Heart Failure, IEEE Xplore, and Researcher.Life. These repositories were selected for their extensive coverage of medical and technological research and they provide access to a wide range of peer-reviewed journals and conference proceedings pertinent to the topic. Free-text keywords were used to ensure a thorough search. The leading search terms included “Heart Failure”, “Cardiac Failure”, “Congestive Heart Failure”, “Artificial Intelligence”, “AI”, “Machine Learning”, “ML”, “Deep Learning”, “Prevention”, “Risk Prediction”, “Early Detection”, “Epidemiology” OR “Risk Factors”, “Pathophysiology”, “Mechanisms”.

The inclusion criteria are as follows:

Peer-reviewed articles published in English;
Studies published between 2014 and 2025 to ensure contemporary relevance;
Research involving human subjects, focusing on heart failure and using AI, ML, or deep learning;
Articles addressing epidemiology, risk factors, pathophysiology, preventive strategies, and technological advances in heart failure.

The exclusion criteria are as follows:

Non-English publications;
Studies involving animal models without direct human data;
Reviews, editorials, and opinion pieces, unless they provided substantial insights or were foundational papers;
Studies not focusing on the application of AI, ML, or deep learning in heart failure.

Figure 3 is a word cloud of the most common words used in the research journals mentioned in this study. The words that are the largest are used most often; in this case, we can see that the words “Heart”, “Learning”, “Failure”, “Deep”, and “Data” appear more frequently in the research papers. The generation process involved parsing relevant research papers using Python (version 3.9.13) to identify the most common words. Common stopwords (e.g., the, et, al) were excluded, and the resulting terms were visualized using an online word cloud generator. After trying a few visualizers, we selected one with a simple structure and appealing color. After that, we generated the “Word Cloud” image as an output, which was converted to PNG file format from SVG format.

Table 1 summarizes the top 10 most frequently occurring words extracted from a corpus of 17 research articles. In addition, Figure 2 illustrates the workflow employed in this study. The process involved identifying and retrieving relevant research papers, extracting text using a Python script implemented on Google Colab, computing word frequencies, and finally generating a word cloud visualization via WordClouds.com. The prominence of terms such as “Heart”, “Failure”, “Deep”, “Learning”, and “Data” indicates a substantial focus on the application of deep learning (DL) and convolutional neural network (CNN) models for heart failure detection and diagnosis, as reflected in the top-cited literature within this field.

3.2. Selection of Study Criteria

We selected the research papers based on their innovative approach to using deep learning (DL) models, various neural networks, and machine learning models.

In Figure 4, some current AI and ML algorithms have demonstrated remarkable accuracy in diagnosing heart failure, predicting disease progression, and identifying patients at high risk of adverse events. For instance, AI-powered models can rapidly analyze EHRs to predict heart failure onset, facilitating timely intervention [22]. Moreover, deep learning (DL) techniques applied to imaging data, such as echocardiograms and cardiac MRIs, can enhance the interpretation of cardiac function and structure, providing critical insights for clinicians. Furthermore, AI-driven decision support systems are transforming the management of heart failure by integrating patient-specific data to recommend optimized treatment plans [23]. These systems can continuously learn from new data, adapt to the evolving characteristics of heart failure, and improve their predictive capabilities over time. Integrating AI into remote monitoring and wearable devices also enables the real-time tracking of patient health status, allowing proactive management and reducing the likelihood of hospital readmission. The central part of Figure 4 illustrates the convergence of AI and heart failure treatment, with overlapping areas indicating essential applications of AI in this medical field. AI technologies, including machine learning (ML), deep learning (DL), natural language processing (NLP), and predictive analytics, are increasingly being used in diagnostics, personalized treatment, monitoring, and predictive models for heart failure. These technologies help enhance the precision and effectiveness of heart failure management by providing data-driven insights and recommendations. Moreover, Figure 4 details various AI models and heart failure detection and management strategies. Machine learning models such as random forest [24,25], support vector machines (SVM) [26], and gradient boosting machines [27] are highlighted for their roles in predictive analytics and risk stratification. Deep learning models, particularly convolutional neural networks (CNNs) for imaging and recurrent neural networks (RNNs) for time-series data (like ECGs), are noted for their accuracy in interpreting complex medical data [28]. Additionally, NLP is used for the text analysis of clinical notes and patient history analysis, further aiding in comprehensive patient care. Figure 4 outlines practical applications of AI in cardiology in the bottom right section. These applications include image analysis for MRI [29] and CT scans [30], predictive analytics for patient risk stratification [31], and decision support systems for treatment planning [32]. Figure 4 also identifies critical gaps that fully address the need to leverage AI in heart failure management. Data quality and integration issues pose significant obstacles, such as inconsistent data formats and a lack of comprehensive datasets. However, model interpretability remains a challenge, with the black-box [33] nature of AI models necessitating the development of explainable AI to ensure transparency in clinical settings. Regulatory and ethical challenges, including patient data privacy and navigating regulatory approvals [34], are also significant concerns. Additionally, implementation barriers such as the high cost of AI systems and the need for clinician training and acceptance must be overcome to facilitate widespread adoption. Our detailed depiction underscores the transformative potential of AI in cardiology, particularly in heart failure management.

3.3. Methodological Insights on Deep Learning Model in Cardiology

3.3.1. Implementation of the EchoNet-Dynamic Experiment

In the research article by David et al. [35], the EchoNet-Dynamic model, created to facilitate beat-to-beat cardiac function extracting, was implemented through an advanced deep learning framework tailored explicitly for echocardiogram video analysis. The study involved a substantial dataset of 10,030 apical four-chamber echocardiogram videos obtained from routine clinical practice at Stanford Medicine. Each video captured several cardiac cycles, encompassing various cardiac functions across a diverse patient demographic. Preprocessing steps included extracting individual frames from the videos, standardizing the resolution and frame rate, and applying data augmentation techniques—such as random cropping, rotation, and flipping—to improve the model’s generalization capabilities. The architecture of EchoNet-Dynamic was composed of three main components. First, a convolutional neural network (CNN) with atrous (dilated) convolutions was used to achieve frame-level semantic segmentation of the left ventricle, allowing for capturing spatial context without added computational burden. This segmentation network was trained using a weakly supervised approach, leveraging expert annotations to ensure accurate delineation of the left ventricle. Second, a 3D CNN with residual connections and spatiotemporal convolutions was employed to predict ejection fraction by analyzing the segmented cardiac cycles. The spatiotemporal convolutions enabled the model to effectively interpret the dynamic movements of the heart across the video frames, which is essential for precise ejection fraction estimation. Using residual connections helped overcome the vanishing gradient issue, thus enabling more profound and practical learning. Lastly, an ensemble learning strategy was adopted to integrate the segmentation outcomes with clip-level predictions, facilitating a thorough beat-to-beat analysis of cardiac function. The model was trained using stochastic gradient descent (SGD) with momentum, focusing on minimizing a combined loss function that included Dice loss for segmentation accuracy and the mean squared error (MSE) for ejection fraction prediction. The training involved the careful tuning of hyperparameters, with an initial learning rate of 0.001, a batch size of 32, and the model being trained over 50 epochs. Early stopping based on validation loss was employed to prevent overfitting. The model’s performance was rigorously evaluated using an internal test set of 1277 echocardiogram videos from Stanford Medicine and an external test set of 2895 videos from 1267 patients at Cedars-Sinai Medical Center. Performance was measured using metrics such as the Dice similarity coefficient for segmentation accuracy, mean absolute error (MAE) for ejection fraction predictions, and area under the curve (AUC) for classifying heart failure with reduced ejection fraction. To further verify the model’s generalization, it was tested on the external dataset without fine-tuning, demonstrating robustness across different healthcare systems.

The model was trained using SGD with momentum, optimizing a loss function that combined Dice loss for segmentation accuracy and the mean squared error loss for ejection fraction prediction. The training process involved the careful tuning of hyperparameters, with an initial learning rate of 0.001, a batch size 32, and training over 50 epochs. Early stopping based on validation loss was implemented to prevent overfitting. The model’s performance was evaluated on an internal test set of 1277 echocardiogram videos from Stanford Medicine and an external test set of 2895 videos from 1267 patients at Cedars-Sinai Medical Center. Key performance metrics included the Dice similarity coefficient for segmentation accuracy, mean absolute error (MAE) for ejection fraction estimation, and area under the curve (AUC) for classifying heart failure with reduced ejection fraction. To ensure generalization, the model was tested on the external dataset without fine-tuning, demonstrating its robustness across different healthcare systems.

3.3.2. Multi-Input Neural Network Model

Huang et al. [38] present an innovative approach to detecting congestive heart failure (CHF) through their study. This study uses deep learning techniques to develop a robust and practical system for CHF detection using electrocardiogram (ECG) signals. Their research aims to create a model that can accurately classify CHF by analyzing short-term ECG data, making the system suitable for real-world applications. The proposed method employs a multi-input neural network model that combines time-domain and frequency-domain features extracted from ECG signals. The data preprocessing involves segmenting the ECG signals into 7 min segments, followed by the extraction of RR intervals, significantly reducing computational complexity and enhancing the model’s focus on heart rate variability (HRV) features. By integrating these features, the model can capture critical information related to CHF, improving its detection accuracy.

3.3.3. Machine Learning and End-to-End Deep Learning

Gjoreski et al. [39] present a comprehensive study on the detection of chronic heart failure (CHF) using a combination of classic machine learning (ML) techniques and end-to-end deep learning (DL) approaches. Their research leverages heart sound recordings to develop an effective detection system. The dataset comprises recordings from 947 subjects, sourced from six publicly available datasets from the PhysioNet Challenge, as well as a newly collected CHF-specific dataset. These recordings were made using the 3M Littmann Electronic Stethoscope Model 3200 (3M Health Care, Maplewood, United States), ensuring high-quality data acquisition. The methodology integrates both ML and DL techniques to exploit their respective strengths. The classic ML approach involves extracting expert-defined features from the heart sound recordings, which are then used to train a classification model. In contrast, the DL approach employs an end-to-end learning framework that utilizes spectro-temporal representations of the heart sound signals. By generating spectrograms, the DL model can capture detailed and nuanced information embedded within the heart sounds, which might be overlooked in purely time-domain analyses.

3.3.4. CNN-Based Classifier

In the study conducted by Jeffrey J. Nirschl et al. [40], the primary goal was to develop and evaluate a deep learning classifier capable of detecting clinical heart failure from whole-slide images (WSIs) of hematoxylin and eosin (H&E) stained cardiac tissue. The study addressed the significant challenge of variability in the manual interpretation of endomyocardial biopsy (EMB) results, aiming to improve diagnostic consistency and accuracy. The authors utilized a convolutional neural network (CNN) to analyze WSIs from 209 patients, with 104 patients’ data used for training and 105 patients’ data reserved for independent testing. The CNN architecture was adapted from previous work by Janowczyk and Madabhushi [41], and it processed images to predict the likelihood of heart failure based on histopathological features. The dataset included patients diagnosed with end-stage heart failure and non-failing controls. CNN’s performance was compared to a traditional feature-engineering approach using WND-CHARM coupled with a random forest classifier. WND-CHARM extracted 4059 features per image for this method, with the top 20 features selected for the random forest model.

3.3.5. Deep Learning Framework for Feature Rearrangement

The proposed model in this paper Wang et al. [42] employs a deep learning framework called the feature rearrangement-based deep learning system (FRDLS). The core innovation of this system lies in its feature rearrangement strategy and the incorporation of a specialized convolutional layer called the feature rearrangement convolutional layer (FReaConv). The feature rearrangement process ensures that the order of input features is optimized for better performance by using chi-square tests to determine the significance and order of features. Additionally, the model employs the focal loss function to effectively handle the class imbalance issues common in medical datasets. The dataset used in this study was collected from the electronic health records (EHR) of Shanghai Shuguang Hospital, encompassing data from 10,198 patients from March 2009 to April 2016. The dataset includes records for in-hospital mortality, 30-day mortality, and 1-year mortality, providing a comprehensive basis for evaluating the model’s performance across different time frames.

3.3.6. Modified VGG16-Based Model

The study by Matsumoto et al. [43] focuses on diagnosing heart failure from chest X-ray images using a deep learning approach. The researchers used a dataset of 952 chest X-ray images obtained from the “ChestX-ray8” database [44], which was published by the National Institutes of Health (NIH). Two cardiologists relabeled these images to ensure accurate categorization into “normal” and “heart failure” classes. Of the original set, 260 images were labeled as “normal” and 378 as “heart failure”, while the rest were discarded due to incorrect labeling. The model employed for this study utilized transfer learning, which involves transferring the knowledge from a pre-trained model to a new but related task. Specifically, the convolutional layers of the VGG16 model [45], pre-trained on the ImageNet [46] dataset, were used. These layers were frozen, and only the final two layers of the network, which acted as the classifier, were trained on the relabeled chest X-ray dataset. The training process involved data augmentation techniques to enhance the model’s robustness. The augmentation included resizing, cropping, and rotating images but avoided left-right flipping and shear deformation to maintain the anatomical accuracy critical for medical interpretation. The model was implemented in Keras (version 2.2.5) with a TensorFlow (version 1.15) backend and trained using a graphical processing unit (GPU) on the Google Colaboratory platform. The training process was designed for 150 epochs, using binary cross-entropy as the loss function and SGD as the optimizer.

3.3.7. CNN and RNN-Based Fusion Model

The proposed model in Ning et al. [36] employs a CNN to capture local dependencies and extract robust features from the ECG signal, followed by an RNN that processes these features to capture temporal dependencies and sequence patterns. The hybrid architecture is designed to improve the accuracy of CHF detection by integrating the complementary strengths of CNN and RNN. Specifically, the model architecture includes multiple convolutional layers followed by max-pooling layers to reduce the dimensionality of the feature maps. These feature maps are then fed into an RNN layer, which helps in learning the temporal characteristics of the ECG signals. The model utilizes the Adam optimizer with a learning rate of 10⁻⁴ and 30,000 iterations for training. The network parameters include 84,630 weights, indicating a sophisticated model with significant learning capacity.

3.3.8. Deep Learning Model with Residual Blocks

The authors in Kanani et al. [37] used a deep learning model consisting of six residual blocks, each containing two convolutional layers, ReLU activation layers, and a max-pooling layer. The model utilizes 1D convolution along the time axis, with each convolutional layer containing 64 filters. This architecture is designed to extract many relevant features from the ECG signals, enhancing the model’s ability to classify the signals accurately. The final layers include two fully connected layers with 32 neurons each and a softmax activation layer that predicts the output class probabilities. The model was trained using the Adam optimizer with specific hyperparameters (learning rate: 0.001, beta-I: 0.9, beta-II: 0.999), and the learning rate decays exponentially by a factor of 0.75 every 10,000 iterations. The study utilized the MIT-BIH Arrhythmia Dataset, which includes ECG signals sampled at 125 Hz and contains five classes of heartbeats: normal (N), supraventricular (S), ventricular (V), fusion (F), and unknown (Q). The dataset initially sampled at 360 Hz was downsampled to 125 Hz for consistency. Various time-series data augmentation techniques enhanced the dataset, including stretching, amplifying, and squeezing the signals along the time axis. This augmentation increased the dataset size to 547,230 samples. The dataset was split, with 80% used for training and 20% for testing.

3.3.9. CNN Based on Contextual Data

The paper by Liu et al. [47] presents a novel deep learning approach to predict heart failure readmission using clinical notes, leveraging convolutional neural networks (CNNs). The study aims to enhance readmission models’ predictive power and interpretability by utilizing unstructured data from electronic health records (EHRs), specifically, discharge summary notes. The model employed in Liu et al. (2019) [47] is a CNN, which is particularly well-suited for processing sequential data such as text. The CNN architecture uses word embeddings to capture the contextual information within clinical notes. These embeddings are pre-trained on PubMed abstracts and full-text articles, ensuring they are well-suited for medical text. The CNN model automatically generates feature maps from the text, which are then used to predict readmissions. The methodology involves many steps. The study uses the MIMIC III database, which includes detailed clinical and billing data for over 58,000 hospital admissions. Heart failure admissions are identified using specific ICD-9 codes. Admissions are then labeled based on whether any readmission or readmission follows them within 30 days. The dataset is split into training and test sets, with an equal number of positive (readmission) and negative (non-readmission) samples to address data imbalance. The CNN is trained on the training data, with the discharge summary notes as input. The CNN architecture includes multiple convolutional filters to capture different features from the text. Precision, recall, and F1 score metrics evaluate the model’s performance. A random forest model is also trained using the same data but with TF-IDF features for comparison.

3.3.10. 3D CNN with Frame-Level Segmentation

The study by Grant Duffy et al. [48] aims to enhance the accuracy and throughput of clinical phenotyping for left ventricular hypertrophy (LVH) using a deep learning (DL) workflow. This approach targets the limitations in current methodologies, such as under-recognition, measurement error, and variability in distinguishing causes of increased left ventricular wall thickness. The researchers developed an end-to-end DL workflow consisting of two main components: a frame-level semantic segmentation model for quantifying left ventricular wall thickness and a three-dimensional convolutional neural network (3D CNN) with residual connections to predict the cause of LVH. The DL algorithm was trained and tested on a large dataset of echocardiogram videos, including parasternal long-axis and apical four-chamber views, from multiple sources: Stanford Healthcare, Cedars-Sinai Medical Center (CSMC), and the Unity Imaging Collaborative. The cohort comprised 23,745 patients, with diverse data collected between 1 January 2008 and 31 December 2020. The study’s dataset included 12,001 patients from Stanford Healthcare, 1309 from CSMC for parasternal long-axis videos, 8084 from Stanford, and 2351 from CSMC for apical four-chamber videos. The DL model demonstrated high accuracy in measuring left ventricular dimensions: a mean absolute error (MAE) of 1.2 mm for intraventricular wall thickness, 2.4 mm for LV diameter, and 1.4 mm for posterior wall thickness. Regarding classification performance, the model achieved an area under the curve (AUC) of 0.83 for cardiac amyloidosis and 0.98 for hypertrophic cardiomyopathy. The model maintained strong performance in external datasets with R-squared values of 0.96 (domestic) and 0.90 (international) for ventricular parameter quantification.

3.3.11. Deep Learning Algorithm on ECG

Narang et al. study [49] developed and tested a novel deep learning (DL) algorithm designed to guide novice users in acquiring echocardiographic images of diagnostic quality. This AI-driven software, named Caption Guidance, was trained on more than 5 million examples of ultrasonographic probe movements and their corresponding image quality outcomes. The DL algorithm leverages convolutional neural networks to provide real-time, prescriptive guidance to operators, assisting them in obtaining anatomically correct transthoracic echocardiographic (TTE) images from standard transducer positions. The algorithm estimates image quality, geometric positioning, and corrective manipulations to optimize the echocardiographic image, operating without additional sensors or trackers. The study involved a prospective, multicenter diagnostic trial conducted at two academic hospitals: Northwestern Memorial Hospital in Chicago, Illinois and the Minneapolis Heart Institute in Minnesota. The cohort comprised eight nurses with no prior experience in echocardiography, each scanning 30 patients who were scheduled for clinically indicated echocardiograms. This resulted in a total of 240 patient scans, covering a range of body mass indexes (BMIs) and cardiac conditions to ensure generalizability. The patients’ ages ranged, with a mean age of 61 years and the majority male (57.9%). The scans performed by the nurses were compared with those obtained by experienced sonographers using the same echocardiographic equipment but without AI guidance.

3.4. Impact of Symmetry and Asymmetry in Model Design on Heart Failure Detection

The architectural design of machine learning models, particularly the choice between symmetrical and asymmetrical structures, profoundly influences their effectiveness in detecting heart failure and other cardiovascular abnormalities. Symmetrical models, such as the U-Net [50] architecture as shown in Figure 5, rely on mirrored contracting and expansive pathways to process input data while maintaining spatial correspondence [20]. This design is particularly suited to tasks like echocardiographic segmentation, where the uniform flow of information ensures the accurate localization of anatomical features, such as the boundaries of cardiac chambers. Symmetry in these architectures supports the efficient processing of well-structured data and fosters consistent feature hierarchies across the network. However, this uniformity limits their capacity to handle heterogeneous datasets or subtle abnormalities that deviate from expected patterns.

Asymmetrical models such as the one shown in Figure 6 address these limitations by introducing flexibility in feature extraction and prioritization, allowing the network to focus on clinically significant irregularities. For instance, Liu et al. [21] demonstrated the effectiveness of dual-attention mechanisms in asymmetrical architectures for detecting anomalies in chest X-rays, such as cardiomegaly or pulmonary congestion. By dynamically weighting regions of interest, these models enhance sensitivity to subtle diagnostic markers. Similarly, in ECG analysis, Valehi et al. [51] utilized asymmetrical signal processing techniques to identify early signs of cardiac dysfunction, emphasizing variations in left–right ventricular features. In imaging studies, McCracken et al. [52] highlighted that ventricular volume asymmetry serves as a predictive biomarker for clinical outcomes, showcasing the diagnostic value of asymmetrical modeling approaches.

Moreover, asymmetry plays a vital role in addressing imbalanced datasets, a common challenge in medical applications. Techniques like weighted cross-entropy and focal loss are frequently employed in asymmetrical models to mitigate the impact of data imbalance, ensuring that rare pathological conditions receive adequate representation during training [53]. Integrating symmetrical efficiency with asymmetrical sensitivity provides a comprehensive framework for heart failure detection. This hybrid approach not only enhances diagnostic accuracy but also improves the generalizability of models across diverse patient populations, making them robust tools for clinical application. Table 2 shows key comparisons on the use of symmetric and asymmetric models in detection of heart failure.

3.5. Methodological Insights on Symmetry in Computer Vision Models

Symmetry and asymmetry also play a central role in shaping the methodologies of computer vision models, particularly in feature extraction and optimization. Symmetrical architectures like U-Net exploit their mirrored design to maintain a consistent resolution throughout the network, effectively capturing hierarchical features [20]. The contracting path extracts low-level features such as edges and textures, while the expansive path reconstructs high-level semantic information, making them highly effective for tasks like cardiac segmentation. The spatial correspondence maintained between encoder and decoder layers ensures the accurate localization of anatomical features, which is critical for reliable medical imaging.

While symmetrical models provide a strong foundation, they are often insufficient for addressing the variability and complexity of real-world medical data. Asymmetrical methodologies augment these models with mechanisms that focus on nuanced features. As employed by Liu et al. [21], dual-attention modules selectively amplify diagnostic signals in specific regions, improving the detection of subtle pathologies like asymmetric ventricular dilation. Additionally, multi-scale feature fusion has been incorporated into asymmetrical architectures to combine fine-grained and global information, enhancing the model’s robustness against data resolution and scale variations. These mechanisms allow asymmetrical models to excel in identifying rare and complex conditions.

Optimization strategies further demonstrate the importance of asymmetry. Traditional symmetric models rely on standard loss functions like cross-entropy, which may not adequately address the challenges of imbalanced datasets. In contrast, asymmetrical loss functions, such as focal loss and weighted cross-entropy, prioritize rare classes, improving sensitivity to underrepresented conditions [53]. Data augmentation strategies, including elastic transformations, flipping, and cropping, also play a critical role by simulating asymmetries inherent in real-world datasets, enriching the model’s training experience.

Integrating symmetrical designs for efficient feature extraction with asymmetrical enhancements in attention mechanisms, optimization strategies, and data augmentation leads to a balanced methodological framework. This hybrid approach allows models to achieve high diagnostic accuracy and generalization, positioning them as powerful tools for the early detection and management of heart failure and other cardiovascular diseases.

3.6. Reinforcement Learning in Cardiology

Recent advancements in reinforcement learning (RL) have demonstrated significant potential in optimizing diagnostic and therapeutic processes in cardiology. RL frameworks such as shown in Figure 7 leverage reward-based mechanisms to iteratively refine decision-making policies, which is especially beneficial in addressing challenges such as data imbalance and local optima in deep learning models. For instance, Mirzaee Moghaddam Kasmaee et al. [54] developed a novel deep learning framework that integrates ensemble learning with RL to diagnose myocarditis using cardiac magnetic resonance images. In their approach, the RL component formulates the diagnostic process as a sequential decision-making task, where the model is incentivized to correctly identify rare pathological cases, thereby enhancing sensitivity and overall diagnostic accuracy. Extending this paradigm to heart failure detection, RL-based methods offer promising avenues for real-time risk stratification, personalized treatment planning, and the dynamic adjustment of clinical decision thresholds. Such integration of RL with advanced imaging and deep learning techniques is poised to significantly improve the efficiency and precision of cardiovascular diagnostics.

3.7. Natural Language Processing in Cardiology

Advancements in natural language processing (NLP) have transformed the analysis of unstructured clinical data, thereby enhancing patient management in cardiology. For instance, Adejumo et al. [55] developed a deep learning-based NLP framework that automatically extracts New York Heart Association (NYHA) classifications and heart failure (HF) symptom descriptions from clinical notes, an example of how the model is working is shown in Figure 8. This approach improves the consistency of functional status assessments and facilitates real-time monitoring and quality improvement in HF management. Additionally, NLP techniques have been effectively applied to identify heart failure with reduced ejection fraction (HFrEF) from discharge summaries, achieving high performance metrics in external validations [56]. Integrating these NLP models into clinical workflows offers a scalable solution to process large volumes of textual data, thereby supporting evidence-based decision making and optimizing patient outcomes in cardiovascular care.

Figure 7. Workflow of reinforcement learning model [57].

Figure 8. Natural language processing model workflow [55].

4. Results

In this section, we focused on some of the research papers’ and articles’ findings and provided some details on what the papers found regarding relevant information.

The EchoNet-Dynamic model demonstrated by David et al. [35] shows robust performance across several key tasks in cardiac function assessment, as evidenced by its application on internal and external datasets. For the task of left ventricle segmentation, the model achieved a high level of accuracy, with a Dice similarity coefficient of 0.92, indicating precise delineation of the cardiac structure from echocardiogram videos. This performance was consistent across the internal test set from Stanford Medicine and the external test set from Cedars-Sinai Medical Center, highlighting the model’s ability to generalize effectively across different patient populations and clinical settings. Regarding ejection fraction estimation, the model exhibited an MAE of 4.1% on the internal test set, reflecting its capacity to provide highly accurate predictions. The model’s performance on the external test set was similarly strong, with an MAE of 6.0%, further underscoring its robustness and potential for clinical application. These results were particularly notable given the variability often observed in manual ejection fraction assessments, suggesting that EchoNet-Dynamic could reduce the inter-observer variability typically seen in clinical practice. Furthermore, the model excelled in classifying heart failure with reduced ejection fraction, achieving an AUC of 0.97 on the internal and 0.96 on the external test sets. These high AUC values indicate that the model was highly effective at distinguishing between patients with and without heart failure, a critical aspect of cardiac diagnostics. The model’s consistency across different datasets, particularly its strong generalization to an external healthcare system without fine-tuning, emphasizes its potential for widespread adoption in clinical practice. These findings represent a significant advancement in the application of artificial intelligence to echocardiography, offering a reliable tool for improving the accuracy and consistency of cardiac assessments.

The performance of the fusion of CNN and RNN-based deep learning models in Li et al. [58] is noteworthy. The model achieved an impressive accuracy of 97.6%, with a sensitivity of 96.3% and a specificity of 97.4%. Additionally, the positive predictivity rate was 97.1%. These metrics highlight the model’s high reliability and effectiveness in staging heart failure, surpassing traditional machine learning models such as multilayer perceptrons (MLP), random forests (RF), classification and regression trees (CART), and support vector machines (SVM). Using 10-fold cross-validation further ensured the robustness and generalizability of the model’s performance.

The multi-input neural network model designed by Huang et al. [38] demonstrated high performance in detecting CHF. During the training phase, the model achieved an accuracy of 93.76%, while the testing phase showed an accuracy of 86.74%. These results underscore the effectiveness of their approach in identifying CHF from short-term ECG recordings. RR intervals, a known significant feature from HRV analysis, further enhance the model’s capability to detect CHF.

The hybrid ML-DL model demonstrated impressive performance metrics in Gjoreski et al. (2020) [39]. The aggregated accuracy of the model was 92.9%, with a notable PhysioNet Challenge score of 89.3, reflecting a significant improvement over baseline methods. Additionally, the system’s ability to differentiate between compensated and decompensated phases of CHF was highlighted, with an accuracy of 93.2% using a decision tree classifier. This differentiation is crucial for clinical decision making and patient management. The CNN in Nirshl et al. [40] demonstrated notable results, achieving a sensitivity of 99% and a specificity of 94% in the test set, which outperformed the traditional feature-engineering method and two expert pathologists who reviewed the same test images. The pathologists had an accuracy of 75% at the patient level, indicating CNN’s superior performance by nearly 20% in sensitivity and specificity. The dataset consisted of 2299 regions of interest (ROIs) extracted from WSIs, split between failing (1034 ROIs) and non-failing (1265 ROIs) cohorts. The study’s findings underscored the potential of deep learning models to aid in the objective assessment of cardiac histopathology, reducing inter-rater variability and improving diagnostic accuracy.

The experimental results in Wang et al. [42] demonstrate that the proposed FRDLS framework significantly outperforms traditional machine learning methods in predicting heart failure mortality. The system achieved the highest accuracy and area under the curve (AUC) metrics for predicting in-hospital mortality, 30-day mortality, and 1-year mortality. The framework’s effectiveness is further highlighted by its ability to identify the top 12 clinical features most predictive of heart failure mortality, aiding clinicians in treatment and research. Specifically, the model’s accuracy and AUC metrics indicate robust predictive capabilities, making it a valuable tool for early detection and intervention in heart failure cases. Furthermore, the study identifies the top 12 essential clinical features that significantly contribute to heart failure prediction, aiding clinicians in treatment planning and research.

In the paper Matsumoto et al. [43], the dataset was split into training, validation, and test sets in a ratio of approximately 9:1:1. The model achieved an accuracy of 8% on the test set, with a sensitivity of 75% and a specificity of 94.4%. The learning curves indicated good convergence, with training and validation accuracies reaching 93.9% and 92%, respectively, and log loss values of 17.6% and 23.8% (visualization and interpretation). The authors employed gradient-class activation maps (Grad-CAMs) to interpret the deep learning model’s decisions. These heatmaps highlighted the regions of the chest X-rays that the model considered most important for classification. The Grad-CAMs revealed that the model focused on the lung and heart fields, similar to how human radiologists might approach the task.

The CNN model in Liu et al. [47] significantly outperformed the random forest model. The CNN achieved an F1 score of 0.756 for general readmission prediction and 0.733 for 30-day readmission prediction. In contrast, the random forest model achieved F1 scores of 0.674 and 0.656 for the same tasks, respectively. These results highlight the effectiveness of CNNs in capturing the nuanced information in clinical notes. The study also introduces a chi-square test-based method for feature interpretation, allowing for the identification of key terms associated with readmissions. The study by Duffy et al. [48] highlighted the deep learning model’s ability to accurately detect cardiac amyloidosis and hypertrophic cardiomyopathy, showcasing its potential for routine clinical use. The automated workflow provided reproducible and precise measurements, surpassing the capabilities of traditional human interpretation. This automation facilitates the early and accurate identification of LVH and its underlying causes, which can significantly impact patient care by enabling timely and appropriate interventions. When fine-tuned using the training split of the Unity Imaging Collaborative data set, the deep learning algorithm showed an improved performance, with an overall

R^{2}

of 0.92 and MAEs of 1.7 mm (95% CI, 1.5–2.0 mm) for IVS thickness, 2.9 mm (95% CI, 2.4–3.3 mm) for LVID, and 2.3 mm (95% CI, 1.9–2.7 mm) for LVPW thickness on the Unity Imaging Collaborative validation data split, indicating a data shift and potential variations in practice across institutions and continents. On the CSMC external test data set, the deep learning algorithm showed a robust prediction accuracy, with an overall

R^{2}

of 0.96 and MAEs of 1.7 mm (95% CI, 1.6–1.8 mm) for IVS thickness, 3.8 mm (95% CI, 3.5–4.0 mm) for LVID, and 1.8 mm (95% CI, 1.7–2.0 mm) for LVPW thickness with beat-to-beat evaluation.

The primary endpoints evaluated in the study by Narang et al. [49] were the diagnostic quality of the images concerning left ventricular (LV) size and function, right ventricular (RV) size, and the presence of a nontrivial pericardial effusion. The results were promising, with diagnostic quality achieved in 98.8% of cases for LV size and function and pericardial effusion and in 92.5% of cases for RV size. For secondary endpoints, which included additional clinical parameters such as RV function and valve assessments, the scans performed by nurses were not significantly different from those performed by sonographers for most parameters. This study demonstrates the potential of AI to substantially expand the reach of echocardiography, particularly in settings with limited access to trained sonographers. The DL algorithm’s ability to guide novices in acquiring high-quality diagnostic images could be particularly beneficial in emergency departments, intensive care units, and rural or underserved areas. The software’s compatibility with multiple ultrasonography vendors further enhances its utility and scalability.

The proposed hybrid deep learning model in the research paper by Ning et al. [36] demonstrates superior performance compared to traditional methods. Experiments show the model accurately detects CHF, with an AUC near 1 that excellently differentiates CHF from normal sinus rhythm (NSR). The model’s efficacy is further validated through statistical tests, showing significant differences (p < 0.05) in the time domain, frequency domain, and nonlinear indices between CHF and NSR subjects. The hybrid approach improves the accuracy and enhances the model’s stability and convergence speed during training. The proposed model in Kanani et al. [37] was evaluated using the original and augmented datasets. Results showed that the augmented dataset significantly improved model performance. Key metrics included the following: (i) an initial model with the original dataset, F1 score of 0.89; (ii) an initial model with the augmented dataset, F1 score of 0.98; (iii) the proposed model with the original dataset, F1 score of 0.90; and (iv) the proposed model with the augmented dataset, F1 score of 0.98. While the F1 score improvement from the original to the augmented dataset is evident, the proposed model’s main advantage is its increased convergence speed and training stability, mainly when using the augmented dataset. This makes the proposed model more robust and reliable for ECG signal classification. In the following, Table 3 summarizes key details (models, data, and results) of the reviewed studies, while Table 4 and Table 5 present our opinions on the reviewed papers.

5. Discussion

In this Section, we thoroughly discussed some of the research papers and articles we previously mentioned throughout our paper. We discussed what they did, focusing on the advantages and challenges they faced, the direction for future researchers, and any recommendation from our point of view.

The development and evaluation of EchoNet: When applied to an external dataset from Cedars-Sinai Medical Center, the model maintained high-performance leveling technologies to enhance the accuracy and consistency of cardiac function assessments. The model’s strong performance across multiple tasks—such as left ventricle segmentation, ejection fraction prediction, and heart failure classification—demonstrates its capability not only to meet but, in some cases, exceed the diagnostic accuracy of human experts. This success highlights the transformative impact that artificial intelligence can have in echocardiography, particularly in reducing variability and improving the reliability of clinical measurements. One of the most compelling aspects of EchoNet-Dynamic is its generalizability. The model maintained high performance levels when applied to an external dataset from Cedars-Sinai Medical Center without requiring any fine-tuning. This suggests the model is robust enough to be deployed across different healthcare systems, a crucial requirement for widespread clinical adoption. The ability to generalize across diverse patient populations and settings is a key strength, indicating that EchoNet-Dynamic could be integrated into various clinical workflows with minimal adaptation. However, the study also highlights several challenges and areas for future improvement. One major issue is the “black-box” nature of deep learning models, limiting clinicians’ understanding of how specific predictions are made. Enhancing the interpretability of models like EchoNet-Dynamic is essential for gaining clinician trust and ensuring that AI tools are used effectively. Future research should focus on developing methods to make the model’s decision-making process more transparent, possibly by integrating explainable AI techniques. Future work could explore the integration of EchoNet-Dynamic into clinical trials, providing a more rigorous evaluation of its impact on patient outcomes and its utility in everyday clinical practice. Expanding the training dataset to include more diverse populations and rare cardiac conditions could improve the model’s accuracy and applicability. Its ability to provide accurate, consistent, and generalizable assessments of cardiac function marks a promising step forward in using AI for cardiovascular diagnostics. As research continues to address challenges related to interpretability and real-world application, models like EchoNet-Dynamic could play an increasingly central role in improving cardiac care.

A key aspect of Gjoreski et al.’s [39] approach is using an overlapping-sliding window technique for data segmentation, which enhances the model’s robustness in noisy environments. This method allows the system to maintain high accuracy even with imperfect data, making it more applicable in real-world clinical settings. The combined ML-DL approach improves detection accuracy and offers a practical solution for continuously monitoring CHF patients, potentially reducing hospital readmissions and improving overall patient outcomes. Gjoreski et al.’s [39] study underscores the transformative potential of combining machine learning and deep learning techniques in cardiac care. Their work illustrates how leveraging multiple analytical frameworks can lead to superior diagnostic tools, enhancing the early detection and management of chronic heart failure. This approach represents a significant advancement in the application of artificial intelligence in healthcare, highlighting the importance of integrating diverse methodologies to achieve optimal performance. The study’s reliance on heart sound recordings, which are highly susceptible to noise and artifacts, presents a significant challenge. While the overlapping sliding window technique helps mitigate this, maintaining data quality in real-world conditions remains problematic. Additionally, combining ML and DL techniques increases the model’s complexity, potentially leading to higher computational costs and implementation difficulties [61], especially in resource-limited settings. The classic ML approach’s dependence on expert-defined features may limit the model’s ability to discover novel patterns in the data [62]. Although end-to-end DL models excel at feature discovery, their effectiveness depends heavily on the quality and quantity of training data [63]. Furthermore, while multiple datasets are used, the study needs a thorough discussion of the diversity of these datasets in terms of demographic and clinical characteristics, which could impact the model’s generalizability to different patient populations. The authors note that their system requires additional refinement and validation with larger, more diverse datasets. They acknowledge that the spectrogram-based DL approach might need further optimization to handle various real-world conditions effectively.

One of the advantages of Huang et al.’s [38] method is its practicality. By focusing on short-term ECG segments and leveraging RR interval features, the system can be implemented in various healthcare settings, from clinics to remote monitoring systems. This makes it a valuable tool for the early detection and continuous monitoring of congestive heart failure (CHF), potentially reducing the need for frequent hospital visits and improving patient outcomes. Huang et al. [38] contribute significantly to the field of CHF detection with their multi-input deep learning network model. Their research highlights the importance of combining time-domain and frequency-domain features in a deep learning framework, demonstrating high accuracy and practical applicability. This study advances the current state of CHF detection and provides a foundation for future research and development of AI-driven cardiac care solutions. Focusing on short-term ECG segments (7 min) may not capture the long-term trends and variations essential for comprehensive CHF monitoring. Additionally, the complexity of the multi-input model could hinder its interpretability [64], making it challenging for clinicians to understand and trust the model’s predictions [65]. This issue of interpretability is typical with deep learning models but is particularly significant in clinical settings. The study should provide detailed information about the dataset’s diversity and size, which is crucial for assessing the model’s generalizability across different patient populations and conditions. Furthermore, like Li et al. [58], Huang et al. [38] do not address the practical aspects of integrating their model into existing healthcare systems, including real-time processing and computational resource requirements. The authors mention the need for further validation with larger datasets to confirm the model’s effectiveness and reliability. They also highlight the potential requirement for integrating additional clinical features to improve the model’s predictive capabilities.

An essential strength of the study by Li et al. [58] is integrating clinical features with ECG-derived features, enhancing the model’s ability to make accurate predictions based on comprehensive patient data. This holistic approach improves staging accuracy and offers the potential for personalized treatment plans and better patient management [66,67]. However, the data used in the study may need more diversity, potentially limiting the model’s generalizability to various populations and conditions. The authors do not provide detailed information on whether the dataset includes a wide range of demographic variables such as age, gender, and comorbidity, which is essential for ensuring broad applicability. Secondly, the study needs to discuss the practical implementation of the model in real-world clinical settings. Integration with existing healthcare systems, computational requirements, and real-time processing capabilities are critical for practical deployment but must be addressed. Moreover, while the study compares the proposed deep learning model with traditional machine learning models, it would benefit from comparisons with other state-of-the-art deep learning models to provide a clearer understanding of the relative advantages and limitations. Lastly, comprehensive clinical validation needs to be improved. Although the model achieves high accuracy in a controlled setting, its performance in real-world clinical environments must be validated to ensure reliability and effectiveness in routine practice. The authors acknowledge the need for further validation with more extensive and diverse datasets and suggest integrating more clinical features to enhance the model’s predictive power. Although the dataset includes 209 patients in [40], the model’s generalizability might be limited due to the specific patient cohort from a single institution. Broader multi-center studies would strengthen the findings. Integrating such a model into clinical workflows requires extensive validation and regulatory approval. The transition from research to practical application involves numerous steps that must be addressed in the study. The model’s reliance on WSIs from H&E stained tissues may encounter challenges in varied staining techniques and tissue preparation practices across different labs, potentially impacting model performance. Integrating such a model into clinical workflows requires extensive validation and regulatory approval [68]. The transition from research to practical application involves numerous steps not addressed in the study.

The research paper [43] presents a robust approach to diagnosing heart failure using deep learning techniques applied to chest X-ray images. Transfer learning from the well-established VGG16 [45] model allowed the researchers to achieve high accuracy even with a relatively small dataset. This methodology is particularly beneficial as it leverages pre-learned features from a vast dataset (ImageNet [46]) to enhance performance on a specific medical imaging task. However, there are several limitations to this study. While enhanced by relabeling and data augmentation, the dataset size remains relatively small for deep learning applications, which typically benefit from larger datasets. This limitation could potentially impact the generalizability of the model [69] to broader clinical settings. The study did not differentiate between various causes of cardiomegaly, such as heart failure, and other non-cardiac conditions like pneumonia. Future research should aim to include a more diverse range of clinical conditions to improve the model’s diagnostic accuracy and clinical utility. Moreover, the ChestX-ray8 dataset used in Wang et al.’s study [44] used provides limited patient information, such as age and gender, which are crucial factors in clinical diagnosis. Relying solely on chest X-ray images without integrating other clinical data may reduce diagnostic accuracy in real-world scenarios. Future work could address this by incorporating comprehensive patient data, including echocardiograms and laboratory results, to build a more nuanced and clinically relevant diagnostic tool.

Although the dataset from Shanghai Shuguang Hospital is extensive in Wang et al’s reseach [42], it is geographically and demographically limited to a specific population. This limitation may affect the generalizability of the model [70] to other populations with different demographics, comorbidities, and healthcare practices. Future studies should consider using more diverse datasets from multiple regions to enhance the robustness and applicability of the model. Deep learning models, particularly those involving complex architectures like convolutional neural networks, often operate as “black boxes”. While the study identifies the top 12 clinical features contributing to heart failure prediction, it does not provide detailed interpretability mechanisms to understand how these features influence the model’s decisions. More transparent models or the inclusion of interpretability techniques such as SHAP (SHapley Additive exPlanations) [71,72] values could improve trust and usability among clinicians. The study primarily uses static EHR data for prediction. However, heart failure is a chronic condition with temporal progression. Incorporating longitudinal data analysis [73], which accounts for changes in a patient’s health status over time, could improve prediction accuracy and provide more meaningful insights into disease progression. While accuracy and AUC are important metrics, the study could benefit from a more comprehensive evaluation framework. Metrics such as precision, recall, F1-score, and calibration plots could provide a deeper understanding of the model’s performance [74], particularly in handling the class imbalance issue inherent in medical datasets.

The method used in Liu et al’s paper [47] helps to reveal clinical insights embedded in the notes, such as mentions of specific medications or treatments that are more prevalent in readmission cases. Liu et al.’s study demonstrates that deep learning models, specifically CNNs, can effectively utilize unstructured clinical notes to predict hospital readmissions. The approach improves prediction accuracy and provides interpretable results that can assist clinicians in understanding the factors contributing to readmissions. This research suggests that integrating advanced NLP techniques with deep learning can significantly enhance healthcare outcomes’ predictive modeling [75]. Despite these advances, common limitations such as data specificity, generalizability, and computational complexity persist. Future research should enhance model generalizability across diverse datasets and optimize computational efficiency while integrating multiple data sources and validating models in real-world clinical settings.

The dataset used for training and evaluating the model in Ning et al.’s study [36] includes RR interval sequences extracted from ECG signals, with the original dataset undergoing extensive augmentation to enhance the robustness and generalization of the model. Time-domain and frequency-domain methods are employed for feature extraction, where time-domain features such as the standard deviation of normal-to-normal intervals (SDNN) and the square root of the mean squared differences (RMSSD) are computed. Frequency-domain features are derived using discrete Fourier transformation to analyze the power spectral density of the RR intervals, focusing on low-frequency (LF) and high-frequency (HF) components. Nonlinear methods are also applied to capture the complex physiological characteristics of the cardiovascular system. The hybrid deep learning algorithm proposed in this study effectively combines the strengths of CNNs and RNNs to provide a robust solution for automatically detecting congestive heart failure. Integrating extensive feature extraction techniques and advanced deep learning architectures results in a model that significantly improves detection accuracy and reliability, making it a valuable tool in the Internet of Medical Things for monitoring and diagnosing heart conditions. One of the primary criticisms of the study is the limited duration of the ECG data used for CHF detection. The study primarily utilized ECG data segments of 5 min and 1 min, which, while sufficient for specific clinical demands, do not address the need for ultra-short-term ECG data analysis. In real-world clinical settings, there is a pressing need to develop models that can deliver accurate diagnostics with shorter ECG segments to alleviate clinical stress and improve patient throughput. Another notable limitation is noise exclusion during the extraction of RR intervals. In practical scenarios, ECG signals are often contaminated with various types of noise, especially when collected using wearable devices in non-clinical environments. The paper does mention plans to add different noise proportions to standard signals for future model validation, but the current model’s generalization ability to noisy data remains unverified. The study also only used data from patients with grade 3 and grade 4 CHF, which raises questions about its applicability to earlier stages of the disease. Early detection of CHF is crucial for preventing disease progression and improving patient outcomes. Furthermore, the study’s sample size is relatively small, with fewer than 20 samples being used. The researchers addressed this by segmenting the original data into 5 min and 1 min segments, thereby increasing the data quantity for training and testing the deep learning model. However, this segmentation might introduce correlations that do not exist in a larger, more diverse dataset.

A commonality across these studies is the utilization of advanced neural network architectures to analyze complex cardiovascular data. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are frequently employed due to their proficiency in handling image and sequential data. For instance, the studies by Narang et al. [49] and Duffy et al. [48] utilized CNNs to guide novice users in acquiring echocardiographic images and to automatically measure left ventricular (LV) dimensions, demonstrating high diagnostic accuracy. Similarly, the work by Chen et al. leveraged RNNs to analyze heart sound recordings, achieving notable performance in diagnosing chronic heart failure. Another significant similarity is the emphasis on large and diverse datasets for training and validating these AI models. The studies consistently highlight the importance of using extensive datasets to ensure the generalizability and robustness of their models. For example, Duffy et al. [48]. utilized a dataset of over 23,000 echocardiogram videos from multiple institutions. At the same time, Liu et al. incorporated a wide range of clinical and imaging data to train their predictive models. This approach improves the models’ performance and ensures they can be effectively applied in different clinical settings. Moreover, these papers underscore the potential of AI to streamline and enhance clinical workflows. By automating time-consuming and repetitive tasks, AI enables clinicians to focus on more critical aspects of patient care. The studies by Wang et al. [42]. and Gjoresk et al. [39]. highlight how AI can assist in real-time decision making during echocardiographic assessments and automate the identification of heart failure phenotypes from imaging data, thereby reducing clinician workload and improving diagnostic efficiency. The pursuit of personalized medicine is another recurring theme. Several studies aim to leverage AI to provide tailored treatment strategies based on individual patient data. For instance, the work by Huang et al. [38], and Li et al. [58] focuses on using AI to predict patient-specific responses to heart failure treatments, enabling more effective and personalized therapeutic interventions. This individualized approach is expected to improve patient outcomes and optimize resource utilization in healthcare systems. The accuracy and performance metrics reported in these studies reflect the significant advancements in AI algorithms and their practical applicability in cardiology. Most studies report high accuracy, sensitivity, and specificity in their models, showcasing the reliability of AI in diagnosing and predicting heart failure. The consistent performance across various metrics and datasets highlights the maturity and readiness of these technologies for clinical deployment.

Lastly, from this study, we see that the AI-based risk model outperforms traditional risk scores, including the Framingham and MAGGIC risk scores. Specifically, AI models achieved an AUC of 0.8 to 1, sensitivity of 75% to 97%, and a specificity of 90% to 100%. In contrast, the Framingham risk score, as detailed by D’Agostino et al. [76], demonstrated an AUC of 0.75, sensitivity of 70%, and specificity of 68%. Similarly, the MAGGIC risk score, as reported by Pocock et al. [77], showed an AUC of 0.73, sensitivity of 68%, and specificity of 70%. These results indicate that the deep learning models may provide more precise risk stratification for adverse cardiovascular outcomes. Furthermore, recent investigations have underscored the potential for integrating artificial intelligence with established risk assessment tools to enhance predictive accuracy in cardiology [78]. This integration is particularly promising given that the conventional models, while robust, often rely on limited clinical parameters and may not capture the full complexity of cardiovascular risk factors. In a complementary approach, de Bakker [79] introduced a longitudinal, biomarker-based methodology for cardiovascular risk assessment, emphasizing the importance of incorporating emerging biomarkers for enhanced predictive accuracy. Together, these studies underscore the potential benefits of integrating novel AI-based methodologies with established risk scores to improve clinical decision making and patient outcomes.

6. Conclusions

In this review, we explored the role of AI models in heart failure detection and diagnosis, highlighting significant advancements in improving diagnostic accuracy and enhancing predictive capabilities. AI-driven approaches—including deep learning, machine learning, and complex statistical models—have demonstrated superior performance in identifying HF biomarkers and predicting patient outcomes compared to traditional methods. These models leverage large, heterogeneous datasets to offer personalized and timely clinical insights, marking a pivotal shift towards personalized healthcare.

Nonetheless, several limitations persist. Despite high accuracies reported across studies, many models like EchoNET Dynamic or the hybrid model (CNN + RNN) are constrained by imbalanced datasets, data redundancy, and limited demographic diversity, which may impair their generalizability to broader patient populations. Data privacy remains a major concern [34,80], and the “black-box” nature of many AI models limits interpretability and clinician trust. Moreover, integrating these technologies into clinical practice requires extensive validation to ensure safety and efficacy [81].

Addressing these gaps, we propose several actionable suggestions for improvement:

Enhance Explainability: Integrate generative models with explainability frameworks (e.g., SHAP values) to demystify model decisions and build clinician trust.
Develop Lightweight Architectures: Focus on creating computationally efficient models that can be deployed in resource-constrained environments.
Leverage Hybrid Approaches: Combine transformers with VAEs or integrate multi-modal data fusion techniques to capture complex interactions among diverse clinical data sources.
Improve Dataset Diversity: Expand and diversify training datasets to cover various demographics, clinical conditions, and imaging protocols, thereby enhancing model generalization.

The challenges investigated in this review—from data quality and integration issues to model interpretability and regulatory hurdles—are critical for advancing AI in heart failure research. We underscore the need for interdisciplinary collaboration to overcome these obstacles by providing clear solutions and actionable future research directions. In summary, while AI models have already shown promising results in enhancing HF detection and management, future work should prioritize transparent, scalable, and clinically validated approaches to realize AI’s potential in personalized healthcare fully.

Author Contributions

Conceptualization, I.A.U. and O.H.; Methodology, I.A.U.; Software, I.A.U.; Validation, I.A.U. and O.H.; Formal analysis, I.A.U.; Investigation, I.A.U.; Resources, O.H.; Data curation, I.A.U.; Writing—original draft preparation, I.A.U.; Writing—review and editing, I.A.U. and O.H.; Visualization, I.A.U.; Supervision, O.H.; Project administration, O.H.; Funding acquisition, O.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is contained within the articles mentioned in this study.

Acknowledgments

This paper did not employ GenAI tools for generating images, designing tables, or conducting any formal content review. All research, arguments, and data interpretations remain the work of the authors. Instead, GenAI such as ChatGPT’s o1 mini was used solely to check grammar, improve sentence structure, and brainstorm potential titles, with all suggestions being carefully evaluated before inclusion. The tool did not replace human judgment, nor did it alter any factual or theoretical elements of the paper. These decisions ensure that the integrity and originality of the content remain intact, while still benefiting from a streamlined writing process that respects ethical standards for academic work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ogunniyi, M.; Commodore-Mensah, Y.; Ferdinand, K. Race, ethnicity, hypertension, and heart disease: JACC focus seminar 1/9. J. Am. Coll. Cardiol. 2021, 78, 2460–2470. [Google Scholar] [CrossRef] [PubMed]
Sapna, F.; Raveena, F.; Chandio, M.; Bai, K.; Sayyar, M.; Varrassi, G.; Khatri, M.; Kumar, S.; Mohamad, T. Advancements in heart failure management: A comprehensive narrative review of emerging therapies. Cureus 2023, 15, e46486. [Google Scholar] [CrossRef] [PubMed]
Boehme, A.; Esenwa, C.; Elkind, M. Stroke risk factors, genetics, and prevention. Circ. Res. 2017, 120, 472–495. [Google Scholar] [CrossRef]
Bozkurt, B.; Aguilar, D.; Deswal, A.; Dunbar, S.; Francis, G.; Horwich, T.; Jessup, M.; Kosiborod, M.; Pritchett, A.; Ramasubbu, K.; et al. Contributory risk and management of comorbidities of hypertension, obesity, diabetes mellitus, hyperlipidemia, and metabolic syndrome in chronic heart failure: A scientific statement from the American Heart Association. Circulation 2016, 134, e535–e578. [Google Scholar] [CrossRef] [PubMed]
World Health Organization. Global Health Estimates 2020: Deaths by Cause, Age, Sex, by Country, 2000–2019. 2020. Available online: https://www.who.int/data/gho/data/themes/mortality-and-global-health-estimates (accessed on 5 March 2025).
Savarese, G.; Becher, P.; Lund, L.; Seferovic, P.; Rosano, G.; Coats, A. Global burden of heart failure: A comprehensive and updated review of epidemiology. Cardiovasc. Res. 2022, 118, 3272–3287. [Google Scholar] [CrossRef]
Groenewegen, A.; Rutten, F.; Mosterd, A.; Hoes, A. Epidemiology of heart failure. Eur. J. Heart Fail. 2020, 22, 1342–1356. [Google Scholar] [CrossRef]
Ponikowski, P.; Anker, S.; AlHabib, K.; Cowie, M.; Force, T.; Hu, S.; Jaarsma, T.; Krum, H.; Rastogi, V.; Rohde, L.; et al. Heart failure: Preventing disease and death worldwide. ESC Heart Fail. 2014, 1, 4–25. [Google Scholar] [CrossRef]
Dattani, S.; Samborska, V.; Ritchie, H.; Roser, M. Cardiovascular Diseases. 2023. Available online: https://ourworldindata.org/cardiovascular-diseases (accessed on 5 March 2025).
Nazar, W.; Nazar, K.; Daniłowicz-Szymanowicz, L. Machine Learning and Deep Learning Methods for Fast and Accurate Assessment of Transthoracic Echocardiogram Image Quality. Life 2024, 14, 761. [Google Scholar] [CrossRef]
Sandau, K.E.; Funk, M.; Auerbach, A.; Barsness, G.; Blum, K.; Cvach, M.; Lampert, R.; May, J.; McDaniel, G.; Perez, M.; et al. Update to practice standards for electrocardiographic monitoring in hospital settings: A scientific statement from the American Heart Association. Circulation 2017, 136, e273–e344. [Google Scholar] [CrossRef]
Sharma, S.; Guleria, K. A systematic literature review on deep learning approaches for pneumonia detection using chest X-ray images. Multimed. Tools Appl. 2024, 83, 24101–24151. [Google Scholar] [CrossRef]
Dayarathna, S.; Islam, K.T.; Uribe, S.; Yang, G.; Hayat, M.; Chen, Z. Deep learning based synthesis of MRI, CT and PET: Review and analysis. Med. Image Anal. 2024, 92, 103046. [Google Scholar] [CrossRef]
Counseller, Q.; Aboelkassem, Y. Recent technologies in cardiac imaging. Front. Med. Technol. 2023, 4, 984492. [Google Scholar] [CrossRef]
Moons, K.; Altman, D.; Reitsma, J.; Ioannidis, J.; Macaskill, P.; Steyerberg, E.; Vickers, A.; Ransohoff, D.; Collins, G. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): Explanation and elaboration. Ann. Intern. Med. 2015, 162, W1–W73. [Google Scholar] [CrossRef] [PubMed]
Maddox, T.M.; Januzzi, J.L., Jr.; Allen, L.A.; Breathett, K.; Butler, J.; Davis, L.L.; Fonarow, G.C.; Ibrahim, N.E.; Lindenfeld, J.; Masoudi, F.A.; et al. 2021 update to the 2017 ACC expert consensus decision pathway for optimization of heart failure treatment: Answers to 10 pivotal issues about heart failure with reduced ejection fraction: A report of the American College of Cardiology Solution Set Oversight Committee. J. Am. Coll. Cardiol. 2021, 77, 772–810. [Google Scholar] [CrossRef] [PubMed]
Santos, P.M.; Freire, R.B.; Fernández, A.E.; Sobrino, J.L.B.; Pérez, C.F.; Somoza, F.J.E.; Miguel, C.M.; Vilacosta, I. In-hospital mortality and readmissions for heart failure in Spain. A Study of Index Episodes and 30-Day and 1-year Cardiac Readmissions. Rev. Esp. Cardiol. 2019, 72, 998–1004. [Google Scholar] [CrossRef]
Fischer, C.; Steyerberg, E.W.; Fonarow, G.C.; Ganiats, T.G.; Lingsma, H.F. A systematic review and meta-analysis on the association between quality of hospital care and readmission rates in patients with heart failure. Am. Heart J. 2015, 170, 1005–1017. [Google Scholar] [CrossRef]
Hong, S.; Zhou, Y.; Shang, J.; Xiao, C.; Sun, J. Opportunities and challenges of deep learning methods for electrocardiogram data: A systematic review. Comput. Biol. Med. 2020, 122, 103801. [Google Scholar] [CrossRef]
Madani, A.; Ong, J.R.; Tibrewal, A.; Mofrad, M.R. Deep echocardiography: Data-efficient supervised and semi-supervised deep learning towards automated diagnosis of cardiac disease. NPJ Digit. Med. 2018, 1, 59. [Google Scholar] [CrossRef]
Liu, D.; Lu, S.; Zhang, L.; Liu, Y. Anomaly detection in chest X-rays based on dual-attention mechanism and multi-scale feature fusion. Symmetry 2023, 15, 668. [Google Scholar] [CrossRef]
Jin, B.; Che, C.; Liu, Z.; Zhang, S.; Yin, X.; Wei, X. Predicting the risk of heart failure with EHR sequential data modeling. IEEE Access 2018, 6, 9256–9261. [Google Scholar] [CrossRef]
Schreibmann, E.; Dhabaan, A.; Elder, E.; Fox, T. Patient-specific quality assurance method for VMAT treatment delivery. Med. Phys. 2009, 36, 4530–4535. [Google Scholar] [CrossRef] [PubMed]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Désir, C.; Bernard, S.; Petitjean, C.; Heutte, L. A random forest based approach for one class classification in medical imaging. In Proceedings of the Machine Learning in Medical Imaging: Third International Workshop, MLMI 2012, Held in Conjunction with MICCAI 2012, Nice, France, 1 October 2012; Revised Selected Papers 3. Springer: Berlin/Heidelberg, Germany, 2012; pp. 250–257. [Google Scholar] [CrossRef]
Geweid, G.; Abdallah, M. A new automatic identification method of heart failure using improved support vector machine based on duality optimization technique. IEEE Access 2019, 7, 149595–149611. [Google Scholar] [CrossRef]
Yongcharoenchaiyasit, K.; Arwatchananukul, S.; Temdee, P.; Prasad, R. Gradient Boosting Based Model for Elderly Heart Failure, Aortic Stenosis, and Dementia Classification. IEEE Access 2023, 11, 48677–48696. [Google Scholar] [CrossRef]
Yasmin, F.; Shah, S.; Naeem, A.; Shujauddin, S.; Jabeen, A.; Kazmi, S.; Siddiqui, S.; Kumar, P.; Salman, S.; Hassan, S.; et al. Artificial intelligence in the diagnosis and detection of heart failure: The past, present, and future. Rev. Cardiovasc. Med. 2021, 22, 1095–1113. [Google Scholar] [CrossRef]
Asaduzzaman, M.; Alom, M.K.; Karim, M.E. ALZENET: Deep Learning-Based Early Prediction of Alzheimer’s Disease through Magnetic Resonance Imaging Analysis. Telemat. Inform. Rep. 2025, 17, 100189. [Google Scholar] [CrossRef]
Sahu, H.P.; Kashyap, R. FINE_DENSEIGANET: Automatic medical image classification in chest CT scan using Hybrid Deep Learning Framework. Int. J. Image Graph. 2025, 25, 2550004. [Google Scholar] [CrossRef]
Borzooei, S.; Briganti, G.; Golparian, M.; Lechien, J.R.; Tarokhian, A. Machine learning for risk stratification of thyroid cancer patients: A 15-year cohort study. Eur. Arch. Oto-Rhino-Laryngol. 2024, 281, 2095–2104. [Google Scholar] [CrossRef]
Du, W.; Bi, W.; Liu, Y.; Zhu, Z.; Tai, Y.; Luo, E. Machine learning-based decision support system for orthognathic diagnosis and treatment planning. BMC Oral Health 2024, 24, 286. [Google Scholar] [CrossRef]
Pedreschi, D.; Giannotti, F.; Guidotti, R.; Monreale, A.; Ruggieri, S.; Turini, F. Meaningful explanations of black box AI decision systems. In Proceedings of the AAAI Conference on Artificial Intelligence 2019, Honolulu, HI, USA, 27 January–7 February 2019; Volume 33, pp. 9780–9784. [Google Scholar] [CrossRef]
Kaissis, G.; Makowski, M.; Rückert, D.; Braren, R. Secure, privacy-preserving and federated machine learning in medical imaging. Nat. Mach. Intell. 2020, 2, 305–311. [Google Scholar] [CrossRef]
Ouyang, D.; He, B.; Ghorbani, A.; Yuan, N.; Ebinger, J.; Langlotz, C.; Heidenreich, P.; Harrington, R.; Liang, D.; Ashley, E.; et al. Video-based AI for beat-to-beat assessment of cardiac function. Nature 2020, 580, 252–256. [Google Scholar] [CrossRef] [PubMed]
Ning, W.; Li, S.; Wei, D.; Guo, L.; Chen, H. Automatic detection of congestive heart failure based on a hybrid deep learning algorithm in the internet of medical things. IEEE Internet Things J. 2020, 8, 12550–12558. [Google Scholar] [CrossRef]
Kanani, P.; Padole, M. ECG heartbeat arrhythmia classification using time-series augmented signals and deep learning approach. Procedia Comput. Sci. 2020, 171, 524–531. [Google Scholar] [CrossRef]
Huang, S.H.; Chuang, B.L.; Lin, Y.H.; Hung, C.S.; Ma, H.P. A Congestive Heart Failure Detection System via Multi-Input Deep Learning Networks. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Honolulu, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar] [CrossRef]
Gjoreski, M.; Gradišek, A.; Budna, B.; Gams, M.; Poglajen, G. Machine Learning and End-to-End Deep Learning for the Detection of Chronic Heart Failure From Heart Sounds. IEEE Access 2020, 8, 20313–20324. [Google Scholar] [CrossRef]
Nirschl, J.; Janowczyk, A.; Peyster, E.; Frank, R.; Margulies, K.; Feldman, M.; Madabhushi, A. A deep-learning classifier identifies patients with clinical heart failure using whole-slide images of H&E tissue. PLoS ONE 2018, 13, e0192726. [Google Scholar] [CrossRef]
Nirschl, J.J.; Janowczyk, A.; Peyster, E.G.; Frank, R.; Margulies, K.B.; Feldman, M.D.; Madabhushi, A. Deep learning tissue segmentation in cardiac histopathology images. In Deep Learning for Medical Image Analysis; Elsevier: Amsterdam, The Netherlands, 2017; pp. 179–195. [Google Scholar] [CrossRef]
Wang, Z.; Zhu, Y.; Li, D.; Yin, Y.; Zhang, J. Feature rearrangement based deep learning system for predicting heart failure mortality. Comput. Methods Programs Biomed. 2020, 191, 105383. [Google Scholar] [CrossRef]
Matsumoto, T.; Kodera, S.; Shinohara, H.; Ieki, H.; Yamaguchi, T.; Higashikuni, Y.; Kiyosue, A.; Ito, K.; Ando, J.; Takimoto, E.; et al. Diagnosing heart failure from chest X-ray images using deep learning. Int. Heart J. 2020, 61, 781–786. [Google Scholar] [CrossRef]
Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; Summers, R. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2097–2106. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar] [CrossRef]
Liu, X.; Chen, Y.; Bae, J.; Li, H.; Johnston, J.; Sanger, T. Predicting heart failure readmission from clinical notes using deep learning. In Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), San Diego, CA, USA, 12–21 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 2642–2648. [Google Scholar] [CrossRef]
Duffy, G.; Cheng, P.; Yuan, N.; He, B.; Kwan, A.; Shun-Shin, M.; Alexander, K.; Ebinger, J.; Lungren, M.; Rader, F.; et al. High-throughput precision phenotyping of left ventricular hypertrophy with cardiovascular deep learning. JAMA Cardiol. 2022, 7, 386–395. [Google Scholar] [CrossRef]
Narang, A.; Bae, R.; Hong, H.; Thomas, Y.; Surette, S.; Cadieu, C.; Chaudhry, A.; Martin, R.; McCarthy, P.; Rubenson, D.; et al. Utility of a deep-learning algorithm to guide novices to acquire echocardiograms for limited diagnostic use. JAMA Cardiol. 2021, 6, 624–632. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention (MICCAI) 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar] [CrossRef]
Valehi, A.; Razi, A.; Chen, J. Smart heart monitoring: Early prediction of heart problems through predictive analysis of ECG signals. IEEE Access 2019, 7, 457–465. [Google Scholar] [CrossRef]
McCracken, C.; Szabo, L.; Abdulelah, Z.A.; Condurache, D.G.; Vago, H.; Nichols, T.E.; Petersen, S.E.; Neubauer, S.; Raisi-Estabragh, Z. Ventricular volume asymmetry as a novel imaging biomarker for disease discrimination and outcome prediction. Eur. Heart J. 2024, 4, e059. [Google Scholar] [CrossRef]
Ramírez-Vélez, R.; García-Hermoso, A. Effects of exercise training on Fetuin-a in obese, type 2 diabetes and cardiovascular disease in adults and elderly: A systematic review and Meta-analysis. Lipids Health Dis. 2019, 18, 1–11. [Google Scholar] [CrossRef]
Kasmaee, A.M.M.; Ataei, A.; Moravvej, S.V.; Alizadehsani, R.; Gorriz, J.M.; Zhang, Y.D.; Tan, R.S.; Acharya, U.R. ELRL-MD: A deep learning approach for myocarditis diagnosis using cardiac magnetic resonance images with ensemble and reinforcement learning integration. Physiol. Meas. 2024, 45, 055011. [Google Scholar] [CrossRef]
Adejumo, P.; Thangaraj, P.M.; Dhingra, L.S.; Aminorroaya, A.; Zhou, X.; Brandt, C.; Xu, H.; Krumholz, H.M.; Khera, R. Natural language processing of clinical documentation to assess functional status in patients with heart failure. JAMA Netw. Open 2024, 7, e2443925. [Google Scholar] [CrossRef]
Nargesi, A.A.; Adejumo, P.; Dhingra, L.S.; Rosand, B.; Hengartner, A.; Coppi, A.; Benigeri, S.; Sen, S.; Ahmad, T.; Nadkarni, G.N.; et al. Automated identification of heart failure with reduced ejection fraction using deep learning-based natural language processing. Heart Fail. 2025, 13, 75–87. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An introduction, 1st ed.; MIT Press: Cambridge, MA, USA, 1998; Volume 1. [Google Scholar]
Li, D.; Li, X.; Zhao, J.; Bai, X. Automatic staging model of heart failure based on deep learning. Biomed. Signal Process. Control 2019, 52, 77–83. [Google Scholar] [CrossRef]
Sridevi, S.; Murugesan, S.; Sakthivel, V. Computer-aided decision support system for symmetry-based prenatal congenital heart defects. Adv. Mach. Vis. Syst. 2021, 2, 45–60. [Google Scholar] [CrossRef]
Tang, W.H.W.; Tong, W.; Shrestha, K. Differential effects of arginine methylation on diastolic dysfunction and disease progression in patients with chronic systolic heart failure. Eur. Heart J. 2008, 29, 2506–2513. [Google Scholar] [CrossRef]
Sarker, I.H. Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput. Sci. 2021, 2, 420. [Google Scholar] [CrossRef]
Ragab, A.; El Koujok, M.; Ghezzaz, H.; Amazouz, M.; Ouali, M.S.; Yacout, S. Deep understanding in industrial processes by complementing human expertise with interpretable patterns of machine learning. Expert Syst. Appl. 2019, 122, 388–405. [Google Scholar] [CrossRef]
Munappy, A.; Bosch, J.; Olsson, H.; Arpteg, A.; Brinne, B. Data management for production quality deep learning models: Challenges and solutions. J. Syst. Softw. 2022, 191, 111359. [Google Scholar] [CrossRef]
Casella, B.; Riviera, W.; Aldinucci, M.; Menegaz, G. MiFL: Multi-Input Neural Networks in Federated Learning. Authorea Preprints 2023. [CrossRef]
De Bois, M.; El Yacoubi, M.; Ammi, M. Enhancing the interpretability of deep models in healthcare through attention: Application to glucose forecasting for diabetic people. Int. J. Pattern Recognit. Artif. Intell. 2021, 35, 2160006. [Google Scholar] [CrossRef]
Snyderman, R. Personalized health care: From theory to practice. Biotechnol. J. 2012, 7, 973–979. [Google Scholar] [CrossRef]
Chintala, S. AI-Driven Personalised Treatment Plans: The Future of Precision Medicine. Mach. Intell. Res. 2023, 17, 9718–9728. [Google Scholar]
Morrison, T.; Pathmanathan, P.; Adwan, M.; Margerrison, E. Advancing regulatory science with computational modeling for medical devices at the FDA’s Office of Science and Engineering Laboratories. Front. Med. 2018, 5, 241. [Google Scholar] [CrossRef]
Krois, J.; Garcia Cantu, A.; Chaurasia, A.; Patil, R.; Chaudhari, P.K.; Gaudin, R.; Gehrung, S.; Schwendicke, F. Generalizability of deep learning models for dental image analysis. Sci. Rep. 2021, 11, 6102. [Google Scholar] [CrossRef]
Chen, C.; Bai, W.; Davies, R.; Bhuva, A.; Manisty, C.; Augusto, J.; Moon, J.; Aung, N.; Lee, A.; Sanghvi, M.; et al. Improving the generalizability of convolutional neural network-based segmentation on CMR images. Front. Cardiovasc. Med. 2020, 7, 105. [Google Scholar] [CrossRef]
Miranda, E.; Adiarto, S.; Bhatti, F.; Zakiyyah, A.; Aryuni, M.; Bernando, C. Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive Explanations Approach. Healthc. Inform. Res. 2023, 29, 228. [Google Scholar] [CrossRef]
Assegie, T.A. Evaluation of Local Interpretable Model-Agnostic Explanation and Shapley Additive Explanation for Chronic Heart Disease Detection. Proc. Eng. Technol. Innov. 2023, 23, 48–59. [Google Scholar] [CrossRef]
Hedeker, D.; Gibbons, R. Longitudinal Data Analysis, 1st ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006; p. 336. [Google Scholar]
Yacouby, R.; Axman, D. Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, Online, 20 November 2020; pp. 79–91. [Google Scholar] [CrossRef]
Ratner, B. Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data, 1st ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2017; p. 324. [Google Scholar]
D’Agostino, R.B., Sr.; Vasan, R.S.; Pencina, M.J.; Wolf, P.A.; Cobain, M.; Massaro, J.M.; Kannel, W.B. General cardiovascular risk profile for use in primary care: The Framingham Heart Study. Circulation 2008, 117, 743–753. [Google Scholar] [CrossRef]
Pocock, S.J.; Ariti, C.A.; McMurray, J.J.; Maggioni, A.; Køber, L.; Squire, I.B.; Swedberg, K.; Dobson, J.; Poppe, K.K.; Whalley, G.A.; et al. Predicting survival in heart failure: A risk score based on 39 372 patients from 30 studies. Eur. Heart J. 2013, 34, 1404–1413. [Google Scholar] [CrossRef]
Krittanawong, C.; Zhang, H.; Wang, Z.; Aydar, M.; Kitai, T. Artificial intelligence in precision cardiovascular medicine. J. Am. Coll. Cardiol. 2017, 69, 2657–2664. [Google Scholar] [CrossRef]
de Bakker, M. Circulating Biomarkers for Dynamic Cardiovascular Risk Assessment: A Precision Medicine Approach. Ph.D. Thesis, Erasmus University Rotterdam, Rotterdam, The Netherlands, 2024. [Google Scholar]
Lotan, E.; Tschider, C.; Sodickson, D.; Caplan, A.; Bruno, M.; Zhang, B.; Lui, Y. Medical imaging and privacy in the era of artificial intelligence: Myth, fallacy, and the future. J. Am. Coll. Radiol. 2020, 17, 1159–1162. [Google Scholar] [CrossRef]
Kelly, C.; Karthikesalingam, A.; Suleyman, M.; Corrado, G.; King, D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019, 17, 195. [Google Scholar] [CrossRef]

Figure 1. Death Rate from 1950 to 2021 [9].

Figure 2. Flowchart of the word cloud generation process.

Figure 3. Word cloud of the most used words in the reviewed research papers. Larger words indicate higher frequency of occurrence.

Figure 4. Artificial intelligence and heart treatment [35,36,37].

Figure 5. A simple symmetrical model, U-Net architecture [50].

Figure 6. An example of an asymmetrical model, CNN-LSTM hybrid model [37].

Table 1. Top 10 words.

Word	Count
Heart	691
Failure	540
Learning	481
Deep	452
Data	427
Features	365
CHF	332
CNN	305
Patients	286
Neural	273

Table 2. Comparison of symmetric and asymmetric model designs in heart failure detection.

Aspect	Symmetric Models	Asymmetric Models
Architecture	Uniform, structured (e.g., U-Net)	Specialized, task-specific
Strengths	Efficient processing, handles balanced data	Captures rare conditions, robust for imbalanced datasets
Weaknesses	May overlook subtle anomalies or rare cases	Computationally complex, less generalized
Applications	Standard echocardiographic and structured data analysis	Multimodal data integration, anomaly detection in imbalanced datasets

Table 3. Summary of reviewed papers on AI in heart failure detection.

Authors	Model/Strategy	Outcome	Data	Accuracy
Li et al. [58]	CNN + RNN	Predict cardiovascular disease risk	Medical records	Accuracy of 97.6%, with a sensitivity of 96.3% and a specificity of 97.4%
Gjoresk et al. [39]	Machine learning (ML) and end-to-end deep learning (DL)	Diagnose chronic heart failure	Heart sound recordings	Accuracy = 93.2%
Narang et al. [49]	CNN	Guide novice echocardiography users	Echocardiography videos	Diagnostic quality: 98.8% of cases for LV size and function and pericardial effusion, and 92.5% of cases for RV size%
Duffy et al. [48]	3D CNN	Measure LV dimensions	Echocardiogram videos	$R^{2}$ = 0.96, MAEs = 1.7 mm (95% CI, 1.6–1.8 mm) IVS thickness, 3.8 mm (95% CI, 3.5–4.0 mm) LVID, 1.8 mm (95% CI, 1.7–2.0 mm) LVPW thickness
Liu et al. [47]	CNN + LSTM	Predict heart failure	Clinical and imaging data	F1 score of 0.756 on general readmission prediction and 0.733 on 30-day readmission prediction
Huang et al. [38]	CNN	Detection system of congestive heart failure (CHF)	Electronic health records	Accuracy: 86.74%, Sensitivity: 88.03%, Specificity: 85.45%
Wang et al. [42]	Ensemble model	Early detection of heart failure	Clinical data	(22.28–38.41)% on F1 score and (76.15–88.81)% AUC across different methods
Pratik Kanani et al. [37]	6 residual blocks with 1D convolution	Improve classification accuracy of ECG signals	MIT-BIH Arrhythmia Dataset	F1 score: 0.98 (with augmentation)
Ning et al. [36]	Hybrid model (CNN + RNN)	Improve CHF detection from ECG signals	RR interval sequences from ECG signals	Accuracy: 99.93%, Sensitivity: 99.85%, Specificity: 100%, AUC close to 1
Matsumoto et al. [43]	Modified VGG16	Detect heart failure from X-ray images	CHestX-ray8 database	Accuracy: 82%, Sensitivity: 75%, Specificity: 94.4%
David Ouyang et al. [35]	EchoNet-Dynamic.	Accurate segmentation, ejection fraction prediction, heart failure classification.	10,030 echocardiogram videos from Stanford Medicine and external datasets	Segmentation dice coefficient of 0.92, MAE of 4.1% (internal), 6.0% (external), AUC of 0.97 (internal), 0.96 (external)

Table 4. Comments on reviewed papers on AI in heart failure detection.

Authors	Name of the Paper	Dataset	Comments
Li et al. [58]	Automatic staging model of heart failure based on deep learning	Medical records	Limited by the quality and completeness of medical records, potential overfitting due to high accuracy reported
Gjoresk et al. [39]	Machine Learning and End-to-End Deep Learning for the Detection of Chronic Heart Failure From Heart Sounds	Heart sound recordings	Small sample size, potential lack of generalizability across diverse populations
Narang et al. [49]	Utility of a deep-learning algorithm to guide novices to acquire echocardiograms for limited diagnostic use	Echocardiography videos	Dependency on video quality, may require high computational resources for real-time guidance
Duffy et al. [48]	High-throughput precision phenotyping of left ventricular hypertrophy with cardiovascular deep learning	Echocardiogram videos	Limited external validation, performance may vary with different imaging protocols
Liu et al. [47]	Predicting heart failure readmission from clinical notes using deep learning	Clinical and imaging data	Complexity of integrating multi-modal data, possible overfitting due to high accuracy metrics
Huang et al. [38]	A Congestive Heart Failure Detection System via Multi-input Deep Learning Networks	Electronic Health Records	Potential bias in EHR data, need for longitudinal validation to ensure model robustness
Wang et al. [42]	Feature rearrangement-based deep learning system for predicting heart failure mortality	Clinical data	High variability in F1 scores, may require large and diverse datasets for better generalizability
Pratik Kanani et al. [37]	ECG heartbeat arrhythmia classification using time-series augmented signals and deep learning approach	MIT-BIH Arrhythmia Dataset	Data augmentation may not reflect real-world scenarios, potential overfitting due to high-performance metrics
Ning et al. [36]	Automatic detection of congestive heart failure based on a hybrid deep learning algorithm in the Internet of Medical Things	RR interval sequences from ECG signals	High AUC may suggest overfitting, limited by the quality of ECG signal preprocessing
Matsumoto et al. [43]	Diagnosing Heart Failure from Chest X-Ray Images Using Deep Learning	CHestX-ray8 database	Needs a bigger dataset, images with ambiguous radiolucency were prone to being misdiagnosed by the model, did not differentiate between cardiac and non-cardiac diseases
David Ouyang et al. [35]	Video-based AI for Beat-to-Beat Assessment of Cardiac Function	10,030 echocardiogram videos from Stanford Medicine and external datasets	The model demonstrated strong generalizability and robust performance across different datasets, highlighting its clinical potential

Table 5. Comments on reviewed papers on asymmetric and symmetric models in healthcare.

Authors	Name of the Paper	Dataset	Comments
Arash Madani et al. [20]	Deep Echocardiography: Data-efficient supervised and semi-supervised deep learning	Echocardiography datasets from hospitals	Symmetric models for cardiac diagnosis; limited for rare conditions.
Dandan Liu et al. [21]	Anomaly detection in chest X-rays based on dual-attention mechanism	Chest X-ray datasets	Asymmetric models with attention mechanisms for subtle anomalies.
Sridevi et al. [59]	Computer-aided decision support system for prenatal congenital heart defects	Prenatal ultrasound datasets	Explored symmetry for congenital heart defect detection.
Valehi et al. [51]	Smart heart monitoring: Early prediction of heart problems	ECG signal datasets	Focused on asymmetry in ECG signal analysis for early detection.
McCracken et al. [52]	Ventricular volume asymmetry as a novel imaging biomarker	Cardiac imaging datasets	Demonstrated ventricular asymmetry as a predictive biomarker.
Tang et al. [60]	Differential effects of arginine methylation on diastolic dysfunction	Clinical biomarker datasets	Analyzed metabolic asymmetry for heart failure progression.
Ramírez-Vélez et al. [53]	Effects of exercise training on Fetuin-A in cardiovascular disease	Exercise intervention datasets	Meta-analysis on biomarker asymmetries in cardiovascular research.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Udoy, I.A.; Hassan, O. AI-Driven Technology in Heart Failure Detection and Diagnosis: A Review of the Advancement in Personalized Healthcare. Symmetry 2025, 17, 469. https://doi.org/10.3390/sym17030469

AMA Style

Udoy IA, Hassan O. AI-Driven Technology in Heart Failure Detection and Diagnosis: A Review of the Advancement in Personalized Healthcare. Symmetry. 2025; 17(3):469. https://doi.org/10.3390/sym17030469

Chicago/Turabian Style

Udoy, Ikteder Akhand, and Omiya Hassan. 2025. "AI-Driven Technology in Heart Failure Detection and Diagnosis: A Review of the Advancement in Personalized Healthcare" Symmetry 17, no. 3: 469. https://doi.org/10.3390/sym17030469

APA Style

Udoy, I. A., & Hassan, O. (2025). AI-Driven Technology in Heart Failure Detection and Diagnosis: A Review of the Advancement in Personalized Healthcare. Symmetry, 17(3), 469. https://doi.org/10.3390/sym17030469

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AI-Driven Technology in Heart Failure Detection and Diagnosis: A Review of the Advancement in Personalized Healthcare

Abstract

1. Introduction

2. Related Work

2.1. Model Architectures: Symmetry and Asymmetry

2.2. Overview of AI Applications in Heart Failure Detection

3. Methodology

3.1. Search Strategy, Data Gathering and Extraction

3.2. Selection of Study Criteria

3.3. Methodological Insights on Deep Learning Model in Cardiology

3.3.1. Implementation of the EchoNet-Dynamic Experiment

3.3.2. Multi-Input Neural Network Model

3.3.3. Machine Learning and End-to-End Deep Learning

3.3.4. CNN-Based Classifier

3.3.5. Deep Learning Framework for Feature Rearrangement

3.3.6. Modified VGG16-Based Model

3.3.7. CNN and RNN-Based Fusion Model

3.3.8. Deep Learning Model with Residual Blocks

3.3.9. CNN Based on Contextual Data

3.3.10. 3D CNN with Frame-Level Segmentation

3.3.11. Deep Learning Algorithm on ECG

3.4. Impact of Symmetry and Asymmetry in Model Design on Heart Failure Detection

3.5. Methodological Insights on Symmetry in Computer Vision Models

3.6. Reinforcement Learning in Cardiology

3.7. Natural Language Processing in Cardiology

4. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI