Enhancing Left Ventricular Segmentation in Echocardiograms Through GAN-Based Synthetic Data Augmentation and MultiResUNet Architecture

Kumar, Vikas; Sharma, Nitin Mohan; Mahapatra, Prasant K.; Dogra, Neeti; Maurya, Lalit; Ahmad, Fahad; Dahiya, Neelam; Panda, Prashant

doi:10.3390/diagnostics15060663

Open AccessArticle

Enhancing Left Ventricular Segmentation in Echocardiograms Through GAN-Based Synthetic Data Augmentation and MultiResUNet Architecture

by

Vikas Kumar

^1,2,

Nitin Mohan Sharma

^1,2,

Prasant K. Mahapatra

^1,2,*,

Neeti Dogra

³

,

Lalit Maurya

^4,5,*

,

Fahad Ahmad

^4,5

,

Neelam Dahiya

³ and

Prashant Panda

³

¹

CSIR-Central Scientific Instruments Organisation (CSIR-CSIO), Chandigarh 160030, India

²

Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India

³

Anaesthesia and Intensive Care, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India

⁴

School of Computing, University of Portsmouth, Portsmouth PO1 3HE, UK

⁵

Portsmouth Artificial Intelligence and Data Science Centre (PAIDS), University of Portsmouth, Portsmouth PO1 3HE, UK

^*

Authors to whom correspondence should be addressed.

Diagnostics 2025, 15(6), 663; https://doi.org/10.3390/diagnostics15060663

Submission received: 9 January 2025 / Revised: 23 February 2025 / Accepted: 26 February 2025 / Published: 9 March 2025

(This article belongs to the Special Issue Artificial Intelligence in Cardiovascular Diseases (2024))

Download

Browse Figures

Versions Notes

Abstract

Background: Accurate segmentation of the left ventricle in echocardiograms is crucial for the diagnosis and monitoring of cardiovascular diseases. However, this process is hindered by the limited availability of high-quality annotated datasets and the inherent complexities of echocardiogram images. Traditional methods often struggle to generalize across varying image qualities and conditions, necessitating a more robust solution. Objectives: This study aims to enhance left ventricular segmentation in echocardiograms by developing a framework that integrates Generative Adversarial Networks (GANs) for synthetic data augmentation with a MultiResUNet architecture, providing a more accurate and reliable segmentation method. Methods: We propose a GAN-based framework that generates synthetic echocardiogram images and their corresponding segmentation masks, augmenting the available training data. The synthetic data, along with real echocardiograms from the EchoNet-Dynamic dataset, were used to train the MultiResUNet architecture. MultiResUNet incorporates multi-resolution blocks, residual connections, and attention mechanisms to effectively capture fine details at multiple scales. Additional enhancements include atrous spatial pyramid pooling (ASPP) and scaled exponential linear units (SELUs) to further improve segmentation accuracy. Results: The proposed approach significantly outperforms existing methods, achieving a Dice Similarity Coefficient of 95.68% and an Intersection over Union (IoU) of 91.62%. This represents improvements of 2.58% in Dice and 4.84% in IoU over previous segmentation techniques, demonstrating the effectiveness of GAN-based augmentation in overcoming data scarcity and improving segmentation performance. Conclusions: The integration of GAN-generated synthetic data and the MultiResUNet architecture provides a robust and accurate solution for left ventricular segmentation in echocardiograms. This approach has the potential to enhance clinical decision-making in cardiovascular medicine by improving the accuracy of automated diagnostic tools, even in the presence of limited and complex training data.

Keywords:

echocardiograms; data augmentation; Generative Adversarial Networks; MultiResUnet; segmentation

1. Introduction

Cardiovascular disease (CVD) is a significant concern globally and the leading cause of death worldwide [1]. It is closely linked with multiple factors, including the left ventricular ejection fraction (LVEF). LVEF measures the changes between the left ventricular end-diastolic volume (EDV) and end-systolic volume (ESV), with a lower LVEF being indicative of a worse outlook [2]. This makes LVEF a crucial metric for assessing the function of the heart [3]. Among various heart imaging techniques, echocardiography is preferred for its quick imaging capabilities, lack of ionizing radiation, and ability to offer an immediate view of the heart in motion. In echocardiographic assessments, precisely measuring the dimensions and function of the heart is essential [4]. To compute LVEF and other vital parameters, the inner heart lining is manually outlined, a process that is labour-intensive, slow, and prone to variability. Thus, there’s an evident demand for enhancing echocardiographic evaluations through automation, focusing on the following objectives: auto-detection of end-diastolic frames (EDFs) and end-systolic frames(ESFs) during the cardiac cycle, automatic segmentation of the left ventricle, and automated calculation of left ventricular EDV and ESV.

Conventional methods for segmenting the left ventricle have utilized various approaches, such as thresholding, detecting edges, growing regions, matching templates, and using machine learning techniques. For example, Goshtasby et al. [5] developed an intensity thresholding algorithm to extract the endocardium. Leclerc et al. [6] employed a structured random forest algorithm to precisely segment the myocardium and left ventricle by using contextual information at various scale. Belous et al. [7] proposes a fully automated segmentation technique using deep learning within a Bayesian nonparametric framework, leveraging a dynamic statistical shape model from weighted training shape subsets. Some studies have worked on segmentation-based methods for the prediction of the left ventricular volume. One such study was presented by Cousty et al. [8] by segmenting the left ventricular myocardium using a watershed algorithm and evaluated the relationship between left ventricular ejection fraction (LVEF) and myocardial mass. Whereas some studies focus on directly predicting the LV volume without segmentation. Afshin et al. [9] proposed the prediction of LV volume directly by including the statistical feature analysis and support vector machine to forecast ventricular volume. However, it suffered from a lack of accuracy and stability. Wang et al. [10] proposed a method using random regression forests that directly predicted ventricular volume based on image statistics. However, these traditional techniques rely on manually designed features and adjusted parameters.

Recently, deep learning has witnessed rapid growth and has been extensively applied across a range of medical imaging areas, particularly in echocardiography [11,12]. A network known as MFP-Unet (Multi feature pyramid U Network) was introduced by Moradi et al. [13] used for the segmentation of the left ventricle and presenting a technical approach for the estimation of LVEF by detecting the long axis and ventricle area through the use of a smallest enclosing triangle. Furthermore, Liu et al. [14] unveiled a deep pyramid local attention network (PLA-Net) aimed at enhancing feature representation by effectively capturing information from adjacent contexts, both compact and sparse. Additionally, Guo et al. [15] incorporated a channel attention mechanism and introduced two segmentation networks; one focused on segmenting the left ventricle and the other targeted at segmenting the apical triangle.

EchoNet-Dynamic was brought forward by Ouyang et al. [16] in 2020, representing the most extensive dynamic echocardiography database at the time, containing 10,030 videos. They suggested a novel beat-to-beat evaluation method utilizing the DeepLab v3 architecture, which showed promising outcomes in crucial tasks like segmenting the left ventricle and estimating LVEF [16]. Moreover, Reynaud et al. [17] introduced a residual auto-encoder network, leveraging the Transformer architecture to directly predict LVEF, achieving a mean absolute error (MAE) of 5.95%. Nevertheless, the approach of directly predicting LVEF might not fully integrate into clinical workflows [18], where LVEF estimation is conventionally performed following the manual outlining of the left ventricular contour in clinical settings. Deep learning approaches have shown promise in analysing echocardiograms but face two major obstacles: poor quality of clinical echocardiograms and a lack of large-scale studies with dynamic data. Issues such as poor contrast, motion blur, incomplete edges, respiration and tangential views complicate accurate left ventricle segmentation. Additionally, the shortage of high-quality annotated datasets limits the development of effective algorithms. Synthetic data has emerged as a valuable resource for enhancing deep learning models and data augmentation, allowing for the creation of diverse datasets. Combining real and synthetic images to train medical algorithms is a successful strategy to address the lack of medical images using Generative Adversarial Networks (GANs). This research introduces an innovative method for enhancing echocardiograms using GANs for image augmentation and an automated analysis process. Rule-based systems [19] rely on predefined heuristics and expert knowledge, making them interpretable but less adaptable to complex image variations. In contrast, GANs generate realistic synthetic images, improving segmentation accuracy. A notable innovation is the MultiResUNet Architecture, which refines segmentation by modifying the traditional UNet framework. MultiResUNet uses an Inception-style block instead of dual convolutional layers, allowing detailed extraction of spatial characteristics. It processes encoder features with convolutional layers before integrating them with decoder features, unlike the conventional direct concatenation method.

The paper highlights the improved accuracy in left ventricular segmentation through GAN-based image augmentation and the MultiResUNet framework, validated using the EchoNet-Dynamic dataset and a synthetic dataset created with GAN technology. Key contributions include an automated deep learning technique for echocardiogram analysis, a workflow utilizing GANs to produce synthetic echocardiograms with accurate labels, and the MultiResUNet framework that enhances segmentation accuracy and efficiency.

2. Related Work

A key advancement in the area of LV segmentation was achieved with the successful integration utilizing deep learning methodologies, which resulted in the extraction of features at multiple scales for a variety of tasks. Table 1 showcases the effectiveness of earlier deep learning models that have been applied to the task of segmenting the left ventricle in echocardiography images.

Among the notable techniques listed, CETUS stands out for employing an active contour method with mathematical fitting, achieving a remarkable Dice coefficient of 0.937. Meanwhile, UCSF (University of California San Francisco) utilized a convolutional neural network (CNN) within the conventional U-Net framework, demonstrating a commendable Intersection over Union (IoU) score of 0.891. Notably, CETUS reappears with a different approach, combining an active snake technique enhanced by a CNN encoder, yielding modified Dice coefficients for end-diastole (ED) and end-systole (ES) phases. Additionally, a dataset comprising 1500 videos employed a CNN based on U-Net architecture, supplemented by Kalman filtering, reporting Dice coefficients of 0.870 and 0.860. EchoNet-Dynamic and CAMUS approaches also showcased impressive performance, with Dice coefficients ranging from 0.903 to 0.951, utilizing various CNN architectures and innovative design elements such as residual blocks and Transformer encoder bridges. Furthermore, EchoNet-Dynamic variants introduced novel strategies such as incorporating auto-encoders and attention mechanisms, resulting in Dice coefficients exceeding 0.91.

3. Data

The EchoNet-Dynamic dataset, accessed on 10 February 2023, via https://echonet.github.io/dynamic/index.html, contains 10,030 2D echocardiography videos from individual patients, captured from the apical four-chamber (A4C) perspective. The dataset is divided into 7465 training videos, 1288 validation videos, and 1277 testing videos.

Each video, processed into a 112 × 112-pixel clip, includes frames capturing the heart’s end-systole (ES) and end-diastolic (ED) phases. This dataset provides detailed 2D coordinate pairs mapping the left ventricle (LV) volume and shape, used as inputs for model training and evaluation. The dataset’s 10,030 videos at 112 × 112 resolution are deconstructed into individual frames with predefined masks, crucial for training and evaluating the newly introduced model. This strategy of transforming videos into frames, guided by existing masks, is influenced by the methodology outlined by Ouyang et al. [16].

4. Methodology

We conducted a comprehensive methodology to achieve accurate left ventricle segmentation from echocardiogram data. Figure 1 illustrates the steps employed. We began by collecting and curating a dataset of echocardiogram videos from EchoNet-Dynamic dataset, ensuring diverse representations of cardiac dynamics and pathologies. We extracted individual frames from the Echonet dynamics dataset videos, forming a sizable collection of images for subsequent analysis. Expert annotations label these frames in each echocardiogram, subsequently serving as the input frames for the videos.

We applied a Generative Adversarial Network (GAN) architecture to augment the dataset and improve its diversity. This allowed us to generate synthetic images closely resembling real echocardiogram frames. These synthetic images were associated with appropriate labels. We applied a deep learning model called MultiResUNet for performing left ventricle segmentation. We trained this architecture on both the real and synthetic image datasets. The segmentation results included pixel-wise delineation of the LV boundaries in the echocardiogram images. To evaluate our segmentation accuracy, we compared our findings with the annotations from the ground truth. The Dice similarity coefficients and Intersection over Union were calculated to assess the degree of overlap between the segmented regions and the manually annotated ground truth masks. Simultaneously, accuracy metrics are computed by contrasting the outcomes from the expert annotations and the algorithmic outputs. This comparison serves to quantify the performance of the two models.

4.1. Data Augmentation with GAN

Generative Adversarial Networks (GANs) [27] involve two components: a generator and a discriminator. The generator aims to create samples that mimic the training data, while the discriminator evaluates these samples to distinguish between real and fake ones. The generator’s goal is to produce data indistinguishable from authentic data, matching the training data distribution.

Figure 2 illustrates the standard GAN setup. The generator selects a random point z from the latent space and generates

G (z)

. The discriminator evaluates

G (z)

alongside a real sample, rating each as authentic (1) or fake (0). These evaluations assess both components’ performance. The generator’s goal is to minimize

log (1 - D (G (z)))

, making generated images indistinguishable from real ones (

D (G (z)) \to 1

). Conversely, the discriminator aims to optimize

log (D (x)) + log (1 - D (G (z)))

, enhancing its ability to differentiate real samples (

D (x)

) from generated ones (

D (G (z))

). This concurrent training sharpens the discriminator’s skills.

4.2. Segmentation Architecture

The MultiResUNet is an advanced U-Net architecture with an encoder and a decoder. The encoder encodes input data, while the decoder reconstructs images by merging feature maps from the encoder. MultiResUNet features the MultiRes Block and the Residual Path. The MultiResUNet framework is distinguished by its dual-component structure, comprising the MultiRes block and the residual path [28]. The MultiRes Block uses parallel convolutions (Figure 3) to enhance spatial feature extraction at varied scales, balancing computational demand and precision. The proposed model enhances MultiResUNet by adding ASPP (Atrous Spatial Pyramid Pooling) blocks and attention mechanisms in each MultiRes Block within the decoder. Transposed convolution plays a critical role in the decoder stage, where it is used to upsample feature maps back to the original resolution. The proposed architecture consists of the following:

4.2.1. MultiRes Block

The introduction of a multires block, aimed at addressing the scale variation problems in object segmentation, borrows its concept from the inception network. This block utilizes three different sizes of convolutional kernels: 3 × 3, 5 × 5, and 7 × 7. We incorporated the technique developed by Ibtehaz and Rahman [28], where the 5 × 5 and 7 × 7 kernels were substituted with 3 × 3 kernels with varying filters. Three distinct convolutional blocks are merged to capture spatial features at various scales. The number of filters in a multires block is controlled by parameter W, which multiplies the filters at each stage. Initial filters for each successive layer are set at 32, 64, 128, and 256. Filter values for multires kernels are determined by coefficients: W/6 for the first 3 × 3 kernel, W/3 for the second, and W/2 for the third. For instance, in multires block 1 with a filter value of 32, the filter values are 5.33, 10.67, and 16. Additionally, 1 × 1 convolutional layers in multires blocks enhance spatial understanding, as shown in Figure 4.

4.2.2. ResPath

Ibtehaz and Rahman [28] proposed a strategy addressing the potential semantic disconnections caused by traditional skip connections linking encoders and decoders. They suggest replacing these conventional skip connections with a residual pathway, which combines feature maps from both the encoder and the decoder. In this approach, the encoder’s output passes through a convolutional layer with residual connections. In the convolutional layer, a filter with dimensions of 3 × 3 is employed. However, within the residual connections, a filter with dimensions of 1 × 1 is utilized, as depicted in Figure 4. Residual connections maintain gradient flow and stabilize training, whereas 1 × 1 convolutions aid in dimensionality reduction and feature transformation.

4.2.3. SELU Activation Function

The original MultiResUNet architecture uses the Rectified Linear Unit (ReLU) as its activation function. Although ReLU enables fast computation for positive inputs, it outputs zero for non-positive inputs, potentially hindering neuron learning [29]. To mitigate this, we adopt the Scaled Exponential Linear Unit (SELU) as the activation function, defined by the following equation:

SELU (x) = \{\begin{matrix} x & if x > 0, \\ α e^{x} - α & if x \leq 0, \end{matrix} where α \approx 1.6732632423

(1)

4.2.4. ASPP

The incorporation of ASPP facilitates the extraction of features across multiple scales. Atrous convolution enables precise control over the field of view, essential to capture information at multiple scales. Consistent with prior work [30], the ASPP module is applied in this study as an integral component of the bridge uniting the encoder and decoder. The governing equation for atrous convolution is:

y [i] = \sum_{k} x [i + r \cdot k] w [k]

(2)

where

y [i]

is the output feature map,

x [i]

is the input,

w [k]

is the convolution kernel, r is the atrous rate, and k is the kernel index. ASPP consists of a

1 \times 1

convolution for fine spatial details, three

3 \times 3

atrous convolutions with increasing dilation rates (e.g., 6, 12, 18), and a global average pooling branch for capturing global context. These features are concatenated and processed via another

1 \times 1

convolution, followed by batch normalization and activation. ASPP was chosen over other methods like fully connected CRFs, PSPNet, or DeepLab variants without ASPP due to its effective multi-scale context capture, improved performance in semantic segmentation, and computational efficiency. Unlike traditional convolutions, ASPP enhances dense prediction tasks by addressing fixed receptive field limitations while remaining more efficient than CRFs and other iterative post-processing techniques. In this context, it addresses the size disparity between systole and diastole segmentation objects by offering multi-scale information.

4.2.5. Attention

In the field of segmentation tasks utilizing U-net, Jha et al. [30] incorporated attention blocks into the decoder that connect to the encoder. Such an architecture allowed the encoder to encapsulate all relevant data from the polyp image into a fixed-dimension vector. A significant benefit of employing an attention mechanism is its flexibility with different input sizes and ability to boost model performance by concentrating on essential parts of the feature map. Consistent with this methodology, our research adds an attention block to the decoder, connected to the encoder, thereby enhancing the model’s ability to identify the area of the left ventricle accurately.

4.3. Evaluation Matrices

Metrics for evaluating echocardiogram segmentation are crucial for assessing the accuracy of automated image analysis algorithms. The Dice coefficient and Intersection over Union (IoU) are commonly used metrics in this field. The Dice coefficient evaluates spatial overlap between the algorithm’s output and the ground truth, ranging from 0 to 1, where higher scores indicate better alignment. IoU measures the overlap proportion relative to the combined area of predicted and actual segmentations, ranging from 0 to 1. These metrics are vital in echocardiogram analysis, helping to develop precise and clinically relevant segmentation algorithms for cardiac diagnostics and research. Both metrics provide quantitative insights into segmentation performance, aiding in refining automated systems for tasks like left ventricle and atrium segmentation and myocardium delineation.

5. Experiment and Results

5.1. GAN Architecture

The GAN model’s generator uses convolutional transpose layers to up sample input noise vectors and noisy labels, generating synthetic echocardiogram images and masks. The network employs LeakyReLU activation functions, dropout layers for improved learning, and injected noise to encourage diversity in generated samples. The discriminator, designed to assess the realism of synthesized images and masks, consists of convolutional layers forming a deep hierarchical feature extractor. It evaluates image realism and assigns anatomical labels guided by binary cross- entropy and cross-entropy loss functions. Key hyperparameters include a noise vector size of 100 for sample diversity and learning rates of 0.00002 for both the generator and discriminator, with Adam optimization. Dropout layers with a 0.5 rate introduce regularization to mitigate overfitting. The loss function for training includes cross-entropy loss for anatomical label prediction and binary cross-entropy loss for image realism in the discriminator. The generator’s loss function also includes binary cross-entropy loss to ensure realistic sample generation and cross-entropy loss to align generated labels with noisy input labels. Training spans 200 epochs, using the Adam optimizer for efficient convergence. The generated echocardiogram images and corresponding masks using the described GAN architecture exhibited notable quality (Figure 5). The training began with a Discriminator loss of 175.3207 and a Generator loss of 86.8098 during the first epoch. However, as training progressed, an improvement was seen in the model performance. By epoch 200, the Discriminator loss had reduced to 132.4445, while the Generator loss had decreased to 61.8812. This loss reduction suggests that the GAN had learned to generate more convincing and anatomically accurate echocardiogram images and masks.

Furthermore, masks predicted by the GAN for generated images indicated that the GAN successfully incorporated anatomical information into the generated images. The training generated a dataset of 20,000 echocardiogram images and their corresponding masks. These synthetic echocardiograms and masks were subjected to rigorous inspection, and the findings indicated that they exhibited significantly improved quality compared to the original dataset, capturing the intricate anatomical details and textures characteristic of echocardiogram images.

5.2. Segmentation Architecture (MultiResUnet)

5.2.1. Dataset

Our dataset comprises 10,030 echocardiogram images with corresponding masks, meticulously sourced from the Echonet Dynamics database, ensuring the availability of high-quality ground truth annotations. Additionally, we introduced a novel aspect to our research by generating an additional 10,000 synthetic echo images, complete with corresponding masks, using a Generative Adversarial Network (GAN) architecture. After GAN-based synthetic augmentation, the dataset size doubled, leading to an increased distribution of 14,930 training, 2576 validation, and 2524 test samples. Table 2 shows the sample dataset size before and after GAN.

5.2.2. Training Process

The training process for the echocardiogram left ventricle segmentation model involved a carefully selected set of hyperparameters and strategies. The Adam optimizer, chosen for its flexible learning rate capabilities, was used with a learning rate

1 \times 10^{- 3}

and batch size 32. We incorporated dropout layers with a rate of 0.5 in both the GAN and MultiResUNet architectures for regularization. To enhance training data diversity, we applied traditional augmentation techniques such as rotation, flipping, and scaling. Additionally, we used early stopping based on validation loss and performed cross-validation to improve generalization. The training set was shuffled before each epoch to expose the model to diverse samples and avoid biases. The loss function combined Binary Cross-Entropy Logit Loss (BCE Logit Loss) and Dice Loss. BCE Logit Loss guided accurate pixel-wise classification, while Dice Loss encouraged precise boundary delineation. Training spanned 50 epochs, chosen to balance convergence and prevent overfitting, with model checkpoints saved every 5 epochs. Training on a CUDA-enabled GPU leveraged hardware acceleration for faster training times and efficient resource utilization.

5.2.3. Evaluation Metrics

We adopted the Dice coefficient index and intersection over union (IoU) to measure our model’s performance. These metrics were computed by comparing the predicted LV region (S) to the ground truth LV segmentation results provided by human experts from the EchoNet-Dynamic dataset (

S E

). The equations for IoU and Dice coefficients are:

\begin{matrix} IoU = \frac{Area of (S \cup S E)}{Area of (S \cap S E)} \\ Dice Coefficient = \frac{2 \cdot | S \cap S E |}{P (S) + P (S E)} \end{matrix}

(3)

where,

P (S)

and

P (S E)

represent the probabilities that a pixel belongs to the predicted segmentation and the expert-annotated ground truth, respectively. A higher Dice Coefficient value indicates better alignment with the ground truth.

5.3. Implementation Results of Segmentation Architecture

The results of our study employing the MultiResUNet architecture for left ventricular segmentation in echocardiograms have been highly promising and demonstrate the model’s substantial potential for clinical applications:

5.3.1. Training Phase

The dice coefficient, accuracy, and IoU steadily increased during training, showing consistent improvement. Precision, recall, and F1 scores remained high, indicating a balanced trade-off between true and false positives and negatives. To assess the functionality of the models, we closely tracked the training process’s loss and the IoU and Dice scores during each validation step. During the training process, there was a notably more substantial improvement in the loss. The model demonstrates remarkable progress across 70 epochs, with the dice coefficient steadily increasing to an impressive 0.9607. This signifies the model’s ability to capture the intricate details of the left ventricle, leading to higher accuracy of 99.27, precision of 88.64, recall value 89.57, F1 score 89.10, and IoU 92.85. Figure 6 (left images) provide the training graphs of Dice coefficient and IoU with respect to the epochs.

5.3.2. Validation Phase

Moving on to the validation results, the MultiResUNet architecture maintains its high performance, with a dice coefficient of 0.9289 and an accuracy of 0.9862 in the final epoch. This underscores the model’s generalization capability, performing well on unseen data. The accuracy 98.62, precision 87.90, recall 94.28, F1 score 90.98, and IoU 86.83 also remain consistently high, suggesting that the model’s segmentation predictions are well-balanced and robust. Validation results corroborated the model’s robustness, showcasing consistent and positive trends in evaluation metrics. Figure 6 (right images) provide the validation graphs of Dice coefficient and IoU with respect to the epochs.

5.3.3. Testing Phase

The most crucial evaluation comes from the testing phase, where the model achieves good results. The dice coefficient of 0.9568 indicates a high intersection between the predicted and actual truth left ventricular regions. The accuracy 99.76, precision 98.98, recall 98.60, F1 score 98.79, and IoU 91.62 are also high, showcasing the model’s ability to accurately delineate the left ventricle in echocardiograms. These results are especially critical in a medical context, where precise segmentation can aid in diagnosing heart conditions and guiding treatment decisions. Figure 7 shows the ROC curve for all the test images, with the AUC value being close to 1 for most of them.

5.3.4. Comparison of Other Methods

The standard deep learning models were trained on the EchoNet-Dynamics dataset and compared with the proposed approach. Table 3 presents a comparative analysis of various deep learning architectures applied to left ventricular segmentation in echocardiograms. The evaluation includes key metrics such as the Dice coefficient, Jaccard index, precision, accuracy, F1 score, and area error ratio. These metrics assess the effectiveness and efficiency of each model in segmenting the left ventricle from echocardiographic images. The results highlight the proposed approach’s superior performance in achieving higher accuracy and precision, thereby demonstrating its potential for improving diagnostic accuracy in echocardiographic analysis.

5.3.5. 2D Projection with LOF Anomaly Detection

The t-SNE visualization shown in Figure 8a demonstrates the 2D projection of high-dimensional image features before and after augmentation using a GAN-based approach. In this plot, blue points correspond to the original dataset, while red points represent GAN-generated synthetic images. The strong overlap between these points indicates that the GAN effectively captures the distribution of real data without introducing significant shifts. Additionally, the dense central region suggests that the generated samples align closely with the original dataset, reinforcing the model’s ability to learn essential patterns. However, a few outliers, represented by scattered red points at the periphery, hint at minor variations in GAN-generated images, potentially due to mode collapse. To further analyze this, we applied Local Outlier Factor (LOF) anomaly detection to highlight deviations in the dataset. In Figure 8b black ‘x’ markers in visualization represent outliers detected by LOF, primarily concentrated at the edges. These anomalies appear in both real and GAN-generated samples, suggesting that the dataset naturally contains outliers rather than the GAN exclusively producing them. To refine the synthetic dataset, we applied LOF-based filtering, removing detected anomalies before training a MultiResUNet + ASPP + Attention model. This filtering process resulted in a slight improvement in performance, as evidenced by increased Dice Coefficient, IoU, and F1-Score shown in the Table 4. The results confirm that LOF-based anomaly detection enhances GAN-augmented data quality, improving model generalization.

6. Discussion

The high training Dice score (96.07%) shows effective learning, while the slightly lower validation score (92.89%) indicates good generalization with room for improvement. The high-test score (95.68%) confirms excellent generalization to new data, highlighting its clinical potential. The analysis of the training, validation, and test results reveals a notable trend in the performance of the proposed model for left ventricular segmentation in echocardiograms. Notably, the test results exhibit significantly superior performance in terms of both the dice coefficient and IoU compared to both the training and validation phases. This divergence indicates that the model generalizes extraordinarily well to previously unseen data in addition to learning from the training set of data. We also evaluated 10-fold cross validation of proposed model and Figure 9 shows the box plots of each metrics. These results indicate that the model performs consistently across different training subsets, as evidenced by low standard deviations across all metrics. The high correlation with our testing phase metrics (Dice: 95.68%, IoU: 91.62%) further confirms robust generalization with minimal bias. While slight variations exist due to dataset composition, the low variance across folds ensures that the model is not overly dependent on any specific training subset, reinforcing its real-world reliability.

Table 5 comprehensively compares various deep learning architectures applied to the same dataset, EchoNet Dynamics, highlighting their performance regarding the Intersection over Union (IoU) and Dice Coefficient. Notably, in this competitive landscape, the MultiResUNet model, proposed in this research, emerges as a standout performer. With good IoU and Dice Similarity Coefficient scores of 91.62% and 95.68%, respectively, it surpasses the results obtained by previous state-of-the-art models, including DeepLabV3, TransBridge, Trans U-net, Swin Transformer, Segformer Network, and MAEF-Net.

Among the notable models, Minqi Liao et al. [11] have exhibited commendable results utilizing innovative approaches like Swin Transformer, K-Net, and Segformer Network. Their contributions, with Dice coefficients ranging from 92.79% to 92.92%, signify the effectiveness of these sophisticated architectures in accurately segmenting the left ventricular region. Additionally, Yan Zeng et al. [31] presented the MAEF-Net, achieving an impressive Dice coefficient of 93.10%, further demonstrating the continual advancements in cardiac image segmentation. However, it is crucial to note that while these models achieved remarkable results, they did not incorporate synthetic data augmentation.

This study not only illustrates the potential of innovative architectures but also accentuates the transformative impact of data augmentation techniques, particularly in cardiac image analysis, paving the way for more accurate and reliable clinical applications. Figure 10 below is the predicted left ventricular segmentation vs. expert annotation of LV segmentation.

Our work not only introduces the MultiResUNet architecture but also harnesses the power of GANs to generate synthetic data, effectively doubling the size of the dataset. This approach has led to a significant boost in segmentation accuracy, indicating the substantial impact of data augmentation in overcoming dataset size and diversity limitations.

Additionally, Integrating GAN-generated synthetic data significantly enhanced the performance of our segmentation model. We conducted a comprehensive evaluation to assess the quality and utility of the GAN’s synthetic echocardiogram images and masks generated. This evaluation involved both qualitative and quantitative analyses. Qualitatively, the synthetic images were subjected to rigorous inspections by clinical experts, who confirmed that the images exhibited high anatomical fidelity and closely resembled real echocardiograms. Quantitatively, the segmentation performance of the MultiResUNet model trained with both real and synthetic data was compared to a model trained exclusively with real data. The results showed a substantial improvement in segmentation accuracy, with the model achieving a Dice similarity coefficient of 95.68% and an IoU of 91.62%. Further, the integration of LOF with the GAN approach led to a slight improvement in performance, as evidenced by Table 4. This generation of synthetic data would be helpful in addressing the limitations of small and diverse training datasets.

Despite the excellent outcomes attained by the MultiResUNet model and the integration of GAN-based synthetic data augmentation, several limitations must be acknowledged. Firstly, although comprehensive, the reliance on the EchoNet-Dynamic dataset may not fully capture the diversity of real-world clinical scenarios. This data utilises LV segmentation in a single frame and therefore would not be a good predictor of EF in patients with arrythmias, who would require averaging of EF over 5 beats. This limitation could impact the model’s generalizability when applied to different populations or imaging conditions.

7. Conclusions and Future Work

This work advances left ventricular segmentation in cardiac images with the MultiResUNet architecture. Evaluated on the EchoNet Dynamics dataset, the model achieved a Dice coefficient of 95.68% and an IoU of 91.62%, showing strong generalization to unseen data. GAN-generated synthetic data, doubling the dataset size, significantly enhanced model performance. This underscores the value of synthetic data in medical imaging. Compared to models like DeepLabV3, ResNet, and Trans U-net, MultiResUNet achieved superior accuracy, setting a new standard in left ventricular segmentation. Overall, this research significantly advances cardiac image analysis, offering accurate segmentation and promising improvements in clinical decision-making and patient care.

Future work may focus on refining the MultiResUNet architecture, improving segmentation accuracy, and evaluating its adaptability to other cardiac imaging modalities such as MRI and 3D echocardiography. Integrating multi-modal information, such as ECGs and patient history, with cardiac image data can enhance the model’s understanding of cardiac anatomy and function, leading to more accurate clinical assessments.

Author Contributions

Conceptualization, V.K.; methodology, V.K. and N.M.S.; software, L.M. and V.K.; validation, V.K., N.M.S. and P.K.M.; formal analysis, V.K. and N.D. (Neelam Dahiya); investigation, N.M.S., P.K.M. and N.D. (Neelam Dahiya); resources, P.K.M. and N.D. (Neeti Dogra); data curation, N.D. (Neeti Dogra), V.K. and N.M.S.; writing—original draft preparation, V.K. and N.M.S.; writing—review and editing, P.K.M., N.D. (Neelam Dahiya), L.M. and F.A.; visualization, P.P. and N.D. (Neelam Dahiya); supervision, P.K.M. and N.D. (Neeti Dogra); project administration, P.K.M.; funding acquisition, N.D. (Neeti Dogra), F.A. and P.K.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The EchoNet-Dynamic dataset is publicly available at https://echonet.github.io/dynamic/index.html (accessed on 25 February 2024).

Acknowledgments

The authors would like to thank the Director of CSIR-CSIO for providing the necessary infrastructure and the staff of PGIMER for their support. The author, Vikas Kumar, acknowledges the UGC, India, for the fellowship.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Deng, K.; Meng, Y.; Gao, D.; Bridge, J.; Shen, Y.; Lip, G.; Zhao, Y.; Zheng, Y. Transbridge: A lightweight transformer for left ventricle segmentation in echocardiography. In Proceedings of the Simplifying Medical Ultrasound: Second International Workshop, ASMUS 2021, Strasbourg, France, 27 September 2021; Held in Conjunction with MICCAI 2021, Proceedings 2. Springer: Berlin/Heidelberg, Germany, 2021; pp. 63–72. [Google Scholar]
Ouyang, D.; He, B.; Ghorbani, A.; Lungren, M.P.; Ashley, E.A.; Liang, D.H.; Zou, J.Y. Echonet-dynamic: A large new cardiac motion video data resource for medical machine learning. In Proceedings of the NeurIPS ML4H Workshop, Vancouver, BC, Canada, 13 December 2019; pp. 1–11. [Google Scholar]
Barbosa, D.; Friboulet, D.; D’hooge, J.; Bernard, O. Fast tracking of the left ventricle using global anatomical affine optical flow and local recursive block matching. MIDAS J. 2014, 10, 17–24. [Google Scholar] [CrossRef]
Noble, J.A.; Boukerroui, D. Ultrasound image segmentation: A survey. IEEE Trans. Med. Imaging 2006, 25, 987–1010. [Google Scholar] [CrossRef] [PubMed]
Goshtasby, A.; Turner, D.A. Segmentation of cardiac cine MR images for extraction of right and left ventricular chambers. IEEE Trans. Med. Imaging 1995, 14, 56–64. [Google Scholar] [CrossRef] [PubMed]
Leclerc, S.; Grenier, T.; Espinosa, F.; Bernard, O. A fully automatic and multi-structural segmentation of the left ventricle and the myocardium on highly heterogeneous 2D echocardiographic data. In Proceedings of the 2017 IEEE International Ultrasonics Symposium (IUS), Washington, DC, USA, 6–9 September 2017; IEEE: New York, NY, USA, 2017; pp. 1–4. [Google Scholar]
Belous, G.; Busch, A.; Rowlands, D.; Gao, Y. Segmentation of the left ventricle in echocardiography using contextual shape model. In Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia, 30 November–2 December 2016; IEEE: New York, NY, USA, 2016; pp. 1–7. [Google Scholar]
Cousty, J.; Najman, L.; Couprie, M.; Clément-Guinaudeau, S.; Goissen, T.; Garot, J. Segmentation of 4D cardiac MRI: Automated method based on spatio-temporal watershed cuts. Image Vis. Comput. 2010, 28, 1229–1243. [Google Scholar] [CrossRef]
Afshin, M.; Ayed, I.B.; Punithakumar, K.; Law, M.; Islam, A.; Goela, A.; Peters, T.; Li, S. Regional assessment of cardiac left ventricular myocardial function via MRI statistical features. IEEE Trans. Med. Imaging 2013, 33, 481–494. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Salah, M.B.; Gu, B.; Islam, A.; Goela, A.; Li, S. Direct estimation of cardiac biventricular volumes with an adapted bayesian formulation. IEEE Trans. Biomed. Eng. 2014, 61, 1251–1260. [Google Scholar] [CrossRef] [PubMed]
Liao, M.; Lian, Y.; Yao, Y.; Chen, L.; Gao, F.; Xu, L.; Huang, X.; Feng, X.; Guo, S. Left Ventricle Segmentation in Echocardiography with Transformer. Diagnostics 2023, 13, 2365. [Google Scholar] [CrossRef] [PubMed]
Olivetti, N.; Sacilotto, L.; Moleta, D.B.; França, L.A.d.; Capeline, L.S.; Wulkan, F.; Wu, T.C.; Pessente, G.D.; Carvalho, M.L.P.d.; Hachul, D.T.; et al. Enhancing Arrhythmogenic Right Ventricular Cardiomyopathy Detection and Risk Stratification: Insights from Advanced Echocardiographic Techniques. Diagnostics 2024, 14, 150. [Google Scholar] [CrossRef] [PubMed]
Moradi, S.; Oghli, M.G.; Alizadehasl, A.; Shiri, I.; Oveisi, N.; Oveisi, M.; Maleki, M.; Dhooge, J. MFP-Unet: A novel deep learning based approach for left ventricle segmentation in echocardiography. Phys. Medica 2019, 67, 58–69. [Google Scholar] [CrossRef] [PubMed]
Liu, F.; Wang, K.; Liu, D.; Yang, X.; Tian, J. Deep pyramid local attention neural network for cardiac structure segmentation in two-dimensional echocardiography. Med. Image Anal. 2021, 67, 101873. [Google Scholar] [CrossRef] [PubMed]
Guo, L.; Lei, B.; Chen, W.; Du, J.; Frangi, A.F.; Qin, J.; Zhao, C.; Shi, P.; Xia, B.; Wang, T. Dual attention enhancement feature fusion network for segmentation and quantitative analysis of paediatric echocardiography. Med. Image Anal. 2021, 71, 102042. [Google Scholar] [CrossRef] [PubMed]
Ouyang, D.; He, B.; Ghorbani, A.; Yuan, N.; Ebinger, J.; Langlotz, C.P.; Heidenreich, P.A.; Harrington, R.A.; Liang, D.H.; Ashley, E.A.; et al. Video-based AI for beat-to-beat assessment of cardiac function. Nature 2020, 580, 252–256. [Google Scholar] [CrossRef]
Reynaud, H.; Vlontzos, A.; Hou, B.; Beqiri, A.; Leeson, P.; Kainz, B. Ultrasound video transformers for cardiac ejection fraction estimation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, 27 September–1 October 2021; Proceedings, Part VI 24. Springer: Berlin/Heidelberg, Germany, 2021; pp. 495–505. [Google Scholar]
Tsang, W.; Salgo, I.S.; Medvedofsky, D.; Takeuchi, M.; Prater, D.; Weinert, L.; Yamat, M.; Mor-Avi, V.; Patel, A.R.; Lang, R.M. Transthoracic 3D echocardiographic left heart chamber quantification using an automated adaptive analytics algorithm. JACC Cardiovasc. Imaging 2016, 9, 769–782. [Google Scholar] [CrossRef] [PubMed]
Dong, T.; Sunderland, N.; Nightingale, A.; Fudulu, D.P.; Chan, J.; Zhai, B.; Freitas, A.; Caputo, M.; Dimagli, A.; Mires, S.; et al. Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database. Bioengineering 2023, 10, 1307. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Gajjala, S.; Agrawal, P.; Tison, G.H.; Hallock, L.A.; Beussink-Nelson, L.; Lassen, M.H.; Fan, E.; Aras, M.A.; Jordan, C.; et al. Fully automated echocardiogram interpretation in clinical practice: Feasibility and diagnostic accuracy. Circulation 2018, 138, 1623–1635. [Google Scholar] [CrossRef]
Dong, S.; Luo, G.; Sun, G.; Wang, K.; Zhang, H. A left ventricular segmentation method on 3D echocardiography using deep learning and snake. In Proceedings of the 2016 Computing in Cardiology Conference (CinC), Vancouver, BC, Canada, 11–14 September 2016; IEEE: New York, NY, USA, 2016; pp. 473–476. [Google Scholar]
∅stvik, A. Automatic Analysis in Echocardiography Using Machine Learning; NTNU open: Gjøvik, Norway, 2021. [Google Scholar]
Oktay, O.; Ferrante, E.; Kamnitsas, K.; Heinrich, M.; Bai, W.; Caballero, J.; Cook, S.A.; De Marvao, A.; Dawes, T.; O‘Regan, D.P.; et al. Anatomically constrained neural networks (ACNNs): Application to cardiac image enhancement and segmentation. IEEE Trans. Med. Imaging 2017, 37, 384–395. [Google Scholar] [CrossRef] [PubMed]
Amer, A.; Ye, X.; Janan, F. ResDUnet: A deep learning-based left ventricle segmentation method for echocardiography. IEEE Access 2021, 9, 159755–159763. [Google Scholar] [CrossRef]
Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
Zeng, Y.; Tsui, P.H.; Pang, K.; Bin, G.; Li, J.; Lv, K.; Wu, X.; Wu, S.; Zhou, Z. MAEF-Net: Multi-attention efficient feature fusion network for left ventricular segmentation and quantitative analysis in two-dimensional echocardiography. Ultrasonics 2023, 127, 106855. [Google Scholar] [CrossRef] [PubMed]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. In Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada, 8–13 December 2014; Volume 2, pp. 2672–2680. [Google Scholar]
Ibtehaz, N.; Rahman, M.S. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. 2020, 121, 74–87. [Google Scholar] [CrossRef] [PubMed]
Yang, Q.; Li, Y.; Zhang, M.; Wang, T.; Yan, F.; Xie, C. Automatic segmentation of COVID-19 CT images using improved MultiResUNet. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; IEEE: New York, NY, USA, 2020; pp. 1614–1618. [Google Scholar]
Jha, D.; Smedsrud, P.H.; Riegler, M.A.; Johansen, D.; De Lange, T.; Halvorsen, P.; Johansen, H.D. Resunet++: An advanced architecture for medical image segmentation. In Proceedings of the 2019 IEEE international symposium on multimedia (ISM), San Diego, CA, USA, 9–11 December 2019; IEEE: New York, NY, USA, 2019; pp. 225–2255. [Google Scholar]
Zeng, Y.; Tsui, P.H.; Wu, W.; Zhou, Z.; Wu, S. MAEF-Net: Multi-attention efficient feature fusion network for deep learning segmentation. In Proceedings of the 2021 IEEE International Ultrasonics Symposium (IUS), Xi’an, China, 11–16 September 2021; IEEE: New York, NY, USA, 2021; pp. 1–4. [Google Scholar]

Figure 1. Procedural framework of the Methodology.

Figure 2. GAN Architecture.

Figure 3. The architecture of the MultiResUNet Model.

Figure 4. ResPath Block Structure.

Figure 5. Synthetic images with corresponding masks Generated through GAN.

Figure 6. Dice Coefficient vs. Epoch and IoU vs. Epoch Graph during training (left two plots) and validation (right two plots), respectively.

Figure 7. ROC curve for all test images.

Figure 8. (a) 2D projection plot of high-dimensional image features before and after GAN-based approach (b) Local Outlier Factor (LOF) anomaly detection. The blue points correspond to the original dataset, while the red points represent the synthetic images generated by the GAN. The black markers represent outliers detected using LOF.

Figure 9. Boxplots of 10 fold cross validation results of proposed approach.

Figure 10. MultiResUNet result vs. Expert Annotated LV Segmentation. Green represents true-positive pixels, red indicates false-positive pixels, and blue highlights false-negative pixels.

Table 1. Deep learning-based studies.

Bibliography	Dataset	Methods	Achieved Results
[3]	CETUS (MICCAI Challenge Dataset)	Utilized an active contour method with mathematical fitting	Achieved a Dice coefficient of 0.937
[20]	UCSF	Employed a Convolutional Neural Network within the conventional U-Net framework comprising 23 layers	Attained an IoU score of 0.891
[21]	CETUS (MICCAI Challenge Dataset)	Implemented an active snake technique enhanced by a Convolutional Neural Network encoder acting as a locator	Demonstrated modified Dice coefficients of 0.112 (ED) and 0.160 (ES)
[22]	1500 videos	Utilized a CNN model with U-Net architecture and supplementary training involving Kalman filtering	Reported Dice coefficients of 0.870 (CNN) and 0.860 (KF)
[23]	CETUS (MICCAI Challenge Dataset)	Employed a Convolutional Neural Network incorporating autoencoder architecture to align with the structure of the LV	Achieved Dice coefficients of 0.912 (ED) and 0.873 (ES)
[16]	EchoNet-Dynamic	Developed a Convolutional Neural Network using the Deeplab V3 architecture and atrous convolutions	Attained Dice coefficients of 0.927 (ED) and 0.903 (ES)
[24]	CAMUS DATASET	Created a Convolutional Neural Network with a combination of residual blocks and U-Net-based encoder-decoder architecture	Achieved a Dice coefficient of 0.951
[1]	EchoNet-Dynamic	Convolutional Neural Network with Transformer architecture connected with encoder and decoder	Demonstrated a Dice coefficient of 0.916
[25]	EchoNet-Dynamic (screened)	U-Net architecture with Transformer	Achieved a Dice coefficient of 0.925
[26]	EchoNet-Dynamic (screened)	EASPP module and channel-spatial dual attention mechanism with Convolutional Neural Network	Dice: 0.931 (LV)

Table 2. Dataset before and after GAN.

Dataset	Training	Validation	Testing
Original	7465	1288	1277
After GAN	14,930	2576	2524

Table 3. A comparative analysis of various deep learning architectures applied to left ventricular segmentation in echocardiograms.

Method	Dice Coefficient	Jaccard Index (IoU)	Precision	Accuracy	F1-Score	Area Error Ratio	Other Notes
UNet	0.89	0.81	0.88	0.91	0.88	0.15	Strong baseline, sensitive to noise
UNet++	0.91	0.84	0.90	0.93	0.91	0.12	Improved multi-scale segmentation
Attention-UNet	0.92	0.85	0.91	0.94	0.92	0.11	Better edge refinement
ResUNet	0.90	0.83	0.89	0.92	0.90	0.13	Efficient with residual connections
R50-AttnUNet	0.93	0.87	0.92	0.95	0.93	0.10	Uses EMA for precision
DeepLabv3+	0.94	0.88	0.93	0.96	0.94	0.09	Excellent for large datasets
YOLO-based	0.92	0.85	0.91	0.94	0.92	0.11	Optimized for speed
MultiResUNet	0.91	0.86	0.89	0.98	0.90	0.02	Ablation 1
MultiResUNet + ASPP + Attention (Without GAN )	0.92	0.87	0.91	0.98	0.91	0.02	Ablation 2
MultiResUNet + ASPP + Attention + GAN (Proposed approach)	0.96	0.92	0.99	0.99	0.98	0.02	Optimized for handling noises and variability in echocardiogram data

Table 4. Effect of LOF anomaly filtering on proposed model’s performance.

Model	Dice Coefficient	JaccardIndex (IoU)	Precision	Accuracy	F1-Score
Proposed Model (with GAN)	0.9568	0.9162	0.9898	0.9976	0.9879
Proposed Model (with GAN + LOF)	0.9582	0.9185	0.9901	0.9978	0.9883

Table 5. Table comparing various model for the LV segmentation using EchoNet-Dynamic. NA (Not Applicable) indicates that the study has not evaluated the IoU.

Authors	Methods	Year	Dataset	IoU	Dice
Ouyang et al. [16]	DeepLabV3 and ResNet	2020	EchoNet Dynamics	NA	91.50
Deng et al. [1]	Trans Bridge	2021	EchoNet Dynamics	NA	91.64
Chen et al. [25]	Trans U-net	2021	EchoNet Dynamics	NA	92.54
Minqi Liao et al. [11]	Swin Transformer and K-Net	2023	EchoNet Dynamics	86.78	92.92
Minqi Liao et al. [11]	Segformer Network	2023	EchoNet Dynamics	86.57	92.79
Yan Zeng et al. [31]	MAEF-Net	2023	EchoNet Dynamics	NA	93.10
	Proposed MultiResUnet	2024	EchoNet Dynamics and Synthetic Dataset	91.62	95.68

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kumar, V.; Sharma, N.M.; Mahapatra, P.K.; Dogra, N.; Maurya, L.; Ahmad, F.; Dahiya, N.; Panda, P. Enhancing Left Ventricular Segmentation in Echocardiograms Through GAN-Based Synthetic Data Augmentation and MultiResUNet Architecture. Diagnostics 2025, 15, 663. https://doi.org/10.3390/diagnostics15060663

AMA Style

Kumar V, Sharma NM, Mahapatra PK, Dogra N, Maurya L, Ahmad F, Dahiya N, Panda P. Enhancing Left Ventricular Segmentation in Echocardiograms Through GAN-Based Synthetic Data Augmentation and MultiResUNet Architecture. Diagnostics. 2025; 15(6):663. https://doi.org/10.3390/diagnostics15060663

Chicago/Turabian Style

Kumar, Vikas, Nitin Mohan Sharma, Prasant K. Mahapatra, Neeti Dogra, Lalit Maurya, Fahad Ahmad, Neelam Dahiya, and Prashant Panda. 2025. "Enhancing Left Ventricular Segmentation in Echocardiograms Through GAN-Based Synthetic Data Augmentation and MultiResUNet Architecture" Diagnostics 15, no. 6: 663. https://doi.org/10.3390/diagnostics15060663

APA Style

Kumar, V., Sharma, N. M., Mahapatra, P. K., Dogra, N., Maurya, L., Ahmad, F., Dahiya, N., & Panda, P. (2025). Enhancing Left Ventricular Segmentation in Echocardiograms Through GAN-Based Synthetic Data Augmentation and MultiResUNet Architecture. Diagnostics, 15(6), 663. https://doi.org/10.3390/diagnostics15060663

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Left Ventricular Segmentation in Echocardiograms Through GAN-Based Synthetic Data Augmentation and MultiResUNet Architecture

Abstract

1. Introduction

2. Related Work

3. Data

4. Methodology

4.1. Data Augmentation with GAN

4.2. Segmentation Architecture

4.2.1. MultiRes Block

4.2.2. ResPath

4.2.3. SELU Activation Function

4.2.4. ASPP

4.2.5. Attention

4.3. Evaluation Matrices

5. Experiment and Results

5.1. GAN Architecture

5.2. Segmentation Architecture (MultiResUnet)

5.2.1. Dataset

5.2.2. Training Process

5.2.3. Evaluation Metrics

5.3. Implementation Results of Segmentation Architecture

5.3.1. Training Phase

5.3.2. Validation Phase

5.3.3. Testing Phase

5.3.4. Comparison of Other Methods

5.3.5. 2D Projection with LOF Anomaly Detection

6. Discussion

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI