1. Introduction
According to the Centers for Disease Control and Prevention (CDC) in the United States [
1], one in four individuals with heart failure die each year, with most cases linked to dysfunction in the left ventricle (LV). The LV plays a crucial role in distributing oxygenated blood throughout the body via the aortic valve. Any malfunction in this process can lead to severe complications within the circulatory system and other organs. As reported by Berman et al. [
1], heart failure often results from compromised LV function, typically due to structural changes in the ventricular wall or the inability of the LV to fill or eject blood effectively. Patients with cardiac disease frequently experience symptoms such as dyspnea, fatigue, and fluid retention, which can further progress to ischemia, muscle disease, pulmonary congestion, and elevated heart rate. To assess ventricular function, a range of imaging and signal processing methods are currently available, including physical examination, X-rays, electrocardiogram (ECG), magnetic resonance imaging (MRI), and echocardiography (ultrasound). Among these, echocardiography provides valuable insights into both systolic and diastolic LV function, ventricular morphology, and conditions such as aneurysms, along with mitral, tricuspid, aortic, and pulmonary valve function [
2]. Due to its non-invasive nature and excellent cost–benefit ratio, echocardiography is widely used in clinical practice for evaluating ventricular function [
2].
However, accurately defining the LV contour and shape remains a critical challenge for diagnosing heart failure. Computational methods have emerged to support cardiologists in producing more precise and efficient diagnoses. Currently, deep-learning-based approaches, particularly those employing semantic segmentation like convolutional neural networks (CNNs), have shown promising results. Nevertheless, these methods can produce anatomically inconsistent or noisy LV contours, including implausible segmentations with irregular or disconnected regions that do not correspond to the expected LV morphology. Such artifacts, commonly referred to as
blobs in medical image analysis, result from pixel-level misclassification errors inherent in semantic segmentation approaches and may significantly compromise clinical reliability. In this work, we address these limitations by presenting a new method for LV segmentation, and the novelty of this work lies in three key contributions. First, we introduce ShapeNet, a specialized ensemble of CNNs that directly predicts both the pose parameters (rotation, translation, and scale) and shape deformation parameters of a statistical shape model, eliminating the artifacts produced by pixel misclassification in semantic segmentation methods (blobs). Second, we develop an improved ASM that optimizes a global objective function based on concatenated gray-level profiles, demonstrating superior capture range and robustness compared to traditional ASM approaches. Third, our fully automatic pipeline uniquely combines these components to generate anatomically plausible contours, without manual initialization—a significant advantage over semi-automatic methods like BEASM [
3]. This integrated approach maintains the flexibility of data-driven deep learning, while utilizing the anatomical validity offered by shape models, as evidenced by our consistent performance across both CAMUS and independent EchoNet datasets.
5. Discussion
This study demonstrated that the proposed ShapeNet + ASM method achieved robust and competitive segmentation performance for left ventricle (LV) contours in echocardiography. Our results, evaluated across the CAMUS and EchoNet datasets, highlighted the method’s capability to generate statistically valid and anatomically accurate LV contours. The use of two datasets allowed us to employ the majority of the available images in the CAMUS dataset, to maximize the number of training patterns. We initially reserved a limited number of images to test our algorithm. To further test the accuracy of our proposal we used an independent unseen dataset, EchoNet Dynamic, which comprises a total of 417 images spanning both systole and diastole. As expected, the accuracy of ventricle segmentation was higher for the small test set CAMUS, and slightly lower for the independent test set EchoNet (
Table 2,
Table 3 and
Table 5), which provides more representative values for accuracy than can be expected during clinical use.
The approach implemented in ShapeNet, where the parameters of a statistical shape model of the organ of interest are optimized with a convolutional neural network, provides restrictions that contribute to the explicability of the final segmentation results. All shapes produced were statistically valid organ shapes. As observed in
Figure 11 and
Figure 12, the results were always smoothed ventricle shapes located closely to the LV in the echocardiography, with a scale and rotation approximate to the expert annotation. However, for deformations, in some cases, the predicted values of the deformation vector
b were not as accurate, which was reflected in a higher Hausdorff distance (see
Table 2).
Our proposed ASM was used to improve the accuracy of the final segmentation of the LV. This ASM proved to be more accurate than the original ASM reported in [
30]. In our capture range tests (see
Figure 10), the improved ASM produced smaller mean values for the Hausdorff distance. Additionally, in some cases, when the initialization pose values were far from the ventricle contour, the original ASM failed to converge, causing run-time errors when the LV model grew outside the image.
Table 1 shows that, for the improved ASM, the number of run-time errors was exceptionally low compared to the ASM reported in [
30]. The improvements in accuracy and robustness of our ASM are most likely due to the use of all the gray level profiles concatenated in a single vector, as well as the objective function, which together provide the means to evaluate the image fitting of a whole ventricle contour, instead of the local adjustment point-by-point performed in the original ASM, and this was reflected in the final segmentation of the LV for the EchoNet dataset, as seen in
Table 5 and in
Figure 13 and
Figure 14.
Table 4 presents a comparison of our results against the methods proposed in [
3], specifically the BEASM approaches, the fully automatic BEASM (BEASM-fully) and the semi-automatic BEASM (BEASM-semi), which is manually initialized at three points: two at the LV base and one at the apex, alongside the ACNN method [
26], which incorporates shape constraints. ShapeNet + improved ASM, being a fully automatic method, demonstrated competitive performance. Although BEASM-semi and ACNN achieved slightly higher Dice scores, the results in
Table 4 indicate that our method achieved the lowest Hausdorff distance for both systole and diastole phases, demonstrating improved characterization of the LV contour over the methods listed in
Table 4.
The comparison with our in-house U-Net highlights that our method achieved competitive Dice scores and lower Hausdorff distances under the same training conditions (see
Table 5). Specifically, the statistical analysis in
Table 6 shows that for Dice scores, our method exhibited statistically significant improvements over the in-house U-Net. In contrast, the Hausdorff distance results reveal large effect sizes of
for systole and
for diastole, meaning a substantial reduction in spatial errors was achieved by our approach. These large effect sizes support the fact that our method produced a statistically valid LV contour closely aligned with expert annotations, demonstrating the robustness of our approach, even with a limited training set and the challenges of ultrasound imaging, particularly the presence of speckle noise. The added advantage of statistical shape constraints in our method reduced misclassified regions more effectively than traditional semantic classification, as depicted in
Figure 15. While our method may not always have reached peak precision, it consistently achieved better segmentation performance on an entirely new dataset (EchoNet), as demonstrated in
Table 5.
In
Figure 16 are shown histograms of the ShapeNet + improved ASM approach for the Dice coefficient and Hausdorff distance values for systole and diastole for the EchoNet dataset. It can be observed that 75.84% and 77.61% of the cases for systole and diastole, respectively, fell within the range of 0.8 to 1. Similarly, for the Hausdorff distance, the majority of values were concentrated between 0 and 10 pixels, 68.11% and 66.66%, for systole and diastole, respectively. The density concentrated in the previously mentioned values demonstrate that our algorithm performed well and that the results were consistent, even with a completely new dataset, as was the case with EchoNet, in contrast to the methods shown in
Table 4, which were trained and tested on images from the same dataset.
Despite the strengths of the proposed method, some limitations should be acknowledged. The CAMUS dataset, although it was augmented to 4800 images, remains relatively small for deep learning applications, and this may have affected the model’s generalization. In addition, the ShapeNet + improved ASM automatic initialization introduces a dependency that may reduce accuracy if there are significant deviations in initial conditions, such as patient positioning or heart orientation. The effectiveness of the improved ASM remains dependent on ShapeNet’s initialization quality. While ShapeNet’s ensemble architecture and the PDM’s shape constraints help mitigate this dependency, extreme imaging artifacts or anatomical anomalies may still lead to suboptimal ASM convergence. This limitation was evidenced in the EchoNet HD variance (10.21 px), where acoustic shadowing occasionally compromised initialization (
Figure 12, bottom right). Nevertheless, our experimental results demonstrated that this combined approach remains clinically viable. ShapeNet’s parameter regression achieved Dice scores superior to 0.65, even in challenging cases (
Table 2), providing sufficiently accurate initialization for ASM refinement. Furthermore, the ASM’s global objective function effectively corrected local errors when initialization was imperfect, as demonstrated by its superior performance compared to the original ASM (
Figure 10).
On the other hand, the model’s reliance on a convolutional neural network ensemble and active shape models demands substantial computational resources for both training and inference. Each network in the ShapeNet ensemble was trained for a maximum of 50 epochs with early stopping (validation patience = 40 iterations). With our hardware configuration (NVIDIA Tesla K40c/T4), the average training time per epoch was approximately 20 min for each specialized CNN (rotation, translation, scale, and shape parameters). For comparison, our in-house U-Net implementation reached early stopping at 38 epochs with longer epoch durations of 41 min on average. Although we tried to make it a lightweight model, this requirement may limit accessibility and real-time application in clinical settings without the availability of modern GPUs. Regarding this issue, we conducted performance tests on a current gaming computer with the following specifications: 24 GB of RAM, an 8th generation Core i7 processor, and an NVIDIA GeForce GTX 1060 with Max Q with 6 GB, achieving offline segmentation of 207 LV images in an average of 18 min (5.27 s per image).
Concerning the morphology of the ventricle, although ShapeNet generally performs well, the segmentation accuracy during the systolic phase may be compromised due to increased left ventricular deformation during contraction, along with greater deformation observed in the diastolic phase, as indicated by higher Hausdorff distances in some cases. Our method showed enhanced robustness with limited training data, offering a statistically grounded alternative to traditional segmentation models.
Future work will involve exploring the incorporation of additional datasets to further enhance the model’s inference capabilities. Additionally, we plan to investigate techniques such as model pruning and knowledge distillation to reduce computational demands, while preserving the performance advantages of the ensemble.
6. Conclusions
In this paper, we proposed a new scheme for automatic LV segmentation in echocardiography images. The proposal consists of two stages: ShapeNet, which is an ensemble of CNNs to predict pose and shape parameters; and an improved ASM, which is initialized with the parameters estimated by the ensemble of neural networks, in order to fine-tune the LV contours for improved segmentation. Our study demonstrated that integrating ShapeNet with an improved ASM enhanced LV segmentation accuracy and effectively prevented blob artifacts commonly found in semantic segmentation. Our algorithm was tested on two different datasets, CAMUS and EchoNet, providing an overall Dice coefficient of 0.83 and a Hausdorff distance of 7.36 pixels for both systole and diastole.
A major strength of our approach is its ability to automatically generate statistically valid shapes, offering a new perspective on the utilization of convolutional neural networks (CNNs) in medical imaging. When combined with the improved ASM, which outperformed traditional ASM techniques, our method provided competitive and accurate fitting of the LV contour compared to existing shape-based methods, as evidenced by higher Dice coefficients and lower Hausdorff distances in ultrasound images. Notably, we demonstrated that the ShapeNet + ASM approach was more robust with a limited training set than traditional semantic segmentation methods such as U-Net under the same conditions. Despite these strengths, limitations should be considered, including the need for substantial computational resources, the limited availability of training images, and the complexity of ultrasound images, due to speckle noise and heart morphology variations in both systolic and diastolic phases.
The ShapeNet and improved ASM methodologies presented here offer a promising alternative to semantic segmentation CNNs in medical image analysis. This approach, based on statistically valid shape adjustment, holds potential for broad applications in automated medical image analysis and clinical decision-making.