Cascaded Dual Stage U-Net with Texture-Aware Feature Fusion for Unified Segmentation and Classification in Echo-Cardiogram Images
Abstract
1. Introduction
Contributions of This Work
2. Background and Related Works
3. Research Methodology
3.1. Data Collection
- ∗
- This study uses publicly available, reputable repositories of medical images (e.g., the Cancer Imaging Archive (TCIA), BraTS (Brain Tumour Segmentation), and ISIC (International Skin Imaging Collaboration)) and specialized repositories of labeled MRI, CT, and dermoscopic images. These datasets cover a range of pathological conditions (gliomas, skin lesions, lung nodules) and support multiclass classification and segmentation tasks.
- ∗
- Each dataset includes ground-truth annotations from medical professionals, essential for supervised learning. These annotations serve as segmentation masks delineating regions of interest (ROIs), such as tumor boundaries or lesion areas, enabling pixel-wise classification. The datasets are carefully organized to balance classes and mitigate class imbalance, which can otherwise degrade model performance.
- ∗
- Preprocessing is crucial in the data collection pipeline. It includes normalizing images to a standard input size compatible with the Cascaded dual-stage U-Net architecture, reducing noise, and applying augmentations such as rotation, flipping, contrast enhancement, and histogram equalization. These steps significantly improve the texture quality of medical images, which is particularly useful for the framework’s texture-focused deep learning component.
3.2. Data Pre-Processing
- ∗
- Then, intensity normalization is applied to standardize pixel values in each image, typically to the [0, 1] range. When contrast and brightness are consistent across multiple modalities, the model focuses on structural and textural features rather than lighting variations. This normalization is beneficial for multimodal datasets, such as MRI, CT, and dermoscopy images.
- ∗
- Image registration alignsimages from different modalities or time points into a common coordinate system. This spatial alignment enables precise pixel-level comparisons, which are essential for tasks such as segmentation, where anatomical consistency is critical.
- ∗
- Data augmentation techniques increase dataset diversity and improve generalization. These methods include geometric transformationssuch as rotation, horizontal and vertical flips, scaling, translation, and elastic deformation. These augmentations mimic anatomical and pathological variations, helping reduce overfitting by exposing the model to a broader range of examples.
3.3. Segmentation Using Cascaded Dual-Stage U-Net
- Coarse Segmentation U-Net (U1): The first stage learns global spatial context and identifies the region of interest (ROI). Because of its reduced resolution, U1 is well-suited to capturing large-scale anatomical structures and is less sensitive to local noise. The output of this step is an intermediate coarse segmentation mask (Ŝ(c)). This mask is used for ROI localization, multitask loss computation, and regularization of the refined segmentation network.
- Refined Segmentation U-Net (U2): The coarse segmentation mask from U1 is upsampled and concatenated with the original-resolution image (H × W). This combined representation is fed into the second U-Net, which operates at full resolution. By leveraging both the original image details and the spatial guidance from the coarse mask, U2 refines boundary delineation and recovers delicate anatomical structures. The result is the final refined segmentation mask (Ŝ), representing the definitive segmentation for further investigation. This cascaded design jointly addresses global localization and fine-scale refinement, overcoming the limitations of single-stage segmentation architectures.
3.3.1. Texture-Aware Feature Fusion
3.3.2. Classification Network
3.3.3. Implementation Details
U-Net Architecture Details
3.3.4. Training Protocol
3.3.5. Baseline Methods and Fairness of Comparison
3.4. Feature Extraction Using Colour Co-Occurrence Matrix (CCM)
3.4.1. Contrast Quantifies the Luminance Difference Between Adjacent Pixels
3.4.2. Entropy Quantifies the Unpredictability or Complexity of Textures
3.5. Classification Using Convolutional Neural Network
3.6. Mathematical Formulation
3.6.1. Cascaded Dual-Stage U-Net Segmentation
3.6.2. Texture Feature Extraction and Fusion
3.6.3. Classification
3.6.4. Loss Functions
3.6.5. Multi-Task Loss
3.7. Dataset Description
4. Results and Discussion
4.1. Explicit Dataset Description
4.2. Performance Evaluation and Comparison
4.3. Ablation Study
Generalizability and Clinical Deployment Considerations
4.4. Comparative Study with State-of-the-Art Methods
4.5. Robustness Analysis Under Noise Interference
Quantitative Performance Under Noise
- Cascaded dual-stage refinement: The first U-Net is coarse yet less sensitive to high-frequency noise, and the second U-Net uses spatial guidance to optimize boundaries and reduce error propagation.
- Edge-preserving preprocessing: Median and Gaussian filtering reduce noise while preserving diagnostic-related boundaries, making them very useful for speckle-prone ultrasound images.
- Texture-aware CCM fusion: Second-order statistical texture descriptions are relatively robust to moderate noise, which provides added robustness in the context of partially corrupted deep features.
- Multi-task optimization: Joint segmentation–classification learning enforces consistent anatomical representations, serving as an implicit regularizer under noisy conditions.
- X-axis (Noise Intensity):
- Y-axis (Dice Coefficient %):
- Noise Models Evaluated:
- X-axis (Noise Intensity):
- Y-axis (Classification Accuracy %):
- Noise Types Evaluated:
- Clean Image. The original input image is noise-free, shows clear intensity contrast, and has well-defined object boundaries. This image serves as the reference input for evaluating the effect of noise corruption.
- Noisy Image (High Noise). In the second image, high noise causes severe visual degradation. The noise obscures boundary information and reduces image contrast, reflecting complex clinical acquisition issues, such as low signal-to-noise ratios or sensor artifacts.
- Predicted Segmentation. The segmentation result from the proposed framework was applied to the noisy image. Although the output remains noisy, it preserves the shape, location, and boundary consistency of the target structure, demonstrating robustness to noise perturbations.
- Ground Truth. The appropriate expert-annotated segmentation mask serves as a perfect reference. Given the tight visual correspondence between the predicted segmentation and the ground truth, the model appears better able to preserve discriminative structural details, even under adverse imaging conditions.

4.6. Statistical Significance and Stability Analysis
5. Conclusions and Future Scope
Limitations and Clinical Feasibility
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Hariobulesu, P.; Shaik, F. Enhanced multi-grade diabetic retinopathy detection and classification via ensembled deep learning model from retinal fundus images. Expert Syst. Appl. 2025, 285, 128116. [Google Scholar] [CrossRef]
- Giri, J.; Sathish, T.; Sheikh, T.; Sunheriya, N.; Giri, P.; Chadge, R.; Mahatme, C.; Parthiban, A. Automatic liver segmentation using U-Net deep learning architecture for additive manufacturing. Interactions 2024, 245, 90. [Google Scholar] [PubMed]
- Kumthekar, A.; Reddy, G.R. An integrated deep learning framework of U-Net and inception module for cloud detection of remote sensing images. Arab. J. Geosci. 2021, 14, 1900. [Google Scholar] [CrossRef]
- Kumar, S.; Choudhary, S.; Jain, A.; Singh, K.; Ahmadian, A.; Bajuri, M.Y. Brain tumour classification using a deep neural network and transfer learning. Brain Topogr. 2023, 36, 305–318. [Google Scholar]
- Salam, A.A.; Akram, M.U.; Yousaf, M.H.; Rao, B. DermaTransNet: Where Transformer Attention Meets U-Net for Skin Image Segmentation. IEEE Access 2025, 13, 64305–64329. [Google Scholar]
- Zhao, X.; Zhang, P.; Song, F.; Fan, G.; Sun, Y.; Wang, Y.; Tian, Z.; Zhang, L.; Zhang, G. D2A U-Net: Automatic segmentation of COVID-19 CT slices based on dual attention and hybrid dilated convolution. Comput. Biol. Med. 2021, 135, 104526. [Google Scholar] [CrossRef]
- Ilhan, A.; Sekeroglu, B.; Abiyev, R. Brain tumour segmentation in MRI images using nonparametric localization and enhancement methods with U-net. Int. J. Comput. Assist. Radiol. Surg. 2022, 17, 589–600. [Google Scholar] [CrossRef]
- Hoang, L.; Lee, S.-H.; Lee, E.-J.; Kwon, K.-R. Multiclass skin lesion classification using a novel lightweight deep learning framework for smart healthcare. Appl. Sci. 2022, 12, 2677. [Google Scholar] [CrossRef]
- Shiny, K.V. Brain tumour segmentation and classification using optimised U-Net. Imaging Sci. J. 2024, 72, 204–219. [Google Scholar]
- Fakheri, S.; Yamaghani, M.; Nourbakhsh, A. A DenseNet-based deep learning framework for automated brain tumour classification. Healthcraft. Front. 2024, 2, 188–202. [Google Scholar]
- Liu, Y.; Feng, Y.; Cheng, J.; Zhan, H.; Zhu, Z. ManbaDiff: Mamab-Enhanced Diffusion Model for 3D Medical Image Segmentation. IEEE Trans. Image Process. 2025, 34, 5761–5775. [Google Scholar] [CrossRef]
- Liu, J.; Yang, H.; Zhou, H.-Y.; Yu, L.; Liang, Y.; Yu, Y.; Zhang, S.; Zheng, H.; Wang, S. Swin-UMamba: Adapting Mamba-Based Vision Foundation Models for Medical Image Segmentation. IEEE Trans. Med. Imaging 2024, 44, 3898–3908. [Google Scholar] [CrossRef] [PubMed]
- Lumetti, L.; Pipoli, V.; Marchesini, K.; Ficarra, E.; Grana, C.; Bolelli, F. Taming Mambas for 3D Medical Image Segmentation. IEEE Access 2025, 13, 89748–89759. [Google Scholar] [CrossRef]
- Vafaeezadeh, M.; Behnam, H.; Gifani, P. Ultrasound Image Analysis with Vision Transformers—Review. Diagnostics 2024, 14, 542. [Google Scholar] [CrossRef] [PubMed]
- Huang, K.-C.; Lin, C.-E.; Lin, D.S.-H.; Lin, T.; Wu, C.-K.; Jeng, G.-S.; Lin, L.-Y.; Lin, L.-C. Video Transformer for Segmentation of Echocardiography Images in Myocardial Strain Measurement. J. Imaging Inform. Med. 2025. [Google Scholar] [CrossRef]
- Zhu, Z.; Zhang, Z.; Qi, G.; Li, Y.; Yang, P.; Liu, Y. Probability Map-Guided Network for 3D Volumetric Medical Image Segmentation. IEEE Trans. Image Process. 2025, 34, 7222–7234. [Google Scholar] [CrossRef]
- Zhu, Z.; He, X.; Qi, G.; Li, Y.; Cong, B.; Liu, Y. Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal MRI. Inf. Fusion 2023, 91, 376–387. [Google Scholar] [CrossRef]
- Tran, T.-H.; Vu, D.H.; Tran, D.H.; Do, K.L.; Nguyen, P.T.; Nguyen, V.T.; Nguyen, L.T.; Ho, N.K.; Vu, H.; Dao, V.H. DCS-UNet: Dual-path framework for segmentation of reflux esophagitis lesions from endoscopic images with U-Net-based segmentation and colour/texture analysis. Vietnam J. Comput. Sci. 2023, 10, 217–242. [Google Scholar] [CrossRef]
- Inan, N.G.; Kocadağlı, O.; Yıldırım, D.; Meşe, İ.; Kovan, Ö. Multi-class classification of thyroid nodules from automatic segmented ultrasound images: Hybrid ResNet based UNet convolutional neural network approach. Comput. Methods Programs Biomed. 2024, 243, 107921. [Google Scholar] [CrossRef]
- Garia, L.; Muthusamy, H. Dual-Tree Complex Wavelet Pooling and Attention-Based Modified U-Net Architecture for Automated Breast Thermogram Segmentation and Classification. J. Imaging Inform. Med. 2025, 38, 887–901. [Google Scholar] [CrossRef]
- Dabass, M.; Dabass, J.; Vashisth, S.; Vig, R. A hybrid U-Net model with attention and advanced convolutional learning modules for simultaneous gland segmentation and cancer grade prediction in colorectal histopathological images. Intell. Med. 2023, 7, 100094. [Google Scholar] [CrossRef]
- Naveena, T.; Jerine, S. DOTHE based image enhancement and segmentation using U-Net for effective prediction of human skin cancer. Multimed. Tools Appl. 2024, 83, 75147–75169. [Google Scholar] [CrossRef]
- Brady, L.; Wang, Y.-N.; Rombokas, E.; Ledoux, W.R. Comparison of texture-based classification and deep learning for plantar soft tissue histology segmentation. Comput. Biol. Med. 2021, 134, 104491. [Google Scholar] [CrossRef] [PubMed]
- Salih, F.A.A.; Mohammed, S.T.; Tofiq, T.A.; Mohammed, H.J. An Effective Computer-aided diagnosis Technique for Alzheimer’s Disease Classification using U-net-based Deep Learning. UHD J. Sci. Technol. 2025, 9, 34–43. [Google Scholar]
- Madhu, G.; Bonasi, A.M.; Kautish, S.; Almazyad, A.S.; Mohamed, A.W.; Werner, F.; Hosseinzadeh, M.; Shokouhifar, M. UCapsNet: A Two-Stage Deep Learning Model Using U-Net and Capsule Network for Breast Cancer Segmentation and Classification in Ultrasound Imaging. Cancers 2024, 16, 3777. [Google Scholar] [CrossRef]
- Kim, S.; Jin, P.; Chen, C.; Kim, K.; Lyu, Z.; Ren, H.; Li, Q. MediViSTA: Medical Video Segmentation via Temporal Fusion SAM Adaptation for Echocardiography. arXiv 2024, arXiv:2309.13539. [Google Scholar] [CrossRef]
- Holste, G.; Oikonomou, E.K.; Mortazavi, B.J.; Wang, Z.; Khera, R. Efficient Deep Learning-based Automated Diagnosis from Echocardiography with Contrastive Self-Supervised Learning (EchoCLR). Commun. Med. 2024, 4, 33. [Google Scholar]
- Ferreira, D.L. Self-supervised segmentation for cardiac ultrasound. Nat. Commun. 2025, 16, 4070. [Google Scholar]
- El-Taraboulsi, J.; Cabrera, C.P.; Roney, C.; Aung, N. Deep neural network architectures for cardiac image segmentation. Artif. Intell. Life Sci. 2023, 4, 100083. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar] [CrossRef]
- Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Rueckert, D. Attention U-Net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Zhou, Y. TransUNet: Transformers Make Strong Encoder for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
- Seetharam, K.; Thyagaturu, H.; Ferreira, G.L.; Patel, A.; Patel, C.; Elahi, A.; Pachulski, R.; Shah, J.; Mir, P.; Thodimela, A.; et al. Broadening Perspectives of Artificial Intelligence in Echocardiography. Cardiol. Ther. 2024, 13, 267–279. [Google Scholar] [CrossRef]














| Preprocessing Step | Clinical Justification | CCM Impact |
|---|---|---|
| Normalization | Reduces scanner variability | Preserves relative texture contrast |
| Registration | Aligns anatomy across views | Stabilises pixel co-occurrence |
| Noise filtering | Suppresses speckle | Prevents false texture entropy |
| Rotation/flip | Simulates probe angle | Maintains spatial texture patterns |
| Avoid elastic warp | Prevents anatomy distortion | Protects CCM statistics |
| View | Dataset Path | No. of Samples | Spatial Resolution | Data Type | Description |
|---|---|---|---|---|---|
| 2CH | /train/2ch/frames | 900 | 384 × 384 × 1 | Float32 | Preprocessed grayscale two-chamber (2CH) echocardiographic frames |
| 2CH | /train/2ch/masks | 900 | 384 × 384 × 1 | Int32 | Expert-annotated segmentation masks corresponding to 2CH frames |
| 4CH | /train/4ch/frames | 900 | 384 × 384 × 1 | Float32 | Preprocessed grayscale four-chamber (4CH) echocardiographic frames |
| 4CH | /train/4ch/masks | 900 | 384 × 384 × 1 | Int32 | Expert-annotated segmentation masks corresponding to 4CH frames |
| Configuration | MV Dice (%) | MV IoU (%) | AV Dice (%) | AV IoU (%) | Accuracy (%) | F1-Score (%) |
|---|---|---|---|---|---|---|
| Single U-Net (Baseline) | 72.3 | 65.8 | 70.1 | 64.0 | 95.1 | 90.6 |
| Separate Training | 73.7 | 67.2 | 71.6 | 65.5 | 96.5 | 92.9 |
| Without Feature Fusion | 75.1 | 68.9 | 73.0 | 67.0 | 96.2 | 92.0 |
| Without Texture Features | 76.5 | 70.2 | 74.1 | 68.1 | 96.5 | 92.7 |
| Proposed (Cascaded dual-stage U-Net + CCM) | 81.3 | 75.5 | 78.0 | 72.3 | 98.2 | 95.6 |
| Sl. No. | Cardiac Structure | Normal Reference Values | Abnormal Severity Classification |
|---|---|---|---|
| 1 | Mitral Valve | Valve Area: 4.0–6.0 cm2 Mean Pressure Gradient: <2 mmHg | Mild stenosis: 1.5–2.0 cm2 Moderate stenosis: 1.0–1.5 cm2 Severe stenosis: <1.0 cm2 |
| 2 | Aortic Valve | Valve Area: 3.0–4.0 cm2 Peak Velocity (Vmax): <2.0 m/s Mean Pressure Gradient: <10 mmHg | Mild stenosis: 1.5–2.0 cm2 Moderate stenosis: 1.0–1.5 cm2 Severe stenosis: <1.0 cm2 |
| Training → Testing Dataset | MV Dice (%) | MV IoU (%) | AV Dice (%) | AV IoU (%) | Accuracy (%) | F1-Score (%) |
|---|---|---|---|---|---|---|
| CAMUS → Current Dataset | 74.5 | 68.2 | 71.0 | 65.0 | 96.2 | 91.8 |
| PROVAR → Current Dataset | 76.8 | 70.4 | 73.2 | 67.5 | 96.7 | 92.9 |
| Martins → Current Dataset | 75.9 | 69.5 | 72.6 | 66.8 | 96.5 | 92.5 |
| Current Dataset (In-domain) | 81.3 | 75.5 | 78.0 | 72.3 | 98.2 | 95.6 |
| Method/Dataset | MV Dice (%) | MV IoU (%) | AV Dice (%) | AV IoU (%) | Accuracy (%) | F1-Score (%) |
|---|---|---|---|---|---|---|
| U-Net [30] | 71.8 | 65.8 | 70.1 | 64.0 | 95.2 | 90.1 |
| Attention U-Net [31] | 73.6 | 67.2 | 71.6 | 65.5 | 96.0 | 91.7 |
| DenseNet–U-Net Hybrid [10] | 75.2 | 68.9 | 73.0 | 67.0 | 96.5 | 92.5 |
| TransUNet [32] | 76.4 | 70.2 | 74.1 | 68.1 | 96.9 | 93.3 |
| Proposed (Cascaded dual-stage U-Net + CCM, Current Dataset) | 82.9 | 75.5 | 78.0 | 72.3 | 98.5 | 96.2 |
| Proposed (Cascaded dual-stage U-Net + CCM, CAMUS) | 78.5 | 71.0 | 74.5 | 68.2 | 97.1 | 93.8 |
| Proposed (Cascaded dual-stage U-Net + CCM, PROVAR) | 76.9 | 70.4 | 73.2 | 67.5 | 96.7 | 93.0 |
| Proposed (Cascaded dual-stage U-Net + CCM, Martins) | 75.2 | 69.5 | 72.6 | 66.8 | 96.0 | 92.4 |
| Swin-UNet | 77.1 | 71.0 | 74.8 | 68.6 | 97.0 | 93.9 |
| UNETR | 76.5 | 70.3 | 74.0 | 68.0 | 96.8 | 93.2 |
| Noise Type | Noise Level | Dice (%) | IoU (%) | Accuracy (%) | F1-Score (%) |
|---|---|---|---|---|---|
| No Noise | – | 82.9 | 75.5 | 98.5 | 96.2 |
| Gaussian | σ = 0.01 | 81.8 | 74.6 | 97.9 | 95.6 |
| Gaussian | σ = 0.03 | 80.2 | 72.8 | 97.1 | 94.7 |
| Gaussian | σ = 0.05 | 78.6 | 71.1 | 96.3 | 93.8 |
| Speckle | v = 0.02 | 81.4 | 74.2 | 97.6 | 95.2 |
| Speckle | v = 0.05 | 79.5 | 72.0 | 96.8 | 94.1 |
| Speckle | v = 0.10 | 77.8 | 70.3 | 96.0 | 93.2 |
| Salt & Pepper | p = 0.01 | 80.9 | 73.8 | 97.4 | 95.0 |
| Salt & Pepper | p = 0.03 | 79.0 | 71.9 | 96.6 | 94.0 |
| Salt & Pepper | p = 0.05 | 77.2 | 70.0 | 95.8 | 92.9 |
| Method | Dice (%) | IoU (%) | Accuracy (%) | F1-Score (%) |
|---|---|---|---|---|
| U-Net | 71.8 ± 1.9 | 65.8 ± 1.7 | 95.2 ± 0.8 | 90.1 ± 1.2 |
| Attention U-Net | 73.6 ± 1.6 | 67.2 ± 1.5 | 96.0 ± 0.6 | 91.7 ± 1.0 |
| Dense U-Net Hybrid | 75.2 ± 1.4 | 68.9 ± 1.3 | 96.5 ± 0.5 | 92.5 ± 0.9 |
| TransUNet | 76.4 ± 1.3 | 70.2 ± 1.2 | 96.9 ± 0.4 | 93.3 ± 0.8 |
| Proposed Method | 82.9 ± 0.9 | 75.5 ± 0.8 | 98.5 ± 0.3 | 96.2 ± 0.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jagadish, A.N.; Manjunath, R.; Krishnamurthy, I. Cascaded Dual Stage U-Net with Texture-Aware Feature Fusion for Unified Segmentation and Classification in Echo-Cardiogram Images. Informatics 2026, 13, 84. https://doi.org/10.3390/informatics13060084
Jagadish AN, Manjunath R, Krishnamurthy I. Cascaded Dual Stage U-Net with Texture-Aware Feature Fusion for Unified Segmentation and Classification in Echo-Cardiogram Images. Informatics. 2026; 13(6):84. https://doi.org/10.3390/informatics13060084
Chicago/Turabian StyleJagadish, Arakere Nagarajappa, Ravikumar Manjunath, and Indrakumar Krishnamurthy. 2026. "Cascaded Dual Stage U-Net with Texture-Aware Feature Fusion for Unified Segmentation and Classification in Echo-Cardiogram Images" Informatics 13, no. 6: 84. https://doi.org/10.3390/informatics13060084
APA StyleJagadish, A. N., Manjunath, R., & Krishnamurthy, I. (2026). Cascaded Dual Stage U-Net with Texture-Aware Feature Fusion for Unified Segmentation and Classification in Echo-Cardiogram Images. Informatics, 13(6), 84. https://doi.org/10.3390/informatics13060084

