Next Article in Journal
Acoustic Survey for the Characterization of a Medieval Cave Church
Previous Article in Journal
Micro Plasma Lens for Intensity Enhancement in Fast Ignition Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

A Dual-Stage Multimodal Alignment Approach for Robust Breast Cancer Diagnosis via Visual–Textual Computing

by
Ramazan Ozgur Dogan
Department of Artificial Intelligence Engineering, Faculty of Computer and Information Sciences, Trabzon University, Trabzon 61080, Türkiye
Appl. Sci. 2026, 16(12), 5934; https://doi.org/10.3390/app16125934
Submission received: 19 May 2026 / Revised: 9 June 2026 / Accepted: 10 June 2026 / Published: 11 June 2026

Abstract

Manual classification of breast cancer is resource-intensive, slow, and subject to inter-observer variability, motivating automated deep learning solutions. Most current methods rely on unimodal imaging data and struggle with domain generalization (DG) across varied clinical environments. We propose a Dual-Stage Multimodal Alignment approach that integrates breast ultrasound (US) imagery with clinical text reports to improve diagnostic stability. The method proceeds in two stages: (1) Local Correlation Alignment (LCA), which aligns fine-grained visual features with textual embeddings to capture localized lesion attributes, and (2) Global Attention Alignment (GAA), which applies multi-head self-attention to the joint visual–textual sequence to encourage domain-invariant representations. We evaluate the approach on a harmonized, leakage-free repository of 6880 images aggregated from six public US datasets (BUS-CoT, BrEaST, BUS-BRA, BUS-UCLM, BLUI, BUSI) under three protocols: independent benchmarking on BUS-CoT, pooled cross-dataset evaluation, and zero-shot domain generalization on unseen unimodal target domains. On the BUS-CoT benchmark, the 198M-parameter model reaches 0.8177 accuracy and 0.8852 AUC, on par with the 7-billion-parameter Qwen2.5-VL-7B with chain-of-thought reasoning (0.8064 accuracy, 0.8354 AUC) while using roughly 1/35 the parameter count. In the pooled setting, it is competitive with single-domain state-of-the-art methods on individual subsets (e.g., 0.9576 AUC on BUSI, 0.8741 accuracy on BUS-BRA). Under zero-shot transfer without clinical text, per-domain AUC ranges from 0.7360 to 0.8060 across four unseen targets, providing a lower bound under cross-scanner shift. These results indicate that task-specific multimodal alignment can rival large vision-language models in breast US diagnosis at a fraction of the parameter count.
Keywords: breast cancer; deep learning; domain generalization; dual-stage alignment; multimodal learning; ultrasound imaging breast cancer; deep learning; domain generalization; dual-stage alignment; multimodal learning; ultrasound imaging

Share and Cite

MDPI and ACS Style

Dogan, R.O. A Dual-Stage Multimodal Alignment Approach for Robust Breast Cancer Diagnosis via Visual–Textual Computing. Appl. Sci. 2026, 16, 5934. https://doi.org/10.3390/app16125934

AMA Style

Dogan RO. A Dual-Stage Multimodal Alignment Approach for Robust Breast Cancer Diagnosis via Visual–Textual Computing. Applied Sciences. 2026; 16(12):5934. https://doi.org/10.3390/app16125934

Chicago/Turabian Style

Dogan, Ramazan Ozgur. 2026. "A Dual-Stage Multimodal Alignment Approach for Robust Breast Cancer Diagnosis via Visual–Textual Computing" Applied Sciences 16, no. 12: 5934. https://doi.org/10.3390/app16125934

APA Style

Dogan, R. O. (2026). A Dual-Stage Multimodal Alignment Approach for Robust Breast Cancer Diagnosis via Visual–Textual Computing. Applied Sciences, 16(12), 5934. https://doi.org/10.3390/app16125934

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop