Previous Article in Journal
Angiovolume and Peak Enhancement on Preoperative CAD-Derived MRI as Prognostic Factors in Primary Operable Triple-Negative Breast Cancer
Previous Article in Special Issue
Quantitative Ultrasound Grayscale Analysis and Size of Benign and Malignant Solid Thyroid Nodules
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Clinically Focused Computer-Aided Diagnosis for Breast Cancer Using SE and CBAM with Multi-Head Attention

1
Department of Surgery, Elazig Fethi Sekin City Hospital, Elazig 23300, Turkey
2
Department of Software Engineering, Malatya Turgut Ozal University, Malatya 44210, Turkey
3
Department of Computer Engineering, Malatya Turgut Ozal University, Malatya 44210, Turkey
*
Author to whom correspondence should be addressed.
Tomography 2025, 11(12), 138; https://doi.org/10.3390/tomography11120138
Submission received: 24 October 2025 / Revised: 6 December 2025 / Accepted: 8 December 2025 / Published: 10 December 2025
(This article belongs to the Special Issue Imaging in Cancer Diagnosis)

Simple Summary

In this study, a deep learning-based model was developed that can classify benign, malignant, and normal breast tissues from ultrasound images with high accuracy, thus outperforming the methods commonly used in the literature. The model combines Squeeze-and-Excitation blocks, which emphasize channel-level feature importance, and Convolutional Block Attention Module attention mechanisms, which focus on spatial information, with the Multi-Head Attention structure to learn both local textural features and global contextual relationships effectively. In experimental analyses, the developed model achieved 96.03% accuracy on an ultrasound dataset and 99.55% accuracy on a second dataset of histopathological images.

Abstract

Background/Objectives: Breast cancer is one of the most common malignancies in women worldwide. Early diagnosis and accurate classification in breast cancer detection are among the most critical factors determining treatment success and patient survival. In this study, a deep learning-based model was developed that can classify benign, malignant, and normal breast tissues from ultrasound images with high accuracy and achieve better results than the methods commonly used in the literature. Methods: The proposed model was trained on a dataset of breast ultrasound images, and its classification performance was evaluated. The model is designed to effectively learn both local textural features and global contextual relationships by combining Squeeze-and-Excitation (SE) blocks, which emphasize channel-level feature importance, and Convolutional Block Attention Module (CBAM) attention mechanisms, which focus on spatial information, with the MHA structure. The model’s performance is compared with three commonly used convolutional neural networks (CNNs) and three Vision Transformer (ViT) architectures. Results: The developed model achieved an accuracy rate of 96.03% in experimental analyses, outperforming both the six compared models and similar studies in the literature. Additionally, the proposed model was tested on a second dataset consisting of histopathological images and achieved an average accuracy of 99.55%. The results demonstrate that the model can effectively learn meaningful spatial and contextual information from ultrasound data and distinguish different tissue types with high accuracy. Conclusions: This study demonstrates the potential of deep learning-based approaches in breast ultrasound-based computer-aided diagnostic systems, providing a reliable, fast, and accurate decision support tool for early diagnosis. The results obtained with the proposed model suggest that it can significantly contribute to patient management by improving diagnostic accuracy in clinical applications.

1. Introduction

Breast cancer remains one of the most prevalent malignancies among women and continues to represent a major global health concern [1,2,3]. Many systems have been developed for the early diagnosis of this disease, which affects millions of people worldwide [4,5]. The main goal of these systems is to classify diseases accurately and implement appropriate treatment methods in a timely manner [5]. Accurately classifying early diagnosis is crucial for improving treatment outcomes, survival, and quality of life [2].
Breast ultrasonography (USG), an imaging method used in the diagnosis of breast diseases, has many advantages and disadvantages [6]. These include differences in radiologist assessments, difficulties in evaluating certain lesions, and individual dependency [7]. The limitations of this imaging method can affect the sensitivity and specificity of diagnosis [6]. Breast USG, which is known to be operator-dependent, has also been questioned for its diagnostic reliability and effectiveness in treatment planning [8].
At this point, the need for decision-support systems that can assist clinicians in guiding treatment is increasing [9]. Artificial intelligence, which has become a part of our lives in recent years, is being used increasingly effectively in medical diagnosis and treatment [10,11]. Owing to these methods, more objective diagnoses and higher accuracy rates have been achieved using systems developed from patient datasets [12]. Deep learning-based systems have yielded more effective results than machine-learning models in breast cancer screening and diagnosis [13]. Although many studies in the literature distinguish between benign and malignant lesions in breast diseases, normal breast tissue is not included in this classification. Failure to distinguish this distinction may lead to a defect in the diagnosis. Mixing these textures paves the way for wrong decisions to be made [14].
This study presents a new deep learning model designed to classify breast ultrasound images into normal, benign, and malignant breast tissues for the identification of breast lesions. The potential contribution of this model to the diagnosis and treatment of breast diseases in the clinic was evaluated. Precise identification and classification of lesion characteristics before surgical treatment will significantly contribute to the diagnosis and treatment planning. Higher lesion recognition reduces the number of negative biopsies and unnecessary surgeries. This artificial intelligence-based system aims to improve the quality of treatment and the early and accurate diagnosis of patients with suspected breast cancer.
There are studies in the literature on the prevalence of breast cancer in women. Rahman et al. stated in their study that breast cancer is the most common invasive cancer in women. They also noted that breast cancer has the second-highest mortality rate worldwide. The study aimed to detect breast cancer using architectures such as U-Net and YOLO. In the study, the researchers performed a data multiplexing step after the preprocessing step. An accuracy rate of 93% was achieved in the study [15]. Shah et al. stated that breast cancer is one of the most dangerous and common cancer types. Therefore, the importance of detecting breast cancer using artificial intelligence-based methods was emphasized. The study favored the use of CNN architectures. In this study, an ensemble deep learning model was developed combining the outputs of EfficientNet, AlexNet, ResNet, and DenseNet. The model achieved an accuracy of 94.6% [16]. Kormpos et al. used transfer learning-based methods for automatic breast cancer detection. In this study, conducted on ultrasound images, results were obtained with different models. Different results were obtained from different architectures in the study. However, the highest accuracy achieved in the study was 93.1% with the NasNet-based architecture [17]. Wang et al. adopted a different approach, achieving high accuracy in classifying breast lesions using multimodal breast ultrasound data. In their study, the researchers reported that they used four different ultrasound images together for the first time and developed a reinforcement learning-based model for this purpose. The accuracy achieved in their study was 95.4% [18]. Rashid et al. used the ResNet architecture, a CNN-based architecture, for feature extraction. The features obtained with this architecture were optimized using Metaheuristic optimization algorithms, and the optimized feature maps were classified using an SVM classifier. The researchers achieved an accuracy of 94.4% with their proposed hybrid model [19].
The proposed SE-CBAM-MHA-based hybrid architecture offers high accuracy and clinical reliability in lesion classification of breast ultrasound images, thanks to the high discrimination provided by channel and spatial attention mechanisms, as well as the contextual awareness and interpretability advantages offered by global context information. The originality and contribution of the proposed model can be described as follows.
  • The proposed model offers a multi-layer hybrid structure by combining CNN-based local feature extraction with Transformer-based global context modeling.
  • Thanks to the SE and CBAM-based attention mechanism, which simultaneously calculates channel and spatial importance levels, small but critical morphological differences can be captured in low-contrast and heterogeneous ultrasound images.
  • The integration of the Multi-Head Attention layer modeled long-range relationships between the lesion and surrounding tissues, ensuring the inclusion of contextual information in the classification decision.
  • The use of Focal Loss in malignant classes with low sample sizes resulted in a significant increase in model sensitivity and F1 score in challenging examples.
  • By providing unique methodological contributions to contextual relationship modeling, which classical CNN models lack, and by emphasizing channel and spatial importance, a new framework for breast ultrasound image analysis has been introduced to the literature.
  • The developed model achieved 96.03% accuracy on the ultrasound image dataset and 99.55% accuracy on the histopathological image dataset. These values will make significant contributions to the literature in the classification of breast cancer ultrasound and histopathological images.
The second part of the article includes the materials and methods section, the third part presents the application results, and the fourth part presents the discussion section. The study concludes with a conclusion section.

2. Materials and Methods

2.1. Dataset

The dataset used in this study was obtained from the Kaggle platform [14]. It was stated that 600 female patients were screened during the creation of the dataset. These women were between the ages of 25 and 60. The images are in .png format, and data augmentation was performed on classes with low data counts before training the models. The classes in the dataset are Benign (437 images), Malignant (420 images), and Normal (399 images). Sample images from the dataset are presented in Figure 1.
Benign: Most benign breast lesions do not carry a risk of malignancy, but some require periodic follow-up because they can exhibit radiological features similar to malignant tissue. Fibroadenoma, fibrocystic changes, cystic structures, non-neoplastic changes, lipomatous and papillomatous structures are included in this group. Excision may be considered for suspicious lesions in patients with suspected increased risk of malignancy and lesions requiring follow-up.
Malignant: Most breast malignancies are the histopathological subtypes of invasive ductal carcinoma and invasive lobular carcinoma. On breast ultrasound, these lesions may exhibit findings such as irregular borders, hypoechoic heterogeneous echoes, acoustic shadowing, spiculated contours, and increased blood flow. Early diagnosis and accurate treatment planning are crucial for this group.
Normal: Breast imaging of patients in this group reveals normal glandular and fibro glandular parenchyma along with fatty tissue. Breast ultrasound reveals homogeneous and evenly distributed parenchymal tissue. Since this group does not have any pathological features, there is no need to follow up with Breast USG.

2.2. Proposed Model and Other Models Used in the Study

The SE-CBAM-MHA-based model developed in this study aims to automatically classify breast tissues obtained from ultrasound images into three different classes: benign, malignant, and normal. The resulting structure of the proposed model is shown in Figure 2.
The architecture of the proposed model is designed as a hybrid structure consisting of four primary components. These components include convolutional feature extraction layers, SE-CBAM blocks containing channel and spatial attention mechanisms, a multi-header attention (MHA) layer inspired by the Transformer architecture, and fully connected layers used for final classification. This structure significantly improves classification performance by capturing both local texture patterns and large-scale contextual relationships.
The normalization process, initially applied on a pixel-by-pixel basis, is given in Equation (1).
I n o r m x , y = I x , y μ σ  
Here, I x , y represents the original pixel value, and I n o r m x , y represents the normalized value.
The first convolutional layers of the model are used to extract low-level textural features. Equation (2) is used to calculate the convolution process. The feature maps obtained in this layer support the learning of meaningful representations in deeper layers.
F i , j ( k ) = m = 0 M 1 n = 0 N 1 I ( i + m ) ( j + n ) W m n ( k ) + b ( k )
Here, I represents the pixel value, F i , j ( k ) k represents the output feature for the kth filter, W ( k ) represents the filter kernel and b ( k ) represents the bias term.
The Squeeze-and-Excitation (SE) block, a key component of the model, summarizes the information carried by each channel using a global average pooling process and learns channel importance weights based on this information. First, the global compression process is defined by Equation (3).
z c = 1 H   .   W i = 1 H j = 1 W F c ( i , j )  
Here, z c represents the scalar value obtained as a result of global average pooling of the c-th channel, H represents the height of the feature map, W represents the width of the feature map, and F c ( i , j ) represents the activation value of the c-th channel at position (i,j).
The channel excitation process is then performed using Equation (4). In this step, the channel representations obtained by global average pooling are rescaled by passing them through two fully connected layers. The channel size is reduced in the first fully connected layer, while the second layer expands it back to its original size. The Sigmoid Linear Unit (SiLU) activation function is chosen for the nonlinear transformation between the two layers. The SiLU function is defined as shown in Equation (5). SiLU provides a smoother activation compared to ReLU, allowing low-value inputs to carry some information, thus contributing to the preservation of fine textural details, especially in low-contrast ultrasound images. The sigmoid function applied at the output generates an importance coefficient for each channel by compressing the channel weights to the 0–1 range.
s c = σ W 2 δ W 1 z  
s c is the importance coefficient calculated for the c-th channel, W 1 is the weight matrix of the first fully connected layer, W 2 is the weight matrix of the second fully connected layer, δ is the activation function, and σ represents the sigmoid function.
δ x = x ·   σ x = x 1 + e x  
δ x is the SiLU activation output, x is the input value and σ x is the sigmoid function.
The rescaling process is defined by Equation (6). In this step, the contribution of each channel is dynamically adjusted by applying the channel importance coefficients obtained in the previous steps to the original feature maps.
F c ^ = s c ·   F c  
Here F c ^ represents the output feature of the rescaled c-th channel and F c represents the feature map of the original c-th channel.
Through this process, each channel is dynamically strengthened or weakened based on the information it carries. Activation of channels with high discrimination specific to lesions in images within each class is increased, while channels containing unimportant or noise are suppressed. Thus, the model creates a stronger representation by focusing on fewer, yet more meaningful features, thereby improving classification performance.
Following the SE block, integrated CBAM also implements the spatial attention mechanism along with channel-based attention. The features extracted through averaging and max pooling are combined, as given by Equation (7).
M = A v g P o o l F ; M a x P o o l F  
The combined feature map, with a channel size of M, yields the average pooling result using AvgPool (F), while MaxPool (F) yields the maximum pooling result.
This stage provides both the average representation and the maximum representation as input to the spatial attention mechanism. The goal here is to make the model sensitive not only to the average structure but also to dominant local differences. This enables the more effective capture of critical regions, such as lesion borders, in ultrasound images.
The second step of the spatial attention mechanism within CBAM, attention map generation, is calculated as shown in Equation (8).
S = σ f 7 × 7 M  
S is the spatial attention map and f 7 × 7 is the 7 × 7 size of the convolution filter.
This stage determines which spatial regions in the image are more important. Clinically critical regions in ultrasound images, such as the irregular edges of malignant lesions, heterogeneous echo patterns, or micro calcification foci, receive higher importance coefficients through this attention map. Thus, the model focuses on diagnostically significant regions when making classification decisions, rather than focusing equally on the entire image.
The rescaled features are then calculated as expressed in Equation (9).
F = F S  
Here, F   represents the rescaled feature map with spatial attention applied, F represents the original feature map, and ⊗ represents the element-wise multiplication operation.
This process enables the more effective capture of distinctive anatomical features, such as microcalcifications, edge irregularities, or heterogeneous echo patterns surrounding the lesion. Thanks to CBAM, the model evaluates the location and structure of the lesion like that of a human observer, thereby making classification decisions more reliable.
A multi-headed attention (MHA) layer, inspired by the Transformer architecture, was integrated into the model to learn deep contextual relationships. While classical convolutional networks can successfully capture local features, they are insufficient in modeling long-range dependencies. The basic structure of the attention mechanism incorporated into the model is defined in Equation (10), where the input features are re-represented by three fundamental matrices. These fundamental matrices are defined as Query (Q), Key (K), and Value (V). Here, Q is the vector used by the model to query the relationship of a feature at a given location to other locations. K is a reference vector representing the features of each location and helps determine which locations are important for attention. V represents the information contained in the relevant location and is used to create the output by weighting it according to attention scores.
A t t e n t i o n ( Q , K , V ) = s o f t m a x Q K T d k   V  
Here, Q is the query matrix, K is the key matrix, V is the value matrix, and Attention(Q, K, V) is the resulting attention output.
Through this process, the model learns the relationship between each spatial location and other locations, and assigns weights to each feature based on its importance within the context. In terms of breast ultrasound images, this mechanism considers not only the local structure of the lesion but also its relationship with surrounding tissues.
The multi-head version is as given in Equation (11). Here, multiple query-key-value groups are calculated in parallel, and each head learns different relationships and feature subspaces.
M H A Q , K , V = h e a d 1 ; ; h e a d h W O
Here, h is the number of attention heads used, h e a d i   is the output of the i-th attention head, and W O is the output projection matrix.
Thanks to this structure, the model can make more accurate predictions by evaluating not only local features but also contextual information across the entire image.
The extracted feature maps are reduced to a single vector using the global average pooling process. This process is given by Equation (12). At this stage, the high-dimensional feature map is reduced to a one-dimensional vector by averaging over the spatial dimensions of each channel. Thus, the model’s output is transformed into a more compact and meaningful representation for the classifier layer.
g c = 1 H   .   W i = 1 H j = 1 W F c ( i , j )  
Here, g c is the scalar value obtained as a result of global average pooling for the c-th channel, H represents the height of the feature map, W represents the width of the feature map, and F c ( i , j ) represents the activation value of the c-th channel at position (i,j).
After this stage, the classification process is carried out using the SoftMax function, as given in Equation (13).
P y = c | x = e z c k = 1 C e z k  
P y = c | x is the probability that the input sample x belongs to the c-th class, C is the total number of classes, z c is the logit value of the c-th class obtained from the fully connected layer, and e is the base of the natural logarithm.
Using the Softmax function, the model’s output is converted into a normalized probability value between 0 and 1 for each class, with the sum of these probabilities equal to 1. This approach allows for the assessment of the model’s confidence level by grounding the decision-making process in classification problems on a probabilistic basis. The AdamW algorithm and cosine annealing learning rate planner were employed for model optimization, and the Focal Loss function was selected to mitigate class imbalance and the adverse impact of complex examples. The Focal Loss function is defined as in Equation (14).
L = α ( 1 p t )   γ   log ( p t )  
L is the Focal Loss value, α is the class weighting parameter, p t is the estimated probability value for the correct class, γ is the focusing parameter and log ( p t ) is the logarithm of the probability of the correct class. Focal Loss provides a more effective error signal than the classic Cross-Entropy loss by directing learning from easy examples to complex examples, thereby avoiding time wastage. Throughout the training process, the model was evaluated using basic classification metrics, including accuracy, precision, sensitivity, and F1-score. A confusion matrix was also used to analyze performance differences between classes better.
To objectively evaluate the model’s performance compared to the proposed architecture, pre-trained deep learning models commonly used in the literature—ConvNeXt-Tiny [20], ViT-B/16, ViT-B/32 [21], ResNet50 [22], EfficientNet-B0 [23], and DenseNet121 [24]—were retrained and tested using the fine-tuning method on the relevant dataset. Classification results obtained from these models were analyzed in terms of basic performance metrics, including accuracy, precision, sensitivity, and F1-score, and a detailed comparison was conducted with the proposed method. This comparison aims to quantify the performance improvements and unique contributions of the developed model compared to existing deep learning approaches.

3. Experimental Results

The results of this study, conducted for breast cancer detection, were obtained in the Python 3.12 environment. For use in the model, the images were first scaled to 224 × 224. AdamW was used as the optimization algorithm, and the learning rate was set to 3 × 10−4. The training and testing phases of the models were performed in the Google Colab environment using an NVIDIA T4 GPU. The results of the model developed for this process were compared with a total of six models accepted in the literature: three ViT-based and three CNN-based models. Accuracy, Precision, Recall, and F1-score metrics were used to evaluate the performance of the models. The data were split into 80% training and 20% testing sets for the proposed model and the models used for comparison. The proposed model was run for 500 epochs. The Training and Validation Accuracy lines and Training and Validation Loss curves obtained for the model developed for breast cancer detection are presented in Figure 3.
An examination of Figure 3 reveals that the model has a strong learning ability in terms of overall performance. The steady increase in accuracy rates throughout the training process indicates that the model effectively learned from the data. Furthermore, despite the high number of epochs, no significant drop in validation performance was observed, indicating that the model achieved stability. Training and validation losses, starting from high values at the beginning, decreased rapidly and dropped to very low levels from the 100th epoch onward. The performance evaluation metrics obtained in the proposed model are presented in Table 1 on a class basis.
When Table 1 is examined, it is observed that the highest F1-Score value of 96.43% was obtained in the Malignant class. To compare the performance of the proposed model, the performances of six different CNN and ViT architectures were also examined.
To compare the performance of the proposed model, results were obtained from the current CNN and ViT architectures. The performance metrics obtained for comparison are presented in Table 2.
Model performance metrics were evaluated to 2 decimal places. When Table 2 is examined, it is seen that the highest accuracy value was obtained in the proposed model with 96.03%. The model that comes closest to the proposed model is DenseNet121 with an accuracy value of 94.84%. The lowest accuracy value was obtained in the ConvNeXt-Tiny model with 91.67%.
The performance of the proposed model was also tested on a second dataset. The second dataset consists of benign and malignant histopathological images. The Breast Cancer Dataset contains 5000 images per class [25]. Our proposed model was run on this dataset with 100 epochs. Our model used k-fold cross-validation on this dataset. K-fold cross-validation divides the dataset into equal parts and tests the model on a different fold each time. K-fold cross-validation increases the reliability of performance metrics. This also prevents the model from overfitting on limited datasets. At this stage, the k-fold value is selected as 5. The 5-fold cross-validation process steps are shown in Figure 4.
The training and validation accuracy graphs of the developed model are shown in Figure 5, and the loss graph is shown in Figure 6.
Examining Figure 5 and Figure 6, the Accuracy curve is stable and high, while the Loss curve stabilizes at a similarly low loss rate. The confusion matrix obtained from our model trained via cross-validation is shown in Figure 7.
In the classification performed using histopathological images, our proposed model also achieved successful results on the second dataset. Examining the confusion matrix in Figure 7, it is seen that the proposed model correctly predicted 4970 of 5000 benign images. Similarly, the proposed model incorrectly predicted 15 of 5000 malignant images. The accuracy values for each fold are given in Table 3.
Table 3 shows the accuracy values for each fold after the cross-validation technique, followed by the mean accuracy value. The proposed model’s mean accuracy was 99.55%. Class-based performance metrics of the proposed model are presented in Table 4.
The average accuracy rate calculated at the end of cross-validation is 99.55, while the standard deviation value is ±0.05.

4. Discussion

Breast ultrasound is widely used because it does not expose the patient to radiation, is applicable even to patients at risk, and is superior to other imaging methods for evaluating dense breast parenchyma [26]. However, the variable results of breast ultrasound, depending on the operator, and its inadequacy in identifying some lesions are among its drawbacks [27]. The aim is to mitigate these negative aspects using clinical decision support suggestions provided by artificial intelligence-supported systems. This will contribute to more accurate decisions regarding diagnosis and treatment management [28].
High performance achieved in the model developed in our study will have many clinical benefits. Accurate rates for distinguishing between normal breast tissue, benign lesions, and malignant lesions will assist clinicians in follow-up decisions and medical or surgical treatment decisions. It offers many benefits, including reducing diagnostic biopsies, preventing unnecessary surgeries, diagnosing malignant patients before they reach advanced stages, and enabling early treatment planning. Many medicolegal situations can arise during the diagnosis and treatment processes of breast cancer. Artificial intelligence systems can be used in clinician decision-making processes and in multidisciplinary tumor boards where complex cases are evaluated [29]. Breast USG could be a new method for increasing the consistency between radiologists and surgeons regarding lesions that can be interpreted differently. The classification features of these models could be utilized when planning patients before surgical or medical treatment [30].
The use of artificial intelligence models for breast disease diagnosis has many benefits. In rural areas, where there are insufficient radiologists or clinicians, the availability of breast ultrasound contributes to early diagnosis, which is one of the most critical aspects of breast cancer. Artificial intelligence assisted imaging methods will help clinicians make accurate and early diagnoses in areas where breast specialists are scarce [31,32]. The use of datasets from densely populated urban areas in artificial intelligence models limits the generalizability of these data. Comprehensive datasets, along with the need to address ethical and data privacy issues, will further contribute to the development of artificial intelligence models for medical treatment [33]. The development of artificial intelligence modules for the diagnosis and treatment of breast diseases should incorporate changes in the diagnosis and treatment phases of patient groups of different ages and ethnicities. Datasets containing all clinical and radiological data used in the diagnosis phase will enhance the development and usability of these modules [34]. Such a transformation will lead to earlier diagnosis for many patients and treatment strategies that eliminate unnecessary methods. In this study, a hybrid model was developed for breast cancer detection. The developed model was tested on two datasets. Furthermore, the performance of the proposed model is compared with similar studies in the literature in Table 5.
An examination of Table 2, Table 3 and Table 5 reveals that the proposed model produces better results than similar methods and similar studies. Despite the high accuracy achieved in this study, the proposed hybrid model has some limitations. First, the model was trained on a public dataset consisting of only a single center and a limited number of breast ultrasound images. The histopathological image dataset used to test our model is also publicly available. The dataset’s single-center presence, inclusion of patients from the same population, and low number of rare cases in the dataset limit the generalizability of the study. Its applicability to demographics not included in this dataset should be questioned. The exclusion of demographic, clinical, and imaging data, as well as different imaging modalities, is a significant limitation in our study. Systems that combine demographic, clinical, and imaging modalities would make significant contributions to the diagnosis and treatment of breast cancer. Furthermore, although the image annotations in the dataset were based on expert opinions, labeling errors or class imbalances are potential factors that could affect model performance.
Future work aims to increase the generalization ability of the model by testing it on larger and more diverse datasets obtained from different institutions and devices. It is important to create a dataset by incorporating clinical and demographic information using metrics such as age, race, region, and blood values, taking into account factors that may cause the data to appear differently despite belonging to the same disease category. This will allow for more subjective model training and classification. Furthermore, by involving multiple centers, federated learning models will be developed to prevent data leakage between centers. With the increase in communication speeds between different regions and centers and the active adoption of 6G technology, future models aim to use quantum-based models for real-time classification, while the development of real-time artificial intelligence techniques is also considered if necessary during operation. Finally, developing methods to increase the explainability of the model is another goal.

5. Conclusions

In this study, a novel deep learning-based model was developed that integrates SE blocks, CBAM, and MHA mechanisms. The proposed model was developed to classify breast tissues as benign, malignant, or normal based on ultrasound images. The proposed model produced successful results by effectively capturing both local textural patterns and global contextual dependencies. The proposed model achieved a classification accuracy of 96.03%, outperforming similar studies in the literature, as well as commonly used CNN and ViT-based architectures, and state-of-the-art architectures. The obtained results highlight the potential of the developed model as a reliable decision support tool in breast cancer diagnosis. By significantly improving diagnostic accuracy and reducing human error, clinicians involved in breast disease management can achieve earlier and more reliable diagnoses. Furthermore, the proposed model can facilitate more precise treatment planning and has the potential to improve patient outcomes. Our future work will focus on expanding dataset diversity, incorporating multimodal imaging data, and validating the model in real-world clinical workflows to strengthen its applicability in clinical decision-making processes further.

Author Contributions

Conceptualization, Z.O., M.K. and M.Y.; methodology, Z.O., M.K. and M.Y.; software, M.K. and M.Y.; validation, Z.O., M.K. and M.Y.; formal analysis, Z.O., M.K. and M.Y.; investigation, Z.O., M.K. and M.Y.; resources, Z.O., M.K. and M.Y.; data curation, Z.O., M.K. and M.Y.; writing—original draft preparation, Z.O., M.K. and M.Y.; writing—review and editing, Z.O., M.K. and M.Y.; visualization, Z.O., M.K. and M.Y.; supervision, Z.O., M.K. and M.Y.; project administration, Z.O., M.K. and M.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

A publicly available dataset was used in the article.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

References

  1. Shang, C.; Xu, D. Epidemiology of Breast Cancer. Oncologie 2022, 24, 649–663. [Google Scholar] [CrossRef]
  2. Tang, Z.; Wei, Y.; Liang, Y.; Zhu, X.; Tang, J.; Sun, Y.; Zhuang, Q. Breast cancer burden among young women from 1990 to 2021: A global, regional, and national perspective. Eur. J. Cancer Prev. 2025, 34, 130–139. [Google Scholar] [CrossRef] [PubMed]
  3. Kim, J.; Harper, A.; McCormack, V.; Sung, H.; Houssami, N.; Morgan, E.; Mutebi, M.; Garvey, G.; Soerjomataram, I.; Fidler-Benaoudia, M.M. Global patterns and trends in breast cancer incidence and mortality across 185 countries. Nat. Med. 2025, 31, 1154–1162. [Google Scholar] [CrossRef] [PubMed]
  4. Wang, L. Early diagnosis of breast cancer. Sensors 2017, 17, 1572. [Google Scholar] [CrossRef]
  5. Ginsburg, O.; Yip, C.H.; Brooks, A.; Cabanes, A.; Caleffi, M.; Dunstan Yataco, J.A.; Gyawali, B.; McCormack, V.; McLaughlin de Anderson, M.; Mehrotra, R. Breast cancer early detection: A phased approach to implementation. Cancer 2020, 126, 2379–2393. [Google Scholar] [CrossRef]
  6. Sood, R.; Rositch, A.F.; Shakoor, D.; Ambinder, E.; Pool, K.-L.; Pollack, E.; Mollura, D.J.; Mullen, L.A.; Harvey, S.C. Ultrasound for breast cancer detection globally: A systematic review and meta-analysis. J. Glob. Oncol. 2019, 5, 1–17. [Google Scholar] [CrossRef]
  7. Abdullah, N.; Mesurolle, B.; El-Khoury, M.; Kao, E. Breast imaging reporting and data system lexicon for US: Interobserver agreement for assessment of breast masses. Radiology 2009, 252, 665–672. [Google Scholar] [CrossRef]
  8. Iacob, R.; Iacob, E.R.; Stoicescu, E.R.; Ghenciu, D.M.; Cocolea, D.M.; Constantinescu, A.; Ghenciu, L.A.; Manolescu, D.L. Evaluating the role of breast ultrasound in early detection of breast cancer in low-and middle-income countries: A comprehensive narrative review. Bioengineering 2024, 11, 262. [Google Scholar] [CrossRef]
  9. Van der Velden, B.H.; Kuijf, H.J.; Gilhuijs, K.G.; Viergever, M.A. Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Med. Image Anal. 2022, 79, 102470. [Google Scholar] [CrossRef]
  10. Khalid, A.; Mehmood, A.; Alabrah, A.; Alkhamees, B.F.; Amin, F.; AlSalman, H.; Choi, G.S. Breast cancer detection and prevention using machine learning. Diagnostics 2023, 13, 3113. [Google Scholar] [CrossRef]
  11. Jiang, X.; Hu, Z.; Wang, S.; Zhang, Y. Deep learning for medical image-based cancer diagnosis. Cancers 2023, 15, 3608. [Google Scholar] [CrossRef]
  12. Munir, K.; Elahi, H.; Ayub, A.; Frezza, F.; Rizzi, A. Cancer diagnosis using deep learning: A bibliographic review. Cancers 2019, 11, 1235. [Google Scholar] [CrossRef]
  13. Yala, A.; Lehman, C.; Schuster, T.; Portnoi, T.; Barzilay, R. A deep learning mammography-based model for improved breast cancer risk prediction. Radiology 2019, 292, 60–66. [Google Scholar] [CrossRef]
  14. Al-Dhabyani, W.; Gomaa, M.; Khaled, H.; Fahmy, A. Dataset of breast ultrasound images. Data Brief 2020, 28, 104863. [Google Scholar] [CrossRef]
  15. Rahman, M.M.; Jahangir, M.Z.B.; Rahman, A.; Akter, M.; Nasim, M.A.A.; Gupta, K.D.; George, R. Breast cancer detection and localizing the mass area using deep learning. Big Data Cogn. Comput. 2024, 8, 80. [Google Scholar] [CrossRef]
  16. Shah, D.; Khan, M.A.U.; Abrar, M.; Tahir, M. Optimizing breast cancer detection with an ensemble deep learning approach. Int. J. Intell. Syst. 2024, 2024, 5564649. [Google Scholar] [CrossRef]
  17. Kormpos, C.; Zantalis, F.; Katsoulis, S.; Koulouras, G. Evaluating Deep Learning Architectures for Breast Tumor Classification and Ultrasound Image Detection Using Transfer Learning. Big Data Cogn. Comput. 2025, 9, 111. [Google Scholar] [CrossRef]
  18. Wang, J.; Miao, J.; Yang, X.; Li, R.; Zhou, G.; Huang, Y.; Lin, Z.; Xue, W.; Jia, X.; Zhou, J. Auto-weighting for breast cancer classification in multimodal ultrasound. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Lima, Peru, 4–8 October 2020; pp. 190–199. [Google Scholar]
  19. Rashid, T.A.; Majidpour, J.; Thinakaran, R.; Batumalay, M.; Dewi, D.A.; Hassan, B.A.; Dadgar, H.; Arabi, H. NSGA-II-DL: Metaheuristic optimal feature selection with deep learning framework for HER2 classification in breast cancer. IEEE Access 2024, 12, 38885–38898. [Google Scholar] [CrossRef]
  20. Khalfaoui-Hassani, I.; Pellegrini, T.; Masquelier, T. Dilated convolution with learnable spacings. arXiv 2021, arXiv:2112.03740. [Google Scholar]
  21. Beal, J.; Kim, E.; Tzeng, E.; Park, D.H.; Zhai, A.; Kislyuk, D. Toward transformer-based object detection. arXiv 2020, arXiv:2012.09958. [Google Scholar] [CrossRef]
  22. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
  23. Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
  24. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
  25. Walid, D. Breast Cancer Dataset. Available online: https://www.kaggle.com/datasets/djaidwalid/breast-cancer-dataset/data (accessed on 26 November 2025).
  26. Berg, W.A.; Bandos, A.I.; Mendelson, E.B.; Lehrer, D.; Jong, R.A.; Pisano, E.D. Ultrasound as the primary screening test for breast cancer: Analysis from ACRIN 6666. J. Natl. Cancer Inst. 2016, 108, djv367. [Google Scholar] [CrossRef]
  27. Chan, Y.S.; Hung, W.K.; Yuen, L.W.; Chan, H.Y.Y.; Chu, C.W.W.; Cheung, P.S.Y. Comparison of Characteristics of Breast Cancer Detected through Different Imaging Modalities in a Large Cohort of Hong Kong Chinese Women: Implication of Imaging Choice on Upcoming Local Screening Program. Breast J. 2022, 2022, 3882936. [Google Scholar] [CrossRef]
  28. Alowais, S.A.; Alghamdi, S.S.; Alsuhebany, N.; Alqahtani, T.; Alshaya, A.I.; Almohareb, S.N.; Aldairem, A.; Alrashed, M.; Bin Saleh, K.; Badreldin, H.A. Revolutionizing healthcare: The role of artificial intelligence in clinical practice. BMC Med. Educ. 2023, 23, 689. [Google Scholar] [CrossRef] [PubMed]
  29. Zabaleta, J.; Aguinagalde, B.; Lopez, I.; Fernandez-Monge, A.; Lizarbe, J.A.; Mainer, M.; Ferrer-Bonsoms, J.A.; De Assas, M. Utility of artificial intelligence for decision making in thoracic multidisciplinary tumor boards. J. Clin. Med. 2025, 14, 399. [Google Scholar] [CrossRef] [PubMed]
  30. Shamir, S.B.; Sasson, A.L.; Margolies, L.R.; Mendelson, D.S. New frontiers in breast cancer imaging: The rise of AI. Bioengineering 2024, 11, 451. [Google Scholar] [CrossRef] [PubMed]
  31. Roadevin, C.; Hill, H. AI interventions in cancer screening: Balancing equity and cost-effectiveness. J. Med. Ethics 2025. [Google Scholar] [CrossRef]
  32. Dankwa-Mullan, I. Health equity and ethical considerations in using artificial intelligence in public health and medicine. Prev. Chronic Dis. 2024, 21, E64. [Google Scholar] [CrossRef]
  33. Arora, A.; Alderman, J.E.; Palmer, J.; Ganapathi, S.; Laws, E.; Mccradden, M.D.; Oakden-Rayner, L.; Pfohl, S.R.; Ghassemi, M.; Mckay, F. The value of standards for health datasets in artificial intelligence-based applications. Nat. Med. 2023, 29, 2929–2938. [Google Scholar] [CrossRef]
  34. You, C.; Shen, Y.; Sun, S.; Zhou, J.; Li, J.; Su, G.; Michalopoulou, E.; Peng, W.; Gu, Y.; Guo, W. Artificial intelligence in breast imaging: Current situation and clinical challenges. In Exploration; Wiley Online Library: Hoboken, NJ, USA, 2023; p. 20230007. [Google Scholar]
Figure 1. Dataset sample images for every classes.
Figure 1. Dataset sample images for every classes.
Tomography 11 00138 g001
Figure 2. Proposed method.
Figure 2. Proposed method.
Tomography 11 00138 g002
Figure 3. Accuracy and Loss Curves.
Figure 3. Accuracy and Loss Curves.
Tomography 11 00138 g003aTomography 11 00138 g003b
Figure 4. The 5-fold cross-validation process steps.
Figure 4. The 5-fold cross-validation process steps.
Tomography 11 00138 g004
Figure 5. Best fold accuracy curves for training and validation of proposed model.
Figure 5. Best fold accuracy curves for training and validation of proposed model.
Tomography 11 00138 g005
Figure 6. Best fold loss curves for training and validation of proposed model.
Figure 6. Best fold loss curves for training and validation of proposed model.
Tomography 11 00138 g006
Figure 7. Confusion matrix of proposed model for second dataset.
Figure 7. Confusion matrix of proposed model for second dataset.
Tomography 11 00138 g007
Table 1. Performance result.
Table 1. Performance result.
ClassesPrecisionRecallF1-Score
Benign98.8093.1895.91
Malignant96.4396.4396.43
Normal92.9498.7595.76
Table 2. Models performance comparison result (%).
Table 2. Models performance comparison result (%).
ModelAccuracyPrecisionRecallF1 Score
ConvNeXt-Tiny91.6792.0091.7791.71
ViT-B/1692.4692.6092.5992.53
ResNet5092.8693.0692.9492.91
ViT-B/3293.2593.3793.3893.29
EfficientNet-B093.6593.7393.7893.70
DenseNet12194.8494.9494.9994.85
Proposed Model96.0396.1596.0396.03
Table 3. Accuracy rate of the proposed model per fold for the second dataset.
Table 3. Accuracy rate of the proposed model per fold for the second dataset.
Fold 1Fold 2Fold 3Fold 4Fold 5Mean
Accuracy99.4899.7299.6199.5899.4299.55
Table 4. Cross validation performance metrics of the proposed model for the second dataset.
Table 4. Cross validation performance metrics of the proposed model for the second dataset.
ClassesPrecisionRecallF1-ScoreNumber of Images
Benign99.7099.4099.555000
Malignant99.4099.7099.555000
Table 5. Literature Review.
Table 5. Literature Review.
PaperYearMethodsPerformance
Rahman et al. [15] U-Net and YOLOAcc: 93%
Shah et al. [16]2024EfficientNet, AlexNet, ResNet and DenseNet based hybrid modelAcc: 94.6%
Kormpos et al. [17]2025NasNetAcc: 93.1%
Wang et al. [18]2020Reinforcement learning-based modelAcc: 95.4%
Rashid et al. [19]2024CNN and Metaheuristic based hybrid modelAcc: 94.4%
Proposed Model2025SE-CBAM-MHA based hybrid modelDataset1: Acc: 96.03%
Dataset2: Acc. 99.55%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ogut, Z.; Karaduman, M.; Yildirim, M. Clinically Focused Computer-Aided Diagnosis for Breast Cancer Using SE and CBAM with Multi-Head Attention. Tomography 2025, 11, 138. https://doi.org/10.3390/tomography11120138

AMA Style

Ogut Z, Karaduman M, Yildirim M. Clinically Focused Computer-Aided Diagnosis for Breast Cancer Using SE and CBAM with Multi-Head Attention. Tomography. 2025; 11(12):138. https://doi.org/10.3390/tomography11120138

Chicago/Turabian Style

Ogut, Zeki, Mucahit Karaduman, and Muhammed Yildirim. 2025. "Clinically Focused Computer-Aided Diagnosis for Breast Cancer Using SE and CBAM with Multi-Head Attention" Tomography 11, no. 12: 138. https://doi.org/10.3390/tomography11120138

APA Style

Ogut, Z., Karaduman, M., & Yildirim, M. (2025). Clinically Focused Computer-Aided Diagnosis for Breast Cancer Using SE and CBAM with Multi-Head Attention. Tomography, 11(12), 138. https://doi.org/10.3390/tomography11120138

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop