1. Introduction
Breast tumors are a prevalent health concern for women that significantly impacts their well-being and lives. As a result, regular breast screening and diagnosis play a crucial role in formulating effective treatment plans and improving survival rates. Due to the flexibility and convenience of ultrasound imaging, it has become a conventional modality for breast tumor screening. In recent years, many deep learning methods based on ultrasound images have been proposed for breast tumor segmentation. However, complex ultrasound patterns continue to pose the following challenges: (1). blurred boundaries caused by low contrast between the foreground and background; (2). segmentation disruption due to speckle noise (as illustrated in
Figure 1).
The impressive non-linear learning capability has led to significant successes in medical image segmentation with Full Convolution Network (FCN) and U-Net [
1,
2]. Motivated by this, many deep learning approaches have emerged for segmenting breast tumors from ultrasound images. In 2018, Almajalid et al. [
3] were the first to systematically evaluate the impact of different FCN variants on breast tumor segmentation and achieve segmentation results that outperformed traditional methods. AAU-Net [
4] integrates a hybrid adaptive attention module instead of the conventional convolution block, enhancing feature extraction across diverse receptive fields. NU-Net [
5] utilizes sub-networks of varying depths with shared weights to attain robust representations of breast tumors.
Transformer has garnered scholarly attention for its attention mechanism and the complete elimination of convolution. Subsequent investigations [
6,
7,
8] have explored the integration of transformer structures in image recognition. Notably, ViT [
9] stands out as the pioneering work to apply a pure transformer to image classification, substantiating the viability of transformer architectures for computer vision tasks. In the realm of medical image segmentation, the efficacy of vision transformers has been substantiated by PVT-CASCADE [
10] and DuAT [
11]. DuAT proposes a Dual-Aggregation Transformer Network to address the challenge of capturing both global and local spatial features, while PVT-CASCADE introduces an innovative attention-based decoder leveraging the multi-stage feature representation of the vision transformer. These advancements underscore the transformative impact of transformer architectures in the medical image field.
To further address the issue of blurred boundaries caused by breast tumors, the following optimization strategies have been considered: expanding the receptive field and the attention mechanisms has been widely used. The dilated convolution operation is a commonly used strategy to expand the receptive field. For example, Hu et al. [
12] obtained the large receptive field of breast tumors by using dilated convolutions in deeper network layers. In terms of attention mechanisms, Lee et al. [
13] proposed a channel attention module to further improve the performance of U-Net for breast tumor segmentation. Yan et al. [
14] proposed an attention enhanced U-Net with hybrid dilated convolution, merging dilated convolutions with an attention mechanism. Although progress has been made by these methods, the optimization paradigm from fine to coarse granularity struggles to capture prominent object regions in deeper convolutional layers, where object regions and boundaries stand as two crucial distinguishing features between normal tissue and breast tumors. Thus, we propose an iteratively enhanced Boundary Refinement Module (BRM) based on a global map, emphasizing a pattern from coarse to fine granularity. Our motivation arises from clinical practice, where clinicians initially approximate the location of a breast tumor and then meticulously extract its silhouette mask based on local features. In the NSBR-Net model, we adopt a two-step approach: first predicting the coarse region and then implicitly modeling the boundaries using axial reverse attention. There are two advantages to this strategy, including better learning ability and improved generalization capability.
In ultrasound imaging, the inherent nature of speckle noise tends to degrade image quality and complicates the distinction between breast tissue and noise artifacts [
15], making accurate tumor detection more challenging. Moreover, speckle noise significantly impacts segmentation accuracy by propagating across various convolutional layers at different scales. Current methods primarily leverage the concept of deep supervision to develop refined networks [
16], exploring neighboring decisions to correct potential errors induced by speckle noise. However, we propose addressing noise influence from a more fundamental perspective by introducing “frequency”. In an intriguing experiment, we examined the network’s performance variation when eliminating high-frequency information (detail and noise) [
17] in deeper layers. We used the mainstream method UNet [
18] to evaluate the impact of high frequencies on breast tumor segmentation in the BUSI testset (ultrasound images, including both benign and malignant breast tumors) [
19]. Building upon [
20], we employed multiple pooling operations on the last two stages of the UNet architecture to filter out the high-frequency information and keep only the low-frequency information. As shown in
Figure 2, we observed a substantial improvement in model performance when the network solely contained low-frequency information, indicating that speckle noise within high-frequency information disrupts spatial consistency. To address this phenomenon, we introduced a Noise Suppression Module, decoupling high- and low-frequency information in feature maps and denoising the high-frequency components with a Gaussian filter. While following prior works’ principles, NSBR-Net also incorporates a deep supervision mechanism.
Our method, built upon a transformer-based encoder, incorporates BRM and NSM for breast tumor segmentation in ultrasound images. Its efficacy was validated through extensive experiments on a breast ultrasound dataset and resulted in highlight significant improvement over existing methods. Our contributions include the following:
We present a novel breast tumor segmentation framework, termed NSBR-Net. Unlike existing CNN-based methods, we adopt the pyramid vision transformer as an encoder to extract more robust features.
To support our framework, we introduce two simple modules. Specifically, NSM is utilized to suppress speckle noise within high-frequency information, while BRM performs boundary refinement based on coarse regions.
Comparative experiments juxtaposed with leading-edge medical image segmentation models demonstrate the superior efficacy of our method on a breast ultrasound dataset.
This paper is structured as follows:
Section 2 outlines the dataset, method (including NSM and BRM), loss function, and experimental settings (including evaluation protocols and implementation details).
Section 3 presents a qualitative and quantitative comparison of different methods and their corresponding analyses.
Section 4 discusses the results, limitations, and future research directions, while
Section 5 is dedicated to the conclusion.
4. Discussion
Breast cancer is a prevalent gynecological disease which poses a significant threat to women’s health. With the development of deep learning, intelligent analysis based on ultrasound imaging is increasingly widely used in clinical pre-screening and is becoming the mainstream trend. However, the segmentation of breast tumors suffers from the inherent limitations of ultrasound images, including the presence of speckle noise and the issue of indistinct boundaries in malignant tumors.
Speckle noise is inevitable in ultrasound images, causing strong interference during neural network training, thereby reducing the model’s generalization ability. Simultaneously, the blurred boundaries of malignant tumors also lead to decreased model accuracy, which is detrimental to intelligent diagnosis. To address these issues, we discussed how to effectively suppress speckle noise from a frequency perspective and designed a coarse-to-fine paradigm, namely NSBR-Net, which shows outstanding segmentation performance and brings new insights into ultrasound image analysis.
The proposed breast tumor segmentation model exhibited superior accuracy over competing algorithms, achieving mDice, mIoU, and MAE scores of 81.83%, 73.50%, and 3.55% on the BUSI dataset and 81.48%, 73.08%, and 1.74% on Dataset B, respectively. Compared to the state-of-the-art transformer-based method, there was a significant improvement of 3.67% on Dataset B and 2.30% on BUSI in mDice, in the testing of malignant tumors, which holds great significance for the clinical diagnosis of cancer. Moreover, as can be seen in
Figure 6, NSBR-Net showed no issues of false positives, unlike other segmentation models, which are obviously affected by speckle noise. We attribute this capability to our NSM module, which filters out noise information from coarse-grained feature mappings while preserving detailed boundary information.
Considering the significant individual differences in breast tumors (shown in
Figure 6), one of our future research directions will focus on developing appropriate data augmentation algorithms to expand the sample space and enhance the generalization capabilities of our model further. Since our model is designed for two-dimensional ultrasound images, another future direction is to extend the methodology to three-dimensional images. Furthermore, considering the requirement for real-time performance in clinical diagnostic assistance, enhancing the computational efficiency of the model is also a crucial direction for future research. Therefore, we plan to conduct in-depth evaluations and optimizations of the model’s computational performance in our future studies.
Compared to ultrasound, mammography is irreplaceable due to its ability to clearly detect tiny calcifications within breast tissue. Thus, mammography is also a commonly employed method for breast cancer screening, with a wealth of AI-related research conducted in this area, including machine learning methods [
39] and deep learning methods [
40]. Digital Breast Tomosynthesis (DBT), which significantly mitigates the issue of missed detections caused by overlapping breast fibroglandular tissue in mammography, has also emerged as a widely adopted new technology. By combining deep learning methods with other breast cancer screening techniques, our future endeavors aim to broaden the scope of our research, facilitating more comprehensive and accurate diagnostic tools for breast pathology assessment.