Lightweight Mamba Model for 3D Tumor Segmentation in Automated Breast Ultrasounds

Kim, JongNam; Kim, Jun; Dharejo, Fayaz Ali; Abbas, Zeeshan; Lee, Seung Won

doi:10.3390/math13162553

Open AccessArticle

Lightweight Mamba Model for 3D Tumor Segmentation in Automated Breast Ultrasounds

by

JongNam Kim

¹

,

Jun Kim

¹

,

Fayaz Ali Dharejo

²

,

Zeeshan Abbas

^3,4,*

and

Seung Won Lee

^3,4,5,6,7,*

¹

Department of Metabiohealth, Institute for Cross-Disciplinary Studies, Sungkyunkwan University, Suwon 16419, Republic of Korea

²

Computer Vision Lab, CAIDAS & IFI, University of Wurzburg, 97070 Wurzburg, Germany

³

Department of Precision Medicine, School of Medicine, Sungkyunkwan University, Suwon 16419, Republic of Korea

⁴

Department of Artificial Intelligence, Sungkyunkwan University, Suwon 16419, Republic of Korea

⁵

Department of Metabiohealth, Sungkyunkwan University, Suwon 16419, Republic of Korea

⁶

Personalized Cancer Immunotherapy Research Center, School of Medicine, Sungkyunkwan University, Suwon 16419, Republic of Korea

⁷

Department of Family Medicine, Kangbuk Samsung Hospital, School of Medicine, Sungkyunkwan University, 29 Saemunan-ro, Jongno-gu, Seoul 03181, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Mathematics 2025, 13(16), 2553; https://doi.org/10.3390/math13162553

Submission received: 10 July 2025 / Revised: 4 August 2025 / Accepted: 6 August 2025 / Published: 9 August 2025

(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Artificial Intelligence, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Background: Recently, the adoption of AI-based technologies has been accelerating in the field of medical image analysis. For the early diagnosis and treatment planning of breast cancer, Automated Breast Ultrasound (ABUS) has emerged as a safe and non-invasive imaging method, especially for women with dense breasts. However, the increasing computational cost due to the minute size and complexity of 3D ABUS data remains a major challenge. Methods: In this study, we propose a novel model based on the Mamba state–space model architecture for 3D tumor segmentation in ABUS images. The model uses Mamba blocks to effectively capture the volumetric spatial features of tumors, and integrates a deep spatial pyramid pooling (DASPP) module to extract multiscale contextual information from lesions of different sizes. Results: On the TDSC-2023 ABUS dataset, the proposed model achieved a Dice Similarity Coefficient (DSC) of 0.8062, and Intersection over Union (IoU) of 0.6831, using only 3.08 million parameters. Conclusions: These results show that the proposed model improves the performance of tumor segmentation in ABUS, offering both diagnostic precision and computational efficiency. The reduced computational space suggests a strong potential for real-world medical applications, where accurate early diagnosis can reduce costs and improve patient survival.

Keywords:

state–space model; ABUS; image segmentation; image augmentation; 3D tumor segmentation; deep learning; Mamba architecture

MSC:

68U10; 92C55

1. Introduction

Breast cancer is one of the most commonly diagnosed malignancies among women worldwide, with approximately 2.3 million new cases reported in 2022 alone [1]. It is estimated that 1 in 20 women will be diagnosed with breast cancer during their lifetime. In 2020, breast cancer accounted for approximately 685,000 deaths among women globally, representing about 16% of all female cancer-related deaths. The disease imposes a substantial burden not only due to its high incidence but also in terms of mortality and disability-adjusted life years (DALYs) [2]. These figures are projected to continue rising through 2040 [3,4].

A comprehensive analysis of data from 204 countries between 1990 and 2021 revealed that disparities in early detection and access to treatment between high-income and low-income countries have significantly contributed to a widening gap in breast cancer mortality rates [5]. Particularly in low-resource settings, delayed diagnosis and insufficient treatment infrastructure remain major challenges. Furthermore, a recent global assessment focusing on women of reproductive age (15–49 years) reported a 66.8% increase in breast cancer-related DALYs over the past three decades, underscoring the disease’s profound impact on social and economic productivity [6].

Breast cancer is one of the most representative malignancies for which early detection can significantly improve survival rates [7]. Especially when identified during the subclinical phase—before lesions become clinically evident—early diagnosis can reduce the invasiveness of treatment and dramatically improve patient prognosis [8]. Beyond its clinical utility, early detection is increasingly recognized as a public health priority. Phased implementation strategies for early detection have been shown to significantly reduce breast cancer mortality, particularly in low- and middle-income countries [9]. However, in women with dense breast tissue, conventional mammography alone may have limitations in early-stage detection. Evidence suggests that supplemental ultrasound can serve as an effective adjunct diagnostic tool in these cases [10,11]. Accordingly, establishing technical and institutional infrastructure to support early detection should be considered a policy priority in healthcare planning [12].

Recent advancements in medical imaging technology have led to the adoption of three-dimensional automated breast ultrasound (3D ABUS) systems, which have been reported to significantly improve lesion detection, particularly in women with dense breast tissue [13]. Compared to conventional handheld ultrasound (HHUS), 3D ABUS enables standardized image acquisition, thereby reducing inter-observer variability and enhancing diagnostic reproducibility [14]. Alongside hardware developments, artificial intelligence (AI)-based techniques for automatic lesion segmentation in 3D ABUS images have also progressed rapidly [15,16]. Early studies primarily relied on traditional machine learning methods utilizing handcrafted features [17]. However, with advances in deep learning technologies, particularly the introduction of U-Net-based architectures, segmentation performance has improved substantially [18,19]. More recently, transformer-based architectures and lightweight feature fusion strategies have further improved the performance in medical image segmentation tasks [20,21,22]. Despite these achievements, current deep learning-based segmentation models for 3D ABUS often require a large number of parameters to achieve high accuracy. This results in increased computational costs and poses challenges for practical deployment in clinical settings [23,24]. Consequently, there is a growing demand for lightweight models that can significantly reduce model complexity while maintaining or even enhancing segmentation performance.

The main contributions of this study are summarized as follows:

The proposed model, LightSegMamba, was designed to deliver high segmentation performance while reducing the number of parameters by approximately 95.43% compared to the original SegMamba.
To effectively learn multi-scale contextual information, we designed a Deeper Atrous Spatial Pyramid Pooling (DASPP) module that applies atrous depthwise separable convolutions in parallel.
To capture structural dependencies along the three anatomical directions (axial, coronal, and sagittal) in ABUS images, we incorporated a Tri-Oriented Mamba (ToM) module.
Furthermore, skip connections between the encoder and decoder were employed to preserve low-level features such as tumor boundaries and locations, while integrating them with high-level semantic information for precise segmentation.

Ultimately, the proposed model demonstrates superior segmentation performance with significantly reduced computational cost compared to conventional 3D segmentation models and other complex architectures. By enabling accurate localization of breast cancer lesions from 3D ABUS images, the model reduces reliance on high-cost diagnostic modalities such as MRI, offering a cost-effective and scalable solution that can contribute to reducing overall healthcare expenditures.

2. Related Works

2.1. ABUS Image Analysis

Automated 3D Breast Ultrasound (ABUS) is gaining traction as an adjunctive imaging modality for breast cancer detection in dense breasts. However, the complexity and time-consuming interpretation of large amounts of three-dimensional image data has become a challenge. To address these issues, there is a growing need for computer-aided diagnosis (CAD) systems.

Zhou Yue et al. [25] developed a longitudinal integrated model of V-Net and 3D Mask R-CNN using a Cross-Model Attention Mechanism and a hybrid loss function (Dice Loss + Focal Loss) technique. The dataset comprised 170 3D ABUS images (with expert-validated segmentation mask) from 107 patients. The results achieved a DSC of 64.57%, an IoU of 53.39%, a recall of 64.43%, and a precision of 74.51%.

Cao et al. [18] developed a D²U-Net model with Uncertainty Focus Loss applied to a U-Net structure combining Dilated Convolution and Densely Connected Blocks. A total of 170 volumes of 3D ABUS images from 107 patients were segmented into training and test datasets. The analysis yielded a DSC of 69.02% and an IoU of 56.61%.

Fayyaz et al. [26] developed Dual-Path U-Net, a sophisticated deep learning model that incorporates dual paths to enhance the feature extraction and segmentation accuracy of complex 3D ABUS images. The study utilized a dataset comprising 50 masses, including 38 malignant and 12 benign cases. The Dual-Path U-Net demonstrated an average Dice coefficient of 0.82, indicating a substantial degree of success in the segmentation process.

Zhang et al. [27] developed a novel automatic tumor segmentation method called SC-FCN-BLSTM. This method integrates a bidirectional long short-term memory (BLSTM) and spatial-channel attention (SC-attention) module in a fully convolutional neural network (FCN) to capture continuity between slices. The proposed methodology was validated on a private ABUS dataset consisting of 124 patients in 170 volumes with 3636 2D labeled slices, achieving a Dice similarity coefficient (DSC) of 0.8178, recall of 0.8067, and precision of 0.8292.

2.2. State–Space Model

A state–space model (SSM) is a mathematical representation of the state and output of a system, and has been used primarily in the field of control theory. SSMs have recently gained traction in deep learning due to their ability to effectively handle long sequences of data while having linear computational complexity. The state–space model is represented as follows:

\begin{matrix} h^{'} (t) & = A h (t) + B x (t), \\ y (t) & = C h (t) \end{matrix}

(1)

where

h (t) \in R^{N}

,

x (t) \in R^{L}

, and

y (t) \in R^{L}

denote the latent state, input signal, and output signal, respectively. The matrix

A \in R^{N \times N}

is the state transition matrix,

B \in R^{N \times L}

is the input matrix, and

C \in R^{L \times N}

is the output matrix. To leverage continuous-time SSMs for deep learning, the discretized expression using the ZOH method is as follows:

\begin{matrix} h_{t} & = \bar{A} h_{t - 1} + \bar{B} x_{t}, \\ y_{t} & = C h_{t} \end{matrix}

(2)

where

\bar{A} = exp (Δ A)

and

\bar{B} = {(Δ A)}^{- 1} (exp (Δ A) - I) Δ B

and

Δ

denotes the discretization time step.

The recently proposed Mamba [28] is a state–space model that introduces a selective scan mechanism, which dynamically selects and compresses information at each point in the sequence. This effectively solved the problem of long-range dependencies while maintaining the linear computational complexity, which is an advantage of state–space models. The selective scan mechanism is represented as follows:

\begin{matrix} Δ, B, C & = Linear (x) \\ Δ & = softplus (Δ) \end{matrix}

(3)

where x is the input sequence, and Linear is the linear transformation that uses the input x to generate dynamic

Δ

, B, and C values. The softplus is the activation function defined as

softplus (x) = ln (1 + e^{x})

.

Following the development of the Mamba architecture, researchers have been actively applying Mamba to various medical imaging applications [29,30,31,32,33,34,35,36,37,38,39].

In particular, Wang et al. [38] proposed the S3-Mamba model, which combines a Mamba-based structure with an Enhanced Visual State Space (EnVSS) block designed to be sensitive to small lesions, and a Tensor-based Cross-feature Multi-scale Attention (TCMA) module that integrates input, mid-feature, and boundary information at multiple scales. Furthermore, a regularized curriculum learning strategy was implemented, with the focus being on large lesions in the early stages of learning, and with a gradual transition to smaller lesions. The proposed model was evaluated on three datasets, including ISIC-2018, CVC-ClinicDB, and a private dataset for lymph node lesions, and achieved good segmentation performance.

Zhang et al. [39] proposed an edge-interaction Mamba Network to precisely segment ambiguous boundary regions in brain tumor MRI by introducing an interaction mechanism between edge and interior information into the Mamba-based structure. In this study, we used the Brain Tumor Segmentation (BraTS) 2020,2021 dataset to segment Enhancing Tumor (ET), Tumor Core (TC), and Whole Tumor (WT) regions using multi-modal MRI images (T1, T1ce, T2, FLAIR). Experimentally, EiMamba-UNet achieved the following results in the BraTS2020 validation set: Enhancing Tumor (ET) based on Dice coefficient (DSC) 77.99%, Whole Tumor (WT) 91.07%, Tumor Core (TC) 84.49%, with an average of 84.52%, In the BraTS2021 training set, ET 86.18%, WT 93.74%, TC 93.39%, with an average of 91.10% based on DSC, demonstrating both fine structure discrimination and boundary recognition precision.

As such, Mamba has recently gained traction in the field of medical image segmentation and classification due to its efficient structure that reduces the consumption of computing resources while maintaining high performance. Despite its excellent performance in analyzing medical images such as CT and MRI, it has not been well studied in the field of ABUS, where data are limited. To the best of our knowledge, this is the first study to apply the Mamba architecture to ABUS tumor segmentation. In this study, we propose a lightweight yet high-performance tumor segmentation model based on Mamba that leverages its powerful long-range dependency modeling capability to effectively capture the contextual information of tumors in ABUS images.

2.3. TDSC-ABUS Dataset

In this study, we utilized the TDSC-ABUS dataset (Tumor Detection, Segmentation, and Classification Dataset on Automated 3D Breast Ultrasound), which provides breast ultrasound image data for research purposes [40]. Figure 1 shows a sample image from the TDSC-2023 dataset. This dataset serves as a benchmark for developing algorithms related to tumor detection, segmentation, and classification based on 3D ABUS images. The dataset was collected from approximately 200 patient cases at Harbin Medical University Cancer Hospital and was released as part of the MICCAI 2023 Challenge and for further research use.

The dataset comprises high-resolution 3D images with dimensions ranging from 843 × 546 × 270 to 865 × 682 × 354 pixels. The in-plane spacing is 0.200 mm and 0.073 mm, and the inter-slice spacing is approximately 0.475674 mm, enabling voxel-level annotations. Each image was manually annotated by ten experienced radiologists using Seg3D software. The annotators were divided into two groups of five, with cross-review procedures implemented to ensure annotation reliability. Tumor boundaries were manually delineated, each case was labeled as either benign or malignant, and 3D bounding boxes were generated from the segmentation masks. All data were anonymized to fully remove any patient-identifiable information. In this study, we divided the dataset into training, validation, and testing subsets at a ratio of 70:20:10.

2.4. Pre-Processing

In this study, we performed tumor-centered 3D cropping and data augmentation. We cropped all 3D ABUS images to a minimum size of (128, 128, 128), centering them on the tumor. The cropping was executed using a box-shift crop. Figure 2 is a visualization of the box-shift crop that adjusts the center point if the crop box extends beyond the image range. This allowed the model to focus on the lesion while preserving as much of the original ABUS image as possible.

To improve the generalization performance and prevent overfitting, we applied a variety of data augmentation techniques to the training dataset. A total of 12 augmentation methods were employed, including spatial transformations (such as rotation and scaling), intensity-based adjustments (such as brightness, contrast, and gamma correction), noise addition, simulated resolution degradation, and mirroring. These techniques were randomly applied during training based on predefined probabilities, as summarized in Table 1. Figure 3 shows a visualization of each data augmentation technique.

The augmentation strategy was designed based on the SegMamba framework, which has demonstrated effectiveness in prior work. Our approach leverages a probability-based augmentation strategy that reflects diverse tumor sizes and imaging characteristics, a method proven in previous studies to enhance model robustness and generalizability [41,42].

3. Methods

3.1. Baseline Models

The 3D U-Net: U-Net [43] model is one of the most widely used models developed for the segmentation of two-dimensional medical images. To train on 3D medical images, traditional U-Net architectures often require slicing the volumetric data into 2D planes. However, this process inevitably leads to the loss of contextual spatial information from adjacent slices, which may hinder the model’s ability to effectively capture inter-slice correlations. The 3D U-Net [44] model addresses this limitation by employing 3D convolutions, allowing the network to directly process 3D inputs. The 3D U-Net architecture follows an encoder–decoder structure. In the encoder path, 3D convolution and 3D pooling operations are employed to progressively reduce the spatial resolution while extracting hierarchical features. In the decoder path, spatial resolution is gradually restored using 3D transposed convolutions. Skip connections between corresponding encoder and decoder layers are incorporated to mitigate information loss and preserve fine-grained spatial details.

The 3D DeepLab: The DeepLab [45] model is a segmentation model that uses a deep convolutional neural network (DeepCNN) as an encoder. When DeepCNN is utilized directly for segmentation, the boundary information of objects is lost as the spatial resolution is reduced due to frequent downsampling (strided convolution and pooling), which gives rise to the multi-scale problem of not being able to recognize objects of various sizes. The DeepLab model addresses the issue of multi-scale by introducing atrous convolution. Atrous convolution is a technique that adjusts the spacing between filters to obtain a wide receptive field without increasing parameters and computation. DeepLab is designed to effectively capture objects in various images by applying the Atrous Spatial Pyramid Pooling (ASPP) structure, which is composed of atrous convolution with various rates in parallel. DeepLab uses DeepCNN as an encoder to extract feature maps and incrementally reconstruct them at the decoder. At the same time, the feature map is applied to the ASPP module, upsampled and combined with the decoder intermediate layer to effectively learn image boundary information, which successfully applies the DeepCNN structure to the segmentation task.

SegMamba: SegMamba [29] is a recently developed model for 3D medical image segmentation. Although Transformer-based deep learning models perform well in natural language processing, they have the disadvantage of high computational cost because the complexity of self-attention increases squarely with input size. The Mamba [28] model developed using a state–space model to compensate for the high computational cost problem is attracting attention in natural language processing while showing linear computational efficiency and memory efficiency. SegMamba is designed to effectively learn the global and local features of 3D images by combining Mamba and U-net structures. To effectively learn high-dimensional medical images, SegMamba develops an existing Mamba that models global dependence in one direction to design a Tri-Oriented Spatial Mamba (ToM) module to learn global structure information in the three directions of the x-, y-, and z-axes of the image. Figure 4 shows the ToM module. In addition, a Gated Spatial Convolution (GSC) module was designed to minimize spatial information lost in the one-dimensional flattening process for application to Mamba layers.

3.2. Proposed Model

In this study, a novel LightSegmamba is proposed, which is capable of effectively detecting breast tumours of different sizes in ABUS images while efficiently reducing parameters. The structure of the proposed model is shown in Figure 5.

The proposed model was designed as an encoder–decoder structure. The encoder consists of a stem and two DASPPMamba blocks. The stem layer consists of a convolution with a kernel size of

7 \times 7 \times 7

, padding of

3 \times 3 \times 3

, and stride of

2 \times 2 \times 2

. Given an input image of dimensions

D \times H \times W

, a feature map of size

48 \times \frac{D}{2} \times \frac{H}{2} \times \frac{W}{2}

is extracted from the stem layer. Subsequently, DASPPMamba and downsampling are repeatedly applied to extract the features of each layer. For each application, the extracted feature map is doubled and the spatial size is reduced two-fold. In the last 3D Conv of the encoder, the feature map of

192 \times \frac{D}{4} \times \frac{H}{4} \times \frac{W}{4}

is extracted by doubling the feature map. The decoder applies transposed convolution at each upsampling step to incrementally restore spatial resolution. Through skip connections, it combines with the corresponding feature maps from the encoder, preserving low-level spatial information, such as the tumor’s location and boundaries, while integrating high-level semantic features that reflect the tumor’s shape and characteristics. In the final output layer, a

1 \times 1 \times 1

convolution is performed to transform the restored feature map into a final segmentation mask.

3.2.1. DASPP Mamba

In this study, we implemented the DASPP Mamba structure for effective 3D ABUS image feature extraction. The DASPP Mamba structure is designed to effectively capture tumors of different sizes and tumor outline information through the 3D DASPP module, and the ToM module in the SegMamba structure, which scans the image in the forward, backward, and inter-slice directions, is adopted to enable the model to efficiently learn the structural information of the tumor. In addition, skip connections are introduced to minimize input feature loss and increase learning stability. Finally, we enriched the features recovered from Mamba with MLPs composed of multilayer 3D convolutions. Figure 4 shows the ToM module and Figure 6 shows the 3D DASPP module.

3.2.2. 3D DASPP

To enable Mamba to effectively capture tumor features and tumor outer information of various sizes, we incorporated DASPP modules. Figure 6 shows the DASPP module applied to the model. The proposed Deeper Atrous Spatial Pyramid Pooling (DASPP) module consists of several core components to effectively learn various spatial features in three-dimensional ABUS images. First, the global average pooling (GAP) method is applied to effectively capture global contextual information. Then,

1 \times 1 \times 1

DWS Conv is performed to extract global features. Secondly,

1 \times 1 \times 1

DWS Conv is used to extract fine-grained local information while maintaining spatial resolution. Thirdly,

3 \times 3 \times 3

DWS Conv with dilation rates of 3, 6, and 9 are applied in parallel to ensure a large receptive field and integrate contextual information at different scales. Fourthly, skip connections are introduced to preserve the original feature information of the input without loss and stabilize the learning process. The feature maps of different scales extracted from each parallel path are finally combined, and the number of channels is adjusted through

1 \times 1 \times 1

DWS Conv to preserve the original features of the input through the skip connection. The various spatial features extracted by the DASPP module are then passed to the Tri-Oriented Spatial Mamba (ToM) module, which deepens the structural information in the three-dimensional image along the forward, backward, and inter-slice directions.

4. Results

4.1. Implementation Details

Our model was implemented using PyTorch 2.5.0 and CUDA 12.4. The GPU was a NVIDIA GeForce RTX 3090 Ti. For training, we apply a random crop of size 128 × 128 × 128 for each dataset and train with a batch size of 2. In all experiments, we used a binary cross-entropy (BCE) loss function, a polynomial learning rate scheduler, and an SGD optimizer with Nesterov momentum. The data were randomly divided into 70% training, 20% validation, and 10% testing for baseline models comparison and ablation studies, and 80% training, 10% validation, and 10% testing for Mamba models comparison.

4.1.1. Loss Function

Binary cross-entropy (BCE) is widely used in binary classification problems as a loss function that measures the difference between a model’s predicted probability and the true label. In the field of deep learning-based medical image analysis, it plays an important role in evaluating the performance of a model. The formula follows:

BCE = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} log (p_{i}) + (1 - y_{i}) log (1 - p_{i})]

(4)

where N denotes the total number of samples,

y_{i}

represents the ground truth label of the sample, and

p_{i}

signifies the probability that the sample belongs to the positive class.

4.1.2. Optimizer

We trained the model using Stochastic Gradient Descent (SGD) with Nesterov momentum. The parameter update is given by:

\begin{matrix} v_{t + 1} & = μ v_{t} - η \nabla L (θ_{t} + μ v_{t}), \\ θ_{t + 1} & = θ_{t} + v_{t + 1} \end{matrix}

(5)

where

θ_{t}

denotes the parameter at the current time,

v_{t}

represents the momentum term,

η

is the learning rate,

μ

is the momentum coefficient, and

\nabla L (\cdot)

is the gradient of the loss function.

We also employed a polynomial learning rate scheduler:

η_{t} = η_{0} {(1 - \frac{t}{T})}^{power}

(6)

where T is the total number of epochs and “power” is the power of the decay curve. Polynomial decay gradually decreases the learning rate, allowing for rapid exploration with a relatively large learning rate at the beginning and gradually decreasing the learning rate as learning progresses, leading to stable convergence.

4.2. Evaluation Metric

We used various metrics to evaluate the performance of the proposed model to effectively segment tumors in ABUS images. In particular, we focused on the Dice coefficient (DSC) for accurate segmentation of tumors, and additionally utilized metrics such as intersection over union (IoU), pixel-based precision, and recall. We also compared the number of parameters to evaluate the efficiency of the model.

4.2.1. Dice Coefficient (DSC)

Dice is the most frequently used evaluation method in image segmentation. It measures the similarity of the model’s predicted results to the ground truth, with values between 0 and 1, with higher values indicating better model performance. The formula follows:

DSC = \frac{2 | A \cap B |}{| A | + | B |}

(7)

where A is the model’s predicted area and B is the ground truth area.

4.2.2. Intersection over Union (IoU)

The intersection over union (IoU), also called the Jaccard index, is a popular metric used to evaluate accuracy in object detection and image segmentation tasks. The IoU is similar to the Dice coefficient, but is calculated differently and generally provides a more rigorous evaluation than the Dice coefficient, especially since it uses the union as the denominator. The formula follows:

IoU = \frac{| A \cap B |}{| A \cup B |}

(8)

where A is the model’s predicted area and B is the ground truth area.

4.2.3. Precision

Precision is a measure of the percentage of samples that the model predicts are positive that are actually positive. It has a value between 0 and 1, with values closer to 1 indicating that the positivity predicted by the model is accurate. In our study, we calculated the per-voxel precision. The formula follows:

Precision = \frac{T P}{T P + F P}

(9)

where TP represents True Positive, and FP represents False Positive.

4.2.4. Recall

Recall, also known as sensitivity, is a measure of the proportion of samples that the model predicts to be positive out of the true positive samples. It has a value between 0 and 1, with values closer to 1 indicating that the model is better at finding true positives. In our study, we calculated the recall per voxel. The formula follows:

Recall = \frac{T P}{T P + F N}

(10)

where TP represents True Positive and FN represents False Negative.

4.3. Comparison with Base Model

In this study, we compared and evaluated the performance of seven models for ABUS medical image segmentation—the baseline model, its adapted models, and the SegMamba model—and analyzed their performance based on DSC, IoU, precision, recall, and the number of model parameters. Table 2 shows the performance on the test data.

SegMamba performed the second best with DSC (0.7898) and IoU (0.6633). This means that the ability to effectively learn the global dependence of images in the state–space model performs very strongly in complex 3D image segmentation problems.

On the other hand, 3D U-Net has the lowest performance with DSC (0.7007) and IoU (0.5478), which means that the traditional UNet-based model lacks the ability to identify the features of the image in 3D image segmentation. The model with SEBlock added to 3D U-Net had slightly improved performance with DSC (0.7607) and IoU (0.6245) despite having lower parameters compared to 3D U-Net, but it was still not enough to achieve high performance.

SegResNet [42], a model developed for 3D MRI segmentation, performed well on DSC (0.7728) and IoU (0.6434), but performed slightly worse on all metrics compared to SegMamba. This shows that the Mamba structure can better understand and learn the relationship between the tumor and surrounding tissues, allowing for more precise segmentation within complex 3D structures.

The DeepLab model applied ResNet [47]-based encoders of different depths to analyze the difference in performance. The Deeplab model based on ResNet152 showed lower performance, such as DSC (0.6798) and IoU (0.5281), even though it has about 2.4-times the number of parameters as the SegMamba model. This is interpreted as the generalization performance of deep models such as ResNet152 decreasing as the number of parameters increases. Of note is the comparison between the ResNet152-based model and the ResNet50-based DeepLab model. Despite ResNet152 having a deeper network, the model based on ResNet50 performed better. This shows that the negative performance impact of frequent downsampling in DeepCNNs is still present in ABUS images. Deeplab based on Resnet34 had the lowest depth, but showed good segmentation performance for DSC (0.7624) and IoU (0.6299). This shows that in segmentation tasks, the depth or number of parameters in a model does not necessarily lead to better performance, but rather the structure of the model is important to ensure that it can extract important information as it learns and generalizes.

The proposed LightSegMamba model demonstrated the most optimal segmentation performance among all models examined, exhibiting a DSC of (0.7985) and an IoU of (0.6724). Compared to the traditional state–space model-based SegMamba, the results show improvement in all metrics. In particular, the model parameters were reduced by 95.43% (from 67.36M to 3.08M). This demonstrated the efficacy of integrating DASPP and ToM in both lightweight and accuracy.

LightSegMamba is also able to recognize not only the center of the tumor, but also the surrounding thin bulges in great detail. As shown in Figure 7, LightSegMamba tends to predict more conservatively than SegMamba. Consequently, this approach yielded relatively higher precision. This can be attributed to LightSegMamba’s DASPPMamba module, which allows for a wider receptive field and more accurate boundary prediction.

As illustrated in Figure 7, the results of a qualitative evaluation of tumor masks predicted by different segmentation models on ABUS images are compared to the ground truth (GT). The following data exemplifies the findings observed in four patients. Each column of data represents, respectively, the original image, the actual mask, and the results of different segmentation models. The 3D U-Net model effectively captured the main structure of the tumor. However, the predicted mask often included unnecessary regions or showed blurred boundaries around the tumor. It also missed features in complex structures, as in the example in row 3, and frequently over-segmented background tissue into the tumor. While SegResNet captured tumors well overall, it performed somewhat poorly on complex patterns, as shown in rows 2 and 3 of the example, and there is still room for improvement on fine region segmentation. The DeepLab series of models demonstrated performance variations contingent on the backbone (ResNet34, ResNet50, ResNet152) and output stride. ResNet50 (os = 16) and ResNet152 (os = 16) segmented the core of the tumor relatively accurately, but tended to capture complex boundaries or thinly spread distal regions poorly. Some results showed a tendency for tumors to be split larger than they actually were. Overall, the SegMamba and LightSegMamba models reliably detect tumor regions. In particular, they accurately captured fine protrusions at the tumor boundary, showing fine shape preservation even with complex boundary structures.

4.4. Comparison with Mamba Model

We evaluated the performance of the LightSegMamba model by conducting a comparison experiment with Mamba-family models. To enhance the learning of various patterns, we partitioned the dataset in an 80:10:10 ratio for training and evaluation, and verified the balance between lightweight and performance of the proposed model. For comparison, LightSegMamba, SegMamba, nnMamba [36], and LightM-UNet [37] models were used and evaluated by the number of DSC, IoU, precision, recall, and model parameters. Table 3 shows the performance on the test data, using the 80:10:10 ratio.

LightSegMamba achieved the best performance in DSC (0.8062) and IoU (0.6831), while demonstrating impressive efficiency with only 3.08M parameters. SegMamba also performs well in DSC (0.8022) and IoU (0.6775), but is relatively resource intensive with 67.36M parameters. Both models applied the Tri-Oriented Mamba (ToM) module to effectively integrate various tumor patterns and spatial information in three dimensions to maintain good segmentation performance.

nnMamba is a Mamba model for 3D medical image segmentation with 15.94M parameters and was trained with a learning rate of 0.001 for stable convergence and optimal performance. This model achieved the highest value for recall (0.8922), indicating its strength in terms of sensitivity, but performed relatively poorly on DSC and IoU. On the other hand, LightM-UNet, a very lightweight Mamba model, performed somewhat lower than the other models in terms of overall segmentation accuracy DSC (0.6710) despite having fewer parameters of 0.38M. Overall, LightSegMamba successfully achieved a balance between lightweight and high performance, demonstrating that effective 3D tumor segmentation is possible even under limited computational resources.

5. Ablation Study

To evaluate the effectiveness of each component of the LightSegMamba architecture, we performed an ablation study focusing on the DASPP module, the Mamba (ToM) module, and the feature extraction stage. The structural diagram of the DASPP block-removed model is shown in Figure 8, and Figure 9 shows the architecture in which the Mamba (ToM) block is removed. Table 4 shows the results of removing the DASPP module or the Mamba (ToM) module, and Table 5 shows the learning results by stage.

As demonstrated in Table 4, the elimination of the DASPP block results in a reduction in the DSC to 0.7654 and the IoU to 0.6331, which is a substantial decrease. The model that has been modified by the removal of the ToM block demonstrates a DSC of 0.7835 and an IoU of 0.6525, indicating a decline in performance for both truncation studies. The relatively minor decline in performance observed upon removing the DASPP block indicates that the DASPP block plays a pivotal role in enhancing segmentation accuracy.

To find the optimal structure of the model, we trained it by gradually increasing the stage, where stage refers to the downsampling and DASPP Mamba blocks in Figure 5. Table 5 shows the variation in segmentation performance and the number of parameters for LightSegMamba with two, three, and four feature extraction stages. The two-stage structure has a DSC of 0.7985 and an IoU of 0.6724. With a total of 3.08M parameters, it is a very efficient structure. Scaling to a three-stage structure maintains performance at DSC 0.7958 and IoU 0.6712 but increases the number of parameters to 12.09M, about four times more. The four-stage structure’s performance drops slightly to DSC 0.7924 and IoU 0.6662. Its number of parameters reaches 47.85M, significantly consuming computational resources.

The performance improvement tends to decrease as the number of parameters increases. In particular, the four-stage structure shows a performance decrease despite the rapid increase in computation, which may be an inefficient choice for real-world applications. In conclusion, the two-stage structure is evaluated as the optimal structure that satisfies both lightweight and accuracy. This shows that the proposed LightSegMamba has the potential to be effectively applied in clinical environments with limited computational resources or in edge device-based medical image analysis systems.

6. Discussion

6.1. Contributions

In this study, we propose the LightSegMamba model for tumor segmentation in Automated Breast Ultrasound (ABUS) images. Experimental results using recently released ABUS data show that the proposed model significantly reduces the number of parameters by about 95.43% (67.36M → 3.08M) compared to SegMamba, an existing high-performance Mamba-based model, while maintaining a high segmentation performance of DSC 0.8062 and IoU 0.6831, demonstrating excellent performance even under the difficulty of 3D ultrasound image segmentation and limited datasets. This suggests that the proposed model can be utilized in clinical settings with limited computing resources by balancing computational efficiency and accuracy.

In particular, the existing Mamba-based segmentation model has the limitation of not fully reflecting the volumetric structural information of the tumor because it scans the image in one direction. However, by introducing a three-way Mamba structure (TOM), the proposed LightSegMamba model is able to capture the morphological characteristics and boundary information of the tumor in a more three-dimensional way, which contributes to the overall performance improvement.

In addition, the combination with the DASPP module laid the foundation for effective segmentation of tumors of various sizes and shapes, and achieved a balanced model structure that satisfies both lightweight and accuracy, particularly important given the limited data available in the ABUS domain. In this respect, LightSegMamba is a meaningful example of the practical clinical applicability of ABUS imaging.

The significance of our model lies in its pioneering role in demonstrating the applicability of Mamba structures in ABUS images. This domain has not yet been significantly impacted by advancements in deep learning. Traditional CNN or Transformer-based models tended to perform poorly, or to have high computational complexity or memory requirements, but Mamba-based approaches provide a better balance between efficiency and performance.

6.2. Limitations and Future Directions

However, there are some limitations to this study. First, the dataset used consists of a total of 200 images, which were leveraged by dividing them into training, validation, and test sets. These limited data sizes may not fully reflect the complex tumor characteristics of 3D ultrasound images and may limit the generalization ability of the model. Second, due to computing resource constraints, the training was performed with a batch size of 2, which may have caused some limitations in training stability and optimization. Third, in this study, we did not consider isotropic voxels because isotropic voxel conversion tended to severely distort tumor images. Different resolutions for different axes may hinder model training and may hinder future integration between different modalities or utilization of other ABUS data. In future research, it is essential not only to enhance performance but also to improve the interpretability of the model through validation with a broader range of clinical data. Further studies should investigate the model’s applicability across diverse clinical scenarios, and additional optimization should be pursued to enable integration into real-time diagnostic support systems.

7. Conclusions

In this study, we propose a lightweight LightSegMamba model for tumor segmentation in automated breast ultrasound (ABUS) images. The proposed model is designed to integrate the three-way Mamba structure, ToM, and DASPP modules, to effectively capture 3D tumor structure and efficiently utilize multi-scale contextual information. Furthermore, the model lightweighting greatly improves its real-time applicability and computational resource efficiency. Despite the small number of parameters of 3.08M, LightSegMamba achieved excellent performance of DSC 0.8062, IoU 0.6831, precision 0.8032, and recall 0.8332 on the TDSC-2023 dataset. This study demonstrates that the Mamba-based model, which maintains high accuracy despite its lightweight structure, can be effectively utilized for tumor segmentation in ABUS images. This suggests significant promise for the future development of medical image analysis and healthcare artificial intelligence systems, which can contribute to real-time diagnostic support, clinical applicability, and improved diagnostic efficiency for medical staff. Subsequent efforts will encompass the validation of the model’s generalizability through the utilization of diverse real-world medical datasets. Additionally, the stability and reliability of the model for clinical implementation will be ensured.

Author Contributions

Conceptualization, model development, data collection, and original draft preparation, J.K. (JongNam Kim); experiments on baseline models and manuscript editing, J.K. (Jun Kim) and F.A.D.; supervision, review, final approval, and funding acquisition for the manuscript, Z.A. and S.W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the SungKyunKwan University and the BK21 FOUR (Graduate School Innovation) funded by the Ministry of Education (MOE, Korea) and National Research Foundation of Korea (NRF). This work was also supported by National Research Foundation (NRF) grants funded by the Ministry of Science and ICT (MSIT) and Ministry of Education (MOE), Republic of Korea (NRF[2021-R1-I1A2(059735)]; RS[2024-0040(5650)]; RS[2024-0044(0881)]; RS[2019-II19(0421)], and RS[2025-2544(3209)].

Data Availability Statement

The dataset used in this study is publicly available at https://tdsc-abus2023.grand-challenge.org/, accessed on 1 March 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ABUS	Automated Breast Ultrasound
DASPP	Dilated Atrous Spatial Pyramid Pooling
DSC	Dice Similarity Coefficient
IoU	Intersection over Union
DeepCNN	Deep Convolutional Neural Network
ToM	Tri-Oriented Spatial Mamba
DWS Conv	Depthwise Separable Convolution
GAP	Global Average Pooling

References

Kim, J.; Harper, A.; McCormack, V.; Sung, H.; Houssami, N.; Morgan, E.; Mutebi, M.; Garvey, G.; Soerjomataram, I.; Fidler-Benaoudia, M.M. Global patterns and trends in breast cancer incidence and mortality across 185 countries. Nat. Med. 2025, 31, 1154–1162. [Google Scholar] [CrossRef]
Huang, J.; Chan, P.S.; Lok, V.; Chen, X.; Ding, H.; Jin, Y.; Yuan, J.; Lao, X.q.; Zheng, Z.J.; Wong, M.C. Global incidence and mortality of breast cancer: A trend analysis. Aging 2021, 13, 5748. [Google Scholar] [CrossRef]
Arnold, M.; Morgan, E.; Rumgay, H.; Mafra, A.; Singh, D.; Laversanne, M.; Vignat, J.; Gralow, J.R.; Cardoso, F.; Siesling, S.; et al. Current and future burden of breast cancer: Global statistics for 2020 and 2040. Breast 2022, 66, 15–23. [Google Scholar] [CrossRef]
Hu, F.; Yang, H.; Qiu, L.; Wang, X.; Ren, Z.; Wei, S.; Zhou, H.; Chen, Y.; Hu, H. Innovation networks in the advanced medical equipment industry: Supporting regional digital health systems from a local–national perspective. Front. Public Health 2025, 13, 1635475. [Google Scholar] [CrossRef]
Sha, R.; Kong, X.m.; Li, X.y.; Wang, Y.b. Global burden of breast cancer and attributable risk factors in 204 countries and territories, from 1990 to 2021: Results from the Global Burden of Disease Study 2021. Biomark. Res. 2024, 12, 87. [Google Scholar] [CrossRef]
Cai, Y.; Dai, F.; Ye, Y.; Qian, J. The global burden of breast cancer among women of reproductive age: A comprehensive analysis. Sci. Rep. 2025, 15, 9347. [Google Scholar] [CrossRef]
Pang, J.; Ding, N.; Liu, X.; He, X.; Zhou, W.; Xie, H.; Feng, J.; Li, Y.; He, Y.; Wang, S.; et al. Prognostic value of the baseline systemic immune-inflammation index in HER2-positive metastatic breast cancer: Exploratory analysis of two prospective trials. Ann. Surg. Oncol. 2025, 32, 750–759. [Google Scholar] [CrossRef] [PubMed]
Gjesvik, J.; Moshina, N.; Lee, C.I.; Miglioretti, D.L.; Hofvind, S. Artificial intelligence algorithm for subclinical breast cancer detection. JAMA Netw. Open 2024, 7, e2437402. [Google Scholar] [CrossRef]
Ginsburg, O.; Yip, C.H.; Brooks, A.; Cabanes, A.; Caleffi, M.; Dunstan Yataco, J.A.; Gyawali, B.; McCormack, V.; McLaughlin de Anderson, M.; Mehrotra, R.; et al. Breast cancer early detection: A phased approach to implementation. Cancer 2020, 126, 2379–2393. [Google Scholar] [CrossRef] [PubMed]
Nothacker, M.; Duda, V.; Hahn, M.; Warm, M.; Degenhardt, F.; Madjar, H.; Weinbrenner, S.; Albert, U.S. Early detection of breast cancer: Benefits and risks of supplemental breast ultrasound in asymptomatic women with mammographically dense breast tissue. A systematic review. BMC Cancer 2009, 9, 335. [Google Scholar] [CrossRef] [PubMed]
Jiang, Z.; Chen, Z.; Xu, Y.; Li, H.; Li, Y.; Peng, L.; Shan, H.; Liu, X.; Wu, H.; Wu, L.; et al. Low-frequency ultrasound sensitive Piezo1 channels regulate keloid-related characteristics of fibroblasts. Adv. Sci. 2024, 11, 2305489. [Google Scholar] [CrossRef]
Richie, R.C.; Swanson, J.O. Breast cancer: A review of the literature. J. Insur. Med. 2003, 35, 85–101. [Google Scholar]
Huss, R. Digital Medicine: Bringing Digital Solutions to Medical Practice; Jenny Stanford Publishing: Singapore, 2023. [Google Scholar]
Hejduk, P.; Marcon, M.; Unkelbach, J.; Ciritsis, A.; Rossi, C.; Borkowski, K.; Boss, A. Fully automatic classification of automated breast ultrasound (ABUS) imaging according to BI-RADS using a deep convolutional neural network. Eur. Radiol. 2022, 32, 4868–4878. [Google Scholar] [CrossRef]
Chen, W.; Liu, Y.; Wang, C.; Zhu, J.; Li, G.; Liu, C.L.; Lin, L. Cross-Modal Causal Representation Learning for Radiology Report Generation. IEEE Trans. Image Process. 2025, 34, 2970–2985. [Google Scholar] [CrossRef] [PubMed]
Luan, S.; Yu, X.; Lei, S.; Ma, C.; Wang, X.; Xue, X.; Ding, Y.; Ma, T.; Zhu, B. Deep learning for fast super-resolution ultrasound microvessel imaging. Phys. Med. Biol. 2023, 68, 245023. [Google Scholar] [CrossRef] [PubMed]
Agarwal, R.; Diaz, O.; Lladó, X.; Gubern-Mérida, A.; Vilanova, J.C.; Martí, R. Lesion segmentation in automated 3D breast ultrasound: Volumetric analysis. Ultrason. Imaging 2018, 40, 97–112. [Google Scholar] [CrossRef]
Cao, X.; Chen, H.; Li, Y.; Peng, Y.; Wang, S.; Cheng, L. Dilated densely connected U-Net with uncertainty focus loss for 3D ABUS mass segmentation. Comput. Methods Programs Biomed. 2021, 209, 106313. [Google Scholar] [CrossRef]
Li, Z.; Jiang, S.; Xiang, F.; Li, C.; Li, S.; Gao, T.; He, K.; Chen, J.; Zhang, J.; Zhang, J. White patchy skin lesion classification using feature enhancement and interaction transformer module. Biomed. Signal Process. Control 2025, 107, 107819. [Google Scholar] [CrossRef]
Song, W.; Wang, X.; Guo, Y.; Li, S.; Xia, B.; Hao, A. Centerformer: A novel cluster center enhanced transformer for unconstrained dental plaque segmentation. IEEE Trans. Multimed. 2024, 26, 10965–10978. [Google Scholar] [CrossRef]
Yin, L.; Wang, L.; Lu, S.; Wang, R.; Ren, H.; AlSanad, A.; AlQahtani, S.A.; Yin, Z.; Li, X.; Zheng, W. AFBNet: A Lightweight Adaptive Feature Fusion Module for Super-Resolution Algorithms. CMES-Comput. Model. Eng. Sci. 2024, 140, 2315–2347. [Google Scholar] [CrossRef]
Wang, W.; Yuan, X.; Wu, X.; Liu, Y. Fast image dehazing method based on linear transformation. IEEE Trans. Multimed. 2017, 19, 1142–1155. [Google Scholar] [CrossRef]
Cao, X.; Chen, H.; Li, Y.; Peng, Y.; Zhou, Y.; Cheng, L.; Liu, T.; Shen, D. Auto-DenseUNet: Searchable neural network architecture for mass segmentation in 3D automated breast ultrasound. Med Image Anal. 2022, 82, 102589. [Google Scholar] [CrossRef]
Chen, X.; Jing, R. Video super resolution based on deformable 3D convolutional group fusion. Sci. Rep. 2025, 15, 9050. [Google Scholar] [CrossRef]
Zhou, Y.; Chen, H.; Li, Y.; Cao, X.; Wang, S.; Shen, D. Cross-model attention-guided tumor segmentation for 3D automated breast ultrasound (ABUS) images. IEEE J. Biomed. Health Informatics 2021, 26, 301–311. [Google Scholar] [CrossRef] [PubMed]
Fayyaz, H.; Kozegar, E.; Tan, T.; Soryani, M. Mass segmentation in automated 3-D Breast ultrasound using dual-path U-net. arXiv 2021, arXiv:2109.08330. [Google Scholar] [CrossRef]
Pan, P.; Chen, H.; Li, Y.; Cai, N.; Cheng, L.; Wang, S. Tumor segmentation in automated whole breast ultrasound using bidirectional LSTM neural network and attention mechanism. Ultrasonics 2021, 110, 106271. [Google Scholar] [CrossRef]
Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar] [CrossRef]
Xing, Z.; Ye, T.; Yang, Y.; Liu, G.; Zhu, L. Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Marrakesh, Morocco, 6–10 October 2024; Springer: Cham, Switzerland, 2024; pp. 578–588. [Google Scholar]
Zhang, M.; Yu, Y.; Jin, S.; Gu, L.; Ling, T.; Tao, X. VM-UNET-V2: Rethinking vision mamba UNet for medical image segmentation. In Proceedings of the International Symposium on Bioinformatics Research and Applications, Kunming, China, 19–21 July 2024; Springer: Singapore, 2024; pp. 335–346. [Google Scholar]
Wang, J.; Chen, J.; Chen, D.; Wu, J. LKM-UNet: Large kernel vision mamba unet for medical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Marrakesh, Morocco, 6–10 October 2024; Springer: Cham, Switzerland, 2024; pp. 360–370. [Google Scholar]
Wu, R.; Liu, Y.; Liang, P.; Chang, Q. H-vmunet: High-order vision mamba unet for medical image segmentation. Neurocomputing 2025, 624, 129447. [Google Scholar] [CrossRef]
Li, G.; Huang, Q.; Wang, W.; Liu, L. Selective and multi-scale fusion Mamba for medical image segmentation. Expert Syst. Appl. 2025, 261, 125518. [Google Scholar] [CrossRef]
Zhong, X.; Lu, G.; Li, H. Vision Mamba and xLSTM-UNet for medical image segmentation. Sci. Rep. 2025, 15, 8163. [Google Scholar] [CrossRef] [PubMed]
Liu, J.; Yang, H.; Zhou, H.Y.; Yu, L.; Liang, Y.; Yu, Y.; Zhang, S.; Zheng, H.; Wang, S. Swin-UMamba†: Adapting Mamba-based vision foundation models for medical image segmentation. IEEE Trans. Med. Imaging 2024. [Google Scholar] [CrossRef]
Gong, H.; Kang, L.; Wang, Y.; Wang, Y.; Wan, X.; Wu, X.; Li, H. nnmamba: 3D biomedical image segmentation, classification and landmark detection with state space model. In Proceedings of the ISBI, Houston, TX, USA, 14–17 April 2025. [Google Scholar]
Liao, W.; Zhu, Y.; Wang, X.; Pan, C.; Wang, Y.; Ma, L. LightM-UNet: Mamba Assists in Lightweight UNet for Medical Image Segmentation. arXix 2024, arXiv:2403.05246. [Google Scholar]
Wang, G.; Li, Y.; Chen, W.; Ding, M.; Cheah, W.P.; Qu, R.; Ren, J.; Shen, L. S³-Mamba: Small-Size-Sensitive Mamba for Lesion Segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 7655–7664. [Google Scholar]
Zhang, M.; Sun, Q.; Han, Y.; Zhang, J. Edge-interaction Mamba Network for MRI Brain Tumor Segmentation. In Proceedings of the ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; pp. 1–5. [Google Scholar]
Luo, G.; Xu, M.; Chen, H.; Liang, X.; Tao, X.; Ni, D.; Jeong, H.; Kim, C.; Stock, R.; Baumgartner, M.; et al. Tumor Detection, Segmentation and Classification Challenge on Automated 3D Breast Ultrasound: The TDSC-ABUS Challenge. arXiv 2025, arXiv:2501.15588. [Google Scholar]
Isensee, F.; Jäger, P.F.; Kohl, S.A.; Petersen, J.; Maier-Hein, K.H. Automated design of deep learning methods for biomedical image segmentation. arXiv 2019, arXiv:1904.08128. [Google Scholar]
Myronenko, A. 3D MRI brain tumor segmentation using autoencoder regularization. In Proceedings of the International MICCAI Brainlesion Workshop, Granada, Spain, 16 September 2018; Springer International Publishing: Cham, Switzerland, 2018; pp. 311–320. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, 17–21 October 2016; Proceedings, Part II 19. Springer: Cham, Switzerland, 2016; pp. 424–432. [Google Scholar]
Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar] [CrossRef]
Emara, T.; Abd El Munim, H.E.; Abbas, H.M. Liteseg: A novel lightweight convnet for semantic segmentation. In Proceedings of the 2019 Digital Image Computing: Techniques and Applications (DICTA), Perth, Australia, 2–4 December 2019; pp. 1–7. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]

Figure 1. TDSC-2023 image examples.

Figure 2. Visualization of box-shift crop. To preserve the original image as much as possible, the boundaries of the crop box that extend beyond the image are adjusted inward.

Figure 3. Visualization of data augmentation techniques (sliced examples).

Figure 4. A tri-oriented Mamba (ToM) module architecture is proposed in SegMamba [29]. In contrast to the conventional single-oriented Mamba, it is improved to effectively handle complex 3D medical images. The three-dimensional input features are processed in three directions (forward, reverse, and inter-slice) to efficiently model the long-range dependence of the entire volume.

Figure 5. Architecture of the proposed LightSegMamba. The input image is processed by the Stem block and subsequently downsampled using the DASPPMamba module to extract features at multiple scales. The extracted features are then integrated through upsampling and skip connections to yield the final segmentation outcome. The DASPPMamba module comprises three elements: DASPP, LayerNorm, and Tri-oriented Mamba (ToM).

Figure 6. Illustration of the DASPP module, which extends the 2D DASPP [46]. It utilizes 3D DWS Conv to extract multi-scale volume information features and integrate local and global contextual information. The module comprises a total of six paths, including

1 \times 1 \times 1

and

3 \times 3 \times 3

DWS Conv with varying rates and global average pooling, whose outputs are merged to produce the final output via

1 \times 1 \times 1

DWS Conv. Moreover, the initial input characteristics are maintained in the output through the implementation of skip connections.

Figure 6. Illustration of the DASPP module, which extends the 2D DASPP [46]. It utilizes 3D DWS Conv to extract multi-scale volume information features and integrate local and global contextual information. The module comprises a total of six paths, including

1 \times 1 \times 1

and

3 \times 3 \times 3

DWS Conv with varying rates and global average pooling, whose outputs are merged to produce the final output via

1 \times 1 \times 1

DWS Conv. Moreover, the initial input characteristics are maintained in the output through the implementation of skip connections.

Figure 7. Visual comparison model prediction and ground truth (GT). From left to right: Image, GT, 3D U-Net, SE+ 3D U-Net, SegResNet, DeepLabV3 (ResNet152, OS = 16), DeepLabV3 (ResNet50, OS = 16), DeepLabV3 (ResNet34, OS = 8), LightSegMamba, SegMamba.

Figure 8. Structure of the model by removing DASPP block.

Figure 9. Structure of the model by removing Mamba block.

Table 1. Summary of data augmentation transformations.

Transformation Type	Probability	Value/Setting	Function
Rotation	20%	±30°	Rotates image along X-, Y-, and Z-axes
Scaling	20%	0.7–1.4×	Adjusts image size (zoom in/out, with padding if shrinking)
Gaussian Noise Transform	10%	-	Adds Gaussian noise to the image
Gaussian Blur Transform	20%	0.5–1.0 sigma	Applies Gaussian blur
Brightness Multiplicative Transform	15%	0.75–1.25×	Adjusts image brightness
Contrast Augmentation Transform	15%	-	Adjusts image contrast
Simulate Low Resolution Transform	25%	0.5–1.0 zoom	Simulates low-resolution degradation
Gamma Invert Transform	10%	gamma 0.7–1.5	Applies per-channel gamma adjustment after image inversion
Gamma Transform	30%	gamma 0.7–1.5	Applies per-channel gamma adjustment
Mirror Transform	50%	-	Mirrors the image along X-, Y-, and Z-axes

Table 2. Performance comparison of segmentation models based on DSC, IoU, precision, recall, and the number of parameters. Bold values indicate the best performance for each metric.

Model	DSC	IoU	Precision	Recall	Parameters
SegMamba	0.7898	0.6633	0.7892	0.8283	67.36M
3D U-Net	0.7007	0.5478	0.6203	0.8527	4.77M
SegResNet	0.7728	0.6434	0.7786	0.8111	4.70M
SEBlock + 3D U-Net	0.7607	0.6245	0.8115	0.7565	1.40M
DeepLabV3 (ResNet34, OS = 8)	0.7624	0.6299	0.8110	0.7604	74.68M
DeepLabV3 (ResNet50, OS = 16)	0.6932	0.5388	0.6704	0.7585	90.00M
DeepLabV3 (ResNet152, OS = 16)	0.6798	0.5281	0.7380	0.6781	161.21M
LightSegMamba	0.7985	0.6724	0.7932	0.8336	3.08M

Table 3. Performance comparison of Mamba models based on DSC, IoU, precision, recall, and the number of parameters. Bold values indicate the best performance for each metric.

Mamba Model	DSC	IoU	Precision	Recall	Parameters
LightSegMamba	0.8062	0.6831	0.8032	0.8332	3.08M
SegMamba	0.8022	0.6775	0.8069	0.8232	67.36M
nnMamba	0.7481	0.6086	0.6683	0.8922	15.55M
LightM-UNet	0.6710	0.5225	0.6959	0.7046	1.87M

Table 4. Comparison results of DASPP block and Mamba block removal. The full model is shown in the first row. Bold values indicate the best performance for each metric.

Ablation Model	DSC	IoU	Precision	Recall	Parameters
Basic	0.7985	0.6724	0.7932	0.8336	3.08M
DASPP Removed	0.7654	0.6331	0.8076	0.8150	2.94M
ToM Removed	0.7835	0.6525	0.7738	0.8239	2.83M

Table 5. Comparison results according to feature extraction stage. Bold values indicate the best performance for each metric.

Model	DSC	IoU	Precision	Recall	Parameters
2-stage	0.7985	0.6724	0.7932	0.8336	3.08M
3-stage	0.7958	0.6712	0.8076	0.8150	12.09M
4-stage	0.7924	0.6662	0.8056	0.8128	47.85M

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, J.; Kim, J.; Dharejo, F.A.; Abbas, Z.; Lee, S.W. Lightweight Mamba Model for 3D Tumor Segmentation in Automated Breast Ultrasounds. Mathematics 2025, 13, 2553. https://doi.org/10.3390/math13162553

AMA Style

Kim J, Kim J, Dharejo FA, Abbas Z, Lee SW. Lightweight Mamba Model for 3D Tumor Segmentation in Automated Breast Ultrasounds. Mathematics. 2025; 13(16):2553. https://doi.org/10.3390/math13162553

Chicago/Turabian Style

Kim, JongNam, Jun Kim, Fayaz Ali Dharejo, Zeeshan Abbas, and Seung Won Lee. 2025. "Lightweight Mamba Model for 3D Tumor Segmentation in Automated Breast Ultrasounds" Mathematics 13, no. 16: 2553. https://doi.org/10.3390/math13162553

APA Style

Kim, J., Kim, J., Dharejo, F. A., Abbas, Z., & Lee, S. W. (2025). Lightweight Mamba Model for 3D Tumor Segmentation in Automated Breast Ultrasounds. Mathematics, 13(16), 2553. https://doi.org/10.3390/math13162553

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Lightweight Mamba Model for 3D Tumor Segmentation in Automated Breast Ultrasounds

Abstract

1. Introduction

2. Related Works

2.1. ABUS Image Analysis

2.2. State–Space Model

2.3. TDSC-ABUS Dataset

2.4. Pre-Processing

3. Methods

3.1. Baseline Models

3.2. Proposed Model

3.2.1. DASPP Mamba

3.2.2. 3D DASPP

4. Results

4.1. Implementation Details

4.1.1. Loss Function

4.1.2. Optimizer

4.2. Evaluation Metric

4.2.1. Dice Coefficient (DSC)

4.2.2. Intersection over Union (IoU)

4.2.3. Precision

4.2.4. Recall

4.3. Comparison with Base Model

4.4. Comparison with Mamba Model

5. Ablation Study

6. Discussion

6.1. Contributions

6.2. Limitations and Future Directions

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI