You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

5 December 2024

SCENet: Small Kernel Convolution with Effective Receptive Field Network for Brain Tumor Segmentation

,
,
and
1
College of Computer and Information Engineering, Xinjiang Agricultural University, Urumqi 830052, China
2
College of Information Science and Engineering, Hohai University, Nanjing 210098, China
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Applications of Computer Vision and Image Processing in Medicine

Abstract

Brain tumors are serious conditions, which can cause great trauma to patients, endangering their health and even leading to disability or death. Therefore, accurate preoperative diagnosis is particularly important. Accurate brain tumor segmentation based on deep learning plays an important role in the preoperative treatment planning process and has achieved good performance. However, one of the challenges involved is an insufficient ability to extract features with a large receptive field in encoder layers and guide the selection of deep semantic information in decoder layers. We propose small kernel convolution with an effective receptive field network (SCENet) based on UNet, which involves a small kernel convolution with effective receptive field shuffle module (SCER) and a channel spatial attention module (CSAM). The SCER module utilizes the inherent properties of stacking convolution to obtain effectively receptive fields and improve the features with a large receptive field extraction ability. CSAM of decoder layers can preserve more detailed features to capture clearer contours of the segmented image by calculating the weights of channels and spaces. An ASPP module is introduced to the bottleneck layer to enlarge the receptive field and can capture multi-scale detailed features. Furthermore, a large number of experiments were performed to evaluate the performance of our model on the BraTS2021 dataset. The SCENet achieved dice coefficient scores of 91.67%, 87.70%, and 83.35% for whole tumor (WT), tumor core (TC), and enhancing tumor (ET), respectively. The results show that the proposed model achieves the state-of-the-art performance compared with more than twelve benchmarks.

1. Introduction

A brain tumor is an area of injury or disease within the brain, which can range from small to large, from few to many, from relatively harmless to life threatening. Common intracranial tumors include gliomas, metastases, and meningiomas. Glioma is the most common primary central nervous system tumor [1]. According to the heterogeneity and biological behavior of tumor cells, as well as the presence or absence of vascular proliferation in the tumor body, gliomas are classified into four grades by the WHO. Grades 1–2 represent low-grade malignancy, while grades 3–4 indicate highly malignant tumors [2]. Traditional imaging diagnosis mainly grades gliomas based on their morphology, signal characteristics, degree of enhancement of tumor parenchyma, presence or absence of cystic necrosis, and degree of peritumoral edema [3]. Low-grade gliomas have uniform signals, clear boundaries, and mostly lack enhancement. The characteristics of high-grade brain tumors are typically associated with vasogenic edema, mass effect, and intravenous contrast enhancement [4]. Although there has been significant progress in surgery, radiation therapy, and chemotherapy in recent years, the prognosis still needs to be improved. The purpose of treatment is to remove or eliminate as many tumors as possible while minimizing damage to normal brain tissue, prolonging the patient’s life and their quality of life as much as possible. Therefore, segmenting the subregion of the tumor and determining the degree of invasion of the tumor on surrounding structures before surgery can be helpful for selecting treatment plans and formulating the best surgical plan [5]. Magnetic Resonance Imaging (MRI) [6] is currently the preferred method for brain tumor imaging due to its excellent soft tissue resolution, high signal-to-noise ratio, and multi-directional imaging. It usually includes four modalities, T1-Weighted (T1), T1-enhanced contrast (T1ce), T2-weighted (T2), and fluid-attenuated inversion recovery (FLAIR) [7], in the Brats 2021 dataset. The four modalities are usually input simultaneously into the program to assist in effective analysis, that is, multimodal brain tumor segmentation.
The simple procedure of manual and semi-automatic segmentation can be seen in Figure 1. Typically, radiologists begin by importing brain tumor imaging data. They then proceed to examine the data across the axial, sagittal, and coronal planes to scrutinize the image details. Ultimately, they utilize the region of interest (ROI) for calibration and measurement purposes. Compared to traditional manual and semi-automatic diagnostic methods used by radiologists, brain tumor segmentation methods based on artificial intelligence are characterized by fast segmentation speed, high diagnostic accuracy, and a reduced missed diagnosis rate. UNet [8] is widely used for medical image processing, which fully captures the feature information of images with a special symmetrical structure and uses efficient restoration techniques to segment the lesion site. In order to fully utilize the depth information, a 3D UNet that performs well in the field of medical image segmentation has been proposed by Özgün Çiçek [9] and has been extensively researched.
Figure 1. The simple procedure of manual and semi-automatic segmentation medical images.
The feature extraction in UNet is mainly undertaken by combining two convolutions, but convolution is characterized by a receptive field that is too small. This seems to be a common challenge; the insufficient extraction of features with a large receptive field can lead to suboptimal segmentation results. The attention mechanism can effectively supplement the insufficient extraction of two convolutional features in UNet. Recently, one of the most popular ideas in Unet or 3D Unet research has been that the attention mechanism is applied after the two convolutions to extract more image features of encoder blocks. Although some attention mechanisms improve the performance of brain tumor segmentation, further improvement is needed. To improve efficacy for brain tumor image segmentation, we proposed SCENet with an effective receptive field of encoder layers, and channels and spatial weights of decoder layers.
The main contributions of our work are as follows:
(1)
We proposed the Small Kernel Convolution with Effective Receptive Field Network (SCENet) based on UNet, which can effectively improve brain tumor segmentation performance.
(2)
We designed an SCER block to enhance the effectiveness and efficiency of extraction of features with effective, receptive filled-in encoder layers.
(3)
We used a CSAM attention mechanism to select the more important, detailed features of decoder layers.
(4)
The ASPP module was introduced to the bottleneck layer to enlarge the receptive field in order to capture more detailed features.
The rest of this paper is laid out as follows: The related work is described in Section 2. The materials and methods and details of SCENet are presented in Section 3. Subsequently, in Section 4, the comparison results and ablation experiments are presented and analyzed in detail. Section 5 introduces limitations and future perspectives. Finally, Section 6 reveals the conclusions.

3. Methodology

3.1. Network Architecture

The overall architecture of our method is depicted in Figure 2. On the left branch, Stage 1, Stage 2, Stage 3, and Stage 4 form the encoder, which is mainly responsible for extracting image features. In the right section, Decoder 1, Decoder 2, Decoder 3, and Decoder 4 gradually restore the detailed information of the image. The two convolutions, two SCER blocks and the downsampling block are included in each stage of encoder. The evolving normalization–activation layers (EVO Norm) [39] are used after each convolution, with kernel 3 × 3 × 3 and stride of 1. The SCER blocks take advantage of the small kernel convolution of effective receptive field to extract features from the fusion of local semantic information and features with a large receptive field. The layer normalization (LN) [40] and convolution with kernel 2 × 2 × 2 and stride of 2 are sequentially utilized as downsampling blocks of the encoder block.
Figure 2. An illustration of the proposed SCENet for brain tumor image segmentation.
The decoder contains two convolutions with kernel 3 × 3 × 3 and stride of 1, a CSAM block, a trilinear interpolation operation, and a convolution with kernel 1 × 1 × 1 and stride of 1. In the decoder, the convolution with kernel 1 × 1 × 1 and stride of 1 is used to improve the non-linear expression ability of the network. The EVO Norm is applied after it. A trilinear interpolation operation is used for gradually restoring image size by twice. To focus on important features and ignore unimportant ones, channel and spatial attention mechanisms, called CSAM, are also applied after the trilinear interpolation operation. The last operation of each decoder uses two convolutions with kernel 3 × 3 × 3 and stride of 1 to obtain segmentation details.
To further reduce the number of parameters, a 1 × 1 × 1 convolution and EVO Norm were added into the skip connections between the encoder and the corresponding decoder, which halved the number of channels. There are two CSAM modules and an Atrous Spatial Pyramid Pooling (ASPP) [41] in the bottle layer. ASPP with atrous rates 1, 2, 3, and 5 is applied to capture multi-scale contextual image information. At the end of the network, a convolution with kernel 1 × 1 × 1 and a stride of 1 serves as the classifier. The dimensions of the input image are 4 × 128 × 128 × 128, and the classifier layer maps deep dimensional features to 3 × 128 × 128 × 128.

3.2. Small Kernel Convolution with Effective Receptive Field Shuffle Module (SCER)

It is widely acknowledged that the larger the receptive field, the larger the range of the original image it can come into contact with, which also means that it may contain more features with a large receptive field and higher-level semantic features. On the other hand, a smaller receptive field indicates that its features tend to be more localized and detailed. In UNet, there are already two 3 × 3 × 3 convolutions, which lack a large receptive field that can extract features with a large receptive field. Convolutional operations are the fundamental units of UNet; it can extract effective features without additional operations. Increasing the number of stacking convolutional kernels to obtain a large receptive field is undoubtedly a fast and simple method. It is a fact that larger convolutional kernels bring greater receptive fields, but they also bring a larger number of parameters, such as 7 × 7 × 7. Karen Simonyan et al. [42] and Christian Szegedy et al. [43] suggested that the receptive field value of one convolution with kernel 5 × 5 × 5 is similar to a stack of two 3 × 3 × 3 convolution layers, and the receptive field value of one convolution with kernel 7 × 7 × 7 is close to a stack of three 3 × 3 × 3 convolution layers. However, the difference in their parameter amounts is huge. Taking 2D as an example, the parameters of one convolution with kernel 7 × 7 are 49C2, in which C is the channel number and the parameter count for the two 3 × 3 convolutional layers is only 27C2 [42]. To further reduce the number of parameters brought by the large receptive field, we introduced the idea of lightweight shuffleNetV2 [44], which mainly aims to improve the effect and reduce the number of parameters.
A diagram of the SCER block is shown in Figure 3. Firstly, in order to reduce the number of parameters in the network, the original image is evenly divided into two branches according to the channel. The operations of the upper branch are used to extract features with an effective receptive field, and the lower one is exploited to preserve the original semantic features, in which the stack of three 3 × 3 × 3 convolution layers is used to enlarge the receptive field. Next, channel shuffling operations are used to ensure the interaction of information between the two branches. The SCER block can be represented as follows:
Z = s h u f f l e ( c o n c a t ( s p l i t ( x , C 2 ) , C o n v 3 × 3 × 3 ( C o n v 3 × 3 × 3 ( C o n v 3 × 3 × 3 ( s p l i t ( x , C 2 ) ) ) ) ) )
where x denotes the input images. The split signifies split operations according to parameter C 2 , in which C 2 represents the division of images evenly according to the channel. Conv3×3×3 is defined as a convolution with kernel 3 × 3 × 3 and stride of 1. The shuffle is expressed as channel shuffling operations.
Figure 3. An illustration of building blocks of SCER Block.

3.3. Channel Spatial Attention Module (CSAM)

A diagram of the CSAM block is shown in Figure 4, which includes two parts: a channel attention and a spatial attention mechanism. The upper part is channel attention, and the lower part is the spatial attention mechanism. The channel attention involves global average pooling (GAP), convolution with kernel size 3 × 3 × 3, and the Sigmoid Activation Function, which enable the model to automatically learn the importance of each channel and adjust the representation of input data based on the contribution of each channel. The spatial attention incorporates average pooling (AP), max pooling (MP), convolution with kernel size 7 × 7 × 7 and Sigmoid Activation Function, which can help the model better focus on the features of different regions in the image to improve the representation ability and performance of the model. The CSAM is an attention module that combines channel attention and spatial attention mechanisms to improve the representation ability of image features, and it achieves global perception and the importance adjustment of image features by simultaneously focusing on feature information in both channel and spatial dimensions.
Figure 4. An illustration of building blocks of CSAM block.
Compared to the CBAM [45] attention mechanism, it has two inputs: one is the information transmitted by the encoder, and the other is the information transmitted by the previous layer in the decoder. The advantage of the CSAM module is that it can integrate shallow information transmitted by the encoder corresponding to the decoder with deep information transmitted from the previous layer, which can guide the selection of detailed features transmitted from the previous layer by spatial and channel attention mechanisms. The channel attention can be represented as follows:
Z c = S i g m o i d ( C o n v 3 × 3 × 3 ( G A P ( Z i n L + Z i n L 1 ) ) ) × Z i n L
where ZinL denotes the shallow information transmitted by the encoder corresponding to the decoder, ZinL−1 represents detailed features transmitted from the previous layer, GAP is defined as global average pooling, and Conv3×3×3 indicates convolution with kernel size 3 × 3 × 3. Sigmoid is Activation Function.
The spatial attention can be described as follows:
Z S = S i g m o i d ( C o n v 7 × 7 × 7 ( c o n c a t ( A P ( Z i n L + Z i n L 1 ) , M P Z i n L + Z i n L 1 ) ) ) × Z i n L 1
where AP denotes average pooling and MP represents max pooling; concat is concatenate operation.
Our proposed CSAM block can be depicted as follows:
Z = c o n c a t Z c + Z s ,     Z i n L 1
where concat denotes concatenate operation.

4. Experiments and Results

4.1. Datasets and Preprocessing

The BraTS dataset, synonymous with the Brain Tumor Segmentation Challenge, is a publicly available medical imaging resource that serves as a foundation for the research and development of algorithms aimed at segmenting brain tumors. It is a collection of MRI images from a multitude of patients with brain tumors, sourced from various medical centers. For the BraTS 2021 challenge, a significant number of cases were employed, 1251 for training and 219 for online validation [7,46,47], which has gained considerable traction. The dataset encompasses 1251 cases with ground truth annotations provided by certified neuroradiologists, while the ground truth for the 219 validation cases remains undisclosed to the public, with evaluation results accessible solely through the online validation process. Our approach involved the utilization of 80% of the 1251 training cases and 20% of the 1251 cases for offline validation. Additionally, we have submitted our model for evaluation on the Synapse platform, which can be accessed at https://www.synapse.org/#platform (accessed on 29 November 2024).
To facilitate the accurate segmentation of brain tumor images by our network, we commenced by integrating the BraTS 2021 dataset into our program during the preprocessing phase. We employed simpleITK [48] and MONAI for image processing, employing the Z-score normalization method to standardize each image. Subsequently, we minimized the background while ensuring the inclusion of the entire brain and randomly downsampled the image from an initial size of 240 × 240 × 155 to a reduced size of 128 × 128 × 128. We clipped all intensity values to the 1st and 99th percentiles of the non-zero voxel distribution within the volume. In our research, we implemented several data augmentation techniques, including channel rescaling within the range of 0.9 to 1.1, channel intensity shifting between −0.1 and 0.1, the addition of Gaussian noise with a mean of 0 and a standard deviation of 0.1, channel dropping, and a random flipping probability of 80% along each spatial axis. We employed a strategic training regimen for our model during the training phase, and following model optimization, we resized the images back to their original dimensions. Ultimately, we submitted our results to the official platform for assessment.

4.2. Implementation Details

Our network was implemented using Python 3.8.10 and PyTorch 1.11.0. For the training phase, we utilized a single NVIDIA RTX A5000 graphics card (PNY, Parsippany, NJ, USA), equipped with 24 GB of memory, in conjunction with an AMD EPYC 7551P processor (AMD, Santa Clara, CA, USA). As detailed in Table 1, we initiated the training with a learning rate of 1 × 104 and a batch size of 1. Throughout the training process, we employed the Ranger [49] optimizer to fine-tune our network. Additionally, we incorporated the standard Dice loss [50] into our network architecture. The dimensions for both input and output data were consistently set to 128 × 128 × 128.
Table 1. Model parameter configuration.

4.3. Evaluation Metrics

Quantitative and qualitative assessments were conducted using established evaluation metrics, including the Dice Similarity Coefficient (Dice) score and the Hausdorff distance (HD) [51,52]. The Dice score serves as a metric for the similarity between two sets. In the context of image segmentation, it quantifies the resemblance between the segmentation outcomes predicted by the network and the manual annotations, and is expressed as follows:
D i c e = 2 T P 2 T P + F P + F N
where TP, FP, and FN denote true-positive cases, false-positive cases, and false-negative cases, respectively. The Hausdorff distance (HD) signifies the maximum distance between the boundary of the predicted segmentation and the actual region boundary. A lower HD value indicates a smaller error in the predicted boundary segmentation, which corresponds to higher quality. The HD can be mathematically represented as follows:
D P , T = max { s u p t T   i n f p P d t , p ,   s u p p P   i n f t T d t , p }
where t and p represent the real region boundary and predicted segmentation region boundary, respectively. d(·) represents the distance between t and p. The sup denotes the supremum and the inf denotes the infimum.

4.4. Comparison with Other Methods

We compared twelve advanced models to evaluate the advantages of the proposed model. In the evaluation process, four indicators of whole tumor (WT), tumor core (TC), enhancing tumor (ET), and average dice value (AVG) are used to evaluate the level of results. The numbers of compared networks were 2, 2, 3, and 5 for 2024, 2023, 2022, and classic networks, respectively. We compared five classic networks, which are 3D UNet, Att-UNet, UNETR, TransBTS, and VT-UNet. There are eight architecture variants based on basic UNet, and four are structures based on Transformers. Table 2 and Figure 5 show the online validation results on BraTS2021. The WT, TC, ET, and AVG are 91.67%, 87.70%, 83.35%, and 87.57%, respectively.
Table 2. The online validation results for the comparison of different methods on BraTS 2021.
Figure 5. Comparison of the dice results of different segmentation methods.
The HD values, which are shown in Table 2 and Figure 6, are 5.34, 8.03, and 19.41 for the three tumor subregions (WT, TC, and ET), respectively. We compared the 3D U-Net as the baseline, and our WT, TC, ET, and average dice results increased by 3.65%, 11.53%, 7.15%, and 7.44%, respectively. The VT-UNet was compared with a pure Transformer, WT, TC, ET, and average dice results increased by 0.01%, 3.29%, 2.6%, and 1.96%, respectively. We compared UNETR, whose CNN + Transformer, our WT, TC, ET and average dice values increased by 0.78%, 3.97%, 2.42%, and 2.39%, respectively. The results show that our network dramatically improved for all targets. Although not all indicators are optimal, the average dice of the network is indeed the highest among all comparison results, which also demonstrates that the proposed network performs better in brain tumor segmentation tasks.
Figure 6. Comparison of HD results of different segmentation methods.
Figure 7 illustrates the visualization outcomes of the SCENet model when applied to the BraTS 2021 dataset, showcasing five randomly selected cases. The medical cases with identifiers 00631, 00446, 00586, 00618, and 00625 of Figure 7 were segmented by SCENet, respectively. The combined green, yellow, and red regions, the intersection of red and yellow, and the yellow regions correspond to whole tumor (WT), tumor core (TC), and enhancing tumor (ET), respectively. Generally speaking, SCENet’s segmentation results closely match the labeled ground truth. When compared to networks based on 3D UNet or Transformer, as presented in Table 2, our model’s performance is superior. Additionally, our network outperforms other UNet-based architectures. Collectively, our architecture and its constituent modules have achieved outstanding results on the BraTS 2021 dataset, laying a solid foundation for future research endeavors.
Figure 7. Visualization results of medical cases. The union of green, yellow and red, the union of red and yellow, and the yellow labels represent WT, TC, and ET, respectively.

4.5. Ablation Experiments

4.5.1. Ablation Study of Each Module in SCENet

We conducted ablation experiments to verify the effect of different modules in this architecture. Experiment (Expt) A represents the baseline without any modules. The SCER module, CSAM attention mechanism, and ASPP module are used in Experiments B, C, and D, respectively. The different combinations of SCER module, CSAM attention mechanism, and ASPP module are tested in Experiments E, F, and G, respectively.
The WT value of the combination with SCER and CSAM is the highest, which can be seen in Figure 8 and Table 3. From the results, it can be seen that the effect is not significant when using CSAM or ASPP alone, but there is a significant improvement when both are used in the network. This indicates that increasing the multi-scale receptive field in the bottleneck layer is beneficial, but without effective feature selection after input to the decoder, it is difficult to achieve an improved performance. Empirically, we used the SCER, CSAM, and ASPP; the average dice is the best, and the average dice result is 87.57%. The best results show that our network and all the modules can be effectively applied to brain tumor segmentation tasks.
Figure 8. The result of ablation study of each module in SCENet.
Table 3. The result of ablation study of each module in SCENet. The symbol "√" indicates that it has been selected for use in the network.

4.5.2. Ablation Study of the Number of Stacking Convolution Layers and Kernel Size in the SCER Module

To verify the effectiveness of replacing a convolution of kernel size 7 × 7 × 7 with a stack of three convolutions layers of kernel size 3 × 3 × 3 in the SCER module, we conducted three sets of experiments in the SCER.
The results are shown in Table 4 and Figure 9. It is generally believed that a convolution with kernel size 7 × 7 × 7 has a large receptive field and that its effect should be the best, but, except for the best ET results, its other indications are not as good as those of other experiments. A large receptive field does indeed improve the segmentation of small target areas, but a large number of parameters can lead to a decrease in the final result. Three convolutions with kernel size 3 × 3 × 3 and a convolution with kernel size 7 × 7 × 7 have the same receptive field, but the parameter of three 3 × 3 × 3 is much smaller, whose results are the best among these three experiments. Empirically, we reduced one convolution on the basis of Experiment C. From Experiment B in Table 4, it can be seen that the effect did not improve, which also shows that stacking three 3 × 3 × 3 convolutional layers is the best method in the proposed network.
Table 4. The results of ablation experiments for the number of stacking convolution layers and kernel size in the SCER module.
Figure 9. The results of ablation experiments for the number of stacking convolution layers and kernel size in the SCER module.

4.5.3. Comparative Experiment SCER Module of SCENet with Shuffle Block of Shufflenet V2

The SCER module and ShuffleNet V2 are indeed somewhat similar, but there are still differences in their structures.
To demonstrate that the SCER module does indeed perform better than the Shuffle block in the SCENet network, we conducted two sets of experiments. From Figure 10 and Table 5, it can be seen that the WT, TC, ET, and average dice results of the SCER module are 91.15%, 86.40%, 82.17%, and 86.57%, respectively, which, in terms of WT, TC, ET, and average dice, are higher than the shuffle block of 90.73%, 84.99%, 82.39%, and 86.04%, respectively. This result shows that our module outperforms the shuffle block in brain tumor segmentation tasks.
Figure 10. The results of comparative experiment SCER module with shuffle block.
Table 5. The results of comparative experiment SCER module with shuffle block.

5. Limitations and Future Perspectives

At present, deep learning stands as a prominent area of research. The advancement of attention mechanisms and the refinement of algorithms have significantly improved the quality of brain tumor MRI images. Nonetheless, there is a dearth of experiments that translate these advancements into clinical practice. Moving forward, the emphasis of research should pivot towards clinical applications to develop algorithms that are more grounded in real-world utility.
Bioinformatics stands as a crucial field of advancement, exerting a substantial impact on the etiological analysis and prognostic evaluation of brain tumors. The seamless integration of medical imaging, clinical diagnostics, and bioinformatics can significantly enhance the efficacy of brain tumor treatments [56].

6. Conclusions

In this study, we propose SCENet based on UNet, which integrates an SCER block and a CSAM block. The SCENet module takes advantage of stacking three small kernel convolutions to form an effective receptive field, which greatly improves feature extraction ability. The CSAM module achieved a good segmentation performance by combining spatial and channel attention mechanisms for guiding the selection of deep semantic information. The ASPP module is introduced into the bottleneck layer to obtain richer semantic features. In addition to comparing our results with the classic 3D UNet, we also compared our results with networks based on UNet or Transformers. Our results yielded WT, TC, ET, and average dice of 91.67%, 87.70%, 83.35%, and 87.57%, respectively. We also conducted ablation experiments on the SCER module, CSAM module, and ASPP, which demonstrated the effectiveness of our modules. By comparing a 7 × 7 × 7 convolution with three stacking 3 × 3 × 3 convolutions in the SCER block, it can be seen that although their receptive field values are similar, the stacking of three 3 × 3 × 3 convolutions yielded better results in this network. The combination experiment of ASPP and CSAM shows that multi-scale receptive fields used into the bottleneck layer are beneficial, but the improvement effect is not significant without effective feature selection in the decoder. By comparing the performance of the SCER block in SCENet with the shuffle block in ShuffleNet V2 in this architecture, the results show that the proposed module in this network is superior to the shuffle block. Furthermore, quantitative and qualitative experiments demonstrated the accuracy of SCENet in brain tumor segmentation tasks. Our architecture and the proposed modules can provide effective ideas for subsequent research. We believe that the encouraging results obtained with SCENet will inspire further research into brain tumor segmentation.

Author Contributions

Conceptualization, N.C. and B.G.; methodology, B.G.; software, P.Y.; data curation, R.Z.; writing—original draft, B.G.; writing—review and editing, N.C. and B.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Jiangsu Provincial Key Research and Development Program (BE2020714). The APC was funded by Cao, N.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Datasets released to the public were analyzed in this study. The BraTS2021 dataset can be found through the following link: https://www.med.upenn.edu/cbica/brats2021/#Data2 (accessed on 15 October 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AGAttention Gate
APAverage Pooling
ASPPAtrous Spatial Pyramid Pooling
AVGAverage Dice Value
CNNConvolutional Neural Network
CSAMChannel Spatial Attention Module
DiceDice Similarity Coefficient
ETEnhancing Tumor
EVO NormEvolving Normalization–Activation Layers
ExptExperiment
FCNFully Convolutional Networks
FLAIRFluid Attenuated Inversion Recovery
GAPGlobal Average Pooling
HDHausdorff Distance
LNLayer Normalization
MPMax Pooling
ROIRegion Of Interest
SCENetSmall Kernel Convolution With Effective Receptive Field Network
SCERSmall Kernel Convolution With Effective Receptive Field Shuffle Module
SGESpatial Group-Wise Enhance
T1T1-Weighted
T1ceT1-Enhanced Contrast
T2T2-Weighted
TCTumor Core
VITVision Transformer
VTUnetVolumetric Transformer
WHOWorld Health Organization
WTWhole Tumor

References

  1. Ibebuike, K.; Ouma, J.; Gopal, R. Meningiomas among intracranial neoplasms in Johannesburg, South Africa: Prevalence, clinical observations and review of the literature. Afr. Health Sci. 2013, 13, 118–121. [Google Scholar] [CrossRef] [PubMed]
  2. Herholz, K.; Langen, K.-J.; Schiepers, C.; Mountz, J.M. Brain tumors. Semin. Nucl. Med. 2012, 42, 356–370. [Google Scholar] [CrossRef] [PubMed]
  3. Wu, J.; Su, R.; Qiu, D.; Cheng, X.; Li, L.; Huang, C.; Mu, Q. Analysis of DWI in the classification of glioma pathology and its therapeutic application in clinical surgery: A case-control study. Transl. Cancer Res. 2022, 11, 805–812. [Google Scholar] [CrossRef]
  4. Chen, J.; Qi, X.; Zhang, M.; Zhang, J.; Han, T.; Wang, C.; Cai, C. Review on neuroimaging in pediatric-type diffuse low-grade gliomas. Front. Pediatr. 2023, 11, 1149646. [Google Scholar] [CrossRef] [PubMed]
  5. Verma, N.; Cowperthwaite, M.C.; Burnett, M.G.; Markey, M.K.J. Differentiating tumor recurrence from treatment necrosis: A review of neuro-oncologic imaging strategies. Neuro-Oncology 2013, 15, 515–534. [Google Scholar] [CrossRef]
  6. Bauer, S.; Wiest, R.; Nolte, L.-P.; Reyes, M. A survey of MRI-based medical image analysis for brain tumor studies. Phys. Med. Biol. 2013, 58, R97. [Google Scholar] [CrossRef]
  7. Bakas, S.; Akbari, H.; Sotiras, A.; Bilello, M.; Rozycki, M.; Kirby, J.S.; Freymann, J.B.; Farahani, K.; Davatzikos, C. Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci. Data 2017, 4, 170117. [Google Scholar] [CrossRef]
  8. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
  9. Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, 17–21 October 2016; pp. 424–432. [Google Scholar]
  10. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  11. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
  12. Norman, B.; Pedoia, V.; Majumdar, S. Use of 2D U-Net convolutional neural networks for automated cartilage and meniscus segmentation of knee MR imaging data to determine relaxometry and morphometry. Radiology 2018, 288, 177–185. [Google Scholar] [CrossRef]
  13. Sevastopolsky, A. Optic disc and cup segmentation methods for glaucoma detection with modification of U-Net convolutional neural network. Pattern Recognit. Image Anal. 2017, 27, 618–624. [Google Scholar] [CrossRef]
  14. Roy, A.G.; Conjeti, S.; Karri, S.P.K.; Sheet, D.; Katouzian, A.; Wachinger, C.; Navab, N. ReLayNet: Retinal layer and fluid segmentation of macular optical coherence tomography using fully convolutional networks. Biomed. Opt. Express. 2017, 8, 3627–3642. [Google Scholar] [CrossRef] [PubMed]
  15. Skourt, B.A.; El Hassani, A.; Majda, A. Lung CT image segmentation using deep neural networks. Procedia Comput. Sci. 2018, 127, 109–113. [Google Scholar] [CrossRef]
  16. Chen, C.; Liu, X.; Ding, M.; Zheng, J.; Li, J. 3D dilated multi-fiber network for real-time brain tumor segmentation in MRI. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, 13–17 October 2019; pp. 184–192. [Google Scholar]
  17. Raza, R.; Bajwa, U.I.; Mehmood, Y.; Anwar, M.W.; Jamal, M.H. dResU-Net: 3D deep residual U-Net based brain tumor segmentation from multimodal MRI. Biomed. Signal Process. Control 2023, 79, 103861. [Google Scholar] [CrossRef]
  18. Ahmad, P.; Qamar, S.; Shen, L.; Rizvi, S.Q.A.; Ali, A.; Chetty, G. Ms unet: Multi-scale 3d unet for brain tumor segmentation. In Proceedings of the International MICCAI Brainlesion Workshop, Singapore, 18 September 2022; pp. 30–41. [Google Scholar]
  19. Gammoudi, I.; Ghozi, R.; Mahjoub, M.A. An Innovative Approach to Multimodal Brain Tumor Segmentation: The Residual Convolution Gated Neural Network and 3D UNet Integration. Trait. Signal 2024, 41, 141–151. [Google Scholar] [CrossRef]
  20. Soni, V.; Singh, N.K.; Singh, R.K.; Tomar, D.S. Multiencoder-based federated intelligent deep learning model for brain tumor segmentation. IMA 2023, 34, e22981. [Google Scholar] [CrossRef]
  21. Olisah, C.C. SEDNet: Shallow Encoder-Decoder Network for Brain Tumor Segmentation. arXiv 2024, arXiv:2401.13403. [Google Scholar]
  22. Chen, R.; Lin, Y.; Ren, Y.; Deng, H.; Cui, W.; Liu, W. An efficient brain tumor segmentation model based on group normalization and 3D U-Net. IMA 2024, 34, e23072. [Google Scholar] [CrossRef]
  23. Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; Mcdonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention u-net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
  24. Liu, D.; Sheng, N.; He, T.; Wang, W.; Zhang, J.; Zhang, J. SGEResU-Net for brain tumor segmentation. Math. Biosci. Eng. 2022, 19, 5576–5590. [Google Scholar] [CrossRef] [PubMed]
  25. Tian, W.W.; Li, D.W.; Lv, M.Y.; Huang, P. Axial attention convolutional neural network for brain tumor segmentation with multi-modality mri scans. Brain Sci. 2023, 13, 12. [Google Scholar] [CrossRef]
  26. Zhang, L.; Lan, C.; Fu, L.; Mao, X.; Zhang, M. Segmentation of brain tumor MRI image based on improved attention module Unet network. SIViP 2023, 17, 2277–2285. [Google Scholar] [CrossRef]
  27. Liu, D.; Sheng, N.; Han, Y.; Hou, Y.; Liu, B.; Zhang, J. SCAU-net: 3D self-calibrated attention U-Net for brain tumor segmentation. Neural Comput. 2023, 35, 23973–23985. [Google Scholar] [CrossRef]
  28. Zeeshan Aslam, M.; Raza, B.; Faheem, M.; Raza, A. AML-Net: Attention-based multi-scale lightweight model for brain tumor segmentation in internet of medical things. CAAI Trans. Intell. Technol. 2024; early view. [Google Scholar] [CrossRef]
  29. Kharaji, M.; Abbasi, H.; Orouskhani, Y.; Shomalzadeh, M.; Kazemi, F.; Orouskhani, M. Brain Tumor Segmentation with Advanced nnU-Net: Pediatrics and Adults Tumors. Neurosci. Inform. 2024, 4, 100156. [Google Scholar] [CrossRef]
  30. Pang, B.; Chen, L.; Tao, Q.; Wang, E.; Yu, Y. GA-UNet: A Lightweight Ghost and Attention U-Net for Medical Image Segmentation. J. Imaging Inform. Med. 2024, 37, 1874–1888. [Google Scholar] [CrossRef] [PubMed]
  31. Tang, Y.; Han, K.; Guo, J.; Xu, C.; Xu, C.; Wang, Y. GhostNetV2: Enhance cheap operation with long-range attention. Proc. Adv. Neural Inf. Process. Syst. 2022, 35, 9969–9982. [Google Scholar]
  32. Peiris, H.; Hayat, M.; Chen, Z.; Egan, G.; Harandi, M. A robust volumetric transformer for accurate 3D tumor segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, 18–22 September 2022; pp. 162–172. [Google Scholar]
  33. Jia, Q.; Shu, H. Bitr-unet: A cnn-transformer combined network for mri brain tumor segmentation. In Proceedings of the International MICCAI Brainlesion Workshop, Virtual Event, 27 September 2021; pp. 3–14. [Google Scholar]
  34. Wang, W.; Chen, C.; Ding, M.; Yu, H.; Zha, S.; Li, J. TransBTS: Multimodal brain tumor segmentation using transformer. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Strasbourg, France, 27 September–1 October 2021; pp. 109–119. [Google Scholar]
  35. Sun, Q.; Fang, N.; Liu, Z.; Zhao, L.; Wen, Y.; Lin, H. HybridCTrm: Bridging CNN and transformer for multimodal brain image segmentation. J. Healthc. Eng. 2021, 2021, 7467261. [Google Scholar] [CrossRef] [PubMed]
  36. Hatamizadeh, A.; Tang, Y.; Nath, V.; Yang, D.; Myronenko, A.; Landman, B.; Roth, H.R.; Xu, D. Unetr: Transformers for 3d medical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 574–584. [Google Scholar]
  37. Cai, Y.; Long, Y.; Han, Z.; Liu, M.; Zheng, Y.; Yang, W.; Chen, L. Swin Unet3D: A three-dimensional medical image segmentation network combining vision transformer and convolution. BMC Med. Inform. Decis. Mak. 2023, 23, 33. [Google Scholar] [CrossRef] [PubMed]
  38. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2022, arXiv:2010.11929. [Google Scholar]
  39. Liu, H.; Brock, A.; Simonyan, K.; Le, Q. Evolving normalization-activation layers. Adv. Neural Inf. Process. Syst. 2020, 33, 13539–13550. [Google Scholar]
  40. Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
  41. Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
  42. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  43. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
  44. Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
  45. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  46. Baid, U.; Ghodasara, S.; Mohan, S.; Bilello, M.; Calabrese, E.; Colak, E.; Farahani, K.; Kalpathy-Cramer, J.; Kitamura, F.C.; Pati, S. The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification. arXiv 2021, arXiv:2107.02314. [Google Scholar]
  47. Menze, B.H.; Jakab, A.; Bauer, S.; Kalpathy-Cramer, J.; Farahani, K.; Kirby, J.; Burren, Y.; Porz, N.; Slotboom, J.; Wiest, R. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 2014, 34, 1993–2024. [Google Scholar] [CrossRef]
  48. Lowekamp, B.C.; Chen, D.T.; Ibáñez, L.; Blezek, D. The design of SimpleITK. Front. Neuroinformatics 2013, 7, 45. [Google Scholar] [CrossRef]
  49. Wright, L.; Demeure, N. Ranger21: A synergistic deep learning optimizer. arXiv 2021, arXiv:2106.13731. [Google Scholar]
  50. Dice, L.R. Measures of the amount of ecologic association between species. Ecology 1945, 26, 297–302. [Google Scholar] [CrossRef]
  51. Kim, I.S.; McLean, W. Computing the Hausdorff distance between two sets of parametric curves. Commun. Korean Math. Soc. 2013, 28, 833–850. [Google Scholar] [CrossRef]
  52. Aydin, O.U.; Taha, A.A.; Hilbert, A.; Khalil, A.A.; Galinovic, I.; Fiebach, J.B.; Frey, D.; Madai, V.I. On the usage of average Hausdorff distance for segmentation performance assessment: Hidden error when used for ranking. Eur. Radiol. Exp. 2021, 5, 4. [Google Scholar] [CrossRef]
  53. Wu, Q.; Pei, Y.; Cheng, Z.; Hu, X.; Wang, C. SDS-Net: A lightweight 3D convolutional neural network with multi-branch attention for multimodal brain tumor accurate segmentation. Math. Biosci. Eng. 2023, 20, 17384–17406. [Google Scholar] [CrossRef] [PubMed]
  54. Håversen, A.H.; Bavirisetti, D.P.; Kiss, G.H.; Lindseth, F. QT-UNet: A self-supervised self-querying all-Transformer U-Net for 3D segmentation. IEEE Access 2024, 12, 62664–62676. [Google Scholar] [CrossRef]
  55. Akbar, A.S.; Fatichah, C.; Suciati, N.; Za’in, C. Yaru3DFPN: A lightweight modified 3D UNet with feature pyramid network and combine thresholding for brain tumor segmentation. Neural Comput. Appl. 2024, 36, 7529–7544. [Google Scholar] [CrossRef]
  56. Papacocea, S.I.; Vrinceanu, D.; Dumitru, M.; Manole, F.; Serboiu, C.; Papacocea, M.T. Molecular Profile as an Outcome Predictor in Glioblastoma along with MRI Features and Surgical Resection: A Scoping Review. Int. J. Mol. Sci. 2024, 25, 9714. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.