1. Introduction
The brain tumor is caused by abnormal cell growth in the human brain. Currently, the incidence of malignant brain tumor is relatively high, which has a great impact on humans and society [
1]. To diagnose this disease, a brain tumor is subdivided through high-quality image processing. The dominant malignant brain tumor is known as the histological glioma, and its sub-regions are tumor core, enhancing core, and whole tumor [
2,
3]. Most of the existing brain tumor segmentation studies focus on gliomas, the most common brain tumors in adults, and there are two types of glioma: high-grade glioma (HGG) and low-grade glioma (LGG). HGG tumors behave malignantly as they grow rapidly and damage brain tissues. Patients affected with HGG tumors require surgery, as they are unable to survive for more than 2 years. The active treatment of LGG tumors can extend life expectancy [
4].
Brain tumors can be monitored and analyzed by main tools such as Magnetic Resonance Imaging (MRI). The MRI employs four different modalities to visualize the brain: T1-weighted, T2-weighted, post-contrast T1-weighted, and Flair. Different information from these modalities complements one another for a robust brain tumor segmentation.
Since it is difficult to divide brain tumors manually, a lot of effort is being devoted to develop a method for automatically segmenting brain tumor regions. It is crucial to separate and interpret tumors in the medical field, and a clear understanding is essential. With advances in medical image processing, finding tumor using machine learning has become more reliable and sophisticated than in the past [
5]. From a judgmental point of view, it is important that medical experts can trust the algorithm’s predictions. In the field of bioinformatics [
6,
7,
8] and medical imaging [
9,
10], deep learning algorithms have obtained impressive results. In recent times real-life applications of soft computing techniques in different fields have proved that deep learning can have a good impact on human lives [
11,
12,
13,
14,
15,
16,
17,
18,
19].
The most common deep learning-based methods in the field of medical image segmentation are U-Net [
20] and Fully Convolutional Network (FCN) [
21]. Among them, U-net has proved to be the most reliable technique in terms of performance. The U-net architecture has a U-symmetrical structure where the left side performs encoder task and the right side of the architecture performs decoder task. Another specification in this architecture is that the encoder concatenates the corresponding layer of the decoder. This characteristic allows the resultant feature map to have both low-level and high-level features. Further, the model performance is improved by integrating features from different levels while preserving the location information.
The 3-dimensional (3D) segmentation based on MRI [
22] and the 2-dimensional 2D segmentation based on slice [
23] are the main methods for brain tumor segmentation. In the case of MRI-based 3D segmentation, there are few training data with labels [
3,
24,
25], and it is difficult to increase the amount of data. In particular, enormous network parameters and memory issues make it hard to train 3D models.
Havaei et al. [
26] proposed a specific multipath convolution neural network (CNN) to segment the brain tumor region on the 2D sliced data of the MRI image. Besides, they used two training steps (phases) to deal with unbalanced classes of input data. Shen et al. developed a boundary-aware FCN to improve the segmentation performance [
27]. Later, Kamnitsas et al. [
22] developed a 3D network called Deep Medic that extracts multi-scale feature maps and incorporates them locally and globally using a two-path architecture.
On the other hand, training data for 2D segmentation are 155 times more dense (each 3D-MRI datum contains 155 2D sliced data), so 2D segmentation has drawn attention recently [
28]. In particular, patch-based 2D models such as Pereira [
23] and FCN-based (fully convolutional networks) 2D models [
21] such as U-net [
20] are two representative kinds for 2D brain tumor segmentation. The patch-based model classifies surrounding patches to determine which class each pixel belongs to. The pipeline of the patch-based model is generally composed of three main steps: pre-processing, classification using CNNs, and post-processing, which takes time and cannot be carried out end-to-end [
28].
In 2018 Wang et al. proposed a technique to gain long-range dependency of spatial-dimension [
29]. For this purpose, the weighted sum of all responses was taken to generate a spatial feature response. Further, in another study, the network learned long-range context with the help of location-sensitive NL [
30]. An increase in interest for network architecture such as FCN and U-net has occurred in recent years. Among them, U-Net is the most widely used architecture due to its high performance. In the recent reputed publication, the U-Net architecture was declared as a genetic solution algorithm for research problems related to biomedical image data [
31]. However, U-net was developed for segmentation work for binary classes, and the output resolution is smaller than the input resolution because it does not use convolution with padding. Therefore, U-net cannot be directly applied if we require similar output resolution as that of input. Further, the U-Net architecture gradually recovers the downsampling image, and low-level features from shallow layers are shared with the deep layers. This direct information bridge generates distortion in the information, which affects the final prediction. However, an effective information bridge between shallow and deep layers can enhance the local features, which may improve the brain tumor segmentation performance. W-Net is another architecture that resembles the U-Net architecture, which uses two-stage U-Net. However, the problem with W-Net is the high number of trainable parameters it has, which makes it difficult to train the model.
Keeping in mind the limitations in the baseline model, in this paper we propose a network named BU-Net. For brain image segmentation, BU-Net introduces two modules that are embedded in a U-Net architecture. These modules are residual extended skip (RES) and wide context (WC), which were inspired by inception net [
32] and deep lab [
33] respectively. The following are the contributions made by the BU-Net,
Both new modules in BU-Net help to get the contextual information along with aggregation in the global features.
Residual extended skip (RES) converts the low-level features to middle-level features.
It is useful when scale-invariant features are used, which is important in the case of brain tumor segmentation, as the cancer regions vary from case to case.
The RES module increases the valid receptive field, which remains a problem in previous techniques, as in those techniques the theoretical receptive field is always dominant.
Two combined loss functions are used to tackle the problem associated with a huge difference in the percentage of pixels occupied by each class.
BU-Net has exhibited promising results when compared with existing state-of-the-art brain tumor segmentation techniques.
4. Results and Discussion
We have carried out quantitative and qualitative analysis. For quantitative analysis, the performance evaluation in terms of numbers is discussed. For qualitative analysis, the visual quality of the results is discussed.
For evaluating the performance of BU-Net, we have used the Dice score as the figure of merit. Dice score is used by the previous state-of-the-art techniques, so it will allow us to have a better quantitative comparative analysis between existing state-of-the-art and proposed BU-Net architectures. The Dice score gives similarity between sets
P &
Q which can be mathematically expressed as
where
and
represents the cardinalities of sets P & Q respectively.
Firstly the proposed model was evaluated on BraTS 2017 HGG dataset which has 210 cases. Out of these cases, 80% were used for training purposes and the other 20% were used for testing purposes. The training and testing cases are defined by the BraTS challenge.
Table 2 shows the achieved results by BU-Net compared with existing techniques. All the architectures have used similar cost functions, optimizers and all other co-factors. BU-Net obtained gains of 7%, 6.6% and 8.5% when compared with its baseline U-Net for segmentation of whole, core, and enhancing tumor segmentation respectively. In terms of Dice score, the proposed model has further outperformed four existing state-of-the-art techniques which hold the best performance of segmentation for HGG data.
For further evaluation of BU-Net, results were obtained for the whole dataset of BraTS 2017. In this experiment, there were 228 MRI scans used for training purposesm and the remaining 57 MRI scans were used for testing.
Table 3 illustrates the attained results using BU-Net and its comparison with the best existing techniques. In case of enhancing tumor and core tumor the best results from the literature are received by ResU-Net. In its comparison our proposed model has shown a performance increase of 0.3% and 0.5% for enhancing tumor and core tumor respectively. Bets results reported in the literature for the whole tumor are by NovelNet. Even in this case, the proposed BU-Net have shown an improvement of 1.6%. The difference between the performance of state-of-the-art techniques and BU-Net exhibits the fact that proposed model can effectively identify the small tumor regions.
The third experiment is carried out on the BraTS 2018 dataset. In this dataset, there are 285 training samples and 66 testing samples.
Table 4 illustrates the comparison of the results of BU-Net and other state-of-the-art techniques on BraTS 2018 validation dataset. The BU-Net architecture has achieved the dice score of 0.901, 0.837, 0.788 for whole, core and enhancing tumor respectively. The proposed model has exhibited better performance when compared with either the baseline architecture which is U-Net or the other existing state-of-the-art techniques. The better performance by the BU-Net shows that it has high intersection over union, which means the model can identify majority of the area for every tumor type.
In the research problem associated with segmentation, the qualitative analysis is as important as quantitative analysis. For this reason, we have carried out qualitative analysis.
Figure 5 illustrates the visual comparison between the ground truth and the predicted segmented region using BU-Net and U-Net for four different cases. As can be seen that the predicted regions by BU-Net show high likeness with the ground truth. Further, the visual quality comparison can be made between the performance of U-Net and BU-Net. The U-Net prediction holds many unwanted regions segmented as edema which is the false prediction of edema region. BU-Net has shown high resemblance with the ground truth. While identifying the necrosis region, the U-Net architecture seems to be unable to identify the whole region. Whereas BU-Net has covered most of the area of necrosis region. The high resemblance between ground truth and proposed architecture, speaks about the high quality of BU-Net architecture.