BU-Net: Brain Tumor Segmentation Using Modiﬁed U-Net Architecture

: The semantic segmentation of a brain tumor is of paramount importance for its treatment and prevention. Recently, researches have proposed various neural network-based architectures to improve the performance of segmentation of brain tumor sub-regions. Brain tumor segmentation, being a challenging area of research, requires improvement in its performance. This paper proposes a 2D image segmentation method, BU-Net, to contribute to brain tumor segmentation research. Residual extended skip (RES) and wide context (WC) are used along with the customized loss function in the baseline U-Net architecture. The modiﬁcations contribute by ﬁnding more diverse features, by increasing the valid receptive ﬁeld. The contextual information is extracted with the aggregating features to get better segmentation performance. The proposed BU-Net was evaluated on the high-grade glioma (HGG) datasets of the BraTS2017 Challenge—the test datasets of the BraTS 2017 and 2018 Challenge datasets. Three major labels to segmented were tumor core (TC), whole tumor (WT), and enhancing core (EC). To compare the performance quantitatively, the dice score was utilized. The proposed BU-Net outperformed the existing state-of-the-art techniques. The high performing BU-Net can have a great contribution to researchers from the ﬁeld of bioinformatics and medicine.


Introduction
The brain tumor is caused by abnormal cell growth in the human brain. Currently, the incidence of malignant brain tumor is relatively high, which has a great impact on humans and society [1]. To diagnose this disease, a brain tumor is subdivided through high-quality image processing. The dominant malignant brain tumor is known as the histological glioma, and its sub-regions are tumor core, enhancing core, and whole tumor [2,3]. Most of the existing brain tumor segmentation studies focus on gliomas, the most common brain tumors in adults, and there are two types of glioma: high-grade glioma (HGG) and low-grade glioma (LGG). HGG tumors behave malignantly as they grow rapidly and damage brain tissues. Patients affected with HGG tumors require surgery, as they are unable to survive for more than 2 years. The active treatment of LGG tumors can extend life expectancy [4].
Brain tumors can be monitored and analyzed by main tools such as Magnetic Resonance Imaging (MRI). The MRI employs four different modalities to visualize the brain: T1-weighted, T2-weighted, post-contrast T1-weighted, and Flair. Different information from these modalities complements one another for a robust brain tumor segmentation.
Since it is difficult to divide brain tumors manually, a lot of effort is being devoted to develop a method for automatically segmenting brain tumor regions. It is crucial to separate and interpret tumors in the medical field, and a clear understanding is essential. With advances in medical image processing, finding tumor using machine learning has become more reliable and sophisticated than in the past [5]. From a judgmental point of view, it is important that medical experts can trust the algorithm's predictions. In the field of bioinformatics [6][7][8] and medical imaging [9,10], deep learning algorithms have obtained impressive results. In recent times real-life applications of soft computing techniques in different fields have proved that deep learning can have a good impact on human lives [11][12][13][14][15][16][17][18][19].
The most common deep learning-based methods in the field of medical image segmentation are U-Net [20] and Fully Convolutional Network (FCN) [21]. Among them, U-net has proved to be the most reliable technique in terms of performance. The U-net architecture has a U-symmetrical structure where the left side performs encoder task and the right side of the architecture performs decoder task. Another specification in this architecture is that the encoder concatenates the corresponding layer of the decoder. This characteristic allows the resultant feature map to have both low-level and high-level features. Further, the model performance is improved by integrating features from different levels while preserving the location information.
The 3-dimensional (3D) segmentation based on MRI [22] and the 2-dimensional 2D segmentation based on slice [23] are the main methods for brain tumor segmentation. In the case of MRI-based 3D segmentation, there are few training data with labels [3,24,25], and it is difficult to increase the amount of data. In particular, enormous network parameters and memory issues make it hard to train 3D models.
Havaei et al. [26] proposed a specific multipath convolution neural network (CNN) to segment the brain tumor region on the 2D sliced data of the MRI image. Besides, they used two training steps (phases) to deal with unbalanced classes of input data. Shen et al. developed a boundary-aware FCN to improve the segmentation performance [27]. Later, Kamnitsas et al. [22] developed a 3D network called Deep Medic that extracts multi-scale feature maps and incorporates them locally and globally using a two-path architecture.
On the other hand, training data for 2D segmentation are 155 times more dense (each 3D-MRI datum contains 155 2D sliced data), so 2D segmentation has drawn attention recently [28]. In particular, patch-based 2D models such as Pereira [23] and FCN-based (fully convolutional networks) 2D models [21] such as U-net [20] are two representative kinds for 2D brain tumor segmentation. The patch-based model classifies surrounding patches to determine which class each pixel belongs to. The pipeline of the patch-based model is generally composed of three main steps: pre-processing, classification using CNNs, and post-processing, which takes time and cannot be carried out end-to-end [28].
In 2018 Wang et al. proposed a technique to gain long-range dependency of spatial-dimension [29]. For this purpose, the weighted sum of all responses was taken to generate a spatial feature response. Further, in another study, the network learned long-range context with the help of location-sensitive NL [30]. An increase in interest for network architecture such as FCN and U-net has occurred in recent years. Among them, U-Net is the most widely used architecture due to its high performance. In the recent reputed publication, the U-Net architecture was declared as a genetic solution algorithm for research problems related to biomedical image data [31]. However, U-net was developed for segmentation work for binary classes, and the output resolution is smaller than the input resolution because it does not use convolution with padding. Therefore, U-net cannot be directly applied if we require similar output resolution as that of input. Further, the U-Net architecture gradually recovers the downsampling image, and low-level features from shallow layers are shared with the deep layers. This direct information bridge generates distortion in the information, which affects the final prediction. However, an effective information bridge between shallow and deep layers can enhance the local features, which may improve the brain tumor segmentation performance. W-Net is another architecture that resembles the U-Net architecture, which uses two-stage U-Net. However, the problem with W-Net is the high number of trainable parameters it has, which makes it difficult to train the model.
Keeping in mind the limitations in the baseline model, in this paper we propose a network named BU-Net. For brain image segmentation, BU-Net introduces two modules that are embedded in a U-Net architecture. These modules are residual extended skip (RES) and wide context (WC), which were inspired by inception net [32] and deep lab [33] respectively. The following are the contributions made by the BU-Net,

•
Both new modules in BU-Net help to get the contextual information along with aggregation in the global features.

•
Residual extended skip (RES) converts the low-level features to middle-level features.

•
It is useful when scale-invariant features are used, which is important in the case of brain tumor segmentation, as the cancer regions vary from case to case.

•
The RES module increases the valid receptive field, which remains a problem in previous techniques, as in those techniques the theoretical receptive field is always dominant.

•
Two combined loss functions are used to tackle the problem associated with a huge difference in the percentage of pixels occupied by each class.
BU-Net has exhibited promising results when compared with existing state-of-the-art brain tumor segmentation techniques.

Datasets
In this section, we discuss the publicly available benchmark databases used in this study. The proposed BU-Net model was evaluated on two benchmark datasets. These datasets are BraTS 2017 and BraTS 2018. The BraTS 2017 dataset consists of images collected from 285 glioma patients, out of which 210 were HGG cases and the remainder belong to LGG cases. Further, the validation dataset of BraTS 2017 carries images of 46 patients with unknown grade. The ground truths of the training data were labeled by the experts, and the labels of validation dataset are not made publicly available; therefore, the results can only be generated from the online web-server of BraTS. The dataset is labeled as four main classes which are: • Enhancing tumor. • Necrosis and non-enhancing tumor.
Healthy tissue.

Methodology
In this section, we first discuss image preprocessing, which is necessary for an input image. Then, proposed BU-Net is discussed along with the two modules RES and WC, which are included to get better performance.

Image Preprocessing
One of the weaknesses of deep learning models is that they are robust to noise; therefore, data processing is an important task to be carried out before the image is given to the network. For this purpose, N4ITK algorithm [34], a bias correction technique, is used on all images to make them homogeneous. In the literature many different algorithms are used for the pre-processing of the input images; however, the majority of the literature suggests that the N41TK algorithm for brain image pre-processing is the most reliable [35,36]. The N41TK algorithm is capable of correcting the bias field of MRI data. Moreover, the intensities at top 1% and bottom 1% are discarded, as done in [26]. As a final step, all the images are normalized to a zero mean with unit variance.

Proposed BU-Net
In previous baseline architecture, no contextual information is shared between the shallow and deep layers. There is a need to introduce a module which can create an information bridge between shallow and deep layers so that local and global features of the network can be enhanced. Figure 2 shows the overall architecture of the proposed BU-Net, which includes RES blocks and a WC block. The architecture takes input images of resolution 256 × 256 and outputs the images with the same dimensions. The left part of the model act as an encoder and the right part of the model acts as a decoder. The convolution layers with padding are used in BU-Net. This allows getting the same sized image as the output as that given as input.
The encoder and decoder of the network are divided into blocks. On the encoder side every block consists of two convolution layers along with a single max-pooling layer and a dropout layer. Every block of the decoder side starts with the Conv2DTranspose layer applied on the output of the previous block. The output of Conv2DTranspose layer is concatenated with the output generated from the associated RES block. Dropout is applied to the concatenated output followed by two convolution layers. The last block of the decoder includes another convolution layer with six filters of size 1 × 1. The encoder side performs the contraction process on the image, and the decoder side performs the expansion process. Further, for the transition from the encoder to the decoder, the architecture uses a wide context block. All the convolution layers of BU-Net are followed by batch normalization and ReLU activation function, except for the last convolution layer, which uses a sigmoid activation function. The numerical representations of ReLU and sigmoid activation function are as follows: The BU-Net is implemented on Keras framework [37]. To set the dropout ratio, we applied hyper-parameter tuning-a range of dropout ratios were tested to get the most optimal dropout ratio; 0.3 proved to be the most optimal dropout ratio for the network. Adam optimizer was used along with the customized loss function. The learning rate was set to 0.01 with a momentum of 0.9. The batch size was 16, and early stopping based on validation loss with patience level of 10 was utilized for the maximum number of training iterations.
The RES block, wide context block, and customized loss function are discussed in upcoming subsections. Figure 3 shows the architecture of the residual extended skip (RES) block. The input to the architecture is given to 5 parallel connections. In the first four of them, two convolutions layers are applied. In each connection with convolution layers, we have used N × 1 filter size for first convolution layer and 1 × N filter size for second convolution layer. We used two cascaded convolution layers rather than using a single convolution layer with the filter size of N × N. Using two convolution layers generates a lesser number of parameters which benefits the overall architecture. Moreover, during experiments, the observation was made that the impact of cascaded convolution layers with the lower number of parameters is similar to that of a single layer of convolution having a higher number of parameters. The last connection is a skip connection where the input is as it is forwarded. All the outputs from five connections are summed up to get a single output. Three convolution layers one after another are applied on the summed output. The three convolution layers have filter sizes of 3 × 3, 3 × 3, and 1 × 1.

Residual Extended Skip (RES)
The RES block generates the middle-level features from the low-level features, which helps to control the information degradation. The cancer regions have high size variations for which residual extended skip performs contextual aggregation on multiple scales, which makes it scale-invariant. The RES increases the valid receptive field, and this allows the BU-Net to have better segmentation.  Figure 4 shows the architecture of the wide context (WC) block. The input to WC is given to two parallel connections. Both the connections have 2 convolution layers. In the first connection, the two convolution layers use N × 1 and 1 × N respectively. The second connection first uses the 1 × N filter size, and then the next convolution layer has filter size N × 1. This change in combination in both the connections makes up a good feature set which can contribute towards the performance. The observation was made that a change in combination changes the extracted features, and both the combinations can contribute towards the final result. The outputs from both connections are summed up and treated as an output of WC.

Wide Context (WC)
The wide context (WC), similarly to RES, extracts the contextual information which is important for sub-classification between different sub-classes of cancer. Further, it performs the feature aggregation at the transition level, which leads to a better reconstruction of the segmented regions.

Customized Loss Function
One of the challenges with brain tumor segmentation is related to the imbalance class data. For reference, Table 1 shows the distribution of the classes for BraTS training data. The total area covered by healthy tissues in brain tumor MRI is 98.46%. The edema region covers 1.02% and the enhancing tumor region covers 0.29% of brain tumor MRI image. The lowest volume is covered by the non-enhancing tumor, which is only 0.23%. The large difference has a severe effect on the segmentation performance. To address the discussed problem, BU-Net utilizes a combined loss function that sums weight cross-entropy (WCE) and Dice loss coefficient (DLC). The respective mathematical expressions for the loss functions are as follows: where N represents the total number of labels, w j is the assigned weight to the label "j". Further, p j denotes the predicted binary pixel value of segmented image and g i denotes ground truth binary pixel value of the segmented image. Thus, the total loss function will be: The loss function is composed of two objective functions: one objective function is used to get maximum overlap between the ground truth and predicted segmented regions regardless of the class, which is performed by Dice loss coefficient (DLC); and the second objective function is responsible for classifying the tissue cells concerning their class, which is performed by weight cross-entropy (WCE).

Results and Discussion
We have carried out quantitative and qualitative analysis. For quantitative analysis, the performance evaluation in terms of numbers is discussed. For qualitative analysis, the visual quality of the results is discussed.
For evaluating the performance of BU-Net, we have used the Dice score as the figure of merit. Dice score is used by the previous state-of-the-art techniques, so it will allow us to have a better quantitative comparative analysis between existing state-of-the-art and proposed BU-Net architectures. The Dice score gives similarity between sets P & Q which can be mathematically expressed as where |P| and |Q| represents the cardinalities of sets P & Q respectively. Firstly the proposed model was evaluated on BraTS 2017 HGG dataset which has 210 cases. Out of these cases, 80% were used for training purposes and the other 20% were used for testing purposes. The training and testing cases are defined by the BraTS challenge. Table 2 shows the achieved results by BU-Net compared with existing techniques. All the architectures have used similar cost functions, optimizers and all other co-factors. BU-Net obtained gains of 7%, 6.6% and 8.5% when compared with its baseline U-Net for segmentation of whole, core, and enhancing tumor segmentation respectively. In terms of Dice score, the proposed model has further outperformed four existing state-of-the-art techniques which hold the best performance of segmentation for HGG data. For further evaluation of BU-Net, results were obtained for the whole dataset of BraTS 2017. In this experiment, there were 228 MRI scans used for training purposesm and the remaining 57 MRI scans were used for testing. Table 3 illustrates the attained results using BU-Net and its comparison with the best existing techniques. In case of enhancing tumor and core tumor the best results from the literature are received by ResU-Net. In its comparison our proposed model has shown a performance increase of 0.3% and 0.5% for enhancing tumor and core tumor respectively. Bets results reported in the literature for the whole tumor are by NovelNet. Even in this case, the proposed BU-Net have shown an improvement of 1.6%. The difference between the performance of state-of-the-art techniques and BU-Net exhibits the fact that proposed model can effectively identify the small tumor regions. The third experiment is carried out on the BraTS 2018 dataset. In this dataset, there are 285 training samples and 66 testing samples. Table 4 illustrates the comparison of the results of BU-Net and other state-of-the-art techniques on BraTS 2018 validation dataset. The BU-Net architecture has achieved the dice score of 0.901, 0.837, 0.788 for whole, core and enhancing tumor respectively. The proposed model has exhibited better performance when compared with either the baseline architecture which is U-Net or the other existing state-of-the-art techniques. The better performance by the BU-Net shows that it has high intersection over union, which means the model can identify majority of the area for every tumor type.
In the research problem associated with segmentation, the qualitative analysis is as important as quantitative analysis. For this reason, we have carried out qualitative analysis. Figure 5 illustrates the visual comparison between the ground truth and the predicted segmented region using BU-Net and U-Net for four different cases. As can be seen that the predicted regions by BU-Net show high likeness with the ground truth. Further, the visual quality comparison can be made between the performance of U-Net and BU-Net. The U-Net prediction holds many unwanted regions segmented as edema which is the false prediction of edema region. BU-Net has shown high resemblance with the ground truth. While identifying the necrosis region, the U-Net architecture seems to be unable to identify the whole region. Whereas BU-Net has covered most of the area of necrosis region. The high resemblance between ground truth and proposed architecture, speaks about the high quality of BU-Net architecture.

Conclusions
Brain tumor segmentation is a difficult task due to the complexity of MRI brain images, and it aims to predict tumors by segmenting them through artificial intelligence models. We propose BU-Net to segment and classify the brain tumor regions. For the definite segmentation of brain tumors, we have proposed a novel model with modifications in encoder-decoder architecture. We have introduced two new blocks, namely, residual extended skip (RES) and wide context (WC), into the existing U-Net architecture. Special attention is given to the contextual features of the MRI scans which have proved to be beneficial for the segmentation of tumor regions. An increase in the valid receptive field is achieved using RES block, which improves the overall performance. The proposed BU-Net architecture was evaluated on BraTS 2017 and 2018 datasets. BU-Net has exhibited good improvement when compared with baseline U-Net architecture and other existing efficient segmentation models. The proposed model is a brain lesion segmentation prediction framework, and as a related study, it contributes towards the precise segmentation of brain lesions regions. The 2D U-Net has the limitation of information loss when compared with 3D U-Net. BU-Net loses local details and context information between different slices. In the future, the authors intend to explore 3D-based networks to improve the performance of segmentation.