RMU-Net: A Novel Residual Mobile U-Net Model for Brain Tumor Segmentation from MR Images

: The most aggressive form of brain tumor is gliomas, which leads to concise life when high grade. The early detection of glioma is important to save the life of patients. MRI is a commonly used approach for brain tumors evaluation. However, the massive amount of data provided by MRI prevents manual segmentation in a reasonable time, restricting the use of accurate quantitative measurements in clinical practice. An automatic and reliable method is required that can segment tumors accurately. To achieve end-to-end brain tumor segmentation, a hybrid deep learning model RMU-Net is proposed. The architecture of MobileNetV2 is modiﬁed by adding residual blocks to learn in-depth features. This modiﬁed Mobile Net V2 is used as an encoder in the proposed network, and upsampling layers of U-Net are used as the decoder part. The proposed model has been validated on BraTS 2020, BraTS 2019, and BraTS 2018 datasets. The RMU-Net achieved the dice coefﬁcient scores for WT, TC, and ET of 91.35%, 88.13%, and 83.26% on the BraTS 2020 dataset, 91.76%, 91.23%, and 83.19% on the BraTS 2019 dataset, and 90.80%, 86.75%, and 79.36% on the BraTS 2018 dataset, respectively. The performance of the proposed method outperforms with less computational cost and time as compared to previous methods.


Introduction
A tumor is a bunch of abnormal cells that grow within the brain. These abnormal cells can cause death if not detected in the early stages. A brain tumor is classified into two main types: benign, which is not cancer, and malignant, a severe type of cancer. Gliomas are the most ordinary kind of brain tumor in adults, which is categorized into two different grades: high-grade gliomas (HGG) proliferate, while low-grade gliomas (LGG) are slowly growing tumors [1]. Primary tumors started within the brain and came from the cells, infiltrating the brain and the nervous system [2]. Secondary tumors begin in one part of the body and metastasize to the brain [3]. Tumors like meningiomas can be easily segmented, while glioblastomas and gliomas are challenging to find and localize due to high contrast and diffusion. In addition, its appearance varies in size, shape, and form, which makes it more challenging to detect. A study concluded that 25 people out of 100,000 have tumors, of which 33% of people were critical [4].
Medical images such as X-rays, CT images, and MRI were used for the diagnosis of diseases. Medical Resonance Imaging has been widely used to detect and treat brain

•
To introduce a novel end-to-end brain tumor segmentation technique that performs classification on the pixel level.

•
To extract discriminant features for introducing a unique approach with adequate results for a clinical setting.

•
To develop an efficient network that reduces the computational cost while maintaining high accuracy.

•
We are adapting the MobileNetV2 [11] as an encoder for U-Net architecture and identifying the changes in performance.

Materials and Methods
Machine learning and digital image processing have been modernized through the innovation of deep learning technology. The power of deep learning methods focuses on images' ability to generate stable, discriminatory, and useful semantic characteristics. The word deep refers to the addition of more layers to increase the network size. Deep learning has created advancements in many fields such as medical image analysis [12], security applications [13], and agriculture [14]. Convolutional neural networks are the most frequently used technique to solve image-based problems. In the proposed CNN model, various combinations of layers are used. A threshold-based deep learning model was proposed [15] in which a multi-level neural network was used to diagnose glaucoma using fundus images. The dataset was collected locally and pre-processed using an adaptive histogram equalizer to remove noise. Two deep learning models are used, one for glaucoma detection as detection-net and another for classification of affected and non-affected glaucoma images. The outcome of the network performs well as compared to previous approaches. The deep LSTM used by Ghulam Ali et al. [16] and the IoT sensors integration helps detect the availability of available parking slots. Birmingham parking dataset was used for the evaluation of the model. Three different experiments were performed based on the different regions and time. This models outperforms from the state-of-the-art methods.Different convolutional network models are U-Net [8], AlexNet [17], ResNet [18], VGG16 [19], DenseNet [20], and Inception [21]. This research aims to segment LGG (low-grade gliomas) brain tumours from MRI images using three models. The first model used is MobileNetV2 [11], released by Google and belongs to the family of neural network architectures used on machines with limited computational power, such as mobile devices. The complete architecture of the MobileNetV2 is shown in Figure 1. It provides promising accuracy results while requiring less computational memory as well as computational power. Moreover, it makes them a high-speed network for image processing tasks. MobileNetV2 is a lightweight convolutional neural network used in synchronous functions for two reasons; first, the number of trainable parameters is less than traditional convolution; it reduces computational costs because of the minimum number of parameters. In the second network, the standard encoder-decoder architecture of U-Net [8] was maintained. However, we replaced the encoder part with the MobileNetV2. The upsampled part of U-Net is used as a decoder. The architecture of MU-Net with encoder-decoder parts is shown in Figure 2. The features from the input dataset are extracted with MobileNetV2. These features are passed to the decoder part of MU-Net for the segmentation task. The third model takes inspiration from the ResNet [18] deep learning model, in which a residual learning framework was used to train deeper models. In this model, the residual blocks are added to the network architecture of MobileNetV2, as shown in Figure 3. As a result, the gradient becomes very small due to the deeper residual network, and the training errors decrease due to the additional layers. This modified network is used as an encoder part in the final proposed model named RMU-Net. One can observe the main difference between this model and the standard U-Net architecture, which has a more complex system of skip connections. Residual blocks are now also present in the encoder part of the network in RMU-Net, propagating information from deeper parts of the encoder up to the topmost layers. The features from the encoder part are given to the decoder part directly without using any connection. Moreover, deep supervision is also present here; however, it is placed along with the first skip connection. The benefit of this approach with RMU-Net is that the blocks along the first concatenation produce full-resolution segmentation maps, consisting of upsampled feature data from the deeper layers of the encoder.
The goal of segmentation is to simplify or change the representation of an image into something meaningful and easier to analyze. The proposed network works in two phases; first, the network is trained to target three classes of the tumor (Enhanced Tumor, Whole Tumor, Tumor Core). The input of the network is an MRI image, and its corresponding label masks all three classes. All the image pixels are assigned to one of the three classes. The output of the network is three predicted masks for each category. The evaluation of the results is conducted by comparing the actual mask and the predicted mask of each type using a dice coefficient score. The BraTS 2020 dataset [22][23][24][25] is used in this research to evaluate the performance of the proposed network. There are 369 training, 125 validation, and 169 test multi-modal brain MR studies.T1-weighted (T1), post-contrast T1-weighted (T1ce), T2-weighted (T2), and fluid-attenuated inversion recovery (Flair) sequences are included in each study, as shown in Figure 4. The size of all the MR images is 240 × 240 × 155. In addition, experts annotated the enhancing tumor (ET), peritumoral edema (ED), as well as the necrotic and non-enhancing tumor core (NET) for each study. For online evaluation and final segmentation competition, the annotations for training studies are made public, whereas the annotations for validation and test trials are kept withheld.

BraTS 2019
The BraTS 2019 dataset [22][23][24] consists of 259 HGG and 76 LGG MRI scans. The ground truth of all the images has been created manually using the same annotation protocol. Annotations were approved by experienced neuroradiologists [25], which contains enhancing tumor (ET label 4), the peritumoral edema (ED label 2), and the necrotic or non-enhancing tumor core (NCR/NET label 1). Figure 5 shows the sample images from the BraTS 2019 dataset.

BraTS 2018
The BraTS 2018 challenge training dataset [22][23][24] consists of 210 HGG and 75 LGG scans. The validation dataset includes 66 different MRI scans. All MRI of the BraTS 2018 dataset has a volume dimension of 240 × 240 × 155. The MRI volumes have been segmented manually by one to four raters, and experienced neuroradiologists approved their annotations. Each tumor was segmented into edema, necrosis, and non-enhancing tumor and active/enhancing tumor. The sample images from the BraTS 2018 dataset are shown in Figure 6.

Evaluation Metrics
An essential part of evaluating the neural network's success is comparing segmented images to determine segmentation accuracy. The dice similarity coefficient (DSC) [26] is the most common and popular evaluation measure for comparing the segmented image and its ground truth. It compares two sets, Q1 and Q2, by normalizing their intersection sizes over the average of their sizes. The formula for DSC is given in the following equation: Jaccard similarity coefficient (Jaccard) [26] is also an evaluation measure of the segmentation methods. For example, the following equation is given by Jacquard to calculate the match of two Q1 and Q2 sets by normalizing the size of their intersection over their union: Sensitivity and specificity are statistical decision theory metrics and are determined using the following equations, respectively.
We used the Jaccard score, dice coefficient score, sensitivity, and specificity to evaluate the performance of the proposed network.

Model Training
Following normalization, cropping, and resampling the images, the next step was training the model to extract the multiclass tumor segments automatically. Samples were processed one by one rather than in batches due to the data's dimensionality. The training dataset is divided into an 80-20 train-test split. All three network models are trained with the training period spanning 200 epochs and using a learning rate of 0.0001. The networks are trained by Adam [27], which is an adaptive first-order gradient-based optimization algorithm. The size of the minibatch is 16 image crops. We also use early stopping, which means the training process will be terminated if there is no improvement after ten epochs on the validation data. We decrease the learning rate by multiplying a factor of 0.4 when the validation loss has no improvement for five epochs. Unless otherwise specified, we use cross-entropy as the default loss function. The MobileNetV2 takes three h and 23 min, MU-Net takes two h 57 min, and the proposed model takes two hours and 47 min to complete the training process. The test speed of MobileNetV2, MU-Net, and the proposed network are 3.5, 3.2, 2.8 s per subject, respectively.

Results
In this section, the performance of the proposed models is discussed. Several experiments were conducted to identify the improvements in the final model. A detailed description of experiments is presented in this section, including a summary of investigations conducted in the research. Following the best model configuration being selected, the results on BraTS 2020, BraTS 2019, and BraTS 2018 datasets were obtained.

Pre-Processing
The first step in any data-driven study is to pre-process the raw images. First, the images of all three datasets are resized to 224 × 224 for feeding as input to MobileNetv2. In every dataset, each subject contains four images with annotated masks. All the images are given to the networks by considering each image separately as ET, TC, and TC classes.
First, the MobileNetV2 model discussed in the previous section is trained on the BraTS datasets. The results of the model are presented in Tables 1 and 2. The performance of MobileNetV2 is less in terms of dice coefficient score. However, the computational cost of MobileNetV2 is much smaller, with 4.6 million trainable parameters and 53 MB of model size. To increase the dice score, a hybrid deep learning model MU-Net is used in which MobileNetV2 is used as an encoder part for feature extraction. MobileNetV2 is a lightweight neural network that reduces trainable parameters. The decoder part of the U-Net is used for tumor segmentation. The results of this model are shown in Tables 3 and 4. The results of MU-Net are improved with fewer computational parameters.

Data Augmentation
To generate extra input samples for model training, data augmentation techniques are employed to create synthetic examples of real-world data. As stated by [28], the objective of using data augmentation for datasets with limited data is to produce a more robust dataset for the model during training. This is generally helpful for training models tasked with solving scarce data, such as biomedical image segmentation. The original U-Net [8] proposal also made use of data augmentation techniques in this regard. The different types of augmentation used in this study are discussed below.

Scaling and Rotation
Deep neural networks models can learn important deep features using a scaled version of the training set. This operation G can be performed in different directions, and G x and G y represents the scaling factors for the X and Y directions. Due to the different tumorsizes, scaling can generate viable augmented images for training. Scaling is combined with cropping to maintain the dimensions of the input image.Cropping can limit only to those parts of the image that are necessary.

Flip and Rotation
Random flipping produces a mirror reflection of the original image along axes. Natural images may usually be flipped along the horizontal axis, but not the vertical axis because up and down components of an image are not always "interchangeable". A similar property applies to MRI brain images: a brain contains two hemispheres in the axial plane, and the brain can be considered anatomically symmetrical in most circumstances. The left hemisphere is swapped with the right hemisphere when you flip along the horizontal axis and vice versa. In this case, rotating an image by an angle around the central pixel can be helpful. After that, appropriate interpolation was used to fit the original image size. The rotation operation Z is frequently used in which zero paddings is applied to missing pixels.
An ablation study was conducted to assess whether data augmentation was beneficial to the final model predictions, comparing two separate training runs. The results are presented in Tables 5 and 6. From the scores obtained, one can observe how data augmentation was beneficial across all the evaluation criteria and greatly improved the model's brain tumor segmentation capabilities.

Encoder Features with Residual Blocks
The second model is a more compact version of the proposed MU-Net model. This model follows the standard U-Net model closely with a minor change: residual blocks are added to the architecture of MobileNetV2, the encoder part of U-Net is replaced with the modified MobileNetV2 architecture. The experiments of this model were conducted using augmentation, and one was trained without augmentation. We compared the results of the model, as shown in Tables 7 and 8. The results of the augmentation model were promising as compared to the model without augmentation.

Using Dropout Regularization
Dropout regularization is commonly used in CNNs to reduce the possibility of the model overfitting the training data. However, the latter process causes the model to only learn the salient features from the training data rather than generalize for new, unseen samples. In this experiment, we used the original online repository's dropout value of 0.3, with the results shown in Tables 9 and 10. The results for this dropout value show that there was no substantial improvement in terms of the model prediction. For this reason, we decided not to use dropout regularization going forward. In our case, experiment prioritization is the main reason for only having a singular dropout test using a value of 0.3. Thus, additional testing with other dropout values is encouraged, as it may lead others to obtain more positive results.

Comparison of RMU-Net with other Deep Learning Segmentation Models
On an industrial machine, the proposed model is tested, and the cost of computation is compared with the current system. RMU-Net is a deep, lightweight neural network designed with convolution depth-wise. In U-Net, the depth-wise convolution used is much faster than standard convolution. On the central processor unit (CPU) platform, separation convolution is generally quicker than traditional convolution. On both GPU and CPU platforms, the proposed RMU-Net performs well in segmentation time. However, the number of parameters impacts the number of computer resources used and the time it takes to train. RMU-Net time is also assessed on various hardware platforms, including two GPU platforms (GTX 1080Ti and GTX 745), a CPU platform (Intel i7), and an embedded platform. The suggested RMU-Net performs well in segmentation time on the GPU platform with two h 47 min. However, the CPU takes 47 h, while the embedded system takes 32 h to complete the process. A lighter weight model with a limited number of parameters and model size is the proposed technique. The results obtained from different segmentation models are shown in Table 11, which includes a brief explanation, trainable parameters, and the models' size. In the article [29], Lucas Fidon introduced a 3D U-Net model to segment brain tumors. The author used the same model as before, but he experimented with non-standard loss functions such as the Wasserstein loss function. Ranger, a non-standard optimizer, was adopted for optimization. Ranger is a more generalized version of Adam that works well with small batches and noisy labels. To find the best results, three deep learning models were trained using different optimizers. The BraTS 2020 dataset was used to test the model. The model had dice scores of 88.9%, 84.1%, 81.4% for the whole tumor, tumor core, and enhanced tumor, respectively, and Hausdorff distances of 6.4, 19.4, and 15.8 for the entire tumor, tumor core, and enhanced tumor. Another automatic brain tumor segmentation approach was proposed by Yixin Wang et al. [30] with modality-pairing learning methods. To extract complex information from several modalities, different layer connections were used. An average ensemble of all the models was used to obtain results, along with postprocessing methods. The model performed well on the BraTS 2020 dataset, with dice scores of 89.1%, 84.2%, and 81.6% for the entire tumor, tumor core, and enhanced tumor, respectively. Haozhe jia et al. [31] used H 2 NF-Net for the segmentation of brain tumor from multi-modal MRI images. To separate the distinct parts of the tumor, the author employed a single, cascaded network and concatenated the pre-predictions to reach the final segmentation result. BraTS 2020 training and validation datasets were used to train and evaluate the model. The model attained dice scores of 78.75%, 91.29%, and 85.46% for the enhanced tumor, whole tumor, and tumor core, respectively, and Hausdorff distances of 26.57, 4.18, and 4.97 by integrating the single and cascaded networks.
A modified nnU-Net was proposed by [32] for the segmentation of brain tumors with data augmentation, post-processing, and region-based training. The model showed improved results with several minor modifications and achieved first place in the BraTS 2020 dataset challenge. The dice scores of the model were 88.95%, 85.06%, and 82.03% for the whole tumor, tumor core, and enhanced tumor, respectively. Wenbo Zhang et al. [33] used a multi-encoder framework for brain tumor segmentation. In addition, the author created a new loss function called categorical loss and assigned various weights to different segmented regions. The model was evaluated using the BraTS 2020 dataset. The method achieved promising results with dice scores of 70.24%, 88.26%, and 73.86% for the entire tumor, tumor core, and enhanced tumor. A deep neural network architecture for brain tumor segmentation [34] is proposed to cascade three deep learning models. The output feature map of the previous stage was used in the next step as input. The dataset used for this study was the publicly available BraTS 2020 dataset. The model achieved dice scores of 88.58%, 82.97%, and 79% for the whole tumor, core tumor, and enhanced tumor. Another modified architecture of U-Net was proposed by Parvez Ahmad et al. [35] for automatic brain tumor segmentation. The author extracts multi-contextual features by using dense connections between encoder and decoder. In addition, local and global information was also extracted with residual inception blocks. The author validated the model on the BraTS 2020 dataset. The dice scores for the whole tumor, tumor core, and enhanced tumor were 89.12%, 84.74%, and 79.12%, respectively.
Henry et al. [36] trained multiple U-Net network-like models with stochastic weights and deep vision on a Multi-modal BraTS 2020 training dataset to make the process automated and standardized. Two different models were trained separately, and feature maps from both models were concatenated. The BraTS 2020 test dataset was used for testing the model that achieved dice scores of 81%, 91%, and 95% for the enhanced tumor, whole tumor, and tumor core. Carlo Russo [37] used spherical space transformed input data to extract better features than standard feature extraction methods. The spherical coordinate transformation was used as pre-processing to improve the accuracy for brain tumor segmentation on the BraTS 2020 dataset. The model achieved dice scores for the whole tumor, tumor core, and enhanced tumor of 86.87%, 80.66%, and 78.98%. In article [38], the author trained a two-dimensional network for the three-dimensional segmentation of a brain tumor. EfficientNet was used as the encoder part that achieved promising results compared to previous work with dice scores of 69.59%, 80.86%, and 75.20% for the enhanced tumor, whole tumor, and tumor core.
A multi-step deep neural network [39] was proposed, which takes the hierarchical structure of the brain tumor and segments the substructures. Deep supervision along with data augmentation techniques was used to overcome the gradient vanishing and overfitting. The model has evaluated the BraTS 2019 dataset with dice scores of 88.6%, 81.3%, 77.1% for the whole tumor, tumor core, and enhanced tumor. Wang et al. [40] proposed a 3D U-net based deep earning model using brain-wise normalization and a patching method for brain tumor segmentation. The model was tested on the BraTS 2019 challenge dataset. Dice scores of the enhanced tumor, tumor core and whole tumor are 77.8%, 79.8%, and 85.5%. In [41], a CNN model was trained on high-contrast images to improve the segmentation results of the sub-regions. A Generative Adversarial Network is used for synthesizing high-contrast images. The experiments were conducted on the BraTS 2019 dataset, showing that the high-contrast images have more segmentation accuracy. The dice scores of the synthetic images are 76.65%, 89.65%, and 79.01% for the ET, WT, and TC, respectively.
An automated three-dimensional [42] deep model for the segmentation of gliomas in 3D pre-operative MRI scans was proposed-the model segments the tumor and its subregions. One deep learning model learns the local features of the input data, and another model extracts the global features from the whole image. The output from both the models is ensembled to develop a more accurate learning process. The model is trained on the BraTS 2019 dataset, which gives promising segmentation results. A comparison of 3D semantic segmentation [43], convolutional neural network, and encoder-decoder architecture is used to improve the performance of the segmentation results. The method is evaluated on the BraTS 2019 dataset, which achieved dice scores for the ET, WT, and TC classes of 82.6%, 88.2%, and 83.7%, respectively. The segmentation results of the testing dataset were 0.82, 0.72, and 0.70 for the whole tumor, tumor core, and enhanced tumor.
A two-step approach [44] for brain tumor segmentation was proposed using two different 3D U-net models. First, the tumor is located using 3D U-net, and the second model segments the detected tumor into sub regions. The segmentation results of the ET, WT, and TC classes are 62.1, 84.4, and 72.8, respectively. An automated 2D [45] brain tumor segmentation method is proposed. The network architecture used in this work was a modified U-net architecture for improving the segmentation results. To address the class imbalance problem, weighted cross-entropy and the generalized dice score were used as loss functions. The proposed segmentation system has been tested on the BraTS 2018 dataset, which achieved dice scores of 78.3%, 86.8%, and 80.5%, respectively. Another modified 3D U-net architecture [46] was introduced with the augmentation technique to handle MRI input data. The quality of the tumor segmentation was enhanced with context obtained from models of same network. A cascade of CNN networks [47] for the segmentation of brain tumors using MRI images was introduced that is a trade-off between computational cost and model complexity. Experiments with the BraTS dataset showed that the model achieved dice scores for WT, ET, and TC of 90.5%, 78.6%, and 83.8%, respectively.
A similar model was proposed by AMADEUS et al. [49], which introduced an approach that can be used in low resource devices such as mobile devices. Two different models were introduced in which both models used three convolutional layers to decrease the computational cost. Batch normalization, residual layers, and depthwise separable convolution layers were used to preserve the features and reduce the number of operations. These models were tested on ImageNet, CIFAR 10, CIFAR 100, and some other datasets. The input size of the datasets was 32 × 32 with total trainable parameters of 907,449. Whereas, in the proposed model, we applied the modified architecture of MobileNetV2 with additional residual blocks as an encoder, which was integrated with the U-Net decoder for the segmentation task. The input size is 224 × 224 with a total of 4.6 million trainable parameters.
The model proposed in this study used an encoder-decoder architecture for brain tumor segmentation. The encoder part used modified MobileNetV2 for extracting features from MRI images. These feature maps were given as input to the decoder part of U-Net [8] for the segmentation task. The model is evaluated using the BraTS 2020 dataset. The model contains 4.6M parameters along with a model size of 53MB. Experiments show that the model achieved the dice coefficient scores for WT, TC, and ET as 91.35%, 88.13%, and 83.26% on BraTS 2020, 91.76%, 91.23%, and 83.19% on BraTS 2019, 90.80%, 86.75%, and 79.36% on BraTS 2018 datasets, respectively. Thus, RMU-Net is an improved method for brain tumor segmentation with fewer computational parameters while maintaining high accuracy.

Inference Time of Android Application
All the deep learning models trained in this work are tested in Android application. The specifications of the mobile device used are an Android 11, MIUI 12, CPU Octa-core, GPU Adreno 618 with 8GB RAM. The inference times of all the models are compared using prediction time. The results of the proposed models are shown in Figure 7. The time taken by the models for a single prediction is represented with the blue bar. The result shows that the proposed network model gives fast performance in theAndroid application platform as compared to other models.

Conclusions
In diagnostic procedures, brain tumor segmentation is an essential process. Medical diagnosis is easy with specific segmentation, but the chances of survival of the subject are also greatly improved. In this research, an efficient deep learning RMU-Net model for brain tumor segmentation is proposed, which is inspired by MobileNetV2 and U-net. RMU-Net is evaluated on the BraTS 2020, BraTS 2019, and BraTS 2018 datasets. Compared with other deep learning models, RMU-Net has fewer parameters and achieved the dice coefficient scores for WT, TC, and ET as 91.35%, 88.13%, and 83.26% on BraTS 2020, 91.76%, 91.23%, and 83.19% on BraTS 2019, 90.80%, 86.75%, and 79.36% on BraTS 2018 datasets, respectively. However, RMU-Net's training requires a large amount of brain tumor manually annotated data; therefore, developing a weak supervised and unsupervised brain tumor segmentation method will be the direction of future research.