1. Introduction
Brain tumor is one of the most dangerous cancers in the world. It appears when a certain type of brain cell, known as malignant, begins to grow out of control. In the past 30 years, the number of patients diagnosed with brain cancer has significantly increased, affecting many people throughout the world. This increase leads to a high risk of mortality with 241.037 cases in 2018. In 2012, eight million people died from cancer, taking all types of cancer together, while in 2013, six million people worldwide died of brain tumor.
It is very important to diagnose brain cancer at an earlier stage as it allows for therapy and enhances the rate of survival. Brain tumor treatment options depend on the location, type and size of the tumor and may involve radiotherapy, surgery, chemotherapy or a combination of these options. Medical imaging is used to verify the presence and show certain characteristics of different types of brain tumors. There is a multitude of medical imaging modalities, including magnetic resonance imaging (MRI) and computerized tomography (CT), which are the most common ones used to explore brain cancer.
The efficient classification and segmentation of tumors from surrounding brain tissues is a crucial task. In fact, an essential step is to exclude normal tissues by segmentation and extract more relevant characteristics of lesions for a better diagnosis. However, segmentation is a difficult task due to the wide variations in size, texture, shape and location of brain lesions.
For clinical diagnosis, appropriate classification and segmentation of medical images are necessary. Therefore, several algorithms and methods have been presented for manual, semi- and fully automated tumor segmentation due to the complicated tumor segmentation process in the MRI image. Then, manual segmentation is performed by a radiologist which is considered the gold standard. However, expert segmentation is not very precise and is subject to inter-observer variability. Expert segmentation is time-consuming as it involves visualizing spatial and temporal profiles and thus examining many enhanced datasets and pixel profiles while determining the injury boundary. Then the best solution offered by computer vision is to employ fully automated systems using machine learning techniques.
Therefore, numerous machine learning approaches have been applied effectively to recognize a brain tumor. The most popular and well-known supervised classifiers that have been used to classify gliomas were random forests (RFs) and support vector machines (SVMs). Lefkovits et al. [
1] built a model using an RF classifier, after extracting features and selecting the pertinent ones, to extract features’ first-order operators (mean, standard deviation, maximum, minimum, median, Sobel, gradient), higher-order operators (Laplacian, difference of Gaussians, entropy, curvatures, kurtosis, skewness), texture features (Gabor filter) and spatial context features. All these features are analyzed to select the importance variable using an appropriate selection of attributes. Szabo et al. [
2] proposed a method to segment low-grade gliomas in MRI images; they extracted 104 morphological and Gabor wavelet features, employing an RF as a classifier and neighborhood-based post-processing for output regularization. Zhang et al. [
3] presented a method that was divided into three main steps: pre-processing and feature generation (minima, maxima, average, median, gradient, Gabor wavelet features); then the RF was trained to classify normal pixels from positive ones, while post-processing used morphological phase to regularize the shape of detected lesions. Bahadure et al. [
4] proposed a method that combined the Berkeley wavelet transform to convert the spatial form into temporal domain frequency and the SVM classifier. Ayachi et al. [
5] transformed the segmentation problem into a classification problem; they classified the pixels into normal and abnormal ones based on several features based on intensity and texture, and they employed an SVM such as a classification algorithm. Kwon et al. [
6] proposed a spatial probability map for each tissue type, in which all the different tissues in a patient’s brain are segmented. Menze et al. [
7] also used spatial regularization with a generative probabilistic model where a healthy brain tumor atlas and a latent brain tumor atlas were combined to segment brain tumors into a series of image sequences. Jayachandran et al. [
8] classified MRI images as normal and abnormal in their approach using a fuzzy logic-based hybrid kernel SVM. A classification study of tumors using Gabor wavelet analysis was conducted by Liu et al. [
9]; they were used for the extraction of the features, and Gabor filters and an SVM classifier were adopted to classify the tumor.
Deep learning application stands out as an ideal solution since it can extract more prominent features from the whole image than from manually defined features.
The most frequently adopted segmentation approaches based on deep learning require masked images representing the expected result. Certainly, these labels help to guide the learning process in the segmentation task. On the other hand, their preparation remains a time-consuming task, while the expert’s subjectivity presents another problem. To overcome these problems, we proposed a tumor segmentation approach based on CNN architecture without using masked images. Indeed, after predicting the existence of a tumor based on a CNN architecture that we trained without using labels in the form of images but rather in the form of two numbers (0 or 1), we constructed an image from a combination of the gradients of the last layer of features; then we calculated the gradient of each image filter extracted from the last layer and stocked the mean and the maximum of each one into two different vectors, after we multiplied those vectors with all the filters component by component (component1 × filter1, component2 × filter2 ……, component32 × filter32) to obtain the mask, applying a color map and finally post-processing to generate the segmented image.
This article is organized as follows. Related work is provided in
Section 2. The proposed method is described in
Section 3. The experimental setup is introduced in
Section 4. We summarize our results and then discuss them in
Section 5. Finally, after the conclusion in
Section 6, we present our plans for future research.
2. Related Work
Research in the field of tumor segmentation is still active. Recently, deep learning has proven its performance in medical image analysis and retrieval [
10,
11]. Pixel-based segmentation is a new trend in deep learning methods [
12].
The methods cited in this section were divided into different methods that used CNNs. Lyksborg et al. [
13] proposed a binary CNN to identify the complete tumor as a cellular automate then smooths the segmentation before a multi-class CNN discriminates the sub-regions of the tumor. Pereira et al. [
14] employed an automatic segmentation method when they investigated the use of intensity normalization as a pre-processing step, which, though not common in CNN-based segmentation methods, proved together with data augmentation to be very effective for brain tumor segmentation in MRI images.
In addition, Havaei et al. [
15] presented a novel CNN architecture that exploits both local features as well as more global contextual features simultaneously; they explored a cascade architecture in which the output of a basic CNN is treated as an additional source of information for a subsequent CNN. Moreover, Madhupriya et al. [
16] used a CNN and a probabilistic neural network based on a comparison sketch of various models; they discovered an architecture with both 3 × 3 and 7 × 7 kernels in an overlapped manner and built a cascaded architecture. Zhao et al. [
17] proposed a method by integrating a fully convolutional neural network (FCNN) and conditional random fields (CRFs), rather than adopting CRFs as a post-processing step of the FCNN. However, a cascade of an FCNN was proposed by Wang et al. [
18] to segment multi-modal MRI images with hierarchical regions: whole tumor, tumor core and enhancing tumor; the cascade is designed to decompose the multi-class segmentation problem into a sequence of three binary segmentations, the networks consist of multiple layers of anisotropic and dilated convolution filters, and they are combined with multi-view fusion to reduce false positives, while the proposed method by Zhao et al. [
19] aims at segmenting image slices using deep learning models with integrated FCNNs and CRFs as recurrent neural networks from axial, coronal and sagittal views, respectively, while fusing segmentation results obtained in the three different views. On the other hand, Dong et al. [
20] developed a novel 2D fully convoluted segmentation network that is based on the U-Net [
21] architecture. In order to boost the segmentation accuracy, a comprehensive data augmentation technique was used in this work. In addition, they applied a ‘soft’ Dice-based loss function. Therefore, Sajid et al. [
22] proposed a hybrid convolutional neural network (HCNN) architecture that uses a patch-based approach and takes both local and contextual information into account; when predicting the output label, the proposed network deals with an overfitting problem by utilizing a dropout regularizer alongside batch normalization. Meanwhile, Thaha et al. [
23] developed an automatic segmentation method with skull stripping and image enhancement methods used in pre-processing and an HCNN used for segmentation with the loss function optimized by the Bat algorithm. Concerning 3D-CNN methods, Kamnistsas et al. [
24] used an 11-layers-deep multi-scale 3D CNN; the architecture consisted of two parallel convolutional pathways that processed the input at multiple scales to achieve a large receptive field for the final classification while keeping the computational cost low. Mengqiao et al. [
25] proposed an approach based on a 22-layers-deep three-dimensional convolutional neural network; they used several cascaded convolution layers with small kernels to build a deeper CNN architecture.
Methods that use autoencoder architectures include that of Myronenko et al. [
26] who proposed an encoder–decoder-based CNN architecture. They added an additional branch to the encoder endpoint to reconstruct the original image, similar to the autoencoder architecture; the motivation for using the autoencoder branch was to add additional guidance and regularization to the encoder part since the training dataset size is limited. They used the variational autoencoder approach to better cluster the features of the encoder endpoint. However, a novel architecture named residual cyclic unpaired encoder–decoder network (RescueNet) was proposed by Nema et al. [
27] for brain tumor segmentation. They used training based on unpaired generative adversarial networks to train the RescueNet and a scale-invariant post-processing algorithm to enhance the accuracy.
Table 1 shows the performance results of the related works.
4. Experimental Setup
4.1. Dataset
BraTS is a brain tumor image segmentation challenge. It is organized in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). Most of the state-of-the-art brain tumor segmentation methods have been evaluated on this benchmark.
The proposed method was tested and evaluated on the BraTS 2017 dataset; the training set contained 210 images of high-grade glioma (HGG) and 75 images of low-grade glioma (LGG) patient scans.
Multimodel MRI data were available for every patient in the BraTS 2017 dataset, and four MRI scanning sequences were performed for each patient using T1, T1c, T2 and FLAIR. The BraTS 2017 validation and testing set contained images from 46 and 146 patients with brain tumors of unknown grade, respectively.
For each patient, the T1, T2 and FLAIR images were coregistered into the T1c data, which had the finest spatial resolution, and then resampled and interpolated into with an image size of 240 × 240 × 155.
The ground truth of training set was only obtained by manual segmentation results provided by experts.
4.2. Implementation Details
The algorithm was implemented in Keras library in Python. It is high-level library used for implementing neural networks and can run over either Theano or TensorFlow framework. It supports both GPU and CPU processing.
Hyper-parameters and many tools were tuned using grid search, and the parameters on which model performed best on validation data were selected. Firstly, to read the BraTS 2017 MRI images that had a NIfTI format type, we used SimpleITK, an open-source multi-dimensional image analysis in Python for image registration and segmentation.
Next, we chose FLAIR modalities for each image, cropping each image and saving their size as (192,152,3) instead of (240,240,3), the original size; we also chose the slice number 90 of 155 slices.
After image acquisition step and before training our CNN model, we augmented (
Section 3.2) and pre-processed (
Section 3.1) our data for better performance.
For the classification task, we assigned a ‘tumor’ or ‘not tumor’ label to each image based on the ground truth in order to simplify the segmentation task that comes afterward and avoid the case of segmenting images that had no tumor; then we had to focus on this part and try to eliminate all the false positive possible in order to have a good result during the segmentation.
The training dataset was divided randomly into training and testing sets with 70:30 ratios, the convolution layer kernels were initialized randomly with bias values set to zero, and the stride for all max pooling and convolution layers was set to two and, respectively, to produce translation-invariant feature maps. The best parameters for the proposed method are shown in
Figure 4 and
Table 3. The loss function used for our model was binary crossentropy by computing the following average:
where
is the i-th scalar value in the model output,
is the corresponding target value, and output size is the number of scalar values in the model output.
To minimize this loss function, we used Adam optimizer with initial learning rate of
and progressively decreased it according to:
where
is an epoch counter, and
is the total number of epochs; in our case, the maximum number of epochs = 45, and in every epoch, the batch size = 20.
In the segmentation part, precisely in the post-processing step shown in
Figure 6, we configured some parameters in thresholding and opening steps.
Since the choice of threshold was not fixed for each image because of the variation of the pixel’s intensities, we decided to choose several thresholds and test the performance of each threshold by calculating the similarity coefficient of each segment that tumor receives from this current threshold and the ground truth in order to choose the best threshold.
We tested more than 22 thresholds for each grayscale image ().
For the opening stage, we used a small kernel 3 × 3 to delete some insignificant pixels without effect on the tumor.
4.3. Performance Evaluation
The experimental results were evaluated using different types of performance indicators: precision, recall and accuracy for classification task and Dice similarity coefficient (DSC) for segmentation task:
Precision: It is the percentage of results that are relevant and is defined as:
Recall: The percentage of total relevant results correctly classified by the proposed algorithm which is defined as:
Accuracy: Formally, accuracy has the following definition:
The DSC represents the overlapping of predicted segmentation with the manually segmented output label and is computed as:
where G and S stand for output label and predicted segmentation, respectively.
6. Conclusions
In this article, we proposed an approach based on CNN architecture in order to predict and segment simultaneously a cerebral tumor. In this process, an MRI image was pre-processed and augmented using normalization and data augmentation techniques. The MRI image was classified into a tumor or not tumor brain image by a CNN model with two neurons in the output layer; in this task, we used the ground truth to label the images as tumor or not tumor images. The segmentation was applied on the images that contained the tumor, using the features extracted from the last convolution layer of our CNN architecture and their gradients. Finally, we applied post-processing to improve our results.
The strength of our approach is demonstrated by dispensing with the intervention of a specialist in order to manually locate the tumor pixel by pixel because it is a complex and time-consuming task.
Our method solves these problems by using the features extracted from the CNN architecture and independently of the ground truth detected manually by specialists. The experimental results show good performance and a significant result when compared with the existing methods. The compared parameters were precision, recall and accuracy for the binary classification and the Dice coefficient score in the segmentation task.
Future work will be devoted to improving these results and using deeper architectures to increase the performance of the segmentation output.