1. Introduction
Brain tumor segmentation from MR images (MRIs) is an important step toward clinical assessment, determining treatment strategies, and performing further tumor tissue analysis. Many automatic methods have been successfully used for tumor segmentation. However, most of these methods need tumor data annotations by medical experts, which is a time-consuming process. Apart from this, these methods are also prone to intra- and inter-observer variability [
1,
2]. Recently, deep learning methods have drawn much attention for tumor segmentation when a large training dataset is available. Among these methods, the first used U-Net [
3], and its variants [
4,
5] were most frequently reported due to their good performance on medical image segmentation. Wang et al. [
6] proposed brain-wise normalization and two patching strategies for training a 3D U-Net. Kim et al. [
7] introduced a two-step setup for the segmentation task, wherein an initial segmentation map was obtained from 2D U-Nets which together with the MRIs are further used by 3D U-Net for the final segmentation map. Shi et al. [
8] used an increased number of channels in its proposed U-Net, which is capable of extracting rich and diverse features from multi-modality scans. Other deep learning methods such as CNNs [
9,
10,
11] were also shown to be useful. For example, Sun et al. [
12] proposed a computationally efficient custom-designed CNN with a reduced number of parameters. Das et al. [
13] used 3D CNN in a cascaded format to extract whole tumors first in a series followed by the core tumor and then the enhanced core tumor. Shan et al. [
14] proposed a lightweight 3D CNN with improved depth and used multi-channel convolution kernels of different sizes to aggregate features. Ramin et al. [
15] used a cascade CNN to speed up the learning. However, these deep learning approaches often require all annotated tumors for training the network, and manually annotating tumors for training datasets is a time-consuming process.
There exist many successful studies on non-medical images in computer vision where information has been acquired from unannotated images e.g., bounding boxes [
16,
17,
18], and image-level and point-level labeling [
19,
20,
21], among many others. Rectangular bounding boxes were used for object detection and tracking based on the Riemannian manifold learning of dynamic visual objects [
22,
23]. However, in medical applications, such approaches are still being exploited. Zhang et al. [
24] proposed a semi-supervised method that exploits information from unlabeled data by estimating segmentation uncertainty in predictions, and Luo et al. [
25] used a dual-task deep network to predict a segmentation map and geometry-aware level set labels. Ali et al. proposed the use of rectangular shape [
26] and ellipse shape [
27] bounding box tumor regions for tumor classification. Pavlov et al. [
28] used ResNet50 for segmentation with both tumor ground truth and image-level annotation. Zhu et al. [
29] developed a segmentation method that was guided by image-level class labels on 3D cryo-ET images. Xu et al. [
30] suggested a method called “3D-BoxSup” by using 3D bounding box labels for MRI brain tumor segmentation, with relatively low performance (dice score = 0.62 on MICCAI’17 dataset). This was probably due to the fact that 3D models required more training data and also the fact that the pure use of bounding boxes was not sufficient to obtain an irregular tumor shape estimation. It is worth noting that although the use of bounding box areas for training machine learning/deep learning networks is widely used for object tracking from visual images in computer vision, it is rarely used for medical MR image segmentation. Some reasons could be that MR images are very different from visual images and also the lack of medical experts’ knowledge, which causes the gap between the medical research and computer vision communities.
Motivated by the above issues, we propose the performance of tumor segmentation, whereby we train the deep network by using tumor ellipse box areas instead of MRIs with annotated tumors. The main aims of this study are 1) to investigate whether the paradigm of brain tumor segmentation, based primarily on using large numbers of ellipse box areas for tumors in MR images, plus a small number of annotated tumor patients, is feasible, and 2) to answer the question of what price one needs to pay when replacing the annotated MRIs for training the network in brain tumor segmentation. Because U-Net has demonstrated excellent performance for medical image segmentation, a multi-stream U-Net (an extension of U-Net) is employed in our case studies, wherein combined features from multiple MRI modalities will be explored. The main contributions of this paper are as follows.
We study the feasibility of the use of 2D ellipse box areas for training the deep network for brain tumor (glioma) segmentation plus a small number of annotated tumors.
We use a multi-stream U-Net for our experiments, which is an extended version of the conventional U-Net.
We conduct studies on two scenarios: (a) if the training dataset is large/moderate, learning is conducted by pre-training on a large amount of FG and BG ellipse areas followed by refined-training on a small number of annotated tumor patients (<20); and (b) if the training dataset is small, learning is conducted in a fashion similar to the idea of transfer learning.
We evaluate the performance of the proposed approach and compare the performance with the same network trained entirely by using annotated MRIs.
The remainder of the paper is organized as follows. In
Section 2, the proposed method is described in detail, including the framework for case studies, the FG–BG ellipse area definition, the multi-stream U-Net, training strategies for large/medium and small datasets, and several other issues.
Section 3 gives experimental results, performance evaluation, and comparison, and is followed by
Section 4, with a conclusion.
4. Conclusions
Many medical datasets often lack annotated tumors because tumor annotation is a time-consuming process for medical experts. We conducted a feasibility study on two datasets (with glioma tumor type) by using ellipse box tumor areas for the initial training on majority training data followed by refined training by using annotated tumor MRIs from a small number of patients (<20). Experiments have shown good tumor segmentation results evaluated purely on tumor areas in terms of dice score (0.8407, 0.9104) and average accuracy (83.88%, 88.47%) for the MICCAI and US datasets, respectively, which demonstrated that the proposed approach is feasible by using a large amount of unannotated MRI data. Compared with the same network trained exclusively with annotated data, the proposed approach shows a small decrease in performance (a decrease in dice score = (0.0594, 0.0159) and a decrease in accuracy = (8.78%, 2.61%) for the MICCAI and US test sets). The proposed method provides an alternative approach, which is a tradeoff between a small decrease in performance, and saving time and manual labor for medical doctors. Future work will be conducted on more datasets.