Intelligent Glioma Grading Based on Deep Transfer Learning of MRI Radiomic Features

Featured Application: Authors are encouraged to provide a concise description of the speciﬁc application or a potential application of the work. This section is not mandatory


Introduction
Diffuse gliomas, the most common primary central nervous system (CNS) neoplasm, are formed of tumor cells that display differentiation of glial cells.In the World Health Organization (WHO) classification of tumors of the CNS [1,2], diffuse gliomas are graded according to their malignant degree into WHO grades 2 to 4. Patients with diffuse gliomas of lower grades (grades 2 and 3) have more favorable prognoses [2,3].On the contrary, glioblastoma multiforme (GBM) is the most aggressive tumor type (WHO grade 4) with dismal prognoses despite advances in various aspects of its clinical management [4].Since therapeutic strategies for the various grades are not identical [5], distinguishing the different grades of diffuse gliomas is a critical issue in clinical settings.Determining the tumor grade relies on different pathological features including mitotic activity, cytological atypia, neoangiogenesis, and tumor necrosis.However, since the definitions are semiquantitative and subjective [6,7], histopathological analyses sometimes result in ambiguity in glioma grading.Moreover, previous reports revealed that the heterogeneous expressions of cellular features may result in misgrading in up to one-third of cases with unguided surgical tissue sampling [7][8][9][10][11].
With the development of medical imaging technologies, magnetic resonance (MR) imaging (MRI) is now the most commonly used modality to evaluate the malignancy of brain tumors [12,13].Growing evidence has revealed the feasibility of using MRI features to probe underlying pathological subtypes, suggesting their potential application in differentiating tumor molecular profiles based on imaging traits [14].In addition to conventional sequences, several physiological MR sequences, including diffusion-weighted imaging (DWI), perfusion-weighted imaging (PWI), and MR spectroscopy (MRS), have also been applied to find physiologically meaningful signals within the tumor, thus helping to evaluate heterogeneous patterns of cell compositions within tumor tissues and noninvasively differentiate gliomas of different degrees of malignancy [15][16][17][18][19].A previous study proposed that MRI scans are highly specific for diagnosing brain stem gliomas and can even replace risky biopsies before radiotherapy in most cases [20].To improve the clinical care of CNS gliomas, getting the most out of MRI information is very important.
Various computer-aided diagnosis (CAD) systems have been used to extract relevant diagnostic image features from X-ray radiography, ultrasonography, and MRI to evaluate tumor types, grades, and subsequent treatments [21][22][23][24].Certain image features, including the intensity, morphology, and textural features, are handcrafted according to clinical experience.When designing handcrafted features, such as shape features, which are described by the experimented radiologists from clinical experience as being malignant tumors that are aggressive and have irregular shapes, implementation not only depends on the interpretive skills of physicians and computer scientists, but also is limited by the experts' available knowledge.
Regarding image interpretation, deep learning, as a recent artificial intelligence technique, proposes using automatic convolutions to extract enormous numbers of edge features and object features to recognize the underlying characteristic patterns in images [25][26][27].Human interventions with domain-specific knowledge are thus minimized by convolution layers using hierarchical feature representations.Thus, deep convolutional neural networks (DCNNs) have been successfully applied to object recognition in natural images after being trained on a large amount of training data [28].However, the use of DCNNs in clinical decision making may be restricted due to limited private medical data.Nevertheless, recent studies have used DCNN or machine learning techniques to classify gliomas.Yang et al. used DCNN to classify 113 gliomas and achieved good accuracy [29].Lotan used machine learning techniques to classify tumors based on the features extracted from image segmentation [30].
This study addressed the issue of limited data using transfer learning to transfer pretrained weights obtained from millions of natural images, i.e., ImageNet, to acquire substantial image features [25,27] and accelerate the training process [31].In addition, data augmentation was applied to increase the quantity and diversity of training data [27].Conventional augmentations, including translation, flipping, cropping, scaling, and rotation, are common sampling methods in general use.For the specified glioma MRI images, AutoAugment, which can automatically look for an optimal augmentation policy from one trained dataset to other different datasets [32], was also implemented to generate a customized augmentation dataset.Using the proposed transfer learning and data augmentation, the success of the developed CAD system can be applied to various medical image diagnostic issues.

MRI Database
The National Cancer Institute funded The Cancer Imaging Archive (TCIA), an open-access database containing brain MR images that complies with all applicable laws, regulations, and policies to protect human subjects, including all necessary approvals, human subject assurances, informed consent documents, and institutional review board approvals [33].The acquired MRI database from TCIA was generated in five institutes before any operative procedure: Henry Ford Hospital, Thomas

MRI Database
The National Cancer Institute funded The Cancer Imaging Archive (TCIA), an open-access database containing brain MR images that complies with all applicable laws, regulations, and policies to protect human subjects, including all necessary approvals, human subject assurances, informed consent documents, and institutional review board approvals [33].The acquired MRI database from TCIA was generated in five institutes before any operative procedure: Henry Ford Hospital, Thomas Jefferson University Hospital, Case Western Hospital, Emory University, and Fondazione IRCCS Instituto Neuroligico C. Besta.Representative examples of 30 grade 2, 43 grade 3, and 57 grade 4 gliomas are shown in Figure 1 to illustrate the tumor appearances in MR images.

Image Analysis
A board-certified neuroradiologist (K.H., with 13 years of experience) blinded to the grading information selected the most representative 2D image from the contrast-enhanced axial MRI T1-weighted image (T1WI).The intensity distributions among images were normalized to the gray-level pixel depth, i.e., 8 bits (0-255).After normalization, contrast-enhanced tumor areas were delineated using OsiriX software (Pixmeo, Geneva, Switzerland).Pixels enclosed in the delineated tumor contour were the input region-of-interest for the following tissue characterization.Figure 2 shows tumor areas of the examples in Figure 1.Before feeding the dataset into the DCNN, image resolutions were normalized to 227 × 227 pixels as a regular procedure.

Image Analysis
A board-certified neuroradiologist (K.H., with 13 years of experience) blinded to the grading information selected the most representative 2D image from the contrast-enhanced axial MRI T1-weighted image (T1WI).The intensity distributions among images were normalized to the gray-level pixel depth, i.e., 8 bits (0-255).After normalization, contrast-enhanced tumor areas were delineated using OsiriX software (Pixmeo, Geneva, Switzerland).Pixels enclosed in the delineated tumor contour were the input region-of-interest for the following tissue characterization.

Transfer Learning
DCNNs based on hierarchical convolution layers have overcome various classic computer visual challenges with substantial improvements in areas such as image classification [28], image segmentation [34], and object recognition [35].AlexNet was the first attempt at using a deep neural network to achieve dramatically improved accuracy compared to previous methodologies in the ImageNet large-scale visual recognition challenge (ILSVRC) [28].Conventional image classification methods use handcrafted image features to quantify intuitive and easily observed characteristic patterns for differentiation.As the amount of data increases, more diversities occur that exceed the imaginable range of humans when they manipulate image features.On the other hand, AlexNet, as a DCNN, utilizes data diversity and extracts arbitrary image features from edges to objects to architecturally establish a model of a specified classification task.Inspired by AlexNet, an enormous dataset is required to train a specific model.Nevertheless, collecting specific image data of sufficient quantity and quality is challenging.Using millions of images for training is also time-consuming.Alternatively, high performance is retained by transfer learning, which transfers knowledge learned about object compositions from an enormous dataset such as ImageNet to a specific task with a smaller amount of data [25,27,31].
In transfer learning [36], the internal layers of the original network are regarded as feature extractors, while the final layers used to learn specific features of the source task are replaced by adaptation layers trained on the target task (Figure 3).The target task in this study was the grading of three levels of gliomas on brain MR images, so that the final fully connected layer of the pre-trained AlexNet DCNN model for 1000 objects was replaced by three groups, i.e., grades 2, 3, and 4, and a subsequent classification layer.

Transfer Learning
DCNNs based on hierarchical convolution layers have overcome various classic computer visual challenges with substantial improvements in areas such as image classification [28], image segmentation [34], and object recognition [35].AlexNet was the first attempt at using a deep neural network to achieve dramatically improved accuracy compared to previous methodologies in the ImageNet large-scale visual recognition challenge (ILSVRC) [28].Conventional image classification methods use handcrafted image features to quantify intuitive and easily observed characteristic patterns for differentiation.As the amount of data increases, more diversities occur that exceed the imaginable range of humans when they manipulate image features.On the other hand, AlexNet, as a DCNN, utilizes data diversity and extracts arbitrary image features from edges to objects to architecturally establish a model of a specified classification task.Inspired by AlexNet, an enormous dataset is required to train a specific model.Nevertheless, collecting specific image data of sufficient quantity and quality is challenging.Using millions of images for training is also time-consuming.Alternatively, high performance is retained by transfer learning, which transfers knowledge learned about object compositions from an enormous dataset such as ImageNet to a specific task with a smaller amount of data [25,27,31].
In transfer learning [36], the internal layers of the original network are regarded as feature extractors, while the final layers used to learn specific features of the source task are replaced by adaptation layers trained on the target task (Figure 3).The target task in this study was the grading of three levels of gliomas on brain MR images, so that the final fully connected layer of the pre-trained AlexNet DCNN model for 1000 objects was replaced by three groups, i.e., grades 2, 3, and 4, and a subsequent classification layer.

Data Augmentation
Transfer learning uses extracted features from big data for pretraining.These features are oriented from the source task, such as object recognition in ImageNet, and might not exhaustively describe the target task.To squeeze diversities and characteristics from the target images, AutoAugment [32] was also proposed to enhance the generalization ability and reduce overfitting.AutoAugment has already been used to explore the CIFAR-10, SVHN, and ImageNet datasets to automatically define the best augmentation policies for these imaging data, that is, the most appropriate combinations of image operations to generate more data.In the experiment, the policy for SVHN was transferred to the target task, i.e., examining brain MRI images, while the augmentation policy of the ImageNet dataset focused on color exchange that differed from the collected grayscale images, and CIFAR-10 focused on translation, which caused the target to exceed the boundary and left only half of the target.The SVHN policy consisted of 25 subpolicies, each of which consisted of two operations applied to an image, and each operation was related to two hyper-parameters: the probability of applying the operation and the magnitude of the operation.For example, an operation named ShearX(Y) has a range of magnitudes, which is [−0.3, 0.3] and is discretized into 10 values, so (ShearX, 0.9, 7) has a probability of 0.9 of being applied, and when applied, it has a magnitude of 7 out of 10.Different from the original policy, probabilities of all subpolicies were ignored in the experiment to generate fixed image production; therefore, the two operations in each subpolicy were separated into three groups: operation 1, operation 2, and operation 1 and 2. At this time, some single operations repeated and were removed.These 56 subpolicies expanded the dataset to 57-fold larger.Figure 4

Data Augmentation
Transfer learning uses extracted features from big data for pretraining.These features are oriented from the source task, such as object recognition in ImageNet, and might not exhaustively describe the target task.To squeeze diversities and characteristics from the target images, AutoAugment [32] was also proposed to enhance the generalization ability and reduce overfitting.AutoAugment has already been used to explore the CIFAR-10, SVHN, and ImageNet datasets to automatically define the best augmentation policies for these imaging data, that is, the most appropriate combinations of image operations to generate more data.In the experiment, the policy for SVHN was transferred to the target task, i.e., examining brain MRI images, while the augmentation policy of the ImageNet dataset focused on color exchange that differed from the collected grayscale images, and CIFAR-10 focused on translation, which caused the target to exceed the boundary and left only half of the target.The SVHN policy consisted of 25 subpolicies, each of which consisted of two operations applied to an image, and each operation was related to two hyper-parameters: the probability of applying the operation and the magnitude of the operation.For example, an operation named ShearX(Y) has a range of magnitudes, which is [−0.3, 0.3] and is discretized into 10 values, so (ShearX, 0.9, 7) has a probability of 0.9 of being applied, and when applied, it has a magnitude of 7 out of 10.Different from the original policy, probabilities of all subpolicies were ignored in the experiment to generate fixed image production; therefore, the two operations in each subpolicy were separated into three groups: operation 1, operation 2, and operation 1 and 2. At this time, some single operations repeated and were removed.These 56 subpolicies expanded the dataset to 57-fold larger.Figure 4

Ten-Fold Cross-Validation
Ten-fold cross-validation was used for model validation and assessment.The image dataset was randomly partitioned into 10 equal subsamples.In every cross-validation process, one of the subsamples was preserved as a test set and the others were used as a training set.The cross-validation process was repeated 10 times and every subsample was used once as a test set.The mean and standard deviation (SD) of the 10 test results were calculated as an estimate of the model accuracy.Ten-fold cross-validation is widely used to evaluate the generalization ability of limited datasets [37].

Results
In the training process, a low learning rate can lead to time-consuming training and a low convergence speed with an excessively high learning rate might cause a suboptimal result or diversity.An initial learning rate of 0.001 and a maximum number of epochs of 20 were adopted to achieve a nearly 100% training accuracy, as shown in Figure 5, including accuracies of the training set (blue), validation set (black), and loss rate (orange).As the figure shows, this network converged at the fifth epoch and was stopped at the eighth epoch by the criterion of no smaller loss.According to biopsy-proven results, the performance of the prediction model as presented by the accuracy, sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve (Az) illustrate the tradeoffs between sensitivity and specificity.They were formulated using SPSS software (version 16 for Windows; SPSS, Chicago, IL, USA).By evaluating the 10-fold cross-validation, the transferred DCNN achieved a mean accuracy of 97.9% with an SD of ±1% and a mean Az of 0.9991 ± 0, as illustrated in Figure 6.In detail, the classifier differentiating grade 2 gliomas from the others achieved an accuracy of 98.7%, a sensitivity of 96.9%, and a specificity of 99.2%.The classifier differentiating grade 3 gliomas from the others achieved an accuracy of 98.3%, a sensitivity of 96.8%, and a specificity of 99.0%.The classifier differentiating grade 4 gliomas from the others achieved an accuracy of 98.7%, a sensitivity of 99.1%, and a specificity of 98.3%.Compared to the DCNN without pretrained features, retraining gliomas only achieved a mean accuracy of 61.42% with an SD of ±7% and a mean Az of 0.8222 ± 0.07.In comparison, the results of the DCNN without augmentation were the worst and achieved a mean accuracy of 59.85% with an SD of ±16% and a mean Az of 0.7896 ± 0.18.

Ten-Fold Cross-Validation
Ten-fold cross-validation was used for model validation and assessment.The image dataset was randomly partitioned into 10 equal subsamples.In every cross-validation process, one of the subsamples was preserved as a test set and the others were used as a training set.The cross-validation process was repeated 10 times and every subsample was used once as a test set.The mean and standard deviation (SD) of the 10 test results were calculated as an estimate of the model accuracy.Ten-fold cross-validation is widely used to evaluate the generalization ability of limited datasets [37].

Results
In the training process, a low learning rate can lead to time-consuming training and a low convergence speed with an excessively high learning rate might cause a suboptimal result or diversity.An initial learning rate of 0.001 and a maximum number of epochs of 20 were adopted to achieve a nearly 100% training accuracy, as shown in Figure 5, including accuracies of the training set (blue), validation set (black), and loss rate (orange).As the figure shows, this network converged at the fifth epoch and was stopped at the eighth epoch by the criterion of no smaller loss.According to biopsy-proven results, the performance of the prediction model as presented by the accuracy, sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve (Az) illustrate the tradeoffs between sensitivity and specificity.They were formulated using SPSS software (version 16 for Windows; SPSS, Chicago, IL, USA).By evaluating the 10-fold cross-validation, the transferred DCNN achieved a mean accuracy of 97.9% with an SD of ±1% and a mean Az of 0.9991 ± 0, as illustrated in Figure 6.In detail, the classifier differentiating grade 2 gliomas from the others achieved an accuracy of 98.7%, a sensitivity of 96.9%, and a specificity of 99.2%.The classifier differentiating grade 3 gliomas from the others achieved an accuracy of 98.3%, a sensitivity of 96.8%, and a specificity of 99.0%.The classifier differentiating grade 4 gliomas from the others achieved an accuracy of 98.7%, a sensitivity of 99.1%, and a specificity of 98.3%.Compared to the DCNN without pretrained features, retraining gliomas only achieved a mean accuracy of 61.42% with an SD of ±7% and a mean Az of 0.8222 ± 0.07.In comparison, the results of the DCNN without augmentation were the worst and achieved a mean accuracy of 59.85% with an SD of ±16% and a mean Az of 0.7896 ± 0.18.

Discussion
Machine learning uses statistical analyses to combine various features for automatic classification.As a state-of-the art technique, deep learning inherited the methodology and exhausts computational power to automatically extract any possible features from a large dataset.Deep learning architectures such as the DCNN have been applied to a variety of classification tasks, including object detection and classification in natural images and medical images.DCNN architectures use multiple nonlinear transformations to stimulate advanced visual abstractions of image data.Inspired by the mechanism of biological nervous systems, multiple convolutional layers form a construction from pixels and transfer regions to objects to thoroughly analyze an image's composition.Based on the tissue appearance on a brain MR image, the proposed CAD system uses a transferred DCNN to establish a malignancy evaluation model to provide more objective and accurate diagnostic suggestions for grading gliomas.In this study, using the DCNN to classify grade 2, 3, and 4 gliomas achieved an almost perfect performance with a mean accuracy of 97.9% with an SD of ±1% and a mean Az of 0.9991 ± 0. A previous study [38] used well-known handcrafted MRI features, including the intensity and textural features, and achieved an accuracy of only 88%, a sensitivity of 82%, a specificity of 90%, and an Az of 0.89.Regarding the classifier type, the performance of conventional artificial neural networks in a previous study [38] had a lower diagnostic performance in comparison: an accuracy of 84%, a sensitivity of 79%, and a

Discussion
Machine learning uses statistical analyses to combine various features for automatic classification.As a state-of-the art technique, deep learning inherited the methodology and exhausts computational power to automatically extract any possible features from a large dataset.Deep learning architectures such as the DCNN have been applied to a variety of classification tasks, including object detection and classification in natural images and medical images.DCNN architectures use multiple nonlinear transformations to stimulate advanced visual abstractions of image data.Inspired by the mechanism of biological nervous systems, multiple convolutional layers form a construction from pixels and transfer regions to objects to thoroughly analyze an image's composition.Based on the tissue appearance on a brain MR image, the proposed CAD system uses a transferred DCNN to establish a malignancy evaluation model to provide more objective and accurate diagnostic suggestions for grading gliomas.In this study, using the DCNN to classify grade 2, 3, and 4 gliomas achieved an almost perfect performance with a mean accuracy of 97.9% with an SD of ±1% and a mean Az of 0.9991 ± 0. A previous study [38] used well-known handcrafted MRI features, including the intensity and textural features, and achieved an accuracy of only 88%, a sensitivity of 82%, a specificity of 90%, and an Az of 0.89.Regarding the classifier type, the performance of conventional artificial neural networks in a previous study [38] had a lower diagnostic performance in comparison: an accuracy of 84%, a sensitivity of 79%, and a

Discussion
Machine learning uses statistical analyses to combine various features for automatic classification.As a state-of-the art technique, deep learning inherited the methodology and exhausts computational power to automatically extract any possible features from a large dataset.Deep learning architectures such as the DCNN have been applied to a variety of classification tasks, including object detection and classification in natural images and medical images.DCNN architectures use multiple nonlinear transformations to stimulate advanced visual abstractions of image data.Inspired by the mechanism of biological nervous systems, multiple convolutional layers form a construction from pixels and transfer regions to objects to thoroughly analyze an image's composition.Based on the tissue appearance on a brain MR image, the proposed CAD system uses a transferred DCNN to establish a malignancy evaluation model to provide more objective and accurate diagnostic suggestions for grading gliomas.In this study, using the DCNN to classify grade 2, 3, and 4 gliomas achieved an almost perfect performance with a mean accuracy of 97.9% with an SD of ±1% and a mean Az of 0.9991 ± 0. A previous study [38] used well-known handcrafted MRI features, including the intensity and textural features, and achieved an accuracy of only 88%, a sensitivity of 82%, a specificity of 90%, and an Az of 0.89.Regarding the classifier type, the performance of conventional artificial neural networks in a previous study [38] had a lower diagnostic performance in comparison: an accuracy of 84%, a sensitivity of 79%, and a specificity of 86%.In a study by Yang [29], they used 113 gliomas and convolutional neural networks with and without transfer learning.The results demonstrated that using transfer learning can improve the performance, which is similar to our results.Lotan used automatic segmentation methods to obtain the tumor contours and areas, just like in our delineation.The following classification also depends on machine learning techniques [30].Another study used diagnostic information from multiple modalities to achieve better performance [39].
Kermany et al. [31] imposed a transfer learning model to classify optical coherence tomographic images and achieved high performances of accuracy, sensitivity, and specificity exceeding 93% and an Az of 0.988.It is recommended that training using medical images with pretrained models may produce more accurate models in a very short time compared to training a model from scratch.Even with a limited number of datasets, the transferred DCNN can generate comparable or even better performances than human experts.Similar results were shown in this study, as the transferred DCNN performed much better than the DCNN without pretrained features and the DCNN without augmentation.The difference was likely caused by the number of training samples.The original dataset without augmentation contained only 30, 43, and 57 images in the three classes, respectively which is very low even when using transfer learning-compared to Kermany et al., who used 1000 images in each class to train a limited model.Through AutoAugment, described in this study, the dataset was expanded 57-fold and therefore contained over 1000 images in each class.The total number of augmented datasets was sufficient for transfer learning, but still not for original deep learning, so the result of the DCNN without augmentation was worse than that of the transferred DCNN.
Based on the success of transfer learning, collecting qualified image data, such as precise labeling and description of the tumor location, boundaries, and features, may become more important than thinking about sophisticated image features.However, collecting substantial qualified medical data is very challenging.The limited data used in this experiment may not sufficiently represent the overall diversity of tumor appearances.Every year, only about 600 patients have diffuse malignant gliomas, of which only about 240 cases are GBM in Taiwan.As a result, collecting large amounts of data with patient-informed consent is a challenge.Therefore, data augmentation is bound to become a necessary tool for using deep learning on uncommon diseases.To date, the experimental results have demonstrated that using transfer learning and data augmentation for scarce medical images is suitable.If the case number is limited, the generalization is questioned.In this situation, using data augmentation with transfer learning can fit a small dataset with substantial challenge in classification.If the dataset has enough diversity, it can be used.In the future, after more data on gliomas are collected, this system is likely to be practical in clinical use.
Another limitation is that, although contrast-enhanced T1WIs provided critical information for differentiating different grades of gliomas in the DCNN, the correlation between the prediction model and actual biological tissues was underexplored.Key clinical determinants in grading gliomas are necrosis and angiogenesis.Whether they are related to the established DCNN model is the next topic to be explored.Meanwhile, further investigation of other sequences including fluid-attenuated inversion recovery, T2-weighted images, DWI, and MRS is necessary.

Conclusions
Using hierarchical feature representations learned by transferred convolutional neural networks, the proposed CAD system using a transferred DCNN with data augmentation achieved a mean accuracy of 97.9% with an SD of ±1% and a mean Az of 0.9991 ± 0. The accuracy in distinguishing different grades of gliomas will be promising for radiologists in the clinic.
Figure 2 shows tumor areas of the examples in Figure 1.Before feeding the dataset into the DCNN, image resolutions were normalized to 227 × 227 pixels as a regular procedure.

Figure 3 .
Figure 3. Parameter transfer in the proposed deep convolutional neural network.
presents augmented images for examples of grade 4 gliomas.

Figure 3 .
Figure 3. Parameter transfer in the proposed deep convolutional neural network.
presents augmented images for examples of grade 4 gliomas.

Figure 5 .
Figure 5. Training and validation learning processes.

Figure 6 .
Figure 6.Test accuracy illustrated by tradeoffs between the sensitivity and specificity.

Figure 5 .
Figure 5. Training and validation learning processes.

Figure 5 .
Figure 5. Training and validation learning processes.

Figure 6 .
Figure 6.Test accuracy illustrated by tradeoffs between the sensitivity and specificity.

Figure 6 .
Figure 6.Test accuracy illustrated by tradeoffs between the sensitivity and specificity.
Jefferson University Hospital, Case Western Hospital, Emory University, and Fondazione IRCCS Instituto Neuroligico C. Besta.Representative examples of 30 grade 2, 43 grade 3, and 57 grade 4 gliomas are shown in Figure 1 to illustrate the tumor appearances in MR images.