Development of a Fully Automated Glioma-Grading Pipeline Using Post-Contrast T1-Weighted Images Combined with Cloud-Based 3D Convolutional Neural Network

Featured The proposed grading pipeline combined a cloud-based trained 3D CNN and our original 3D CNN is useful for early treatment of patients and prediction of their prognosis. Abstract: Glioma is the most common type of brain tumor, and its grade inﬂuences its treatment policy and prognosis. Therefore, artiﬁcial-intelligence-based tumor grading methods have been studied. However, in most studies, two-dimensional (2D) analysis and manual tumor-region extraction were performed. Additionally, deep learning research that uses medical images experiences difﬁculties in collecting image data and preparing hardware, thus hindering its widespread use. Therefore, we developed a 3D convolutional neural network (3D CNN) pipeline for realizing a fully automated glioma-grading system by using the pretrained Clara segmentation model provided by NVIDIA and our original classiﬁcation model. In this method, the brain tumor region was extracted using the Clara segmentation model, and the volume of interest (VOI) created using this extracted region was assigned to a grading 3D CNN and classiﬁed as either grade II, III, or IV. Through evaluation using 46 regions, the grading accuracy of all tumors was 91.3%, which was comparable to that of the method using multi-sequence. The proposed pipeline scheme may enable the creation of a fully automated glioma-grading pipeline in a single sequence by combining the pretrained 3D CNN and our original 3D CNN.


Introduction
Glioma is a type of primary brain tumor and is the most common type of brain tumors. The grade of glioma is given as an index of its malignancy, and it significantly influences its treatment policy and prognosis [1]. In clinical practice, the treatment policy for grade IV glioma is different from that of the other grades because it progresses rapidly and has a poor prognosis. Therefore, accurate diagnosis of grade leads to accurate early treatment. Nevertheless, grading as a definite diagnosis cannot be performed without the pathological examination of the removed tumor tissue. Therefore, before the surgical operation, neurologists estimate the tumor grade using magnetic resonance imaging (MRI) findings, such as the presence or absence of the ring enhancement effect. However, these characteristics vary among patients and it makes diagnosis difficult [2][3][4][5]. Therefore, many researchers are trying to solve this problem of low accuracy using a convolutional neural network (CNN), which is one of the excellent image-analysis technologies. Yang et al. [6] classified the grade of glioma using fine-tuned GoogLeNet. Furthermore, Abd-Ellah et al. [7] proposed a glioma detection and grading system using a parallel deep CNN. In addition to gliomas, a computer-aided diagnosis (CAD) system for brain tumors using CNN is being developed.
Abd El Kader et al. [8] proposed a differential deep CNN model to classify abnormal or normal MR brain images, and Díaz-Pernas et al. [9] proposed a multiscale CNN model for region extraction and classification of three types of brain tumors, including glioma, in postcontrast T1-weighted images. However, in these studies, 3D MR images were analyzed on a slice-by-slice basis. It is considered that a more accurate analysis will be possible using 3D images because tumors grow in multiple directions. Additionally, when the tumor regions were manually extracted, the variations in these regions influenced the grading accuracy. Therefore, these problems can be solved by applying automated tumor-region extraction and grading using 3D MR images. In fact, Chen et al. [10] developed an automatic CAD system of gliomas that combines automatic segmentation and radiomics. By training and evaluation using Multimodal Brain Tumor Segmentation Challenge 2015 (BraTS2015) dataset, the grading accuracy was observed to be 91.3%. Furthermore, Zhuge et al. [11] proposed automated glioma-grading system using 3D U-Net and 3D CNN. By training and evaluation using BraTS2018 dataset, the grading accuracy was observed to be 97.1%.
Nevertheless, many of these studies such as Chen et al. [10] proposed method required multi-sequence MR images as input, which leads to a reduction in the number of cases due to the lack of specific sequences and increases the computational cost by enlarging the input data size. In addition, a significant number of datasets and high computational cost are required for training a deep learning network [12]. However, in the case of using medical images, the amount of available data is limited owing to several challenges, including ethical problems and lack of cooperation among hospitals [13,14]. Additionally, not everyone can install a high-performance machine that can withstand a substantial computational load [15,16]. Therefore, fine-tuning a model pretrained by natural images contributes toward reducing the amount of medical image data required as well as the computational cost [17,18]. However, it was difficult to adapt the model to 3D images, such as to brain MR images.
Therefore, NVIDIA is attempting to solve this problem by providing models trained by various medical images in the Clara project [19]. A grading system can be easily developed using the brain tumor segmentation model for single sequence MR images of the project. Therefore, in this study, we developed a fully automated glioma-grading pipeline using post-contrast T1-weighted images using two 3D CNNs-the trained model for tumor region extraction and the original model for grading.
The outline of the proposed method is depicted in Figure 1. First, a post-contrast T1-weighted image was fed to a 3D CNN for brain tumor region extraction. The extracted tumor region was then fed to the other 3D CNN for grading and, thus, the glioma grade was obtained. So far, it has been difficult to develop fully automated grading pipeline that extracts and grades the glioma regions from 3D images due to problems such as calculation cost. Whereas the proposed method achieves segmentation and grading with low computational cost by using pre-trained 3D CNNs for the segmentation task, which is computationally expensive to train. Furthermore, although by using only one type of MR image, the proposed method achieves the same or better performance than the method using multi-sequence, which is computationally expensive. CNN is being developed. Abd El Kader et al. [8] proposed a differential deep CNN model to classify abnormal or normal MR brain images, and Díaz-Pernas et al. [9] proposed a multiscale CNN model for region extraction and classification of three types of brain tumors, including glioma, in post-contrast T1-weighted images. However, in these studies, 3D MR images were analyzed on a slice-by-slice basis. It is considered that a more accurate analysis will be possible using 3D images because tumors grow in multiple directions. Additionally, when the tumor regions were manually extracted, the variations in these regions influenced the grading accuracy. Therefore, these problems can be solved by applying automated tumor-region extraction and grading using 3D MR images. In fact, Chen et al. [10] developed an automatic CAD system of gliomas that combines automatic segmentation and radiomics. By training and evaluation using Multimodal Brain Tumor Segmentation Challenge 2015 (BraTS2015) dataset, the grading accuracy was observed to be 91.3%. Furthermore, Zhuge et al. [11] proposed automated glioma-grading system using 3D U-Net and 3D CNN. By training and evaluation using BraTS2018 dataset, the grading accuracy was observed to be 97.1%. Nevertheless, many of these studies such as Chen et al. [10] proposed method required multi-sequence MR images as input, which leads to a reduction in the number of cases due to the lack of specific sequences and increases the computational cost by enlarging the input data size. In addition, a significant number of datasets and high computational cost are required for training a deep learning network [12]. However, in the case of using medical images, the amount of available data is limited owing to several challenges, including ethical problems and lack of cooperation among hospitals [13,14]. Additionally, not everyone can install a high-performance machine that can withstand a substantial computational load [15,16]. Therefore, fine-tuning a model pretrained by natural images contributes toward reducing the amount of medical image data required as well as the computational cost [17,18]. However, it was difficult to adapt the model to 3D images, such as to brain MR images.
Therefore, NVIDIA is attempting to solve this problem by providing models trained by various medical images in the Clara project [19]. A grading system can be easily developed using the brain tumor segmentation model for single sequence MR images of the project. Therefore, in this study, we developed a fully automated glioma-grading pipeline using post-contrast T1-weighted images using two 3D CNNs-the trained model for tumor region extraction and the original model for grading.
The outline of the proposed method is depicted in Figure 1. First, a post-contrast T1weighted image was fed to a 3D CNN for brain tumor region extraction. The extracted tumor region was then fed to the other 3D CNN for grading and, thus, the glioma grade was obtained. So far, it has been difficult to develop fully automated grading pipeline that extracts and grades the glioma regions from 3D images due to problems such as calculation cost. Whereas the proposed method achieves segmentation and grading with low computational cost by using pre-trained 3D CNNs for the segmentation task, which is computationally expensive to train. Furthermore, although by using only one type of MR image, the proposed method achieves the same or better performance than the method using multi-sequence, which is computationally expensive.

Image Dataset
We used the training dataset of the Multimodal Brain Tumor Segmentation Challenge 2018 (BraTS2018) [20][21][22]. This dataset included MR images for 285 cases of pathologically proven WHO grade II/III or grade IV glioma patients (grades II and III: 75 cases, grade IV: 210 cases). This dataset did not contain detailed information regarding the cancer type of glioma, and only information about the grade was provided. Each case included four sequences of a T1-weighted image, post-contrast T1-weighted images, T2-weighted image, a FLAIR image, and a ground truth. The ground truth comprised three labels (manually revised by neuroradiologists) of the contrast-enhanced region, necrotic and non-enhanced region, and edema region. The region comprising the contrast-enhanced region and necrotic and non-enhanced region was used as the ground truth of the brain tumor region to target only the substantial tumor region. Additionally, the image size was adjusted to 240 × 240 × 155 voxels with 1 mm 3 per voxel, and parts other than the brain parenchyma, such as the skull, were removed from the images. The details of the dataset are provided in [20][21][22].

Region Extraction
For extraction of the brain tumor region, we used a 3D CNN, which was provided by the Clara project developed by NVIDIA [19]. In the Clara project, NVIDIA is attempting to create and provide trained models of various tasks for the medical popularization of artificial intelligence (AI); in addition, high-performance machines manufactured by NVIDIA such as NVIDIA DGX-1 server are being used. In this study, we used a model that was trained on the previously mentioned BraTS2018 dataset.
The voxel value x of the training data was normalized (Z-score) by using Equation (1), where µ and σ denote the mean and standard deviation of the pixel value, respectively [23].
Furthermore, the data were augmented to prevent overfitting. For data augmentation, the voxel value was randomly shifted by −0.1 to 0.1 times the standard deviation of the voxel value in the images, and the scale of the images was randomly changed between 90% and 110% of the original images. Additionally, the images were randomly flipped around the X-, Y-, and Z-axes. During the training period, 224 × 224 × 128 voxels, which sufficiently contained brain parenchyma and tumor regions, were cropped from the images to reduce the data size and were fed to the 3D CNN for extracting the brain tumor region. The architecture of the 3D CNN used for region extraction is shown in Figure 2. The network comprises ResNet-like blocks. Each ResNet-like block comprises two sets of group-normalization layers, the rectified linear unit (ReLU) function and convolutional layer. Because of the limited memory, the batch size was set to 1, and group normalization was adopted. In the group-normalization layer, the data obtained from the previous layer were normalized and fed to the convolutional layer through the ReLU function. In the convolutional layer, convolution was performed using filters with a kernel size of 3 × 3 × 3. The number of epochs was 460, and Adam was adopted as the optimization algorithm [24]. The initial learning rate α was 1 × 10 −4 , which is expressed by using Equation (2).
where e denotes the epoch counter and N e is the total number of epochs. This network was approximately divided into three parts: an encoder, a decoder, and a variational autoencoder (VAE). The ResNet-like blocks were connected by multiple layers. In the encoder part, when two blocks are passed, the kernel size remains the same, and the stride is set to 2 to double the image features and halve the image size. At the end, the encoder part branches to the decoder and VAE parts, and in the decoder part, the extracted image features are halved, and the image size is doubled for upsampling. Finally, the tumor region is extracted with the same image size as that of the input image. The VAE part is a network that reconstructs an image using the extracted image features and Gaussian distribution. It regularizes the encoder part during the training period. Skip connections are provided within the encoder or between the encoder and decoder to facilitate back propagation during the training period. The NVIDIA Tesla V100 32GB GPU was used for training this model, which was implemented using the TensorFlow AI platforms.

Tumor Grading
The tumors were graded using the original 3D CNN model designed by us. In this model, grading was performed on the volume of interest (VOI) that was cropped according to the size of the tumor region. The VOIs were cubical, and the size to be extracted was taken as twice the maximum side of the circumscribed rectangle of the tumor; this was done to consider the information regarding the surrounding brain parenchyma. Data augmentation was performed to prevent overfitting when training the 3D CNN for grading. All the VOIs were laterally flipped, and lateral and cephalocaudal rotations were performed between −10° and 10°. Because a gap existed in the number of cases between grade II/III and grade IV tumors, the rotation interval was adjusted to 5° for grades II and III and 10° for grade IV to balance the training data. Consequently, the number of training images for grades II and III were 3200 VOIs and 3402 for grade IV. Additionally, the images were distorted by randomly expanding the coordinates of the eight apexes of each VOI along the lateral, cephalocaudal, and anteroposterior directions; the maximum expansion and contraction distances were 10% the matrix size on each side of the VOIs. The VOIs were then resized to 64 × 64 × 64 voxels, and after normalization, they were fed to the 3D CNN for grading. The voxel values were normalized by dividing each voxel value of VOI by the maximum voxel value of all VOIs in the dataset. Here, the maximum value was calculated using denoised VOI after applying the median filter to the original images. This network was approximately divided into three parts: an encoder, a decoder, and a variational autoencoder (VAE). The ResNet-like blocks were connected by multiple layers. In the encoder part, when two blocks are passed, the kernel size remains the same, and the stride is set to 2 to double the image features and halve the image size. At the end, the encoder part branches to the decoder and VAE parts, and in the decoder part, the extracted image features are halved, and the image size is doubled for upsampling. Finally, the tumor region is extracted with the same image size as that of the input image. The VAE part is a network that reconstructs an image using the extracted image features and Gaussian distribution. It regularizes the encoder part during the training period. Skip connections are provided within the encoder or between the encoder and decoder to facilitate back propagation during the training period. The NVIDIA Tesla V100 32 GB GPU was used for training this model, which was implemented using the TensorFlow AI platforms.

Tumor Grading
The tumors were graded using the original 3D CNN model designed by us. In this model, grading was performed on the volume of interest (VOI) that was cropped according to the size of the tumor region. The VOIs were cubical, and the size to be extracted was taken as twice the maximum side of the circumscribed rectangle of the tumor; this was done to consider the information regarding the surrounding brain parenchyma. Data augmentation was performed to prevent overfitting when training the 3D CNN for grading. All the VOIs were laterally flipped, and lateral and cephalocaudal rotations were performed between −10 • and 10 • . Because a gap existed in the number of cases between grade II/III and grade IV tumors, the rotation interval was adjusted to 5 • for grades II and III and 10 • for grade IV to balance the training data. Consequently, the number of training images for grades II and III were 3200 VOIs and 3402 for grade IV. Additionally, the images were distorted by randomly expanding the coordinates of the eight apexes of each VOI along the lateral, cephalocaudal, and anteroposterior directions; the maximum expansion and contraction distances were 10% the matrix size on each side of the VOIs. The VOIs were then resized to 64 × 64 × 64 voxels, and after normalization, they were fed to the 3D CNN for grading. The voxel values were normalized by dividing each voxel value of VOI by the maximum voxel value of all VOIs in the dataset. Here, the maximum value was calculated using denoised VOI after applying the median filter to the original images.
The architecture of the 3D CNN model used for grading is shown in Figure 3. This model is the 3D extension of ResNet and comprises residual blocks [25]. Each ResNet-like block comprises three sets of batch-normalization layers, ReLU function, and convolutional layer. Additionally, the skip connections in the residual blocks reduce the vanishing gradient. Convolution was performed in the convolutional layer using filters with a kernel size of 1 × 1 × 1 or 3 × 3 × 3 and a stride of 1, following which image features were extracted. A filter with a kernel size of 1 × 1 × 1 was used to reduce the dimension, and subsequent filters with a kernel size of 3 × 3 × 3 and 1 × 1 × 1 was used to restore the dimension. This bottleneck architecture prevents the computational efficiency degradation due to multi-layered structures [26]. The first residual block has max pooling just before the first convolution and the image size was halved. Furthermore, when three or four residual blocks were passed, the stride was changed to 2, and the image size was halved. Finally, in the fully connected layer, these image features were integrated, and the grading result of the tumor was obtained using the softmax function. The batch size was 16, the number of epochs 30, and SGD (learning rate = 1 × 10 -4 , momentum = 0.9) was adopted as the optimization algorithm [27]. During the training period, 20% of the training dataset was randomly selected and validated. The NVIDIA GeForce RTX 2080 Ti 11 GB GPU was used to train this model, which was implemented using the Keras and TensorFlow AI platforms. The architecture of the 3D CNN model used for grading is shown in Figure 3. This model is the 3D extension of ResNet and comprises residual blocks [25]. Each ResNet-like block comprises three sets of batch-normalization layers, ReLU function, and convolutional layer. Additionally, the skip connections in the residual blocks reduce the vanishing gradient. Convolution was performed in the convolutional layer using filters with a kernel size of 1 × 1 × 1 or 3 × 3 × 3 and a stride of 1, following which image features were extracted. A filter with a kernel size of 1 × 1 × 1 was used to reduce the dimension, and subsequent filters with a kernel size of 3 × 3 × 3 and 1 × 1 × 1 was used to restore the dimension. This bottleneck architecture prevents the computational efficiency degradation due to multilayered structures [26]. The first residual block has max pooling just before the first convolution and the image size was halved. Furthermore, when three or four residual blocks were passed, the stride was changed to 2, and the image size was halved. Finally, in the fully connected layer, these image features were integrated, and the grading result of the tumor was obtained using the softmax function. The batch size was 16, the number of epochs 30, and SGD (learning rate = 1×10 -4 , momentum = 0.9) was adopted as the optimization algorithm [27]. During the training period, 20% of the training dataset was randomly selected and validated. The NVIDIA GeForce RTX 2080 Ti 11 GB GPU was used to train this model, which was implemented using the Keras and TensorFlow AI platforms.

Evaluation Metrics
To determine the usefulness of this method, evaluation was performed using the holdout method for 42 cases (grades II and III: 12 cases; grade IV: 30 cases) on the BraTS2018 dataset, which was not used for training. The tumor region was extracted from the 42 cases previously mentioned, although there were 4 cases with 2 lesions. Therefore, grading was performed on 13 VOIs of grade II/III and 33 VOIs of grade IV.
The Dice similarity coefficient (DSC) was used to evaluate the tumor-regionextraction accuracy [28]. It is used to quantitatively evaluate the similarity between two sets and is expressed by Equation (3), where A and B denote the results of tumor-region extraction and ground truth, respectively.
In addition, accuracy, sensitivity, and specificity were used to evaluate the grading accuracy. They are expressed by Equations (4)-(6), which are defined as:

Evaluation Metrics
To determine the usefulness of this method, evaluation was performed using the holdout method for 42 cases (grades II and III: 12 cases; grade IV: 30 cases) on the BraTS2018 dataset, which was not used for training. The tumor region was extracted from the 42 cases previously mentioned, although there were 4 cases with 2 lesions. Therefore, grading was performed on 13 VOIs of grade II/III and 33 VOIs of grade IV.
The Dice similarity coefficient (DSC) was used to evaluate the tumor-region-extraction accuracy [28]. It is used to quantitatively evaluate the similarity between two sets and is expressed by Equation (3), where A and B denote the results of tumor-region extraction and ground truth, respectively.
In addition, accuracy, sensitivity, and specificity were used to evaluate the grading accuracy. They are expressed by Equations (4)- (6), which are defined as: Accuracy = VOIs classified correctly All VOIs (4) Sensitivity = VOIs classified as grade IV correctly VOIs classified as grade IV (5) Specificity = VOIs classified as grade II and III correctly VOIs classified as grade II and III (6) They were used for performance evaluation. Furthermore, the receiver operating characteristic (ROC) curve was created by changing the threshold to classify from probability of grades II and III or grade IV from 0 to 1, and the area under the curve (AUC) was calculated [29].

Evaluation Results
To evaluate the automated extraction of tumor regions using the evaluation cases, the tumors were detected for all of the cases. The average value of the similarity index using DSC between the tumor-region extraction results and ground-truth image was 0.839. Figures 4 and 5 show examples with high-and low-similarity indices, respectively, in the extraction results obtained using the proposed method. No training time was needed because a pretrained model was used as a 3D CNN for tumor region extraction. However, the training time was 2.87 days at 460 epochs during the development at NVIDIA, and the inference time was approximately 4 s per case. As for evaluating the grading accuracy, the accuracy and AUC was observed to be 91.3% and 0.927. The confusion matrix of the grading and processing results are presented in Tables 1 and 2  They were used for performance evaluation. Furthermore, the receiver operating characteristic (ROC) curve was created by changing the threshold to classify from probability of grades II and III or grade IV from 0 to 1, and the area under the curve (AUC) was calculated [29].

Evaluation Results
To evaluate the automated extraction of tumor regions using the evaluation cases, the tumors were detected for all of the cases. The average value of the similarity index using DSC between the tumor-region extraction results and ground-truth image was 0.839. Figures 4 and 5 show examples with high-and low-similarity indices, respectively, in the extraction results obtained using the proposed method. No training time was needed because a pretrained model was used as a 3D CNN for tumor region extraction. However, the training time was 2.87 days at 460 epochs during the development at NVIDIA, and the inference time was approximately 4 s per case. As for evaluating the grading accuracy, the accuracy and AUC was observed to be 91.3% and 0.927. The confusion matrix of the grading and processing results are presented in Tables 1 and 2

Discussion
The DSC value was 0.839 for the tumor-region extraction, and the extraction was highly accurate even in a single sequence. Additionally, because the matrix sizes of the VOIs were defined as twice of the tumor diameter on the basis of this output result, the VOIs that sufficiently contained the tumor region and surrounding brain parenchyma could be created. Additionally, as shown in Figure 7, some cases with the ring enhancement effect, a typical characteristic of high-grade glioma, were correctly extracted. However, in the case without the ring enhancement effect, as shown in Figure 7, the peripheral edema region was incorrectly extracted as the tumor region, and thus, the similarity index tended to be low. Therefore, the 3D CNN used for tumor-region extraction was mainly trained to recognize the ring enhancement effect as the edge of the tumor region. The pretrained model of the Clara project was a model that trained MR images of glioma patients, and parameter tuning had already been performed. By using this model, it was possible to eliminate the large amount of time required for training and parameter tuning.
Regarding tumor grading, accuracy, sensitivity, and specificity were 91.3%, 100%, and 69.2%, respectively. Table 3 shows a comparison of our approach with other approaches in terms of the methods and results. From Table 3, it is observed that the method of Yang et al. classified grade by processing one slice of a single sequence; however, the accuracy was 96.8%, which was higher than that of our proposed method. In addition to the fact that they used a different database from that of other studies, it made simple comparisons difficult. Furthermore, this was not a fully automated grading system because tumor detection and region extraction were performed manually. Subsequently, we focused on studies adapting fully automated 3D processing, including tumor region extraction and grading-similar to our study. The sensitivity of this method (100%) was higher than that of the method proposed by Zhuge et al. (94.7%), and it is showed that all highgrade tumors were correctly classified. Therefore, it is considered that grade IV tumors can be classified preoperatively with 100% accuracy by this method, leading to early treatment of patients and prediction of prognosis. In addition, the accuracy of our method (91.3%) was lower than that of the method proposed by Zhuge et al. (97.1%), but comparable to that of Chen et al. (91.3%). While these two methods adopted multi-sequence processing, our method uses a single sequence. From these results, it was confirmed that a single sequence can be used for grading with high accuracy comparable to that of multisequences. Therefore, this method can create a highly accurate grade classification pipeline while preventing the occurrence of unusable cases and the increase in calculation cost due to the lack of a specific sequence. However, four VOIs (31%) of grades II and III were misclassified, and the cases with the ring enhancement effect tended to be misclassified as belonging to grade IV (see Figure 7). This may be because the ring enhancement effect is

Discussion
The DSC value was 0.839 for the tumor-region extraction, and the extraction was highly accurate even in a single sequence. Additionally, because the matrix sizes of the VOIs were defined as twice of the tumor diameter on the basis of this output result, the VOIs that sufficiently contained the tumor region and surrounding brain parenchyma could be created. Additionally, as shown in Figure 7, some cases with the ring enhancement effect, a typical characteristic of high-grade glioma, were correctly extracted. However, in the case without the ring enhancement effect, as shown in Figure 7, the peripheral edema region was incorrectly extracted as the tumor region, and thus, the similarity index tended to be low. Therefore, the 3D CNN used for tumor-region extraction was mainly trained to recognize the ring enhancement effect as the edge of the tumor region. The pretrained model of the Clara project was a model that trained MR images of glioma patients, and parameter tuning had already been performed. By using this model, it was possible to eliminate the large amount of time required for training and parameter tuning.
Regarding tumor grading, accuracy, sensitivity, and specificity were 91.3%, 100%, and 69.2%, respectively. Table 3 shows a comparison of our approach with other approaches in terms of the methods and results. From Table 3, it is observed that the method of Yang et al. classified grade by processing one slice of a single sequence; however, the accuracy was 96.8%, which was higher than that of our proposed method. In addition to the fact that they used a different database from that of other studies, it made simple comparisons difficult. Furthermore, this was not a fully automated grading system because tumor detection and region extraction were performed manually. Subsequently, we focused on studies adapting fully automated 3D processing, including tumor region extraction and grading-similar to our study. The sensitivity of this method (100%) was higher than that of the method proposed by Zhuge et al. (94.7%), and it is showed that all high-grade tumors were correctly classified. Therefore, it is considered that grade IV tumors can be classified preoperatively with 100% accuracy by this method, leading to early treatment of patients and prediction of prognosis. In addition, the accuracy of our method (91.3%) was lower than that of the method proposed by Zhuge et al. (97.1%), but comparable to that of Chen et al. (91.3%). While these two methods adopted multi-sequence processing, our method uses a single sequence. From these results, it was confirmed that a single sequence can be used for grading with high accuracy comparable to that of multi-sequences. Therefore, this method can create a highly accurate grade classification pipeline while preventing the occurrence of unusable cases and the increase in calculation cost due to the lack of a specific sequence. However, four VOIs (31%) of grades II and III were misclassified, and the cases with the ring enhancement effect tended to be misclassified as belonging to grade IV (see Figure 7). This may be because the ring enhancement effect is a typical characteristic of high-grade glioma. However, grade II and III tumors without the ring enhancement effect were also misclassified as grade IV, and therefore, some features other than the ring enhancement effect might also have contributed to the incorrect classification results. Furthermore, multi-sequences are used for performing preoperative grading in actual clinical settings; additionally, a definitive postoperative diagnosis is also performed by obtaining multilateral information based on the results of both MRI and pathological examinations. However, our evaluation results confirmed that one could perform grading with high accuracy using this method by only considering the T1-weighted images. In both the tasks of tumor-region extraction and grading, the small number of cases of grade II and III tumors might have decreased the accuracy. Therefore, the limitation of this study is the small number of grade II and III cases. Accordingly, we must evaluate more cases in the future.

Conclusions
We developed a fully automated glioma-grading pipeline using the segmentation model of the NVIDIA Clara project and our original 3D CNN model. Thus far, a fully automated grading system using 3D image analysis was developed to analyze 3D MR images accurately. Two contributions of our study are the combination of a pretrained model for tumor region extraction to a grading pipeline and processing using only a single sequence. When the tumor region was extracted from a 3D image, it was necessary to prepare a large number of medical images and a high-performance machine that could withstand a substantial computational cost; however, using the pretrained model of the Clara project solved these problems. In fact, in this study, we were able to combine a high-performance tumor region extraction model without consuming the time required for training and parameter tuning, and we were able to intensively examine the grading part. Furthermore, in this study, only images of a single sequence were required for grading, and the classification accuracy was comparable to that of previous studies using multiple sequences. This reduced the computational cost and increased the number of applicable patients. These results indicate that a fully automated glioma-grading pipeline may be created in a single sequence by combining a cloud-based pretrained 3D CNN and our original 3D CNN. In the future, we plan to conduct external verification using databases other than BraTS 2018 and finally apply this method to predict IDH status (mutation or wild type). Data Availability Statement: Datasets released to the public were analyzed in this study. The dataset can be found in the BraTS 2018 dataset: https://www.med.upenn.edu/sbia/brats2018/data.html (accessed on 31 May 2021).

Conflicts of Interest:
The authors declare no conflict of interest.