Prediction of Glioma Grades Using Deep Learning with Wavelet Radiomic Features

: Gliomas are the most common primary brain tumors. They are classiﬁed into 4 grades (Grade I–II-III–IV) according to the guidelines of the World Health Organization (WHO). The accurate grading of gliomas has clinical signiﬁcance for planning prognostic treatments, pre-diagnosis, monitoring and administration of chemotherapy. The purpose of this study is to develop a deep learning-based classiﬁcation method using radiomic features of brain tumor glioma grades with deep neural network (DNN). The classiﬁer was combined with the discrete wavelet transform (DWT) the powerful feature extraction tool. This study primarily focuses on the four main aspects of the radiomic workﬂow, namely tumor segmentation, feature extraction, analysis, and classiﬁcation. We evaluated data from 121 patients with brain tumors (Grade II, n = 77; Grade III, n = 44) from The Cancer Imaging Archive, and 744 radiomic features were obtained by applying low sub-band and high sub-band 3D wavelet transform ﬁlters to the 3D tumor images. Quantitative values were statistically analyzed with MannWhitney U tests and 126 radiomic features with signiﬁcant statistical properties were selected in eight di ﬀ erent wavelet ﬁlters. Classiﬁcation performances of 3D wavelet transform ﬁlter groups were measured using accuracy, sensitivity, F1 score, and speciﬁcity values using the deep learning classiﬁer model. The proposed model was highly e ﬀ ective in grading gliomas with 96.15% accuracy, 94.12% precision, 100% recall, 96.97% F1 score, and 98.75% Area under the ROC curve. As a result, deep learning and feature selection techniques with wavelet transform ﬁlters can be accurately applied using the proposed method in glioma grade classiﬁcation.


Introduction
Gliomas are primary malignant tumors that are common in the brain, having a high relapse rate and high mortality. According to the data of the American Cancer Society, 23,880 people were diagnosed with malignant brain and spinal cord tumors in 2018 and 70% of those diagnosed with malignant tumors died [1]. Gliomas are usually classified in a range of grades from I to IV. According to the classification of the World Health Organization (WHO), gliomas can be subdivided by their malignancy from Grade II (lower grade) to Grade IV (high grade) [2]. Gliomas, hypoyphosis tumors and meningiomas are among the primary brain tumors [3]. The WHO categorization includes low-grade gliomas (LGGs), diffuse low-grade (Grade II) and medium-grade gliomas (Grade III), and tumors with highly variable behaviors whose textural structures are unpredictable. In addition, according to the WHO, low-grade gliomas are infiltrative neoplasms that usually contain low and medium-grade gliomas (Grade II and Grade III) [4]. Grade II and Grade III brain tumors can have types of astrocytoma, oligoastrocytoma, oligodendrioglioma. These tumors can be in Grade II and Grade III groups, therefore, examining brain tumor types in the right classes will facilitate the treatment of brain cancers. Astrocytomas can be

Related Work
In this section, current studies on brain cancer diagnosis and classification using deep learning and machine learning techniques were reviewed. Ramteke and Monali [17] used the nearest neighbors classifier as a classification algorithm for the statistical tissue properties of normal and malignant brain magnetic resonance imaging (MRI) findings, achieving an 80% classification rate. Similarly, Gadpayleand and Mahajani [18] classified normal and malignant brain MR images according to tissue properties and achieved 72.5% accuracy with a neural network classifier. Ghosh and Bandyopadhyay [19], using the fuzzy C-means clustering algorithm with patient MRI images, detected different tumor types in the brain and other areas related to the brain with 89.2% accuracy. Abidin et al. [20] used the AdaBoost classification algorithm to detect metastasis and glioblastoma tumors and obtained 0.71 accuracy. George et al. evaluated normal and abnormal brain tumors with the C4.5 decision tree and multilayer perceptron (MLP) machine learning algorithms according to their radiomic shape features [21]. Bahadure et al. [22] examined accuracy, sensitivity, and specificity using the wavelet segmentation and feature extraction methods for brain MRI images. In the study performed by Nabizadeh et al., Gabor wavelet transform was used to extract features of tumor areas in MR images and compared the performances of different classifiers [23]. Hsieh et al. classified glioblastoma and low-grade gliomas with accuracy of 0.88 using a logistic regression algorithm with brain tumor radiomic features [13]. There are many alternative classification methods to be investigated in radiographic analysis, such as logistic regression, naive Bayes classifier, nearest neighbor, and decision trees [24].

Materials and Methods
We have designed a model that aims to classify tumor grades correctly. The workflow consists of 5 steps. These are tumor segmentation, feature extraction, statistical analysis, feature selection and classification. The brain tumor regions were evaluated for the Region Of Interest (ROI) interactively.
The size of the ROI should be large enough to accurately capture the tissue information, thus revealing statistical significance. It should also be noted that the size of the ROI may depend on the MRI acquisition parameters. Using a 200 × 200 pixel ROI in an image with a resolution of 2.5 × 2.5 mm 2 is not the same as for an image of 0.7 × 0.7 mm 2 . Generally, the tumor region found in brain MRI images has a very small proportion in the general image. This situation makes the detection of the tumor difficult. For this purpose, probable tumor region was taken into a convex area manually with ROI. In applications, the multi-gene region that is not more concave than rectangular is determined manually. As a result of this process, the part outside the ROI region will be ignored, so higher success will be achieved. For texture analysis, regions of interest (ROIs) for each MR imaging set of each patient were manually drawn slice-by-slice in the axial plane for each of the available sequences.
In next step, ROIs were segmented with the GrowCut segmentation algorithm [25]. Later, 3D wavelet transform filters were used to detect radiomic features of tissue properties from multispectral layers. Quantitative parameters of textural features and first-order features were obtained using the radiomic feature extraction method. The statistical significance of the data was tested with the Mann-Whitney U non-parametric test. Radiomic features resulting from the radiomic feature extraction process were classified by the DNN. Figure 1 shows the block diagram of the proposed system.

Dataset
The Cancer Imaging Archive (TCIA) is a popular worldwide portal that provides full open access to medical images for cancer research. MRI data of the patients in this study were obtained from TCIA (http://cancerimagingarchive.net/) of the National Cancer Institute [26]. TCIA is a

Dataset
The Cancer Imaging Archive (TCIA) is a popular worldwide portal that provides full open access to medical images for cancer research. MRI data of the patients in this study were obtained from TCIA (http://cancerimagingarchive.net/) of the National Cancer Institute [26]. TCIA is a database that allows the use of MRI medical images of various types of cancer in academic studies and research.
All materials and images included in the LGG-1p/19q deletion dataset have been used in accordance with the rules, guidelines, and licensing policies regarding patient protection [27,28]. 121 patients whose brain tumors are Grade II and Grade III proven by biopsy results were used in the study. T1-weighted, T2-weighted and Fluid-Attenuated Inversion Recovery (FLAIR) sequences of each patient were examined. The choice of MRI sequence in radiomic feature determination depends on the application. Contrast-enhanced T1-weighted images have been used in tissue analysis and segmentation in the literature [29]. T2-weighted images have been used to classify benign and malignant tumors [30]. The tissue properties of the MRI images of each patient are different, so a definitive assessment cannot be made as to which MRI sequence is better. T2-weighted images and FLAIR images were used in this study. Grade II, n = 77 and Grade III, n = 44 patient from the LGG-1p/19q deletion dataset were used in this study. Examples of Grade II and Grade III original gliomas used in the study are shown in Figure 2.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 24 (a) (b) Figure 2. Gliomas of Grades II (a) and Grade III (b) as they appeared on brain magnetic resonance images from (The Cancer Imaging Archive) TCIA.

GrowCut Segmentation
Image segmentation is the process of dividing an image into meaningful regions where different properties are labeled for each pixel. The methods designed for image segmentation and their performances vary depending on the image type, application method, size, and color intensity. Automatic image segmentation is one of the most difficult operations of image processing. Various algorithms based on MRI have been proposed for automatic glioma segmentation. For example, a segmentation method using fuzzy clustering techniques [31], a morphological edge detection method [32], a segmentation method based on tumor tissue pixel densities [33], and a graph-based segmentation method for glioblastoma tumors [34] have been tried. However, many algorithms, such as neural network methods, morphological methods, clustering methods, and Gaussian models, have also been used for the segmentation of brain tissues. The main purpose of these algorithms is to identify the tumor area quickly, steadily, consistently, and accurately.
The GrowCut segmentation method is proposed here, which was introduced to the literature by Vezhnevets and Konouchine [25] as a cellular automata algorithm with a label in each cell developed for the multilayer segmentation of 2D or 3D images. The GrowCut algorithm automatically calculates an area of scribbles, taking into account user input blackouts. GrowCut tagging was done by starting from the core and background pixels that we manually drew with the paint effect tool. The process is complete when the algorithm has tagged all the pixels in the ROI that we have drawn manually. Labeling was done by calculating the weight of individual pixels with their neighbors. The algorithm worked with the basic logic of leaving a weighted neighbor tag larger than the power of the given Figure 2. Gliomas of Grades II (a) and Grade III (b) as they appeared on brain magnetic resonance images from (The Cancer Imaging Archive) TCIA.

GrowCut Segmentation
Image segmentation is the process of dividing an image into meaningful regions where different properties are labeled for each pixel. The methods designed for image segmentation and their performances vary depending on the image type, application method, size, and color intensity. Automatic image segmentation is one of the most difficult operations of image processing. Various algorithms based on MRI have been proposed for automatic glioma segmentation. For example, a segmentation method using fuzzy clustering techniques [31], a morphological edge detection method [32], a segmentation method based on tumor tissue pixel densities [33], and a graph-based segmentation method for glioblastoma tumors [34] have been tried. However, many algorithms, such as neural network methods, morphological methods, clustering methods, and Gaussian models, have also been used for the segmentation of brain tissues. The main purpose of these algorithms is to identify the tumor area quickly, steadily, consistently, and accurately.
The GrowCut segmentation method is proposed here, which was introduced to the literature by Vezhnevets and Konouchine [25] as a cellular automata algorithm with a label in each cell developed for the multilayer segmentation of 2D or 3D images. The GrowCut algorithm automatically calculates an area of scribbles, taking into account user input blackouts. GrowCut tagging was done by starting from the core and background pixels that we manually drew with the paint effect tool. The process is complete when the algorithm has tagged all the pixels in the ROI that we have drawn manually. Labeling was done by calculating the weight of individual pixels with their neighbors. The algorithm worked with the basic logic of leaving a weighted neighbor tag larger than the power of the given pixel. Since all pixels must be visited each time in the GrowCut algorithm, time is lost. To prevent this, ROI processing was done manually. Expert radiologist support was received during this process. The regions that were exposed as a result of the segmentation were filled manually. Automated tumor segmentation was performed after plotting the axial, sagittal, and coronal positions of the tumor images according to the segmentation information in the dataset with the GrowCut algorithm after establishing the ROI. Figure 3 shows the segmentation procedures.

Wavelet Transform
Wavelet transform is an important algorithm used in the analysis of MR images [35]. The wavelet transform was applied by discrete wavelet transform with a total of eight filters with four high pass and four low pass. The discrete wavelet transform works on the principle of sub-band coding. Due to its various features and characteristics, it is a good technique for image processing by using mother wavelets. Wavelet reconstruction is a process in which useful image features are selected that extract an image's radiomic features from each sub-part and then detect the discriminatory aspects of these data in visual analysis. This model creates a geometric localization exchange with narrow high-pass and wide low-pass filters and is widely preferred [36]. In previous studies [37,38], 2D wavelets were used for filtering images, but in the present work, 3D wavelet transform is used for extracting radiomic features. The fact that brain tumors have thin tissue also reflects positively on the classification results of high-pass filters. Radiomic features can be determined on the original brain tumor image or using the filter option. There are many filters used to achieve radiomic features. These include wavelet and Laplacian of Gauss (LoG) filters, and many filters, including square root, logarithm, square, and exponential filters. In addition to the original image, we also used wavelettransformed images in extracting texture features. In our 3D wavelet technique (x, y, z), sub-volumes are created on three-dimensional spatial axes, resulting in an eight-piece image volume. The 3D volume is filtered on the x-axis and a high-pass and low-pass image is obtained, H (x, y, z) and L (x, y, z). This process is then repeated on the y-axis and four sub-bands (HL, HH, LL, LH) appear. On the third axis (z), these four volumes are filtered into a total of eight sub-bands. Thus, high-pass filters (HLL, HLH, HHL, HHH) and low-pass filters (LLL, LLH, LHL, LHH) are formed. Often, fine texture is obtained from details (i.e., high-pass filters), while coarse texture is obtained from approximate values (i.e., low-pass filters). The decomposition process is a view called wavelet analysis filterbank, which is performed by wrapping the mother wavelets in a single down-sampling step in each direction [39]. We tried various filter families for wavelet decomposition including Daubechies, Symlets, Coiflets, Haar. Haar is a square function made up of two coefficients, a mother wavelet with both orthogonality and symmetry properties. The biggest drawback is that it is not smooth [40]. Daubechies wavelets are developed by looking at Haar wavelets. It has a larger number of coefficients, but they are not synchronous [41]. Symmetrical main wavelets such as Symlets,

Wavelet Transform
Wavelet transform is an important algorithm used in the analysis of MR images [35]. The wavelet transform was applied by discrete wavelet transform with a total of eight filters with four high pass and four low pass. The discrete wavelet transform works on the principle of sub-band coding. Due to its various features and characteristics, it is a good technique for image processing by using mother wavelets. Wavelet reconstruction is a process in which useful image features are selected that extract an image's radiomic features from each sub-part and then detect the discriminatory aspects of these data in visual analysis. This model creates a geometric localization exchange with narrow high-pass and wide low-pass filters and is widely preferred [36]. In previous studies [37,38], 2D wavelets were used for filtering images, but in the present work, 3D wavelet transform is used for extracting radiomic features. The fact that brain tumors have thin tissue also reflects positively on the classification results of high-pass filters. Radiomic features can be determined on the original brain tumor image or using the filter option. There are many filters used to achieve radiomic features. These include wavelet and Laplacian of Gauss (LoG) filters, and many filters, including square root, logarithm, square, and exponential filters. In addition to the original image, we also used wavelet-transformed images in extracting texture features. In our 3D wavelet technique (x, y, z), sub-volumes are created on three-dimensional spatial axes, resulting in an eight-piece image volume. The 3D volume is filtered on the x-axis and a high-pass and low-pass image is obtained, H (x, y, z) and L (x, y, z). This process is then repeated on the y-axis and four sub-bands (HL, HH, LL, LH) appear. On the third axis (z), these four volumes are filtered into a total of eight sub-bands. Thus, high-pass filters (HLL, HLH, HHL, HHH) and low-pass filters (LLL, LLH, LHL, LHH) are formed. Often, fine texture is obtained from details (i.e., high-pass filters), while coarse texture is obtained from approximate values (i.e., low-pass filters). The decomposition process is a view called wavelet analysis filterbank, which is performed by wrapping the mother wavelets in a single down-sampling step in each direction [39]. We tried various filter families for wavelet decomposition including Daubechies, Symlets, Coiflets, Haar. Haar is a square function made up of two coefficients, a mother wavelet with both orthogonality and symmetry properties. The biggest drawback is that it is not smooth [40]. Daubechies wavelets are developed by looking at Haar wavelets. It has a larger number of coefficients, but they are not synchronous [41]. Symmetrical main wavelets such as Symlets, Biorthogonal were developed over time. The common feature of these wavelets is that although they have less verticality, they have high smoothness [42]. The Daubechies Wavelet transform is implemented through a series of decompositions, such as the Haar transform. The only difference is that the filter length is more than two. Therefore, it is more local and smoother [43]. Coiflets wavelets developed based on Daubechies [44]. Although there are many mother wavelets, the dependence of the radiomic prognosis prediction to maternal wavelet in terms of patient survival is still not investigated [45]. Daubechies wavelet filter is used in the study. Feature extraction process was started to determine the wavelet filter, which achieved high accuracy in the classification of Grade II and Grade III tumors.

Radiomic Feature Extraction
Radiomics is the process of quantitatively identifying all the properties of tumor volume. Radiomic features are examined in two categories: semantic (size, shape, location, etc.) and agnostic (first-order features, texture, wavelet, etc.) [46]. Agnostic radiomic features of tumor images were extracted using open-source PyRadiomics python package [47]. PyRadiomics enables the processing and extraction of radiographic features from medical image data.
We examined the effectiveness of 3D wavelet radiomic features (LLH, LHL, LHH, LLL, HLL, HLH, HHL, HHH) belonging to six different matrices in brain cancer grade estimation. These matrices are as follows: A total of 144 first-order features and 600 texture features, or a total of 744 3D wavelet features, were extracted with 3D Slicer. We applied a statistical analysis method to select the strongest features.

Statistical Analysis and Feature Selection
The fact that there are too many features in a study extends the calculation time while the absence of a semantic relationship between the features reduces the classification accuracy. For this reason, feature selection has been applied. There are many methods such as Principal Component Analysis (PCA) and sequential forward selection (SFS) analysis to extract and select the most suitable feature set. The PCA method is a mathematical data mining method that facilitates the determination of the components of the data carrying the most information and reduces the data in this way. The PCA method is used to determine the most fundamental factor based on the relationships between variables, but the purpose of our study is to determine the variables that differentiate Grade II and Grade III groups from each other. Therefore, it is important that 126 variables out of 744 variables reveal significant differences between Grade II and Grade III. In the study, while extracting the radiomic feature, it is not aimed to define the properties in a single variable. Therefore, Statistical analysis of data on radiomic features was done with the Statistical Package for the Social Science (SPSS) statistical computer program [50]. Texture feature extraction is based on statistical distributions. We analyzed first-order features (Global) and texture features (GLCM, GLDM, GLRLM, GLSZM, and NGTDM). It is useful software that is widely preferred in medical studies as well as social sciences. We performed univariate analysis based on the Mann-Whitney U test of significance for comparisons between grades and multiscale texture types (i.e., LLL, HLL, LHL, HHL, LLH, HLH, LHH, and HHH). Since the data of 121 patients with Grade II (n = 77) and Grade III (n = 44) were not normally distributed. Whether there are missing values and outlier values is analyzed with SPSS program. "Kurtosis and Skewness" values are checked to analyze whether the data have a normal distribution or not [51]. According to the literature, it is stated that these values have a normal distribution if they are in the range of ±3 or ±2 [52]. Accordingly, it was seen that the data did not have normal distribution, missing value and outlier. In line with these results the statistical significance of the data was tested with the non-parametric test, MannWhitney U. We detected the radiomic features that created significant differences between Grade II and Grade III. Spearman correlation analysis was used because variables were obtained by proportional or intermittent scales, but did not conform to normal distribution. The p-value indicates the amount of possible error that we will have when we encounter a statistically significant difference in a comparison. According to the HolmBonferroni method, a correlation value of 0.3 to 0.4 means a medium correlation and p < 0.05 is considered statistically significant. A total of 126 radiomic features with values of p < 0.05 were found statistically significant according to agnostic radiomic features. The distribution of 126 features in eight different wavelet filters is 4 low filter band LLH 18 radiomic features, LHL 19 radiomic features, LHH 14 radiomic features, LLL 20 radiomic features and 4 high filter band HLL 11 radiomic features, HLH 19 radiomic features, HHL 13 radiomic features and HHH 12 has radiomic features.

Deep Neural Network (DNN) Model
Deep learning extends traditional neural networks by adding hidden layers to network architectures between input and output layers while modeling more complex and nonlinear situations [53]. This situation has attracted the attention of researchers with its performance in many areas such as image processing, classification, and medical image analysis. Deep learning has been used frequently in recent years in medical imaging and computer-aided radiological research. The increasing availability of medical image data as well as the increasing data processing speeds of computers have had a major impact on this [54].
There are various DL architectures, among which Convolutional Neural Networks (CNNs) have been widely used in recent years. CNN architecture does not require feature extraction before implementation. In the other side training CNN from scratch is time consuming and difficult. Before the model is ready, it needs a very large data set for compilation and training [55]. In this case, it is not correct to apply CNN in every dataset. Our proposed methodology is based on the DNN learning architecture for classifying brain tumors as Grade II and Grade III. In this way, radiomic features were obtained in different categories. In such a way, high success is achieved in many problems, including image processing [56], video processing [57], and speech recognition [58]. As with the model applied in this study, using radiomic features in the same architecture with deep learning algorithms can improve outcome estimation.
High classification results have been obtained in deep learning studies with fewer than 100 patients recently [59][60][61]. Considering all these criteria, this study was designed with the open source H2O Python module, which is a feedforward DNN trained using backpropagation [62]. The most important parameters in the H2O deep learning module are the number of periods, the number of hidden layers and the number of neurons in the hidden layer. For nonlinear problems, it has been suggested to start with two hidden layers. The larger the size of the hidden layer, the easier it will be to learn. But after a certain point the success rate will probably decrease [63]. For example, Liu and Chen showed in their study that the network structure of 400 × 400 × 400 reached lower error rates than similar network structures [64].
Different parameters and layers were used to increase network robustness. The deep learning architecture used comprises 200 × 200 hidden layers. While the activation function used in the hidden layers is Rectifier Linear Unit (ReLu), the activation function used in the output layer is Softmax.
The DNN hyper parameters are shown in Table 1. Stochastic gradient descent (SGD) was used as the optimizer. The epsilon value was 10 −8 . Moreover, elastic regularization was performed to prevent over-fitting. Five-fold cross-validation was performed within the framework of training, the Bernoulli distribution process was performed for the dataset, and the values were calculated. The performance of both groups was measured using ROC curves and the accuracy, sensitivity, specificity, and area under the ROC curve (AUC) values. Selected radiomic features of each transform band were classified. Accuracy levels of Grade II and Grade III tumors were examined comparatively according to wavelet groups. The Computational Complexity of the Classification Method is presented in the Table 2.

Data Exploration
Grade II and Grade III MRI brain tumors of 121 patients were segmented by the GrowCut algorithm. 744 radiomic features of the 3D images were detected in low sub-band and high sub-band filters with Wavelet Transform method. Since the data of 121 patients were not normally distributed, the statistical significance of the data was tested with the MannWhitney U, which is a non-parametric test. As a result of the MannWhitney U test, 126 radiomic features with significant statistical properties were determined.
The feature distribution of 8 sub-band groups is as follows. For LLH 18 radiomic features, for LHL 19 radiomic features, for LHH 14 radiomic features, for LLL 20 radiomic features, for HLL 11 radiomic features, for HLH 19 radiomic features, for HHL 13 radiomic features, and for HHH 12 radiomic features were selected with the non-parametric MannWhitney U test. A total of 126 radiomic features, namely first-order features (n = 14) and texture features (n = 112).
First-order features show that the Energy feature creates a statistically significant difference between Grade II and Grade III. In the HLL, HHL, LLH, LHH, and HHH filter groups, the radiomic features of the GLCM matrix group did not show a statistically significant difference between Grade II and Grade III. Radiomic features belonging to the GLSZM matrix group in the LLL and HLH filter groups showed the highest statistically significant difference between Grade II and Grade III.
Among the 3D wavelet low-pass filter groups (LLH, LHL, LHH, LLL), the maximum statistically significant difference belonged to 20 radiomic features, especially in the LLL filter group. On the other hand, the maximum statistically significant difference among the high-pass filter groups belonged to 19 radiomic features, especially in the HLH filter group. The statistically significant differences of selected wavelet first-order features and texture features of Grade II and Grade III tumor images as evaluated by the MannWhitney U test are provided in Appendix A.
First-order features for LLH show that Energy (p = 0.004) and Total Energy (p = 0.007) create a statistically significant difference between Grade II and Grade III. There was no significant difference between GLCM features and tumor grades for LLH.
We found 5 statistically significantly different features for LLH in GLDM, 5 features for LLH in GLRLM, 5 features for LLH in GLSZM, and 1 feature for LLH in NGTDM between Grade II and Grade III (p < 0.05).
For the LLH GLSZM and LHL GLSZM groups, Gray Level Non-Uniformity differed the most significantly (p = 0.000) between Grade II and Grade III. Two statistically significantly different features for LHL GLCM, 5 features for LHL GLDM, 4 features for LHL GLRLM, 5 features for LHL GLSZM, and 1 feature for LHL NGTDM were determined between Grade II and Grade III (p < 0.05).
Coarseness, Energy, and Total Energy differed significantly between Grades II and III (p < 0.005) in all wavelet filter groups (LLH, LHL, LHH, LLL, HLL, HLH, HHL, HHH). Only one radiomic feature, Dependence Non-Uniformity (p = 0.000), differed significantly between Grade II and Grade III in all filter groups. Correlation is frequently used in the statistical data analysis of radiomic features of medical images. In this study, the correlation between selected radiomic features belonging to each wavelet filter group is examined. Figure 4 shows the correlation graphs of wavelet filter groups. Correlation matrices of wavelet groups were obtained with Python.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 24 We found 5 statistically significantly different features for LLH in GLDM, 5 features for LLH in GLRLM, 5 features for LLH in GLSZM, and 1 feature for LLH in NGTDM between Grade II and Grade III (p < 0.05).
For the LLH GLSZM and LHL GLSZM groups, Gray Level Non-Uniformity differed the most significantly (p = 0.000) between Grade II and Grade III. Two statistically significantly different features for LHL GLCM, 5 features for LHL GLDM, 4 features for LHL GLRLM, 5 features for LHL GLSZM, and 1 feature for LHL NGTDM were determined between Grade II and Grade III (p < 0.05).
Coarseness, Energy, and Total Energy differed significantly between Grades II and III (p < 0.005) in all wavelet filter groups (LLH, LHL, LHH, LLL, HLL, HLH, HHL, HHH). Only one radiomic feature, Dependence Non-Uniformity (p = 0.000), differed significantly between Grade II and Grade III in all filter groups. Correlation is frequently used in the statistical data analysis of radiomic features of medical images. In this study, the correlation between selected radiomic features belonging to each wavelet filter group is examined. Figure 4 shows the correlation graphs of wavelet filter groups. Correlation matrices of wavelet groups were obtained with Python.
A statistically significant and strong relationship was observed between the GLDM Dependence Non-Uniformity (w3-HLL) and the GLSZM Gray Level Non-Uniformity (w8-HLL) features at r = 0.9 and p < 0.001. There was a negative statistical relationship between the NGTDM Coarseness (w11-HLL) and Gray Level Non-Uniformity (w6-HLL) in the w-HLL group at r = −0.5 and p < 0.001.
At the same time, the radiomic features of the HHH group revealed a statistically significant and strong relationship between first-order Total Energy and GLSZM Size Zone Non-Uniformity features (r = 0.95).
A statistically high correlation (r = 0.90) was observed between GLSZM Low Gray Level Zone Emphasis and GLRLM Long Run Low Gray Level Emphasis, GLDM Low Gray Level Emphasis, and GLRLM Low Gray Level Run Emphasis.     A statistically significant and strong relationship was observed between the GLDM Dependence Non-Uniformity (w3-HLL) and the GLSZM Gray Level Non-Uniformity (w8-HLL) features at r = 0.9 and p < 0.001. There was a negative statistical relationship between the NGTDM Coarseness (w11-HLL) and Gray Level Non-Uniformity (w6-HLL) in the w-HLL group at r = −0.5 and p < 0.001.
At the same time, the radiomic features of the HHH group revealed a statistically significant and strong relationship between first-order Total Energy and GLSZM Size Zone Non-Uniformity features (r = 0.95).
A statistically high correlation (r = 0.90) was observed between GLSZM Low Gray Level Zone Emphasis and GLRLM Long Run Low Gray Level Emphasis, GLDM Low Gray Level Emphasis, and GLRLM Low Gray Level Run Emphasis.

Performance Evaluation
Classification was performed with the H2O controlled learning model, and 60% of the dataset was randomly reserved for training while 20% was reserved for validation and 20% for testing. Dataset splitting was done before the training phase. Table 3 shows the number of patients to be classified according to these groups. The classification was implemented in an open source Python environment. After the training phase with the DNN model, it was tested on a dataset that had never been used in training to provide an unbiased assessment. Figure 5 shows the confusion matrices resulting from classification in all wavelet sub-bands.

Performance Evaluation
Classification was performed with the H2O controlled learning model, and 60% of the dataset was randomly reserved for training while 20% was reserved for validation and 20% for testing. Dataset splitting was done before the training phase. Table 3 shows the number of patients to be classified according to these groups. Table 3. Training, validation, and testing statistics. Training  49  25  74  Validation  12  9  21  Testing  16  10  26 The classification was implemented in an open source Python environment. After the training phase with the DNN model, it was tested on a dataset that had never been used in training to provide an unbiased assessment. Figure 5 shows the confusion matrices resulting from classification in all wavelet sub-bands.      Accuracy: The most commonly used classification accuracy to determine classification performances is used to measure the overall effectiveness of the classifier.

Groups Grade II Grade III Total
Precision: The term precision is used in describing the agreement of a set of results among themselves. Precision is usually expressed in terms of the deviation of a set of results from the arithmetic mean of the set. Precision = tp tp + fp (2) Recall: Recall analysis determines how different values of an independent variable affect a particular dependent variable under a set of assumptions. This process is used within certain limits that depend on one or more input variables.
F1 Score: In order to make a sound decision about classifier performance, results other than classification accuracy should also be evaluated. The F1 score calculated for this purpose measures the relationship between the positive information of the data and those given by the classifier.
The classification success of the model was evaluated with precision, recall, and F1 metrics. These performance metrics were obtained through the confusion matrix. Table 5 shows the classification performance statistics. When the test results were examined, accuracy of 96.15% was obtained for w-HHH compared to 84.62% for w-HHL, 73.08% for w-HLH, 80.77% for w-HLL, 84.62% for w-LHH, 76.92% for w-LHL, 84.62% for w-LLH, and 80.77% for w-LLL. Thus, w-HHH has higher accuracy (96.15%) when compared with all other 3D wavelet transform bands. The area under the ROC curve (AUC) is also generally examined [65]. When AUC values are analyzed, w-HHH has the highest AUC value of 98.75%. Figure 6 shows the ROC curve for w-HHH filter classification.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 14 of 24 F1 Score: In order to make a sound decision about classifier performance, results other than classification accuracy should also be evaluated. The F1 score calculated for this purpose measures the relationship between the positive information of the data and those given by the classifier. F1 score = 2tp 2tp + fp + fn (4) The classification success of the model was evaluated with precision, recall, and F1 metrics. These performance metrics were obtained through the confusion matrix. Table 5 shows the classification performance statistics.  When the test results were examined, accuracy of 96.15% was obtained for w-HHH compared to 84.62% for w-HHL, 73.08% for w-HLH, 80.77% for w-HLL, 84.62% for w-LHH, 76.92% for w-LHL, 84.62% for w-LLH, and 80.77% for w-LLL. Thus, w-HHH has higher accuracy (96.15%) when compared with all other 3D wavelet transform bands. The area under the ROC curve (AUC) is also generally examined [65]. When AUC values are analyzed, w-HHH has the highest AUC value of 98.75%. Figure 6 shows the ROC curve for w-HHH filter classification.

Discussion
Deep learning and radiomic applications are used together in medical image processing. Accurate analysis and application of these two key methods has the potential to revolutionize radiology altogether and open up a new field in medical imaging. To accomplish this, the right combination of radiomic analysis and deep learning method is very important.
The values of the numerical results obtained from the segmentation and pretreatment of the image affect the classification findings. One of the most important steps in detecting tumor degrees in datasets is radiomic feature extraction and the radiomic feature selection methods that are applied when creating a model. In a deep convolutional encoder-decoder study, accuracy was calculated by removing 90 radiomic features from 83 brain sub-regions using only 2D slice images [66]. However,

Discussion
Deep learning and radiomic applications are used together in medical image processing. Accurate analysis and application of these two key methods has the potential to revolutionize radiology altogether and open up a new field in medical imaging. To accomplish this, the right combination of radiomic analysis and deep learning method is very important.
The values of the numerical results obtained from the segmentation and pretreatment of the image affect the classification findings. One of the most important steps in detecting tumor degrees in datasets is radiomic feature extraction and the radiomic feature selection methods that are applied when creating a model. In a deep convolutional encoder-decoder study, accuracy was calculated by removing 90 radiomic features from 83 brain sub-regions using only 2D slice images [66]. However, in our study, 3D image information was utilized considering the z-axis. The number of radiomic features of the images that we obtained is high and the tumor regions were segmented in three dimensions. Using 3D images in feature extraction provided an important advantage. Additionally, the purpose of feature extraction is to find as many features as possible by defining the collected data by performing different operations. In previous glioma grading studies [67][68][69], the known ROC method was used to analyze the relationship between parametric values and glioma grades.
Just by looking at that information, it is very difficult to determine which parameters and properties can be best used in glioma grading. Some radiomic features selected in previous studies were thought to help in the separation of different grades of gliomas, while others were not significantly correlated with glioma grades [70,71]. Therefore, it will be more useful to find out the most effective method by trying comprehensive parametric combinations in different groups, not a single parameter. There are many researchers in the literature who do segmentation studies. Rundo et al. [72] have studied gross tumor volume (GTV) segmentation. During the treatment planning phase, GTV is often shaped by experienced neurosurgeons and radiologists on MR images using purely manual segmentation procedures. They proposed a semi-automatic seeded image segmentation method in their study. The GTVCUT segmentation approach gave successful results when heterogeneous tumors containing diffused internal necrotic material or cysts were processed. Sompong and Wongthanavasu to improve brain tumor segmentation they proposed a framework consisting of two paradigms, image transformation and segmentation [73]. Their study focused on the segmentation of edema points of T2w MRI images. The Tumor-Cut algorithm developed in this study is faced with the robustness problem that leads to insufficient segmentation in seed growth. Therefore, GrowCut algorithm was used in segmentation. Rundo et al. proposed a fully automated multimodal segmentation approach to separate Biological Target Volume (BTV) and Gross Target Volume (GTV) from PET and MRI images using 19 metastatic brain tumors. The experimental results obtained showed that the GTV and BTV segmentations were statistically related (Spearman rank correlation coefficient: 0.898) but did not have a higher degree of similarity (mean Membrane Similarity Coefficient: 61.87 ± 14.64). On the other hand, only MRI images were used in our study for the segmentation process [74]. Accordingly, in our study, the most effective and significant parameters related to glioma grades in eight sub band and high band wavelet groups were run. As a result, different accuracy values were obtained in each group. In a study comparing the GrowCut algorithm with GraphCut, GrabCut, and Random Walker algorithms, it was reported that the GrowCut algorithm segmented images faster and with higher quality [25]. Some of the basic features that exist in the Growcut algorithm have been an important factor in choosing it for segmentation. (1) Natural handling of images, (2) Multi-label segmentation, (3) easy to use and apply, (4) providing user input opportunity [75]. In this study, 3D Slicer, a free open source software platform for biomedical research, and the GrowCut algorithm, integrated with 3D Slicer, were used to segment brain tumors.
This increased the originality and validity of the present study. Different parameters show statistical significance in each group and that is an indicator that a single parameter is not effective in glioma grading. It is now known that texture features and especially quantitative features are important variables in the field of radiomics. Kharrat et al. [76] used 2D wavelet transform and a spatial gray-level dependency matrix in feature extraction. In this study, 3D wavelet transform is used and the classification results of radiomic features obtained from eight different wavelet groups are evaluated.
The purpose of feature selection is to minimize the number of extracted features in the most meaningful way with the right methods. These features should firmly define the concepts hidden in the data. Over-selection of data can give excellent results when applied to the training data of the analysis, but when new data are presented to the algorithm for analysis, it is an important problem in machine learning and the performance of the model decreases. Our results showed that the First-order Energy, GLDM Dependence Non-Uniformity, GLSZM Gray Level Non-Uniformity, and NGTDM Coarseness features of 3D wavelet filters had high power in differentiating between Grade II and Grade III in each MRI sequence. Welch et al. identified and used the most stable radiomic properties (First-order Energy, Shape-Compactness, Texture-Gray Level Non-Uniformity (GLNU), and Wavelet HLH-GLNU) in their study [77].
In addition, it is very difficult for the human eye to detect and quantify some statistical features such as wavelet and first-order features [37]. Although automated tools significantly affect the process of identifying the ROI and reduce the variable results among radiologists, they are not yet fully and efficiently used. For this reason, ROI fields created by experts in the dataset have been tested here with different automatic segmentation tools and high accuracy has been achieved. In the classification of low-grade and high-grade glioma, authors stated that correlation with GLCM has the highest discrimination among texture features [78]. In our study, correlation, Idn, and Idmn GLCM matrix features also revealed significant differences between Grade II and Grade III. Radiomic features obtained as a result of radiomic analysis have a very important place in classification. In another study, the authors [79] used DWT in their study on brain tumor, but they performed wavelet feature selection using PCA. This is a mathematical operation that converts a set of semantically related variables into fewer unrelated variables called principal components. In our study, the radiomic feature extraction process and the similarity between Grade II and Grade III groups in each feature are examined. This process was carried out with the Mann-Whitney U nonparametric test. With the model we applied, the number of radiomic features was reduced according to the semantic relationship, in this way, high accuracy was obtained in classification. However, in the studies conducted with the PCA method, the algorithm restricts the radiomic features and collects them in a certain category. However, it is difficult to classify Grade II and Grade III gliomas with high accuracy since their radiomic features are similar.
Most of the methods developed in the literature are classified without using sufficient numbers of radiomic features. In a recent work by Dong et al., they selected and classified only 3 features out of a total of 321 radiomic features. They reached accuracy values of between 0.70 and 0.76 with a total of 5 classifiers [80]. Huang et al., studying 576 brain metastases of 161 patients, obtained 107 radiomic features and they determined 8 radiomic features with SPSS [81]. In another study, radiomic feature selection was applied and the radiomic features of Alzheimer patients were classified from brain images with accuracy of 91.5%, 83.1%, and 85.9%. As seen in these studies, the number of radiomic features used affects the accuracy rate. A total of 144 first-order features and 600 texture features, or a total of 744 3D wavelet features, were extracted with 3D Slicer, which is a very high number considering the literature.
Cho et al. [15] chose five radiomic features in glioma grading and three classifiers showed an average AUC value of 0.9400 for training groups and 0.9030 (logistic regression: 0.9010, support vector machine: 0.8866, and random forest: 0.9213) for test groups. In another study, 408 brain metastases were examined from 87 patients, 440 radiomic features were removed, and the AUC value was calculated with the random forest classifier [82]. In our study, a very high AUC result of 98.75% was achieved in the HHH wavelet group. Successful performance in radiomics depends on the type of data used. Support vector machines and logistic regression are useful in dividing cohorts into two different groups, such as good and bad prognosis, but they cannot provide detailed information when there are more variables. Another method commonly used in medical imaging in recent years is deep learning. Deep learning algorithms and radiomics are used together in medical image analysis, such as in image acquisition, segmentation, and classification in recent research. Increasing computer capacities and increasing medical imaging data allow this.
Khawaldeh et al. [83] classified the grades of glioma tumors using convolutional neural networks (CNNs). Their results showed reasonable performance in characterizing medical brain images with an accuracy of 91.16%. It is not always possible to obtain high accuracy in studies if too many features are obtained. In a deep learning-based study for brain tumor segmentation and survival prediction in glioma cases using the CNN architecture, 4524 radiomic features were extracted and 61% accuracy was achieved with the selected strong features [84]. We have classified brain tumors with DNN with high accuracy with the radiomic features that we obtained. The correct determination of radiomic features and the harmony of the parameters of the model have contributed significantly to these results. Sajjad et al. [85] used the VGG-19 CNN architecture to classify brain tumor grades (I, II, III, and IV) and achieved 0.90 accuracy and 0.96 average precision. CNN is disadvantageous for large images such as 256 × 256, in cases where it is necessary to process a large number of filters [86]. In our study, eight sub-band wavelet filters were used and radiomic properties were extracted from each one separately. The DNN can obtain the features automatically by learning and has been applied in computer vision successfully [87]. For this reason, DNN was used in model selection. Zia et al. [88] used the principal component analysis classifier for discrete wavelet transform, feature selection, and support vector machine for feature extraction. Three glioma grades were classified using 92 MR images. The proposed method achieved the highest accuracy of 88.26%, the highest sensitivity of 92.23%, and the highest specificity of 93.93%. Rathi and Palani applied "tumor or non-tumor" classification with a deep learning classifier and achieved 83% accuracy [89]. Although it is easier to differentiate that, the accuracy obtained was insufficient, which indicates the shortcomings of the applied method. In another study, the results showed reasonably good performance with a maximum classification accuracy of 92.86% for the Wndchrm DNN classifier [90]. Wndchrm is open source software that can be used for classification with feature extraction and selection processes in the biomedical field. The results obtained with this software are lower than those obtained with our feature selection method. The increase of open source artificial intelligence libraries is increasing the number of deep learning applications in the field of medical imaging. However, being able to interpret these deep learning systems correctly and running the correct methods in datasets requires significant expertise. While grading brain tumors, they were not limited to a single radiomic feature category. The first-order features, the morphological and wavelet features, and the properties of the matrix groups were all examined. This has provided a comprehensive acquisition of all macro and micro level features of the 3D tumor area. This contributed to the high accuracy achieved. More features and higher accuracy have been obtained compared to other studies in the literature that examine different categories of radiomic features at the same time. The adopted network structure classified Grade II and Grade III tumors with high accuracy with wavelet filters. The DNN architecture, statistical analysis process and segmentation method used in this are of great importance.
In addition, when we recognize that most of the features such as wavelet and first-order features cannot be identified and detected by the human eye, the medical contribution of this study is better understood. The method applied in this study achieved high success with w-HHH radiomic features with accuracy of 96.15%. As a result, the use of wavelet-based radiomic features in classification with deep learning methods has led to higher quantitative results.

Conclusions and Future Work
In this study, a DNN-based architecture is proposed for brain tumor classification. Glioma grades can be accurately determined by a combination of high-dimensional 3D imaging features, an advanced feature selection method, and a deep learning classifier. The proposed model was highly effective in grading gliomas with 96.15% accuracy, 94.12% precision, 100% recall, 96.97% F1 score, and 98.75% Area under the ROC curve on wavelet filters. The DNN model was used to detect the wavelet filter with the highest accuracy in the classification of Grade II and Grade III tumors.
We believe that the method applied in our study can contribute to highly efficient computer-aided diagnostic systems for gliomas. Magnetic resonance imaging maintains its effectiveness in clinical approach, as it is a non-invasive technique in this patient group. Deep learning and radiomic analysis methods will become an indispensable part of clinical support systems over time and will become markers of tumor grades of medical images in the future.
In the future studies, automatic ROI and segmentation processes will be applied directly from the image with Mask R-CNN. It will then be classified using state-of-the-art pre-trained transfer learning models.