Automated Differentiation of Atypical Parkinsonian Syndromes Using Brain Iron Patterns in Susceptibility Weighted Imaging

In recent studies, iron overload has been reported in atypical parkinsonian syndromes. The topographic patterns of iron distribution in deep brain nuclei vary by each subtype of parkinsonian syndrome, which is affected by underlying disease pathologies. In this study, we developed a novel framework that automatically analyzes the disease-specific patterns of iron accumulation using susceptibility weighted imaging (SWI). We constructed various machine learning models that can classify diseases using radiomic features extracted from SWI, representing distinctive iron distribution patterns for each disorder. Since radiomic features are sensitive to the region of interest, we used a combination of T1-weighted MRI and SWI to improve the segmentation of deep brain nuclei. Radiomics was applied to SWI from 34 patients with a parkinsonian variant of multiple system atrophy, 21 patients with cerebellar variant multiple system atrophy, 17 patients with progressive supranuclear palsy, and 56 patients with Parkinson’s disease. The machine learning classifiers that learn the radiomic features extracted from iron-reflected segmentation results produced an average area under receiver operating characteristic curve (AUC) of 0.8607 on the training data and 0.8489 on the testing data, which is superior to the conventional classifier with segmentation using only T1-weighted images. Our radiomic model based on the hybrid images is a promising tool for automatically differentiating atypical parkinsonian syndromes.


Introduction
In neurodegenerative disease, abnormal neuronal cells die rapidly in parts of the nervous system or the entire brain, resulting in loss of brain function, including cognitive and motor abilities. Parkinson's disease (PD) is the second most common neurodegenerative disorder after Alzheimer's and is accompanied by motor symptoms such as bradykinesia, tremor, and gait disturbance, making it difficult to conduct daily activities and many nonmotor symptoms such as cognitive impairment, depression, autonomic dysfunction, and sleep disturbance. Atypical parkinsonian syndromes (APSs), comprising of progressive supranuclear palsy (PSP) and a parkinsonian variant of multiple system atrophy (MSA-P), are degenerative diseases that share similar Parkinsonism symptoms and signs with PD [1] but show additional symptoms and different rates of functional deterioration and prognosis [2]. Therefore, the development of methods for distinguishing between PD and APS has clinical significance.
One of the main pathogenesis of PD is iron accumulation in the substantia nigra area of the brain associated with the degeneration of dopaminergic neurons and accumulation of misfolded proteins [3]. According to recent pathological studies, each parkinsonian syndrome has unique topographic patterns of iron distribution in deep brain nuclei, which are influenced by underlying disease pathologies [4,5].
There have been many studies using advanced magnetic resonance images (MRI) to detect the physiological mechanisms underlying PD and to distinguish APS from PD, such as using resting-state functional MRI (fMRI) [6] or diffusion MRI [7], but these approaches are not easy to apply in general clinical practice because they are time consuming and do not guarantee consistent results [8]. In addition, various studies using other modalities, including PET and SPECT, can achieve significant diagnostic relevance with respect to imaging of PD and APS [9,10]. However, there are also some disadvantages of these modalities, such as radiation exposure of CT [11], and obstacles to the clinical application of PET by limited access and high examination costs [12]. These common advanced neuroimaging techniques are summarized in Table 1. Table 1. An overview of the common neuroimaging modalities (DTI, PET, SPECT, and SWI), role of modality, and potential of differentiating PD and APS.

Neuroimaging Modality Role of Modality Potential of Differentiating PD and APS
Diffusion-tensor image (DTI) [7] Detect characteristics such as fractional anisotropy (FA) and mean diffusion (MD) Decreased FA and/or increased MD in the substantia nigra, the corpus callosum, the frontal lobes, the cingulum, and the temporal cortex Positron emission tomography (PET) [9] Measure amyloid pathology, tau pathology, a-Synuclein pathology, metabolic activity by measuring changes in the glucose consumption PD-related spatial covariance pattern may involve increased pallidothalamic and pontine activity associated with decreased metabolism in supplementary motor area, premotor cortex, and parietal association areas Single photon emission computed tomography (SPECT) [12] Measure dopamine transporter (DAT) density, dopamine D2 receptor, metabolic activity by measuring changes in the cerebral blood flow Decreased striatal presynaptic DAT binding contralateral to parkinsonian symptomatology with greater reduction in posterior putamen than in anterior putamen or caudate nucleus Susceptibility weighted image (SWI) [13] Visualize iron-related contents sensitively Substantia nigra pars compacta, globus pallidus internus, the putamen, and the red nucleus have been described as regions with increased iron concentration Susceptibility weighted imaging (SWI), a type of iron-sensitive MRI, is frequently used to detect disease-specific patterns of uneven and localized iron concentration in brain regions [13]. Figure 1 shows the sample SWI axial slices of the MSA-P, MSA-C, PSP, and PD. Increases in iron-related signals in the anterior and medial aspects of the globus pallidus of SWI are highly specific markers of PSP. For MSA-P, a significant accumulation of iron is present in the lateral aspect of the globus pallidus adjacent to the putamen. In addition, the posterolateral putaminal hypointensity and lateral-to-medial gradient appear consistently in MSA-P SWI [14]. However, assessing the putaminal hypointensity by focusing only on the signal intensity without accounting the distributional pattern fails to differentiate between MSA-P from PD [15]. A generic and age-related sign of physiological mineralization is slit-like hypointensity along the lateral margin of the putamen or evenly distributed hypointensity throughout the putamen [16]. Therefore, finding a distinctive pattern that distinguishes parkinsonian syndromes besides nonspecific and age-related signs is challenging. To analyze the regional iron heterogeneity in deep brain nuclei without an expert radiologist, radiomic features provide advanced quantification and classification methodologies based on machine learning algorithms. Radiomics can extract textural features that express the relationship with neighboring voxels, allowing us to analyze the regional iron deposition in the subcortical structures. It is suitable for SWI, where the signal itself cannot be used because the SWI intensities of non-paramagnetic materials, such as white matter (WM) and cerebrospinal fluid (CSF), are modified through the filtered phase mask to emphasize the susceptibility in the image. There is considerable interest in the potential of radiomics for non-invasive biomarkers in different organs and pathologies, including neurodegenerative diseases [17].
Since radiomics is sensitive to changes in image intensities, accurate and robust segmentation of deep gray matter (DGM) nuclei is required. Although manually viewing the image and judging the lesion or progression is highly accurate when performed by an expert radiologist, it has the disadvantages of high time consumption and monetary costs to diagnose large numbers of patients. To overcome these problems, several automated segmentation tools based on T1-weighted (T1w) images have been developed including FreeSurfer [18,19], FMRIB software library (FSL) integrated registration and segmentation tool (FIRST) [20], and others [21]. These techniques have been applied in multiple brain imaging studies for examining volume and shape changes in subcortical brain regions that may be linked to normal aging or neurodegenerative disorders. DGM segmentation is scan-rescan reliable on the same scanning platform and between separate scanning platforms, indicating that these tools may be used in large-scale longitudinal and multisite studies [22,23]. However, if only T1w images are used as atlas-based tools, the segmentation results tend to be inaccurate [24] and do not represent the patient's hallmarks, because the spatial correspondence of subcortical structures between an abnormal brain and standard atlas is poor and the contrast of DGM in T1w images is insufficient [25]. Therefore, it is necessary to develop a segmentation method that better reflects the distinctive features of each disease using a modality other than T1w.
In this paper, we propose a novel framework that uses the SWI to automatically analyzed the disease-specific patterns of iron accumulation. Our contributions to this study are listed below: • We proposed a fully automatic framework for the analysis of iron deposition patterns in SWI. • We developed segmentation that reflects more the contrast of iron accumulation than conventional methods using a hybrid contrast image, which is created by image processing and combining T1w and SWI. • We designed machine learning classifiers trained using texture-representing features extracted by our segmentation method.
• We demonstrated the improved performance of the machine learning classifier for differentiating APS using our segmentation framework.
The remainder of the paper is organized as follows. In Section 2, we propose an automated framework for SWI segmentation and the radiomic learning model, including hybrid image generation, DGM segmentation, radiomic feature extraction and selection, and machine learning classifier validation. Experimental results are presented in Section 3, wherein the proposed algorithm is validated using the patient datasets. Finally, the main conclusions and discussions are made in Section 4.

Materials and Methods
In this section, we describe the details of our framework that automatically differentiate APS using brain iron patterns in SWI. Figure 2 presents the overall framework of the proposed method. First, a DGM mask using the advantages of both T1w and SWI was obtained by optimally combining preprocessed and registered images. The radiomic features were retrieved from the brain regions of interest (ROI) by adjusting the distance between the neighboring voxels. Thereafter, a machine learning feature selection algorithm was applied to select meaningful features that distinguish the diseases. Finally, various machine learning classifiers were trained and tested using the selected features. Overall flowchart of combining T1w and SWI, SWI segmentation, feature extraction and selection, and disease classification. We create a hybrid image combining T1w and SWI for iron-reflected DGM segmentation, extract texture representative features, and classify parkinsonian disorders with the significant features selected using various machine learning algorithms.

Patients
A total of 34 MSA-P, 21 MSA-C, 17 PSP, and 56 PD patients were enrolled from the Pusan University Yangsan Hospital. The following clinical diagnostic criteria were fulfilled by the patients: PSP diagnosed according to the Litvan criteria [26], MSA according to clinical consensus criteria [27], and PD according to the UK Brain Bank criteria [28]. Movement Disorder Society (MDS) PSP criteria were retrospectively applied to all consecutive patients with PSP. Twelve patients were classified as probable PSP Richardson's syndrome (PSP-RS) and five were classified as probable PSP with predominant Parkinsonism (PSP-P). Subjects with microvascular lesions discovered from brain MRI were excluded. The Hoehn and Yahr (H&Y) stage and motor examination part of the Unified Parkinson's Disease Rating Scale (UPDRS III) were used to measure disease severity and motor symptoms. Written and informed consent was obtained from all subjects participating in the study, which was approved by the Pusan National University Institutional Review Board, in accordance with the guidelines of the Helsinki Declaration.

Data Preprocessing and SWI Registration
We performed SWI postprocessing as the first step. Magnitude, high-pass filtered phase images, and the processed SWI data were reconstructed automatically on a workstation (Syngo, Siemens Medical Solution) as a DICOM file format for analysis. Then, we created an initial segmentation mask for T1w to use when creating HC through FreeSurfer reconstruction. Non-parametric non-uniform and intensity normalization (N4ITK) biasfield correction [29] and intensity normalization were applied. We applied intensity normalization to scale the T1w signal intensity to a predefined mean value of 110 in the white matter (WM).
Subsequently, the SWI images were registered to the T1w images using affine transform. Since the T1w and the SWI images of the same subject have identical anatomy and head motion between scans, the two images were successfully aligned using an affine registration. These data preprocessing are the steps before calculating weights, combining the steps shown in Figure 3.

SWI Segmentation Using Hybrid Contrast Image
To obtain segmentation results reflecting iron-related signals, we used both the T1w and SWI images simultaneously and merged them into a single hybrid contrast (HC) image [30]. Since SWI provides superior contrast for iron-rich structures, while the T1w images have greater contrast in the curvature of complicated gyrus and sulcus principally used for registration, using the HC results in the DGM segmentation that reflects more iron contents than using T1w alone, which better reflects the disease's hallmarks such as nuclei atrophy [31] caused by the iron deposition.
The HC image is defined by linearly combining T1w and SWI images: where w 1 and w 2 are weighting coefficients for T1w and SWI, respectively. We adjusted the weighting coefficients w 1 and w 2 to make HC as close as possible to the reference, Montreal Neurological Institute (MNI) template. We employed the MNI template's contrast as the target for the coefficient optimization because it has a typical T1w contrast with outstanding DGM structural delineation. The optimized values of the weighting coefficients w * 1 , w * 2 can be obtained by minimizing the squared difference of the mean signal intensities in the target brain regions between the HC and MNI template: where I T1w put , I SWI put , and I MNI put are the mean values of the T1w, SWI, and MNI template images in the putamen region, respectively, and I T1w pall , I SWI pall , and I MNI pall are the mean values of the T1w, SWI, and MNI template images in the globus pallidus region, respectively. We chose the putamen and globus pallidus for the target regions because of high-contrast signals in the broad areas. Figure 3. Flowchart of making a deep gray matter (DGM) mask using the T1w and SWI images. T1w and SWI were preprocessed through normalization, bias correction, and registration. The merging weight coefficients were calculated from initial DGM mask obtained using only T1w segmentation, and a hybrid contrast image (HC) was created as a result. The DGM mask was obtained by registering the HC to the MNI atlas space using non-linear registration. The final mask was obtained by applying inverse warping to the original coordinates.
Then, we used advanced normalization tools (ANTs) to register the HC to the MNI template by computing an initial affine registration and non-linear registration employing a non-rigid diffeomorphic registration scheme [32]. The ANTs produced the most consistent and reliable registration results among 14 different registration methods [33]. The segmentation results from the MNI space were inversely warped to the individual T1w image space. The overall procedure of SWI segmentation is shown in Figure 3.

Feature Extraction and Selection
Radiomic features were extracted from the segmented DGM region of the SWI images automatically computed in Section 2.4. The radiomic features included 19 first-order statistical features, 10 2D shape-based features, 16 3D shape-based features and the following texture-based features: 72 gray-level co-occurrence matrix (GLCM) features, 16 gray-level run length matrix (GLRLM) features, 16 gray-level size zone matrix (GLSZM) features, 15 neighboring gray-tone difference matrix (NGTDM) features, and 14 gray-level dependence matrix (GLDM) features [34], as shown in Figure 2. These matrices represent the relationship with the surrounding voxels according to the kernel for each voxel. For example, the (i, j)th element of the GLCM represents the number of times the combination of levels i and j occur in two voxels in the image, which are separated by a distance of δ pixels along the angle θ.
We added GLCM and NGTDM features while changing the distance to neighboring voxels for which the relationship was calculated as four and seven voxels to the default python radiomic package [35]. Since SWI does not provide quantitative measurements of susceptibility, we excluded the signal-based features and focused on the texture-based features. We subtracted the signal-based features such as the minimum, maximum, mean, median, 10th percentile, 90th percentile of intensity, gray level range, and others. We use only selected optimal features (see below).
Among the sub-cortical structures in the DGM, we chose the putamen to extract radiomic features because comparing them is easy owing to the putamen's large size and high contrast. It shows a large difference between the mask segmented from the T1w-only and the proposed methods. In addition, the radiomic results extracted from the putamen showed the best performance in disease classification using machine learning [8].
Next, to avoid overfitting the learning model, feature selection was performed before applying machine learning algorithms [36]. We employed the Fisher score algorithm to rank the radiomic features and a filter-based method for supervised feature selection. It chooses each feature independently according to its scores based on the Fisher criterion. We selected the top-10 ranked features based on Fisher score. We finally applied these selected features to classify the data using machine learning.
The total datasets were divided into training and testing sets at a 7:3 ratio. In the training sets, features were selected, and 10 classifiers were constructed with 3-fold crossvalidation. To evaluate the performance of the classifiers for differentiation of APS, the area under receiver operating characteristic curve (AUC), balanced accuracy (bAcc), sensitivity (Sen), specificity (Spe), and accuracy (Acc) were measured as defined by: where TP denotes the number of the actual positives that are correctly classified as positives, FN denotes the number of the actual positives that are wrongly classified as negatives, TN denotes the number of the actual negatives that are correctly classified as negatives, and FP denotes the number of the actual negatives that are wrongly classified as positives.
The AUC metric is defined as the area under the receiver operating characteristic (ROC) curve plotted by true positive rate (TPR, equivalent to sensitivity) against false positive rate (FPR, equivalent to 1 − specificity) with varying thresholds. For statistical evaluation, the performance metrics were obtained by randomly changing the training and testing sets 100 times and averaged. The source code is available in GitHub: https://github.com/ KimYunSoo/classify_radiomic (accessed on 22 January 2022).

Demographic Characteristics
The demographic and clinical characteristics of the subject groups are listed in Table 2. There were no significant differences between subject groups in terms of gender distribution. Age was higher in the PSP group than other groups. There was no discernible difference in disease duration between MSA-P, MSA-C, PSP, and PD. The disease severity measured using the UPDRS and H&Y scores was greater in the PSP and MSA groups than the PD group, and MMSE was lower in the PSP and MSA groups than the PD group (p < 0.001).  Figure 4 shows an example of axial slices around the DGM area in the T1w, SWI, and HC images. The DGM contrast is weak and the cortex contrast is clear in the T1w, while the trend is opposite for SWI. Whereas, the HC shows high contrast clearly for both the DGM and cortex. Figure 5 shows that the proposed approach produces segmentation results that better represent hypointensity indicating iron concentration in putamen SWI images. HC segmentation masks that use both T1w and SWI simultaneously reflect more hallmarks of parkinsonian disorders, such as iron accumulation and the resulting putamen atrophy, than T1w-only masks.   Table 3 shows the 10 most significant features selected from SWI for the differentiation of MSA-P and PD when using HC and T1w-only (by FreeSurfer, FS) segmentation masks and their mean values. Autocorrelation7, SumAverage4, JointAverage4, SumAverage7, JointAverage7, and Imc24 in GLCM and HighGrayLevelEmphasis in GLDM were commonly selected both in HC and T1w-only segmentation. Imc24 is the correlation between the probability distribution of intensity and occurrence number, quantifying the complexity of the texture, by neighboring voxel distances of 4. JointAverage7 and SumAverage7 (JointAverage4 and SumAverage4) measure the relationship between occurrences of pairs by neighboring voxel distances of 7 (4, respectively) with lower or higher intensity values. These indicate that the number of pairs of lower or higher intensities helps to differentiate between diseases. HighGrayLevelEmphasis in GLDM measures the distribution of the higher gray-level values with a higher value indicating a greater concentration of high gray-level values in the volume. In addition, in the case of comparison with other disease groups as shown in Tables A1-A5, ClusterShade4 and MCC4 were also found to be common in HC and T1w-only. ClusterShade4 is a metric of the skewness and uniformity of the GLCM by neighboring voxel distances of 4 [49]. MCC4 is the maximal correlation coefficient for nearby voxel distances of 4, which also assesses the complexity of the texture. These features represent how dependent and uniform the distributions are. The significant features only selected using HC mask include Autocorrelation4 in GLCM and HighGrayLevelRunEmphasis in GLRLM. Autocorrelation4 quantifies the magnitude of texture coarseness by neighboring voxel distances of 4; therefore, it operates more effectively in the HC segmentation mask as clusters of similar intensities appear better in HC than in T1w-only mask, which includes regions that are not iron-deposited. HighGrayLevelRunEmphasis in GLRLM measures the distribution of the higher gray-level values. RunEntropy and ShortRunHighGrayLevelEmphasis in GLRLM are also common when using HC masks in other disease group comparisons. RunEntropy is a metric that evaluates the uncertainty and randomness in the distribution of run lengths and gray levels. Therefore, heterogeneity in the texture patterns measure by RunEntropy is helpful in classifying each disorder. ShortRunHighGrayLevelEmphasis assesses the distribution of the high gray-level values and their joint distribution with shorter run lengths in GLRLM. The feature indicates how concentrated hyperintensities in SWI are, which is significant for distinguishing each subtype of parkinsonian disorder.

Feature Extraction and Selection Results
The significant features selected using T1w-only segmentation include GrayLevel-NonUniformity in GLRLM and DependenceVariance in GLDM. GrayLevelNonUniformity is a metric that compares the similarity of the SWI image's gray-level intensity values. The variance in dependence size in the image is measured by DependenceVariance. Moreover, in other disorder comparison cases, LargeDependenceHighGrayLevelEmphasis in the GLDM and Strength in NGTDM were frequently selected features using the T1w-only mask. LargeDependenceHighGrayLevelEmphasis in GLDM is the metric of joint distribution of substantial reliance on it. Strength in NGTDM measures how easily defined and visible the primitives in the image are. These all work mainly in the T1w-only mask, where there are both hypo-and hyper-intensity clusters together, because the T1w-only mask is likely to include the region without iron deposition (see Figure 5). Table 4 lists the training and testing area under the receiver operating characteristic curve (AUC) of the RBF SVM classifier employing features from the T1w-only and HC masks. The SVM with RBF kernel that learns the radiomic features extracted from ironreflected segmentation results produced an average AUC of 0.8607 in training and 0.8489 in testing. T1w-only mask-based radiomic training classifiers had an average AUC of 0.7570 in training and 0.7866 in testing. The classifier model trained with features extracted using the HC mask shows better performance than the T1w-only mask-based SVM classifier. The RBF SVM classifier receiver operating characteristic (ROC) curves for each disease distinguishing case are shown in Figures A1-A6. Through other classification algorithms, it was confirmed that the performance of the proposed method is improved compared to the T1w-only method in the same way as the RBF SVM.

SVM Results
The balanced accuracy, sensitivity, and specificity of the RBF SVM classifier using features from T1w-only masks and HC masks are listed in Table 5. The machine learning classifier that learns the SWI-reflected radiomic features produced an average balanced accuracy of 0.7666 for the training cohort and 0.7992 for the testing cohort. The classifier model trained by radiomics extracted from T1w-only segmentation masks achieved 0.6557 in training and 0.7620 in testing.  The classifier that was trained on the radiomic features extracted by the proposed method achieved an average accuracy of 0.8000 in training and 0.8059 in testing, as shown in Table 6. Conventional T1w-only segmentation classifiers had an accuracy of 0.7352 in training and 0.7653 in testing.
The AUC, balanced accuracy, sensitivity, specificity, and accuracy of all other classifiers are listed in Tables A6-A32. Similar to the RBF SVM, in other classifier models, the AUC, balanced accuracy, and accuracy increased when HC masks reflecting iron-related signal were used.

Discussion and Conclusions
In this paper, we proposed a novel framework that automatically analyzes the diseasespecific patterns of iron deposition using SWI. Through this proposed framework, by directly inputting raw data, the results of disease classification by automated processing without any human intervention can be applied to diagnosis.
Atypical Parkinsonian syndromes, such as MSA-P, MSA-C, and PSP, can be mistaken for PD, especially in the early stages of the disease. This is because both APS and PD are present with Parkinsonism. Therefore, it is critical to distinguish between PD and APS; nevertheless, conventional MRI still makes it difficult to discriminate between these neurodegenerative disorders.
We demonstrated that in individuals with abnormal brain anatomy, the commonly used T1w-only segmentation pipeline produces erroneous subcortical segmentation. The goal of this study was to overcome this issue by modifying the conventional pipeline that incorporates nonlinear registration and by using a dedicated hybrid image contrast created by combining standard T1w images with SWI. By using the HC, which is a combination of the T1w and SWI, for the DGM segmentation, it is possible to identify iron deposition automatically without manual segmentation by expert radiologists, as was done in the past. We have visually shown that putamen segmentation performance was improved by using both the T1w and SWI.
We conducted a qualitative assessment of the visual delineation of our segmentation framework results. If there is a manual segmentation mask by an expert, it can be used as the gold standard, and objective and quantitative evaluation can be performed through metrics such as the dice coefficient. However, manual segmentation performed by experts is costly and time consuming. Some studies have used visual ratings as metrics [50].
Another goal of the present study was to a create machine learning classifier that can distinguish APS from PD using image texture-based features derived from basal nuclei on SWI. Different iron deposition patterns for each disease were compared by extracting quantified radiomic features. The distinction between each subtype of parkinsonian disorder groups was better exposed by the features retrieved with the SWI-reflected mask. When classifying diseases using various machine learning algorithms, it was confirmed that the performance of the classifier improved by training features extracted from the HC.
We recognize the lack of pathological confirmation for diagnosis and phenotypic categorization, which remain the gold standard for the diagnosis of PSP. However, we selected patients with the typical clinical characteristics of MSA, PSP and PD, and assessed these patients over several years.
We used the texture features of the signal intensity contrast to train the machine learning classifiers. Since SWI does not represent a quantified value of iron content, the quantitative values of iron deposition were not measured. We used only texture features because we intended to classify disorders by analyzing the image patterns of each disease and not to create a reference point or threshold with a quantified number.
Although we did not directly compare the quantitative values, we indirectly demonstrated the improvement of segmentation through outperforming the machine learning classifier.
In future work, we will validate the proposed framework more clinically using R2*. In addition, we will aim to apply our hybrid approach of brain tissue segmentation in other PET-MRI modalities.

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the patient(s) to publish this paper.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Figures A1-A6 show the receiver operating characteristic (ROC) curves for each differentiating disease case. As in the RBF SVM, in other algorithms, the classifiers that learned the features extracted using HC masks performed better overall in terms of the AUC, balanced accuracy, and accuracy compared to the models trained using the conventional T1w-only masks.         Table A3 lists the mean values of the features with HC and T1w-only masks when comparing MSA-C and PD.   Appendix C

Appendix C.1
Tables A6-A8 list the results of the classifier trained with k-nearest neighbor (kNN).   Comparison of the linear support vector machine (linSVM) classifier is given in Tables A9-A11.   Tables A12-A14 list the results of the Gaussian process (GP) based classifier.   Tables A15-A17 list the performances of the classifier that learned radiomic features based on random forest (RF).   Appendix C.5 Decision tree (DT) classifier results are listed in Tables A18-A20.   The performances of the classifier trained with multi-layer perceptron (MLP), also known as Neural Net (NN), are listed in Tables A21-A23.   Appendix C.7 The results of the classifier trained based on AdaBoost (ADA) are listed in Tables A24-A26.   Appendix C.8 Results of classifier using Gaussian naïve Bayes (GNB) are listed in Tables A27-A29.   Tables A30-A32 list the results of the quadratic discriminant analysis (QDA) classifier.