Classiﬁcation of Alzheimer’s Disease Using Dual-Phase 18 F-Florbetaben Image with Rank-Based Feature Selection and Machine Learning

: 18 F-ﬂorbetaben (FBB) positron emission tomography is a representative imaging test that observes amyloid deposition in the brain. Compared to delay-phase FBB (dFBB), early-phase FBB shows patterns related to glucose metabolism in 18 F-ﬂuorodeoxyglucose perfusion images. The purpose of this study is to prove that classiﬁcation accuracy is higher when using dual-phase FBB (dual FBB) versus dFBB quantitative analysis by using machine learning and to ﬁnd an optimal machine learning model suitable for dual FBB quantitative analysis data. The key features of our method are (1) a feature ranking method for each phase of FBB with a cross-validated F1 score and (2) a quantitative diagnostic model based on machine learning methods. We compared four classiﬁcation models: support vector machine, naïve Bayes, logistic regression, and random forest (RF). In composite standardized uptake value ratio, RF achieved the best performance (F1: 78.06%) with dual FBB, which was 4.83% higher than the result with dFBB. In conclusion, regardless of the two quantitative analysis methods, using the dual FBB has a higher classiﬁcation accuracy than using the dFBB. The RF model is the machine learning model that best classiﬁes a dual FBB. The regions that have the greatest inﬂuence on the classiﬁcation of dual FBB are the frontal and temporal lobes.


Introduction
Dementia is one of the leading causes of death worldwide due to population growth and aging [1]. Alzheimer's disease (AD) accounts for the majority of dementia cases and is a progressive neurodegenerative disease that causes memory loss, cognitive impairment, behavioral changes, and death [2]. One of the characteristics of Alzheimer's dementia is the formation of amyloid plaques due to the deposition of abnormal neuropathological proteins in the early stages [3]. Accordingly, amyloid biomarkers detect the deposition of amyloid plaques in the brain and are used as a major diagnostic tool for the clinical diagnosis and prediction of Alzheimer's dementia. However, the exact pathogenesis of Alzheimer's dementia is very complex, and it can develop even without protein deposition. To compensate for this, an 18 F-fluorodeoxyglucose (FDG) test, which checks blood flow to the brain, is performed to confirm Alzheimer's dementia. Unfortunately, performing multiple such tests involves an economic and physical cost to the patient, and undergoing multiple tests causes anxiety in patients and treatment delays [4]. Therefore, in the This study included subjects with dual FBB images who underwent FBB testing between April 2016 and December 2021 in the Dong-A University cohort. A total of 645 subjects underwent FBB testing during this period. We included 336 subjects, excluding those with neurological, medical, or psychiatric disorders and cases in which dual FBB images were not obtained or were damaged. The 336 subjects were classified according to their diagnoses into 188 patients with Alzheimer's dementia, 111 patients with mild cognitive impairment diagnosis (MCI), and 37 subjects as healthy controls (HCs). Because the proportion of subjects in the three groups was very disproportionate, 37 patients with Alzheimer's dementia and 37 patients with MCI who were the most similar based on sex and age to the HC subjects were selected ( Figure 1). As a result, 37 subjects with Alzheimer's dementia, 37 subjects with MCI, and 37 HC subjects were selected ( Figure 2, Table 1). Each phase of the FBB image was confirmed by a nuclear medicine physician after collection, to ensure that the Aβ distribution labels were accurate. We classified subjects with Alzheimer's dementia and MCI into an "AD patient group" and HC subjects into a "control group". As a result, the subjects participating in the experiment were divided into two classes: AD patient group and control group.
The Dong-A University Hospital Institutional Review Board (DAUHIRB) reviewed this study with the members who participated in the Institutional Review Board Membership List and finally approved this study protocol (DAUHIRB-17-108).      All PET examinations were performed using a Biograph 40 m CT Flow PET/CT scanner (Siemens Healthcare, Knoxville, TN, USA). Images obtained through scanning were reconstructed using Ultra HD-PET (TrueX-TOF). All images were obtained from the skull vertex to the skull base. eFBB images were acquired for 0-20 min after intravenous injection of FBB of 370 MBq in all subjects. dFBB images were acquired 90-110 min after the injection. Spiral CT was performed with a rotation time of 0.5 s at 100 kVP and 228 mA without an intravenous contrast medium.

Image Preprocessing
Both the eFBB and dFBB images underwent the same preprocessing. The eFBB image was acquired by averaging frames of 1.5-6 min, which is a time range with a high correlation with FDG, considering the initial noise and the minimum image length between 0-20 min ( Figure S1). The program used for the pretreatment process was the PMOD software (version 3.613, PMOD Technologies Ltd., Zurich, Switzerland). For the pretreatment process, we referred to the paper of Yoon et al. [14].

SUVR acquisition
SUVR was obtained for 10 brain areas using the "view" program of PMOD. These 10 brain areas were averaged out of 67 areas in the AAL-merged volume-of-interest provided by PMOD (10 brain areas: frontal cortex (r/l), temporal cortex (r/l), parietal cortex (r/l), occipital cortex (r/l), anterior cingulate cortex (r/l), posterior cingulate cortex (r/l), caudate cortex (r/l), putamen (r/l), thalamus (r/l), precuneus (r/l), and cerebellar cortex (r/l)) ( Figure 3). In the experiment, a composite SUVR [15] and regional SUVR were used. Composite SUVR is the average SUVR of the six regions (frontal cortex (r/l), temporal cortex (r/l), parietal cortex (r/l), occipital cortex (r/l), anterior cingulate cortex (r/l), and posterior cingulate cortex (r/l)). Regional SUVR is the SUVR of each considered in the composite SUVR. Appl. Sci. 2022, 12, x FOR PEER REVIEW 4 of 15 The Dong-A University Hospital Institutional Review Board (DAUHIRB) reviewed this study with the members who participated in the Institutional Review Board Membership List and finally approved this study protocol (DAUHIRB-17-108).

Image Acquisition
All PET examinations were performed using a Biograph 40m CT Flow PET/CT scanner (Siemens Healthcare, Knoxville, TN, USA). Images obtained through scanning were reconstructed using Ultra HD-PET (TrueX-TOF). All images were obtained from the skull vertex to the skull base. eFBB images were acquired for 0-20 min after intravenous injection of FBB of 370 MBq in all subjects. dFBB images were acquired 90-110 min after the injection. Spiral CT was performed with a rotation time of 0.5 s at 100 kVP and 228 mA without an intravenous contrast medium.

Image Preprocessing
Both the eFBB and dFBB images underwent the same preprocessing. The eFBB image was acquired by averaging frames of 1.5-6 min, which is a time range with a high correlation with FDG, considering the initial noise and the minimum image length between 0-20 min ( Figure S1). The program used for the pretreatment process was the PMOD software (version 3.613, PMOD Technologies Ltd., Zurich, Switzerland). For the pretreatment process, we referred to the paper of Yoon et al. [14].

SUVR acquisition
SUVR was obtained for 10 brain areas using the "view" program of PMOD. These 10 brain areas were averaged out of 67 areas in the AAL-merged volume-of-interest provided by PMOD (10 brain areas: frontal cortex (r/l), temporal cortex (r/l), parietal cortex (r/l), occipital cortex (r/l), anterior cingulate cortex (r/l), posterior cingulate cortex (r/l), caudate cortex (r/l), putamen (r/l), thalamus (r/l), precuneus (r/l), and cerebellar cortex (r/l)) ( Figure  3). In the experiment, a composite SUVR [15] and regional SUVR were used. Composite SUVR is the average SUVR of the six regions (frontal cortex (r/l), temporal cortex (r/l), parietal cortex (r/l), occipital cortex (r/l), anterior cingulate cortex (r/l), and posterior cingulate cortex (r/l)). Regional SUVR is the SUVR of each considered in the composite SUVR.  Figure 4 presents an overview of our proposed framework to demonstrate the feasibility of using a dual FBB. The experimental process consisted of the following four steps:

Experiment
1. Feature ranking methods were applied to the preprocessed data. 2. Feature subset was determined by cumulative feature search with 5-fold cross validation. 3. As a series of model selection procedures, the hyperparameters, preprocessing methods, and types of predictive model were reconsidered without test set. 4. The best model was tested and feature distribution observed to test our hypotheses.  Figure 4 presents an overview of our proposed framework to demonstrate the feasibility of using a dual FBB. The experimental process consisted of the following four steps:

1.
Feature ranking methods were applied to the preprocessed data.

2.
Feature subset was determined by cumulative feature search with 5-fold cross validation.

3.
As a series of model selection procedures, the hyperparameters, preprocessing methods, and types of predictive model were reconsidered without test set. 4.
The best model was tested and feature distribution observed to test our hypotheses.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 5 of 15 Figure 4. Overview of the proposed framework to make the feasibility of AD classification with dual FBB. eFBB: early-phase FBB, dFBB: delay-phase FBB, dual FBB: dual-phase FBB, CV: cross validation, , : best feature combination of eFBB, , : best feature combination of dFBB. , and , are used to filter raw data, and the filtered data are used as independent variables for predictive models in the model selection and evaluation.
In the above experimental procedure, we observed the major cortical brain area through frequency analysis and the effect of the extracted features on patient classification, as well as the performance of the classification model. We split the data into training and testing sets (8:2). In particular, we conducted a feature ranking method with a crossvalidated F1 score for feature selection and compared four representative ML methods to observe the classification performance of the selected regional SUVR and composite SUVR from each phase of FBB.

Feature Selection and Aggregation for Dual-Phase FBB
In our experiment, we used ranking-based feature selection that individually evaluated each feature and sorted those features based on the evaluated scores [16]. We adopted a one-way F-test [17] and the Gini score [18] estimated by random forest to measure the quality of the feature subset. We scored each numerical regional SUVR with the p-Value from the one-way F-test and Gini scores in the training set. Subsequently, we evaluated the performance of the feature subset created by cumulatively adding individual features sorted according to their scores using five-fold cross validation in the training set. Finally, the top feature combinations selected from each phase of the FBB were aggregated as follows: , = , ∪ , ( , = 1,2, . . , 10). (1) Here, , and , are the top feature combinations for each phase of the FBB; is the feature set aggregated from a dual FBB and is used to extract the input variables for the predictive models. That is, includes i + j feature combinations, and these combinations are used to create predictive models. In the comparison model with only a single- Figure 4. Overview of the proposed framework to make the feasibility of AD classification with dual FBB. eFBB: early-phase FBB, dFBB: delay-phase FBB, dual FBB: dual-phase FBB, CV: cross validation, F e,i : best feature combination of eFBB, F d,j : best feature combination of dFBB. F e,i and F d,j are used to filter raw data, and the filtered data are used as independent variables for predictive models in the model selection and evaluation.
In the above experimental procedure, we observed the major cortical brain area through frequency analysis and the effect of the extracted features on patient classification, as well as the performance of the classification model. We split the data into training and testing sets (8:2). In particular, we conducted a feature ranking method with a crossvalidated F1 score for feature selection and compared four representative ML methods to observe the classification performance of the selected regional SUVR and composite SUVR from each phase of FBB.

Feature Selection and Aggregation for Dual-Phase FBB
In our experiment, we used ranking-based feature selection that individually evaluated each feature and sorted those features based on the evaluated scores [16]. We adopted a one-way F-test [17] and the Gini score [18] estimated by random forest to measure the quality of the feature subset. We scored each numerical regional SUVR with the p-Value from the one-way F-test and Gini scores in the training set. Subsequently, we evaluated the performance of the feature subset created by cumulatively adding individual features sorted according to their scores using five-fold cross validation in the training set. Finally, the top feature combinations selected from each phase of the FBB were aggregated as follows: Here, F e,i and F d,j are the top feature combinations for each phase of the FBB; C is the feature set aggregated from a dual FBB and is used to extract the input variables for the predictive models. That is, C includes i + j feature combinations, and these combinations are used to create predictive models. In the comparison model with only a single-phase FBB (single FBB), without aggregating C, either F e,i or F d,j is used to make the respective predictive models.

Evaluation for Classification Model and Selected Feature Distribution
As shown in Figure 4, in this experiment, we attempted to show improved patient classification performance of dual FBB compared to single FBB and the brain regions contributing to the classification task through our proposed framework. Therefore, we observed three types of validation items using only the test set.
First, the validation of the patient classification performance of the trained predictive model was considered as follows: representative metrics were used to measure the classification performance. A weighted F1 score was used to overcome data imbalance issues that make a classifier biased toward a majority class [15]. We used the weighted F1 score to validate the quality of the selected feature combinations and evaluated the generalization performance of the predictive model for classifying the AD patient group and the control group.
In the model evaluation, we calculated the area under the curve (AUC) score as well. The receiver operating characteristic (ROC) curve is a graph showing the performance of the classification model at all classification thresholds. The area under the ROC curve, AUC, is used when the distribution for each class is different [19].
Second, we considered the validation of the best feature set submitted by the proposed framework from the training set. Because we iteratively experimented with different seeds, we can simply present the frequencies of the regions suggested by the framework with a histogram. The histogram represents the importance of cortical brain regions in patient classification. We can observe the importance of each cortical region in eFBB imaging, which has rarely been covered in previous studies, or in dFBB imaging.

Machine Learning Methods for Classifying AD Patient Group and Control Group
We considered several representative classifiers to handle simple or complex feature spaces visited during the feature selection step. We built support vector machine (SVM) [20], naïve Bayes (NB) [21], logistic regression (LR) [22], and random forest (RF) [23] as feature quality assessment functions and AD patient classifiers. The kernel of the SVM was set as linear. A radial basis function or polynomial kernel was also tested in our internal experiments, but the results are not presented because there were no meaningful differences. The RF was trained with several estimators of 100, and the Gini impurity [18] was used to measure the quality of the split. The hyperparameters of all models were heuristically determined. There was no specific hyperparameter setting for LR and NB.

Experimental Machine Learning Tool
Python 3.6 was used to conduct the rest of the experimental procedures for feature selection and model evaluation (version 1.0.2 of scikit-learn). The experimental tool was implemented and tested on a Linux Ubuntu 18.04 LTS with an Intel ® Xeon ® CPU 2.20 GHz without a GPU and 12.68 GB of system memory.

Statistical Analysis
To provide statistical evidence of the experimental results and the reproducibility of this work, we used statistical tests to observe the significant differences in the performance distribution of the predictive models with each phase of FBB. The best predictive models with regional or composite SUVR were engaged in 100 repeated tests to estimate their F1 score distributions. We considered the Kolmogorov-Smirnov test [24] for the normality test of the F1 score distributions. Furthermore, we performed the Kruskal-Wallis test and post-hoc analysis to evaluate the superiority of dual FBB over single FBB. All experiments were tested at a significance level of p < 0.01 with a two-sided test. Statistical analysis was performed using IBM SPSS Statistics version 23 (Chicago, IL, USA). In comparison among all phases of FBB, the predictive model with dual FBB was the best model, followed by models with dFBB. In addition, the performance of the prediction model with dual FBB was 4.83% higher than that with dFBB. In regional SUVR, RF (ACC: 78.52%, F1: 78.54%, AUC: 0.8456) ranked based on one-way ANOVA p-Value was the best model for dual FBB, followed by NB (ACC: 76.13%, F1: 76.58%, AUC: 0.8486).

Frequency-Based Analysis for Feature Selection
We observed the frequency of feature selection ranked based on cross-validated F1 scores. In Figure 5, it can be seen that dFBB has a higher performance than eFBB, and dual FBB has a higher performance than dFBB in both the composite SUVR and regional SUVR. In addition, it can be seen that the performance difference between eFBB, dFBB, and dual FBB is statistically significant. There was no normality in the F1 score distribution; therefore, the Kruskal-Wallis test was used to test for significant differences between the three groups. The three groups were significantly different (p < 0.0001), and it was confirmed through post-hoc testing that there were significant differences between eFBB and dFBB, dFBB and dual FBB, and eFBB and dual FBB.
In addition, it can be seen that the performance difference between eFBB, dFBB, and dual FBB is statistically significant. There was no normality in the F1 score distribution; therefore, the Kruskal-Wallis test was used to test for significant differences between the three groups. The three groups were significantly different (p < 0.0001), and it was confirmed through post-hoc testing that there were significant differences between eFBB and dFBB, dFBB and dual FBB, and eFBB and dual FBB.  Table 3 presents the summary statistics from Figure 5. Minimum, maximum, median, mean, and standard deviation (SD) are shown for the F1 score obtained in 100 experiments. In eFBB, the F1 score distribution had a minimum of 33.66%, median of 60.19%, mean of 59.51%, maximum of 78.63%, and SD of 8.4251. In dFBB, the F1 score distribution had a minimum of 40.05%, median of 70.92%, mean of 71.24%, maximum of 82.26%, and SD of 7.9509. In dual FBB, the F1 score distribution had a minimum of 49.89%, median of 78.52%, mean of 78.54%, maximum of 95.70%, and SD of 7.6105. In all statistical values except for SD, the F1 score of dual FBB was higher than that of eFBB and dFBB; eFBB had the highest SD, and dual FBB had the lowest. Table 3. Summary statistics of Figure 5. From Figure 6 and Table 4, among the six regions constituting the composite SUVR, an influential area for classifying the AD patient group and control group was identified. In the case of the eFBB, the order of regions influencing classification was frontal (32.65%), temporal (24.4%), posterior cingulate (16.49%), anterior cingulate (13.06%), occipital cingulate (7.9%), and parietal (5.5%). In the case of the dFBB, the order of regions influencing classification was temporal (33.45%), frontal (29.09%), parietal (18.91%), and occipital (8.73%), anterior cingulate (8.36%), and posterior cingulate (1.45%).  Table 3 presents the summary statistics from Figure 5. Minimum, maximum, median, mean, and standard deviation (SD) are shown for the F1 score obtained in 100 experiments. In eFBB, the F1 score distribution had a minimum of 33.66%, median of 60.19%, mean of 59.51%, maximum of 78.63%, and SD of 8.4251. In dFBB, the F1 score distribution had a minimum of 40.05%, median of 70.92%, mean of 71.24%, maximum of 82.26%, and SD of 7.9509. In dual FBB, the F1 score distribution had a minimum of 49.89%, median of 78.52%, mean of 78.54%, maximum of 95.70%, and SD of 7.6105. In all statistical values except for SD, the F1 score of dual FBB was higher than that of eFBB and dFBB; eFBB had the highest SD, and dual FBB had the lowest. Table 3. Summary statistics of Figure 5. From Figure 6 and Table 4, among the six regions constituting the composite SUVR, an influential area for classifying the AD patient group and control group was identified. In the case of the eFBB, the order of regions influencing classification was frontal (32.65%), temporal (24.4%), posterior cingulate (16.49%), anterior cingulate (13.06%), occipital cingulate (7.9%), and parietal (5.5%). In the case of the dFBB, the order of regions influencing classification was temporal (33.45%), frontal (29.09%), parietal (18.91%), and occipital (8.73%), anterior cingulate (8.36%), and posterior cingulate (1.45%).   We compared the execution time taken according to the individual feature evaluation methods. The execution time of one-way ANOVA p-value (23 ms for eFBB, 18 ms for dFBB) was shorter than that of Gini score of RF (322 ms for eFBB, 279 ms for dFBB).

Min
Tables 5 and 6 list the frequency distributions according to the number of selected features in each phase. In other words, each region from Table 4 was observed in detail. For one selected feature, eFBB showed the largest frequency of frontal region features and dFBB of temporal region features. In the case of two selected features, frontal and temporal region features were the most observed in eFBB and dFBB. In the case of three and four selected features, increased frequencies of the posterior cingulate and anterior cingulate region features were observed in the eFBB, and parietal and occipital region features increased in the dFBB.  We compared the execution time taken according to the individual feature evaluation methods. The execution time of one-way ANOVA p-value (23 ms for eFBB, 18 ms for dFBB) was shorter than that of Gini score of RF (322 ms for eFBB, 279 ms for dFBB).
Tables 5 and 6 list the frequency distributions according to the number of selected features in each phase. In other words, each region from Table 4 was observed in detail. For one selected feature, eFBB showed the largest frequency of frontal region features and dFBB of temporal region features. In the case of two selected features, frontal and temporal region features were the most observed in eFBB and dFBB. In the case of three and four selected features, increased frequencies of the posterior cingulate and anterior cingulate region features were observed in the eFBB, and parietal and occipital region features increased in the dFBB.   Frontal  2  11  15  20  18  12  7  5  4  1  95  Temporal  0  7  6  15  15  11  7  5  4  1  71  Parietal  0  0  8  0  2  3  3  3  3  0  16  Occipital  0  0  8  1  3  5  4  3  4  1  23  Anterior cingulate  0  0  4  7  7  6  6  3  4  1

Discussion
In this study, an ML classification model showed a higher classification performance with dual FBB than with dFBB. This is because eFBB and dFBB have complementary properties as biomarkers [25]. dFBB provides information on amyloid deposition and eFBB provides information on neurodegeneration [5][6][7]. Some patients with AD have at least one important neurodegenerative marker, including reduced FDG metabolism, but no detectable amyloid deposition [26]. Dual FBB is more likely to detect cognitive impairment at followup by identifying both markers of amyloid deposition and neurodegeneration [26,27]. In addition, dual FBB has advantages, such as cost savings, reduced radiation exposure, improved convenience for patients and caregivers, and reduced social costs [28].
In this regard, the preceding studies classified using multimodal data are as follows. Grueso et al. (2021) [29] reviewed 116 studies that applied machine learning to neuroimaging data from 2010 to May 2021. In this paper, to predict the progression from MCI to AD, or early prediction, we analyzed the existing classification method based on ML algorithms applied to neuroimaging data together with other variables. Most of the studies used data such as MRI and PET, and multimodal data achieved better classification accuracy than one image. SVM, the most used algorithm, showed an average accuracy of 75.4%, and the average accuracy of the convolutional neural network was 78.5%, which was higher than that of SVM. El-Sappagh et al. (2021) [30] classified AD, MCI, and HC using RF with multimodal data (including MRI images, FDG PET, and cognitive scores). As a result, cross-validation accuracy of 93.95% and F1 score of 93.94% were obtained. Lin et al. (2022) [11] classified subjective cognitive decline (SCD) and HC using SVM for multimodal data including MRI images and genetic information. When only MRI images were classified, the classification accuracy was SCD 79.49% and HC 83.13%, and when classified as multimodal data, the classification accuracy improved to SCD 85.36% and HC 82.52%. Kumari et al. (2022) [31] classified multimodal data including FDG, PiB PET, and cognitive test data using adaptive hyper parameter tuning random forest ensemble classifier. Among the three classes (AD, MCI, and HC), two classes were selected and classified. In the case of binary classification of AD and HC, it showed accuracy of 100%. In the case of binary classification of MCI and HC, it showed accuracy of 91% and specificity of 100%. In the case of binary classification of AD and MCI, it showed accuracy of 95%, specificity of 100%, and sensitivity of 80%.
Recent studies used ML algorithms to classify images of Alzheimer's dementia, MCI, and HC with very high accuracy. In general, by adding data rather than a single image, it was classified as multimodal data to increase the classification accuracy. When multimodal data is used, classification accuracy increases, but tests for data increase. However, addi-tional tests to obtain multimodal data are costly, time consuming, and delay treatment. Our study uses dual FBB to improve classification performance as if using multimodal data with one test.
In addition, this study evaluated the influence of feature selection of each brain region on classification of eFBB and dFBB. This brain region SUVR constitutes the composite SUVR, and frontal and temporal regions are important when observing early and delayed amyloid regions. The priorities for the remaining regions differed for dFBB and eFBB. This means that eFBB and dFBB have different brain regions of focus. In the eFBB, the posterior cingulate had the third highest influence, but in the dFBB, it had the lowest influence among the six regions. According to previous studies, Aβ accumulation has the greatest effect on the temporal and parietal lobes in dFBB [32].
With the recognition that a single biomarker cannot provide diagnostic certainty when considering comorbidities and potential overlapping pathologies [33] and the increasing availability of medical artificial intelligence, the number of studies using multimodal data has increased [11][12][13]. However, to the best of our knowledge, no study has quantitatively analyzed dual-phase amyloid PET with ML models. Most studies using dual-phase amyloid have confirmed the time range of early amyloid PET, which showed the highest correlation with FDG [5][6][7][34][35][36]. A classification study using dual-phase amyloid confirmed that readers could compare dual-phase amyloid PET and delayed amyloid PET [7]. Therefore, to the best of our knowledge, this paper is the first to classify amyloid PET images through ML using quantitative values of dual FBB images.
The image of each phase was preprocessed so that the feature element represents a specific brain region, and the rank-based cumulative feature search, which evaluates the quality of the feature elements individually, was adopted and applied to the feature subsets representing each phase. That is, in the case of the ANOVA method for individual feature scores, only the main effect of each feature is considered. For dual FBB analysis, the two features were aggregated at the input level so that the classification model could analyze the features of both phases simultaneously. The method used in this study was designed to find subsets by evaluating the quality of individual features and arranging them according to their ranking. Therefore, one limitation of this method is that it does not consider the potential for attributes brought about by textures or nonlinear kernel mapping that appear in a subset of the original features. Therefore, it is considered a suitable method to use with sufficient preprocessing, such as adding interaction effects, and additional feature engineering; the individual features should be sufficiently reviewed by the practitioner.
Computer-aided diagnosis and treatment expert systems using multimodal data have been proposed to manage various chronic diseases owing to the increase in the aging population. Cai et al. provided a detailed summary of how to combine different related knowledge to make an important decision in a multimodal data-driven approach for a smart healthcare system [37]. Multimodal fusion is largely divided into three types: early, medium-term, and late fusion. Early fusion refers to an array of methods that directly combine features before inputting them to a predictive system [38]. This method can train a predictive system using various types of knowledge and can lead to a model that makes comprehensive decisions on multimodal data. In medium-term fusion, the model receives multiple data and individually processes each data point within the model, which is mainly implemented by neural network-based algorithms, and then combines each feature set [39]. Finally, late fusion trains a universal model for each data source and directly integrates all the inferences made by individual models.
In this study, to achieve the task of classifying patient groups from HC, only the early fusion method was considered. In the case of the medium-term fusion method, it is important to create a good latent representation by a neural-network-based model, and to build a reliable model, a sufficient dataset is required. In terms of the advantages of late fusion, it is possible to avoid model complexity and training difficulty owing to higher dimensions in feature-level fusion approaches, combine models without retraining, and scale up easily [40].
In our study, late fusion was not considered because the input features were obtained through sufficient image preprocessing steps, and we expected that those features for each phase would only express the state of each phase for a representative brain region. However, we will consider the medium-term or late fusion method for complex data analysis, such as multiple modalities or imaging data, as well as an additional dataset in future studies.
This study has the following limitations and considerations for future work. First, the amount of data used in our experiment may be insufficient or biased to generalize the performance of the model. In future research, it will be necessary to secure sufficient samples and separately secure external data or data for verification.
Second, in the feature selection section, we only considered the feature selection algorithm as a filter-based method, especially the feature ranking method, despite many opportunities to use diverse feature selection techniques in medical data analysis [16]. One of the benefits of the feature ranking method is that these methods are independent of the learning algorithm and require less computational time than wrapper methods. We expected that these properties could be used to avoid making our learning model overfitted under a limited number of samples and to observe the significant regions to distinguish the AD patient group from the control group. We could reproduce our experiment with an additional or external dataset.
Third, because only limited information (quantitative analysis result of dual FBB) was used, and the classification was performed using the feature selection method, it is impossible to explain the changes in the brain as a whole. Changes in beta-amyloid deposition appear locally or globally in the cortical areas. In the future, we propose to reduce the time consumption of the quantitative analysis process and analyze the entire brain region by using the brain image of amyloid PET as it is for data analysis. In this study, with sufficient preprocessing steps, we divided cortical regions into several representative compartments to extract hand-created features and attempted to explain how each cortical region contributes to patient classification. However, in reality, changes in beta-amyloid deposition appear locally or globally in the cortical regions. In future studies, we could apply a local explanation method, such as local interpretable model-agnostic explanation [41] or Shapley additive explanation [42], which provides feature information about the classifier behavior for each sample.
Finally, in this study, we performed complex preprocessing steps to extract regional SUVR from dual FBB and applied four representative ML techniques to handcrafted features. Deep learning technology, which has recently been in the spotlight, has achieved excellent results in the field of medical image analysis related to AD by learning the appropriate feature representation for a given task from input data without a feature engineering procedure [43][44][45][46]. As an extension of this study, we will perform an analysis using a convolutional neural network when sufficient data are obtained.
Additionally, we propose a method to analyze multimodal data, including cognitive and movement tests, in dual-phase amyloid PET image data.

Conclusions
In this study, using dual FBB had a higher classification accuracy than using dFBB in classifying the AD patient group and control group. In the case of dual FBB, the classification accuracy was highest when RF was used among the ML classification models. The RF achieved accuracy of 78.21% (F1 score: 78.06%, AUC: 0.8724) in the composite SUVR and accuracy of 78.52% (F1 score: 78.54%, AUC: 0.8456) in the regional SUVR. It was confirmed that the frontal and temporal lobes are important areas in the early and delayed amyloid regions. The ratio of the number of features selected was 32.64% for frontal lobe and 24.39% for temporal lobe in eFBB and 33.45% for temporal lobe and 29.09% for frontal lobe in dFBB.
Although ML helps in the quantitative analysis of dual amyloid PET by reducing subjectivity and ambiguity, this study still has limitations. In future studies, it will be necessary to improve the classification accuracy for use in a clinical environment by using multimodal data, including dual imaging data.

Data Availability Statement:
The data used for this study are available upon request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.