Potential Applications of Artificial Intelligence in Clinical Trials for Alzheimer’s Disease

Clinical trials for Alzheimer’s disease (AD) face multiple challenges, such as the high screen failure rate and the even allocation of heterogeneous participants. Artificial intelligence (AI), which has become a potent tool of modern science with the expansion in the volume, variety, and velocity of biological data, offers promising potential to address these issues in AD clinical trials. In this review, we introduce the current status of AD clinical trials and the topic of machine learning. Then, a comprehensive review is focused on the potential applications of AI in the steps of AD clinical trials, including the prediction of protein and MRI AD biomarkers in the prescreening process during eligibility assessment and the likelihood stratification of AD subjects into rapid and slow progressors in randomization. Finally, this review provides challenges, developments, and the future outlook on the integration of AI into AD clinical trials.


Introduction
Recent advances in understanding the neurobiology of Alzheimer's Disease (AD) revealed that the initiation of disease processes leading to symptomatic and functional neurodegeneration precedes the onset of dementia by 15-20 years [1,2]. AD is pathologically characterized by the aggregation of beta-amyloid (Aβ) plaques and hyperphosphorylated tau proteins in the form of neurofibrillary tangles (NFTs). The amyloid cascade hypothesis explains that Aβ triggers the following procession, such as the development of NFTs, cortical atrophy, cognitive impairments, and loss of activities of daily living [3][4][5]. These AD biomarkers appear in the predementia stage, including normal cognition (NC) and mild cognitive impairment (MCI). Thus, previous clinical trials have focused on the development of Aβ targeting diagnostic and therapeutic methods. There are also growing clinical trials targeting tau and NFTs, as tau pathology is more closely correlated with cognitive decline than Aβ [6].
Despite the stagnancy in AD clinical trials for the past 18 years ever since memantine was launched in 2003, a recent clinical trial of Biogen's aducanumab has demonstrated a statistically significant reduction in Aβ plaques [7,8]. The US Food and Drug Administration has approved aducanumab for AD treatment using the Accelerated Approval pathway, which is expected to serve as an impetus for global AD clinical trial efforts. AD clinical trials involve two notable steps: eligibility assessment and randomization. During eligibility assessment, recruited participants are screened for either enrollment or exclusion, and, during randomization, selected participants are allocated into intervention and control groups. However, there are several challenges present in these steps. Principally, AD clinical trials have a high screen failure rate, which could be attributed to the stringent screening criteria of AD trials, such as Aβ PET positivity. Secondary and tertiary prevention trials for AD have average screen failure rates of 88% and 44%, respectively, which would suggest the need for considerable work to recruit even one eligible subject due to the expensive and time-consuming nature of screening procedures [9]. Furthermore, given results by finding similarities within pixels of real and synthetic images [15]. Peak signal to noise ratio (PSNR) compares real and synthetic images using the ratio between the maximum possible intensity value and the mean squared error [16].

Eligibility Assessment
Screening is an important process in AD clinical trials to ascertain that selected participants are only those with AD pathology. Clinical diagnosis of AD follows the 1984 NINCDS-ADRDA Work Group criteria [17] or the 2011 NIA-AA guidelines [18]. Recent studies have shown that 15-25% of clinically diagnosed AD patients showed incompatible amyloid positron emission tomography (PET) or cerebrospinal fluid (CSF) findings [19,20]. Additionally, the increasing tendency of AD clinical trials to target preclinical stages, where cognition and functionality are normal, has underscored the importance of biomarker-guided screening. However, AD clinical trials have a high screen failure rate and corresponding low recruitment rate, as only one-third of asymptomatic older adults are Aβ+ [21]. Therefore, prescreening algorithms using AI could help reduce screen failure rate by classifying the recruited population into high and low likelihood groups, with the former undergoing screening procedures for validation and the latter being excluded from the clinical trials ( Figure 1). This consideration applies to clinical trials for both disease-modifying therapies (DMTs) and symptomatic treatments. To evaluate AI-based synthetic PET images, several measurements were suggested. Maximum mean discrepancy (MMD) measures the distance between real and synthetic PET data distribution [14]. Structural similarity metric (SSIM) assesses the diversity of generated results by finding similarities within pixels of real and synthetic images [15]. Peak signal to noise ratio (PSNR) compares real and synthetic images using the ratio between the maximum possible intensity value and the mean squared error [16].

Eligibility Assessment
Screening is an important process in AD clinical trials to ascertain that selected participants are only those with AD pathology. Clinical diagnosis of AD follows the 1984 NINCDS-ADRDA Work Group criteria [17] or the 2011 NIA-AA guidelines [18]. Recent studies have shown that 15-25% of clinically diagnosed AD patients showed incompatible amyloid positron emission tomography (PET) or cerebrospinal fluid (CSF) findings [19,20]. Additionally, the increasing tendency of AD clinical trials to target preclinical stages, where cognition and functionality are normal, has underscored the importance of biomarker-guided screening. However, AD clinical trials have a high screen failure rate and corresponding low recruitment rate, as only one-third of asymptomatic older adults are Aβ+ [21]. Therefore, prescreening algorithms using AI could help reduce screen failure rate by classifying the recruited population into high and low likelihood groups, with the former undergoing screening procedures for validation and the latter being excluded from the clinical trials ( Figure 1). This consideration applies to clinical trials for both disease-modifying therapies (DMTs) and symptomatic treatments. Figure 1. Diagram of eligibility assessment in AD clinical trials. AI-applications in eligibility assessment would prescreen the recruited subjects to identify the high likelihood and low likelihood groups. AI algorithms would be used to classify individuals based on predicted protein (Aβ and tau) biomarkers and/or MRI biomarkers. The high likelihood group would be selected for further screening, and the low likelihood group would be excluded, thereby leading to lower screen failure.

Protein Biomarkers for AD
Although causal mechanisms remain unclear, Aβ and tau proteinopathies are defining features of AD as a unique disease [22]. AI prescreening algorithms can reduce challenges of PET and CSF, such as high costs and participants' fear of radiation exposure, by selecting a subset of individuals who are likely to be Aβ or tau positive. Therefore, in AI research for AD clinical trials, many aimed to predict amyloidosis in subjects with MCI [23][24][25], while others focused on preclinical stages before neurodegeneration is too substantial [26,27] (Table 2).
Studies have suggested neuroimaging modalities as good predictors for Aβ+ status. One group proposed the least absolute shrinkage selection (LASSO) regression method to Figure 1. Diagram of eligibility assessment in AD clinical trials. AI-applications in eligibility assessment would prescreen the recruited subjects to identify the high likelihood and low likelihood groups. AI algorithms would be used to classify individuals based on predicted protein (Aβ and tau) biomarkers and/or MRI biomarkers. The high likelihood group would be selected for further screening, and the low likelihood group would be excluded, thereby leading to lower screen failure.

Protein Biomarkers for AD
Although causal mechanisms remain unclear, Aβ and tau proteinopathies are defining features of AD as a unique disease [22]. AI prescreening algorithms can reduce challenges of PET and CSF, such as high costs and participants' fear of radiation exposure, by selecting a subset of individuals who are likely to be Aβ or tau positive. Therefore, in AI research for AD clinical trials, many aimed to predict amyloidosis in subjects with MCI [23][24][25], while others focused on preclinical stages before neurodegeneration is too substantial [26,27] ( Table 2).
Studies have suggested neuroimaging modalities as good predictors for Aβ+ status. One group proposed the least absolute shrinkage selection (LASSO) regression method to predict Aβ+ in 440 aMCI subjects [28]. Radiomics features were extracted from MRI images with hippocampus and precuneus as regions of interest (ROIs) and were used alone or in combination with baseline non-imaging predictors. Combining the T1 and T2 features (AUC: 0.75) improved the prediction from models only using either T1 (AUC: 0.71) or T2  [30]. In tensor-based morphometry, a Jacobian determinant map encodes the local volume difference between a reference and a target image [31]. Thus, the classifier (AUC: 0.87) relied on the local tissue change between Aβ+ and Aβ− NC subjects for classification. While MRI is the most popular modality, some studies have used other modalities, such as diffusion tensor imaging (DTI), which assesses the structural integrity of white matter tracts, could complement MRI to measure brain atrophy [32]. A multiple kernels SVM (MK-SVM) provided an accuracy of 66-68% for discrimination in identifying MCI Aβ+ vs. MCI Aβ− and 67-74% for MCI Aβ+ vs. NC.
Another research focus is on assessing Aβ status with non-imaging variables, which could reduce cost and save time. One group developed step-wise hierarchical regression models to investigate the relationship between the individual cognitive measure and Aβ in 41 aMCI subjects [33]. Story recall was highly accurate in predicting the Aβ burden (AUC: 0.86) and accounted for variance effects of age, education, hippocampal volume, and global cognition. A recent study evaluated the positive predictive value (PPV) of demographic, APOE, and cognitive information in the prediction of amyloid pathology in older NC subjects [34]. The random forest (RF) model estimated a PPV of 0.65, which would reduce the number of subjects undergoing biomarker screening from 2451 to 1539 in a clinical trial aimed at recruiting 1000 Aβ+ subjects. Another study developed the Preclinical Amyloid Sensitive Composite (PASC) to detect the cognitive differences between Aβ+ and Aβ− NC subjects [35]. The Multiple Indicator Multiple Cause (MIMIC) model was used to compare the latent means in the cognitive domains of two groups, and the multivariate analysis of covariance (MANCOVA) was performed to see the score differences of NP tests pertaining to episodic memory and executive functions. The PASC scores were calculated using principal component analysis (PCA) to obtain the weight for each test score and achieved an AUC of 0.764 when applied with demographic measures.
A global effort is underway to establish various trial-ready registries like the Brain Health Registry (www.brainhealthregistry.org, accessed on 4 February 2022) to facilitate AD trial recruitment. Therefore, there are many algorithms that aim to evaluate online information for prescreening [36,37]. For instance, Extreme Gradient Boosting (XGBoost) [38] is a tree-based ML technique that gives larger weights to misclassified data points at each iteration. An XGBoost model achieved an AUC of 0.60 to 0.74, depending on the various combinations of feature vectors, such as demographics, APOE genotype, cognitive and functional measures from the Trial-Ready Cohort in Preclinical/Prodromal Alzheimer's Disease (TRC-PAD) [39].
Many AI-guided diagnostic efforts focused on Aβ, given the evidence in support of the amyloid cascade hypothesis [3] and countless clinical trials for anti-amyloid agents [40][41][42]. However, recent findings suggested that tau pathology has more intimate links with ADrelated cognitive impairment than Aβ pathology, suggesting the tantalizing potential for clinical trials targeting tau [43,44]. A study classified 64 prodromal AD patients through GBM and RF algorithms [45]. A combination of demographic variables, MCI diagnosis information, NP test scores, APOE genotype, and cortical thickness resulted in the highest performance for GBM (AUC: 0.86) and RF (AUC: 0.82). The relative feature importance, calculated through MDA or Gini index, showed that the most important features to classify tau positivity were the cortical thickness of parietal and occipital lobes and delayed word recall test score. Another study used a multi-class convolutional neural network (MC-CNN) to predict A/T/N staging of 2000+ ADNI cases with known A/T/N status based on structural MRI alone [46]. It predicted "A" at an overall accuracy of 88%, "T" at 89%, and AI algorithms help clinical trials address the challenge of missing PET imaging data by predicting the abnormal protein aggregation in the scan. However, one group designed a 3-dimensional CNN architecture to complete missing PET patterns with MRI data from ADNI [47]. Deep CNNs are a type of multi-layer model capable of capturing nonlinear mapping between inputs and outputs [48]. The proposed architecture achieved AUC of 0.69 for MCI vs. NC, 0.68 progressive MCI (pMCI, MCI subjects who progress to AD) vs. stable MCI (sMCI, MCI subjects who remains stable), and 0.89 AD vs. NC, outperforming other missing estimation methods, such as KNN and Zero methods. Interestingly, the predictive performance improved when PET and MRI data were used together (MCI vs. NC: 0.76; pMCI vs. SMCI: 0.68; AD vs. NC: 0.93).
Efforts to reconstruct AI-based synthetic images especially focused on the generative adversarial network (GAN). In the adversarial training process of the GAN, the Generator generates fake images, and the Discriminator distinguishes real images from fake images until the Generator produces images that the Discriminator can no longer distinguish [49]. One study reconstructed plausible PET images from a gaussian noise distribution (2048dimensional noise), reporting MMD of 1.78 and SSIM of 0.53 [50]. Another study created synthetic PET images of patients in NC, MCI, and AD stages using deep convolutional GAN (DCGAN) [16]. DCGAN improves on the first GAN by ensuring a more stable training stage through measures, such as learnable upsampling and downsampling [51]. It achieved a PSNR of 32.83 and a mean SSIM of 77.48. Moreover, a 2D-CNN model using the axial, coronal, and sagittal slices of synthesized PET images classified NC and AD with an overall accuracy of 71.45%. Additionally, the cycle-consistent GAN (cGAN) to learn bi-directional mappings between PET and MRI scans [52]. Then a Landmarkbased Multi-model Multi-Instance Learning (LM 3 IL) network was developed to learn and fuse discriminative features of MRI and PET for AD diagnosis. The mean PSNR value of synthetic PET images from the 3D-cGAN model was 24.49, and the LM 3 IL method achieved an accuracy of 92.50% (Sens: 89.94%; Spec: 94.53%) for AD vs. NC and 79.06% (Sens: 55.26%; Spec: 82.85%) for pMCI and sMCI, which had a superior performance than a single-model variant of the proposed LM 3 IL method that only used MRI.
Limitations of PET and CSF as general population-screening tools have prompted the search for alternative ways to predict disease progression from more accessible tissues, such as blood. From the Australian Imaging, Biomarkers and Lifestyle (AIBL) cohort, 176 blood analytes and two ratios (Innogenetics Aβ 1-40 /Aβ 1-42 and Mehta Aβ 1-40 /Aβ 1-42 ) were considered along with age, gender, APOE genotype, and years of education in variable selection and model generation to predict continuous standardized uptake value ratios (SUVR) values through RF analysis [53]. The model achieved an AUC of 0.88 (Sens: 0.80; Spec: 0.82). Meanwhile, blood metabolites have also garnered considerable interest as a potential molecular fingerprint of disease progression [54,55]. Deep Learning (DL) and XGBoost algorithms were trained with metabolite data derived from 242 NC and 115 AD subjects and produced AUC of 0.85 and 0.88, respectively. By comparison, CSF measures of amyloid, p-tau, and t-tau using XGBoost achieved AUC of 0.78, 0.83, and 0.87, respectively, which highlights the potential of blood-based biomarkers.

MRI Biomarkers
Within the A/T/N framework, neurodegeneration reflects downstream effects of molecular AD pathology, closely correlating with cognitive and functional decline [56,57]. Among many neuroimaging modalities, structural MRI (sMRI) is widely used as a surrogate marker for neurodegeneration due to its relative availability, low costs, and good diagnostic accuracy [58][59][60][61]. Therefore, many have used AI to capture the spatial patterns of atrophy in MRI data to enhance the linkage between neurodegeneration and AD-related changes, which are shown in Table 3. MRI can reveal the anatomical differences between AD and NC to classify subjects into different stages of AD [62][63][64]. It can also detect MCI subjects who will convert to AD based on the temporal link between MRI abnormalities and the onset of cognitive impairment [65][66][67]. MCI subjects who convert to AD during the duration of the study are often categorized as pMCI, while those who remain in MCI or revert to NC are categorized as sMCI.
The hippocampus, one of the earliest areas to degenerate structurally, is a good indicator for detecting AD progression in stages before initial clinical expression [68,69]. One study modeled the shape of the hippocampus using spherical harmonics (SPHARM) coefficients, which were later used as features in a Radial Basis Functions kernels SVM (RBF-SVM) classifier [70]. It discriminated 25 elderly NC controls from 23 AD subjects with 94% accuracy (Sens: 96%; Spec 92%) and from 23 aMCI subjects with 83% accuracy (Sens: 83%; Spec: 84%). Another study proposed a fully automatic segmentation method of the hippocampus using anatomical and probabilistic information [71]. KNN algorithm was used to assign 605 ADNI subjects to the group whose mean was closest to the hippocampal volume of the participant; 76% of AD subjects and 71% of MCI subjects were correctly classified with respect to NC controls. One group compared the classification performance of SVM, artificial neural network (ANN), and Naïve Bayes (NB) classifiers [72]. Naïve Bayes classifiers are Bayes' theorem-based classifiers robust to irrelevant attributes and with strong independence assumptions [73]. The hippocampus was segmented into seven subfields using an atlas-based automatic algorithm based on Markov random fields in Considerable research has also focused on the relationship between in vivo cortical thickness measurements and AD neuropathology in asymptomatic subjects [74]. A study identified specific patterns of cortical atrophy at four time periods to subdivide the pMCI subjects based on "time to conversion" [75]. It built four stratified linear discriminant analysis (LDA) classifiers and increased classification performance by avoiding double dipping. When compared to sMCI, 80.9% accuracy was achieved with the pMCI6 classifier (converted to AD in 6 months), 74.5% with pMCI12, 73.0% with pMCI24, and 77.3% with pMCI36. Another study developed a spatial frequency representation of cortical thickness data for classification based on incremental learning [76]. Cortical thickness data were mapped onto a spatial frequency domain with the manifold harmonic transform from the surface of the cortex. The PCA-LDA classifier discriminated NC from AD (Sens: 82%; Spec: 93%) and pMCI to sMCI (Sens: 63%; Spec: 76%). A group improved their classification based on cortical thickness features by combining SVM and AdaBoost [77]. AdaBoost iteratively increases the weights of misclassified samples and decreases the weights of correctly classified ones to combine multiple "weak classifiers" into a single "strong classifier." The proposed method discriminated AD and NC with 84.38% accuracy, 4-10% higher than classical methods, such as SVM, LDA, and Gaussian mixture model (GMM). Beyond regional analysis, inter-regional covariation of cortical thickness has also been suggested for prognostic applications [78].
Other research focused on merging different biomarkers. Manual hippocampal volume measurement and automated global and regional volume measures were combined for the orthogonal partial least squares to latent structures (OPLS) analysis [79]. The combination of volume measures for AD vs. NC (Sens: 90%, Spec: 94%) resulted in higher sensitivity and specificity than hippocampal volume alone (Sens: 87%, Spec: 90%). NC vs. AD classification could also be performed through simultaneous patched-based segmentation [80]. The study segmented and graded the anatomical structures of the hippocampus and entorhinal cortex and used hippocampal and entorhinal volumes and grades (similarity of the patch surrounding the vortex), as well as their combinations, to find atrophic patterns. With LDA and quadratic discriminant analysis (QDA) as classifiers, hippocampal measures had more discriminating power than entorhinal measures, and 90% accuracy was achieved (which increased to 93% after adding the ages of subjects). Another study employed a greedy score-based feature selection technique to select important feature vectors (cortical thickness, surface area, folding indices, curvature indices, and volume) [81]. A regularized extreme learning machine (RELM) classifier, a type of learning algorithm implemented without iteratively tuning the artificial hidden nodes, achieved an accuracy of 57.56-61.20% for multi-class (AD, MCI, and NC) differentiation, higher than SVM (52.63-57.40%) and import vector machine (IVM) (54.90-55.50%).
Conventional ML, such as SVM, relies on laborious brain segmentation that requires complex image preprocessing techniques [82]. This challenge is addressed by DL approaches [83], which discover intricate structures in data without requiring prior feature selection or data preprocessing. A study developed 3D-CNNs whose first layer used filters learned with autoencoders [84]. The 3D-CNNs outperformed their 2D counterparts in 3-way (AD, MCI, and NC) and binary classifications (4-10%). Another study employed GoogLeNet [85] and Residual Network (ResNet) [86], the winners of ILSVRC in 2014 and 2015, respectively, for 4-way classification of AD, MCI, late MCI (lMCI), and NC [87]. Both GoogLeNet (99%) and ResNet (98%) achieved high accuracy. A recent study also proposed a deeply supervised adapTable 3 D-CNN (DSA-3D-CNN), with transfer learning for AD diagnosis [88]. The proposed classifier was built by stacking pre-trained 3D convolutional autoencoding layers followed by fully connected layers, which are fine-tuned for task-specific classification. It achieved accuracies from 94.2% to 100% in target domains (AD vs. MCI vs. NC, AD + MCI vs. NC, AD vs. NC, AD vs. MCI, and MCI vs. NC).
Ensemble-based classifiers [89] integrate individual decisions of multiple models to classify a new sample based on voting. A framework that ensembles three deep CNNs, each with slightly different configurations, was provided using the OASIS database [90]. Contrary to many existing approaches focusing on binary classification, the ensemble system classifies individuals into NC, very mild, mild, and moderate AD with an average precision of 94%. An ensemble of 3D densely connected convolutional networks (3D-DenseNets) was also proposed for AD and MCI diagnosis [91]. Dense connections were introduced to maximize information flow and improve feature utilization; with a dense connection mechanism, fewer feature increments are added to each layer, which reduces the number of parameters. It reached an accuracy of 0.9477 for NC vs. MCI vs. AD. Most DL research uses CNNs, which effectively find highly layered features and tune hyperparameters [92]. Nevertheless, an early work in this area used a deep belief network (DBN) [93] for AD vs. NC classification [94]. Manifold learning was performed to reduce the dimensionality of 3D MRI images from ADNI by discovering patterns of similarity and variability. Another work proposed THS-GAN, i.e., Tensor-train decomposition, Higher-order pooling, and Semi-supervised learning were employed in the GAN model to assess MCI and AD using the ADNI data [95]. The tensor train decomposition is applied to all layers in the Generator and the Discriminator, which reduces the number of parameters. The higher-order pooling, compared to the first-order pooling, leverages the second-order statistics of the holistic MRI images, which effectively captures long-range dependencies between slices of different directions. Moreover, the model is designed in a semi-supervised manner to take advantage of both labeled and unlabeled MR images. With optimal hypermeter settings, THS-GAN reached accuracies of 95.92% (AD vs. NC), 89.29% (MCI vs. NC), and 85.71% (AD vs. MCI). While MRI is widely used as a surrogate marker for neurodegeneration, multimodal approaches combining various modalities, such as computer tomography (CT) and single-photon emission computerized tomography (SPECT), if available, can improve predictive performance [96].

Randomization
Primary outcomes of AD clinical trials are often the absence of clinical progression (measured by scales, such as the Clinical Dementia Rating (CDR)) or cognitive deterioration (measured by NP test scores). However, many longitudinal studies have identified fast and slow AD progressors characterized by heterogeneous rates of cognitive and functional decline [97][98][99][100]. Randomization in clinical trials does not always allocate an equal proportion of rapid and slow progressors into control and intervention groups [101,102]. As Figure 2 shows, if rapid progressors are mainly selected for the intervention group and slow progressors for the control group, the reported treatment effect would seem as though the treatment had no significant impact, even if the treatment was, in fact, clinically efficacious. On the other hand, if slow progressors were mainly selected for the intervention and rapid progressors for the control group, the reported treatment would have overestimated the clinical efficacy. Therefore, an even allocation of rapid and slow decliners into intervention and control groups is desirable to reduce bias in treatment assignment and avoid the two aforementioned extreme scenarios that could explain the failures [103] and successes [104] of AD clinical trials in the last two decades.

Randomization
Primary outcomes of AD clinical trials are often the absence of clinical progression (measured by scales, such as the Clinical Dementia Rating (CDR)) or cognitive deterioration (measured by NP test scores). However, many longitudinal studies have identified fast and slow AD progressors characterized by heterogeneous rates of cognitive and functional decline [97][98][99][100]. Randomization in clinical trials does not always allocate an equal proportion of rapid and slow progressors into control and intervention groups [101,102]. As Figure 2 shows, if rapid progressors are mainly selected for the intervention group and slow progressors for the control group, the reported treatment effect would seem as though the treatment had no significant impact, even if the treatment was, in fact, clinically efficacious. On the other hand, if slow progressors were mainly selected for the intervention and rapid progressors for the control group, the reported treatment would have overestimated the clinical efficacy. Therefore, an even allocation of rapid and slow decliners into intervention and control groups is desirable to reduce bias in treatment assignment and avoid the two aforementioned extreme scenarios that could explain the failures [103] and successes [104] of AD clinical trials in the last two decades. Figure 2. Diagram of randomization in AD clinical trials. These three potential scenarios of randomization illustrate how an uneven allocation of rapid or slow progressors (oversampling or undersampling) into I and C can obscure the true treatment effect. Thus, AI applications in randomization can classify trial subjects into these clusters and help trials achieve even allocation. I, Intervention group (dotted line); C, Control group (solid line).
For a reliable observation of the intervention impact, many have focused on predicting rapid progression using multimodal biomarkers (Table 4). A recent study performed a multivariable LR analysis to classify 124 Aβ+ MCI subjects, with rapid progressors defined as those who converted to AD status in 3 years of follow-up [105]. Univariate logistic analysis of rapid and slow progressors showed no significant differences in demographic measures, but the biomarker characteristics between the two differed significantly. Two separate analyses were conducted for CSF p-tau and CSF t-tau due to the multicollinearity between the variables, and the models achieved AUC of 0.901 and 0.907, respectively. LR models suggested that MCI status, APOE4 status, corrected hippocampal volume (HV), [F18] fluorodeoxyglucose (FDG) PET SUVR, and CSF t-tau/p-tau were associated with fast Figure 2. Diagram of randomization in AD clinical trials. These three potential scenarios of randomization illustrate how an uneven allocation of rapid or slow progressors (oversampling or undersampling) into I and C can obscure the true treatment effect. Thus, AI applications in randomization can classify trial subjects into these clusters and help trials achieve even allocation. I, Intervention group (dotted line); C, Control group (solid line).
For a reliable observation of the intervention impact, many have focused on predicting rapid progression using multimodal biomarkers (Table 4). A recent study performed a multivariable LR analysis to classify 124 Aβ+ MCI subjects, with rapid progressors defined as those who converted to AD status in 3 years of follow-up [105]. Univariate logistic analysis of rapid and slow progressors showed no significant differences in demographic measures, but the biomarker characteristics between the two differed significantly. Two separate analyses were conducted for CSF p-tau and CSF t-tau due to the multicollinearity between the variables, and the models achieved AUC of 0.901 and 0.907, respectively. LR models suggested that MCI status, APOE4 status, corrected hippocampal volume (HV), [F18] fluorodeoxyglucose (FDG) PET SUVR, and CSF t-tau/p-tau were associated with fast AD progression, which was then used to construct nomograms where a specific point corresponds to each variable based on the beta coefficients of the regression analyses.
Another study used a DL model to differentiate rapid vs. slow progression in 321 ADNI subjects with baseline AT(N) biomarkers [106]. Rapid and slow progressors were first identified by applying an unsupervised time-series technique based on dynamic time-warping (DWT) [107] with Ward's linkage-based agglomerative clustering [108], which allows for shape-based clustering of dynamic time-varying observations. These progression phenotypes were used to train Parameter-efficient Network Model ( Clustering algorithms are also a popular approach to identify homogeneous clusters of rapid and slow progressors. A multi-layer clustering (MLC) algorithm was proposed in a study to identify clusters of rapid and slow progressors among 562 ADNI MCI subjects [109]. The MLC model consisted of two steps: (1) example similarity tables were computed for each data layer, and (2) an agglomerative bottom-up procedure used these tables to find the optimal clustering solution. The subgroup discovery technique identified the best classifiers (clinical test cut-offs on Alzheimer's Disease Assessment Scale (ADAS), MMSE, and Rey's Auditory Verbal Learning Test (RAVLT)), and the classifiers achieved high sensitivity (75.0-98.4%) and specificity (70.0-90.0%). Findings showed that fast progressors had twofold greater brain atrophy and converted to AD five times the rate of slow progressors. A hierarchical agglomerative clustering method was applied to MRI of a cohort of 751 MCI, 282 AD, and 428 NC [110]. The group preprocessed MRIs to gray matter density maps and regressed out age, gender, and years of education to render the maps comparable. The hierarchical clustering of MRIs discovered clusters of rapid and slow MCI progressors based on striking heterogeneities in brain atrophy patterns. Rapid progressors showed a higher degree of atrophy in the medial temporal lobe and cerebellum, while slow progressors manifested more atrophy in the frontal cortex.
Some groups have developed predictive algorithms without the use of PET and CSF. A ML classifier [76] trained with incremental learning was adopted to distinguish rapid and slow progressors in a longitudinal AD cohort [111]. The study used PCA for dimension reduction of the 27 high-resolution 3T brain MRIs and found coordinate axes that maximally separated the groups with LDA. It demonstrated that slow progressors showed discriminative patterns defined around the prefrontal and temporal cortices, while rapid progressors showed patterns in most of the prefrontal, inferior parietal, and temporal cortices. Another study used ADAS-Cog and MMSE, laboratory tests, and demographic information to train a Conditional Restricted Boltzmann Machine (CRBM) to forecast individual AD progression [112]. A CRBM [113] is a probabilistic neural network that learns from a joint probability distribution of features. Differences between rapid and slow progressors quantified using the absolute value of Cohen's d-statistic showed that, while the majority of baseline features were not associated with rapid AD progression, strong associations were found with cognitive tests based on recall and word recognition. For instance, subjects with poor performance on the ADAS word recall tended to progress more rapidly. One group also reported using an unsupervised separation algorithm based on the genetic algorithm technique to uncover two distinct rates of AD progression characterized by the functional assessment staging (FAST) procedure [114].
Most research has investigated the heterogeneous progression based on clinical and functional measures, which are the primary outcomes in clinical trials. However, there are certain advantages of predicting rapid vs. slow biomarker progression. Firstly, AD clinical trials utilize biomarkers as secondary outcomes. Secondly, the knowledge of rapid and slow biomarker progression is important to correctly estimate the true effect size of anti-amyloid and other AD treatments. An uneven allocation of rapid and slow Aβ progressors could either dilute or exaggerate the clinical efficacy of the treatment. One group combined the baseline clinical, genetic, and imaging features from 610 unique ADNI subjects to identify those at the highest risk of rapid Aβ deposition [115]. They used a CNN based on the ResNet architecture [116] to identify important baseline amyloid PET image features, which were combined with eight clinical, demographic, and genetic markers to predict future SURVR values using a gradient-boosted decision tree (GBDT) algorithm. The combined model achieved a RMSE of 0.0339, which outperformed multivariate linear regression (0.0382) and GBDT without imaging features (0.0355). Moreover, the highest percentage of fastest Aβ progressors was predicted with the proposed method (37.7%), superior to other selection methods, such as Aβ+ cases with at least one APOE ε4 allele (15.7%).

Challenges and Future Directions
AI systems with large-scale data have facilitated the development of disease prediction that can potentially reduce the screen failure rate of clinical trials [39]. Furthermore, identifying suitable participants in trial recruitment contributes to reducing associated expenses and accelerates drug developments [117]. However, it is important to acknowledge several challenges for its applications in clinical trials.
Advanced AI models derived from high-quality databases often showed good predictive performance; additional information from explainable and transparent AI technology might further the understanding of biomedical data and improve their applications in clinical trials. A common form of a visible machine learning algorithm, such as a graphical neural network might provide structural connections between different medical entities (e.g., diseases, drugs, and proteins). For example, GNNexplainer identifies a small set of important variables and genetic pathways that contribute to human disease [118]. Identification of disease mechanisms through the multiscale interactome has facilitated efficacious and safe therapeutic development. In addition, earlier access to the drug candidates could help improve the time and expenditure of prescreening process in clinical trials. Thus, developing an explainable and transparent AI system would substantially benefit both the speed and efficiency of clinical trials and drug discovery. Another challenge is the limited generalizability that arises from the lack of external validations, which reduces the confidence in the predictive power of AI algorithms. External and internal validations are crucial for the development of a reliable algorithm. Internal validation methods, such as bootstrap and cross-validation quantify the algorithm optimism and provide information about the degree of overfitting, whereas external validation uses independently derived data to ensure generalizability. Review studies, however, have demonstrated that a substantial number of studies did not perform any validation or performed either external or internal validation [119]. Internal validation methods could be limited by small sample size. A minimum of 300 subjects is generally recommended for internal validation, but categorical data in AD clinical trials, such as brain imaging measurements, could be limited due to the associated costs and time [120]. For external validation, over-reliance on one cohort population and unavailability of similar but the difference in cohort populations remains a challenge. These shortcomings limit the clinical relevance of AI despite its promising computational results.
Further studies need to focus on improving the efficiency and effectiveness of AI techniques for AD clinical trials. Firstly, AI technologies, such as visible neural networks, could incorporate the inner workings of AI models into complex and hierarchical biological systems [117,121]. AI models can be enriched with biological knowledge, which includes multilevel interactions composed of sequences, protein complexes, cells, tissues, organs, and organisms. Compared to the current deep learning schemes to model the entire system at once, this approach models how various AD-related entities interact with each other at different levels to develop the multiscale interactome for AD drug candidates. Moreover, these models could leverage genetic and genomic data to identify genetic determinants of AD to guide therapies with individuals' genomic profiles, which allows for precision medicine and personalized treatment. Secondly, rigorous external validations in other populations with greater diversity are necessary to assess generalizability and reproducibility. Finally, given that the ML efficiency increases as the quantity and quality of data increases, the integration of genomics, proteomics, and other omics data in the AD clinical research could help investigate molecular pathways of AD, with potential implications for novel diagnostic biomarkers and precision medicine [122].

Conclusions
Clinical trials for AD face challenges of high screen failure and even allocation of the heterogeneous subject population. Many recent works have investigated the potential applications of AI to address these challenges in clinical trials, particularly in the steps of eligibility assessment and randomization. The prediction of protein and MRI AD biomarkers in the prescreening process could drastically reduce the high screen failure rate. Additionally, the AI-based stratification of the AD subject population into rapid and slow progressors can guide the even allocation of the heterogeneous AD population into intervention and control groups during randomization. AI algorithms have not been integrated into AD clinical trials due to the lack of explainability and poor external and internal validations. However, integrating biological knowledge to develop the multiscale interactome and rigorous external validations for generalizability and reproducibility could result in novel diagnostic biomarkers and precision medicine.