1. Introduction
Dementia is characterized clinically by a gradual decline in multiple cognitive domains, including language, memory, executive, and visuospatial functions, which ultimately lead to an inability to perform instrumental and/or basic activities of daily living and ultimately death. Alzheimer’s disease (AD) accounts for up to 80% of all dementia diagnoses [
1] and is characterized pathologically by extracellular amyloid plaques, intraneuronal neurofibrillary tangles, and neurodegeneration [
2]. Mild cognitive impairment (MCI) usually precedes Alzheimer’s disease [
3], and individuals with MCI have an increased risk of dementia with age [
4]. Imaging biomarkers for AD and MCI include both positron emission tomography (PET) (e.g., using amyloid and tau tracers) and magnetic resonance imaging (MRI) methods, the latter of which are used to assess structural, microstructural, and other pathophysiological characteristics of AD [
5].
MR-based diffusion tensor imaging (DTI) reports on microstructural properties of white matter (WM), and the derived metrics have been applied extensively as neuroimaging biomarkers to study a range of clinical conditions [
6], including AD and MCI [
7]. DTI-derived metrics, such as fractional anisotropy (FA) and axial and radial diffusivities (AxD and RxD, respectively), are indicative of water diffusion around WM tracts, which is radially restricted by the myelin sheath surrounding axons. In the context of MCI and AD, DTI metrics have demonstrated microstructural abnormalities in several WM areas, including the cingulum, fornix, corpus callosum (CC), and uncinate fasciculus (UF), in addition to temporal, occipital, and frontal WM [
8,
9,
10]. Cognitive scores, including those relating to memory and executive function, have been found to correlate with DTI-derived metrics [
11], suggesting a microstructural component to cognitive changes. Additionally, DTI has been shown to be sensitive to WM degeneration in the early stages of AD, including MCI [
12]. It is important to note, however, that pathologically proven direct correlations between WM changes and DTI metrics remain elusive.
Although DTI is widely used, the results can be sensitive to data acquisition, pre-processing, and analysis. With recent improvements in scanner hardware and image acceleration, DTI acquisitions now typically include a large number of DTI directions (>30) and/or multiple shells (as a function of b-value). Permutations of DTI analysis include various image coregistration/normalization procedures [
13], and the choice of regional analysis, voxel-based analysis, or skeletonized analysis (also called tract-based spatial statistics (TBSS)) [
14]. Methodological differences can lead to varying results, even within the same cohort of subjects [
13]. This is true across neurodegenerative disease states, including Parkinson’s disease [
15].
Another important factor in DTI analysis is the fitting procedure used to estimate the diffusion tensor [
13]. Linear least squares (LLS) regression is the most basic model used in diffusion MRI for the estimation of diffusion parameters. Higher accuracy can be obtained by using weighted linear least squares (WLLS) regression, but this method is slower than LLS. Non-linear least squares regression (NLLS) and other more robust estimators, such as robust estimation of tensors by outlier rejection (RESTORE) [
16] or informed RESTORE (iRESTORE) [
17] methods, can also be implemented. While software tools are freely available to perform these model fits, to the best of our knowledge no study has systematically investigated the impact of differing fitting methodologies on the resulting WM integrity metrics in AD and MCI subjects. Indeed, over-fitting is a common error likely present in many reported studies that are not either internally or externally validated with independent cohort/fitting analyses.
In this study, we examined group-level differences in WM integrity, as quantified by FA, across AD, MCI, and healthy control (HC) subjects obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (
https://ida.loni.usc.edu/ accessed on 5 February 2021). Three different DTI acquisitions from two scanner vendors (Siemens 30, General Electric (GE) 48, and Siemens 54 directions) were compared using nine commonly utilized fitting procedures with voxel-based analysis (VBA). Correlations between FA and cognitive scores, including the Montreal Cognitive Assessment (MoCA) [
18], the Mini-Mental State Examination (MMSE) [
19], and the Alzheimer’s Disease Assessment Scale (ADAS) [
20] were also assessed. The intraclass correlation coefficient (ICC) [
21] was used to evaluate the consistency across fitting algorithms. The aim of this study is to evaluate the impact of different dMRI acquisitions and DTI fitting procedures on FA measures in the context of AD. These findings may be relevant in the interpretation of microstructural changes associated with Alzheimer’s pathology, and more broadly in other neurodegenerative pathologies.
4. Discussion
In this study, we analyzed WM microstructural differences using FA maps across three aging populations (HC, MCI, and AD) and compared the differences using three DTI acquisitions (from two vendors) and nine DTI fitting algorithms. Due to the different number of subjects within each group, the effect-size (reported as
η2-p from ANCOVA) was used to compare the magnitude of differences across acquisitions, analyses, and groups [
27,
28]. Through this analysis, differences across groups were observed using all acquisitions and fits. However, differences were also observed across acquisitions and fitting methods. For example, fewer significant clusters of group-wise difference were found with SI54 compared with GE48 and SI30, where the latter generally showed larger and more clusters. Within each acquisition, consistency across all fitting algorithms was high, although results of LLS–AFNI were the least similar relative to the other fitting algorithms, followed by CAMINO–RESTORE.
Distinguishing between HC and MCI has important implications for early disease detection, particularly with the development of potential disease-modifying therapies that are likely to be most effective in the early stages of the disease. In terms of the ability to distinguish between groups, it is important to note that differences between AD and MCI groups were not found with SI54 and that differences between HC and MCI were not found with SI30, suggesting that the ability to detect group-wise differences may depend on the acquisition (including the number of directions). The SI30 acquisition included only 30 diffusion-encoding directions, which is close to the lower “limit” of orientations for robust anisotropy estimation [
33]. More specifically, previous simulations have demonstrated that at least 20 unique sampling orientations are needed for anisotropy estimation, and at least 30 tensor orientations are required for robust estimation of tensor orientation. Schemes with a lower number of sampling orientations may introduce bias and spurious correlations between tensor orientation and apparent diffusion characteristics. In general, a higher number of diffusion directions will yield more robust DTI-derived metrics. However, for the comparison between HC and MCI, only minimal differences between these groups were detected using the other acquisitions (GE48: left CCG; SI54: splenium of CC and fornix). However, as all three acquisitions were not acquired in the same cohort, the impact of different individual study populations cannot be discounted.
Variability related to different fitting procedures of DTI data can be traced to the underlying regression methods. The LLS regression is the most basic and generally the fastest model used in diffusion MRI for the estimation of diffusion parameters. This method incorrectly assumes that data outliers are homogeneously distributed; therefore, it does not appropriately de-weight their contributions. On the other hand, WLLS assigns a weight according to how much the original noise variation is affected by the logarithmic transform of the data. This fitting algorithm is slightly slower than LLS but is more precise. In NLLS algorithms, the estimation of the tensor is performed directly from the signal, and iterative regression is used to minimize the error between predicted and observed signal intensity. Therefore, these methods are slower and more computationally expensive compared to LLS and WLLS. LLS, WLLS, and NLLS fits only take into account the signal variability produced by thermal noise, but signal variability is also influenced by physiological noise, which varies both spatially and temporally. Physiological noise can be associated with subject motion, cardiac pulsation, respiration, and/or system instabilities. Physiological noise does not have a known parametric distribution and is usually addressed statistically by including different robust estimators, such as in the RESTORE method. In this study, we found good agreement across all fits, although both CAMINO–RESTORE and AFNI–LLS had less similarity compared with the other fits. Within one regression type (e.g., linear), differences across software tools could be attributed to different default choices of the underlying fitting algorithms, initial guesses, upper and lower bounds, and sensitivity to local minima. For higher consistency, we recommend using the default option in AFNI, which is NLLS. On the other hand, as CAMINO–RESTORE reduces the effect of physiological noise, the use of this method could improve tensor estimation, particularly in aging populations that may exhibit higher degrees of physiological noise [
34].
Many studies have used DTI analysis in the assessment of AD (and in some cases MCI) cohorts [
8]. However, no previous studies to our knowledge have compared results obtained from different DTI fitting procedures and from different acquisitions in the context of this disease. In healthy controls and patients with Tourette syndrome, Maximov et al. demonstrated that the agreement and reliability of TBSS results depended on the applied DTI fitting algorithm [
29], while Bergamino et al. showed differences in DTI results in the analysis of depression when different DTI fits and analyses were employed [
35]. Both authors recommended the analysis of DTI data using different fitting algorithms to have more robust and accurate results, in the absence of a ground-truth [
36]. Correspondingly, we found robust differences across groups primarily inside the CC and the fornix when all results were combined. More specifically, differences between AD and HC were found in the CCG, forceps minor and major, CC and fornix, while differences between AD and MCI were found in the forceps major, CC, fornix, and ACR. Interestingly, no differences were found between HC and MCI using FA, consistent with other studies leveraging ADNI data [
37]. However, other DTI-based biomarkers such as axial and radial diffusivities may be more sensitive to the early subtle changes associated with MCI than composite biomarkers like FA. We also found two clusters where FA in AD was higher than in HC in the right CST and in the right posterior limb of IC, which may be the result of a loss of crossing fibers due to AD-related neurodegeneration [
38].
The fornix is the major output tract of the hippocampus and thus plays a critical role in memory function. Given its anatomical and functional importance, fornix pathology has been implicated in both MCI and AD [
39]. In a cross-sectional study, Mielke et al. found AD patients have lower FA in the fornix than both HC and MCI [
40]. Lower FA values, coupled with higher mean diffusivity, were also observed in the fornix and splenium in AD compared with MCI and HC cohorts [
41]. In the present study, we found large clusters of reduced FA in the fornix for AD compared to HC (covering approximately 89% of the fornix) and MCI (covering approximately 83%). This may indicate that the fornix is a critical WM area involved in AD pathology, though definitive post-mortem studies directly comparing DTI and pathological changes remain elusive.
The role of the CC in AD is less consistent than that of the fornix. For instance, several authors did not observe any changes in DTI-related metrics in the CC in AD subjects [
42,
43,
44], while other studies have shown reduced FA in AD subjects in the posterior regions of the CC [
45,
46] or in the anterior region of the CC [
45,
47]. Additionally, Xie et al. found lower FA in the genu and anterior body of the CC [
48], while Preti et al. found differences in the CC between AD and both HC and MCI cohorts [
49]. Combining all fits and all acquisitions, we found lower FA values in AD compared to HC in the genu, body, and splenium of the CC (all clusters covered about 38% of the CC) and similar but smaller clusters for the AD–MCI comparison (about 5% of the CC). This may mean that WM microstructural changes in the CC occur later in the pathological cascade.
There are several limitations in this study. First, we only used single-shell DTI acquisitions available from ADNI, and thus, we were not able to evaluate different fitting algorithms with multi-shell data, which may yield improved microstructural metrics. While ADNI is continuously expanding, at the time of data analysis, only one multi-shell acquisition was available (with 126 directions), and with very few AD subjects. Second, the data available in ADNI for this study were not matched across groups and acquisitions; for example, there were fewer AD subjects relative to HC and MCI, especially for SI54. As a primary goal of ADNI is to develop biomarkers for the earliest phases of AD, higher numbers of HC and MCI can be expected. ANCOVA may not be appropriate when variances are heterogeneous across groups, particularly in the presence of unbalanced sample sizes; however, when sample sizes are unequal and variance is not heterogeneous, as verified in the present study, ANCOVA remains the recommended test [
26]. Additionally, to account for unequal sample sizes across acquisitions, we provide the main results as effect sizes, which reflect the magnitude of the differences. Finally, this study only analyzed FA, which is the most commonly used DTI index [
15]. Other DTI related metrics, such as radial, axial, and mean diffusivity, may also be of interest, and work is ongoing to assess the impact of different fitting methods on those parameters.
In conclusion, we found differences in the FA maps derived from different dMRI acquisitions and from different DTI fitting methods in the analysis of AD and MCI subjects, compared to HC. These differences were more consistent across fits as the number of directions increased, suggesting that aging studies could be improved with more dMRI directions and that acquisitions with less than 30 directions should be avoided. In terms of the fits, we found that AFNI–LLS had the least similarity compared with the other algorithms, while all other fitting algorithms produced highly similar results, suggesting flexibility for end-users to choose between LLS, WLLS, and NLLS. CAMINO–RESTORE also produced less similar results, but this may be due to compensation for physiological effects. By combining all acquisitions and all fits, we observed differences between AD and HC and between AD and MCI, particularly in the fornix and CC, suggesting robust changes in FA in these regions in AD. Researchers should be aware that potential differences in the results related to the choice of DTI fit type may have implications for comparisons across studies with varying methodologies, particularly in the context of subtle WM changes associated with early AD pathology. Overall, these results show that identifying the most robust DTI analysis methods, including the choice of fitting algorithm and software tool, is a critical step to provide more reliable DTI-based neuroimaging biomarkers for assessing microstructural changes in AD.