A Systematic Review and Meta-Analysis of Cerebrospinal Fluid Amyloid and Tau Levels Identifies Mild Cognitive Impairment Patients Progressing to Alzheimer’s Disease

Reported levels of amyloid-beta and tau in human cerebrospinal fluid (CSF) were evaluated to discover if these biochemical markers can predict the transition from Mild Cognitive Impairment (MCI) to Alzheimer’s disease (AD). A systematic review of the literature in PubMed and Web of Science (April 2021) was performed by a single researcher to identify studies reporting immunologically-based (xMAP or ELISA) measures of CSF analytes Aβ(1-42) and/or P-tau and/or T-tau in clinical studies with at least two timepoints and a statement of diagnostic criteria. Of 1137 screened publications, 22 met the inclusion criteria for CSF Aβ(1-42) measures, 20 studies included T-tau, and 17 included P-tau. Six meta-analyses were conducted to compare the analytes for healthy controls (HC) versus progressive MCI (MCI_AD) and for non-progressive MCI (Stable_MCI) versus MCI_AD; effect sizes were determined using random effects models. The heterogeneity of effect sizes across studies was confirmed with very high significance (p < 0.0001) for all meta-analyses except HC versus MCI_AD T-tau (p < 0.05) and P-tau (non-significant). Standard mean difference (SMD) was highly significant (p < 0.0001) for all comparisons (Stable_MCI versus MCI_AD: SMD [95%-CI] Aβ(1-42) = 1.19 [0.96,1.42]; T-tau = −1.03 [−1.24,−0.82]; P-tau = −1.03 [−1.47,−0.59]; HC versus MCI_AD: SMD Aβ(1-42) = 1.73 [1.39,2.07]; T-tau = −1.13 [−1.33,−0.93]; P-tau = −1.10 [−1.23,−0.96]). The follow-up interval in longitudinal evaluations was a critical factor in clinical study design, and the Aβ(1–42)/P-tau ratio most robustly differentiated progressive from non-progressive MCI. The value of amyloid-beta and tau as markers of patient outcome are supported by these findings.


Alzheimer's Disease in Context
As our global societies evolve, it is well-documented that the average age of the human population is increasing both locally and globally [1]. Age is a significant risk factor for cognitive impairment and dementia [2,3], so increased incidence of these conditions is being seen in association with an ageing population.
Alzheimer's Disease (AD) is the most common cause of dementia. Clinically observable characteristics of this disorder include memory loss, decline in cognitive function, and changes in behavioral patterns. Further, AD is identified as the greatest cause of death without an effective disease-modifying therapy [4,5]. Efforts to develop drugs to treat AD have had a high failure rate [6].
According to the analysis of the 2000 census and subsequent population projections [7], there were 4.5 million AD patients in the United States (US) in 2000, where 1.8 million of them were ≥85 years old. Hebert et al. estimated that by 2040 there would be 11.0-12.8 million patients (by middle-series or high-series estimation respectively) with over 50%

Overview of Study
In the following work, a systematic review and quantitative meta-analyses were performed to test relationships between these three potential biomarkers in CSF (Aβ , T-tau, and P-tau) and the evolution of AD in longitudinal evaluations of levels relative to baseline, using prior-published experimental data. The primary focus of the analysis was on the period describing the transition of a patient from MCI to AD, where it is critical to discover the main biomarker characteristics that differentiate patient outcomes for those who have a stable form of MCI, and those who progress to a confirmed diagnosis of AD. We report highly statistically significant differences (p < 0.0001) for the standard mean difference for all six meta analyses performed in this study, confirm that those MCI patients who were stable tended to have slightly higher Aβ(1-42) levels than healthy controls, and that levels were significantly lower in MCI patients who progressed to develop Alzheimer's disease. The opposite was observed for P-tau and T-tau levels, where MCI patients progressing to develop Alzheimer's disease exhibited the highest levels compared to non-progressive MCI and healthy controls. The data suggest that using these markers, the Aβ(1-42)/P-tau ratio gives the most robust indicator of a patient transitioning from MCI to AD, and the follow-up period for longitudinal evaluations is identified as especially critical to clinical study design for this purpose.

Search Strategy
We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The study is registered in INPLASY; registration number 202270020. The systematic review, and subsequent meta-analyses, were performed as described in Figure 1 and Appendix A respectively, to include papers extracted from the PubMed and Web of Science databases, where these were published in English and dated 1994 onwards (to include works incorporating the association of APP with Alzheimer disease).
The review specifically focuses on longitudinal (instead of cross-sectional) studies that included measurements repeated after baseline at one or more timepoints, so that levels can be tracked in a cohort. The clinical diagnoses reported according to recognized criteria were accepted for the purposes of this analysis, rather than requiring validation in the form of neuropathological assessment at autopsy. Further, this study only focuses on Alzheimer's Disease, so progression from MCI to disorders other than Alzheimer's (including Parkinson's related disorders) is excluded.
(including Parkinson's related disorders) is excluded.
The following search query terms were used in PubMed ('Title') and Web of Science ('Topic') respectively, with the search being completed in April 2021: ((((Alzheimer) OR (AD) OR (MCI) OR (mild cognitive impairment)) AND ((CSF) OR (Cerebrospinal fluid)) AND ((biomarker) OR (iron) OR (metal)) AND ((longitudinal) OR (follow-up))) NOT (Review)) NOT (Parkinson).  [28], using the PubMed and Web of Science databases as detailed in the main text, for records published in or after 1994, with the record identification (data extraction) completed in April 2021. * For reports excluded for multiple reasons, only the primary exclusion criterion is counted here (i.e., each excluded report is only counted once). The primary reasons for exclusion of reports were as follows. Reason 1: Diagnostic outcome information was insufficient (did not explicitly consider the transition from Mild Cognitive Impairment to Alzheimer's disease); Reason 2: Data parameters prevented inclusion (e.g., an incomplete dataset for the purpose of this study, or because data could not be converted into mean ± standard deviation at baseline such as where ratios between markers were  [28], using the PubMed and Web of Science databases as detailed in the main text, for records published in or after 1994, with the record identification (data extraction) completed in April 2021. * For reports excluded for multiple reasons, only the primary exclusion criterion is counted here (i.e., each excluded report is only counted once). The primary reasons for exclusion of reports were as follows. Reason 1: Diagnostic outcome information was insufficient (did not explicitly consider the transition from Mild Cognitive Impairment to Alzheimer's disease); Reason 2: Data parameters prevented inclusion (e.g., an incomplete dataset for the purpose of this study, or because data could not be converted into mean ± standard deviation at baseline such as where ratios between markers were reported, or because the population sampled was too small <30); Reason 3: Analysis in a report replicated that in one or more other reports meeting the inclusion criteria (e.g., where there were multiple studies evaluating the ADNI dataset, or the report was a review of other studies).

Inclusion and Exclusion Criteria
Studies measuring CSF levels of Aβ  and/or tau (T-tau, and/or P-tau) were included. It was also necessary for the studies to include the values for these analytes at baseline (initial visit) and then at one or more subsequent time points for the healthy controls and the patients along with their diagnostic status (MCI, AD, or another form of dementia). To be included, the studies also needed to state which criteria were used for the diagnosis of MCI and AD. For CSF analyte levels, the study needed to document not only the average values but also the standard deviation or interquartile range for each measurement. The study population also needed to be stated for each diagnostic group. Studies that did not meet these criteria were excluded: for example, Zetterberg et al. [29] was excluded as analyte concentration data were reported with 95% confidence interval and, therefore, did not include the necessary statistical information to be included in the present analysis.
Where multiple independent studies were performed with a cohort, all studies except for the latest available study were excluded on the basis that the latest study should contain the most up-to-date information and methods. (Notably, for the ADNI dataset, papers except for Spencer et al.'s 2019 study [30] were excluded. For example, Cui et al. [31] was excluded because this study used the same dataset from ADNI as Spencer et al. subsequently did in 2019). The methods used to assess cognitive function (e.g., MMSE) were not taken into account in the inclusion/exclusion criteria, but the analytical methods for the measurement of the analytes were examined, and for the example of the Malmö University Hospital's dataset, both xMAP and ELISA methods were used; after evaluation it was determined that both should be retained in our analysis [32,33].
Studies of MCI patients progressing to forms of dementia that were not explicitly classified as AD were excluded. Studies that did not include measures of CSF T-tau, P-Tau, or Aβ(1-42) were also excluded. Additionally, studies were excluded where the data were reportedly providing the confidence interval of the median, which prevented inclusion because those data could not then be transformed into the mean and standard deviation.
After application of the exclusion and inclusion criteria, 23 studies in total were included. Of these, 22 provided measures of amyloid in CSF, 20 provided measures of T-tau, and 17 provided measures of P-tau. Of the 17 studies providing P-tau measures, 13 explicitly stated that this was P-tau181, and in the remaining four studies the measurement of P-tau181 was indicated or implicit from the date of the report [34], choice of assay [35], or the literature cited in connection with the P-tau measure [36,37].

Statistical Analysis
The format of the data, as provided in the source papers, sometimes gave direct access to the average and standard deviation values for the analytes, but in other cases it was in the form of the median and accompanying range, or median with first and third quartile values. In the latter cases, the method previously published by D. Luo and X. Wan [38,39] was used to estimate the average and standard deviation to align the datasets to enable a meta-analysis. Where the source data were available as the median and inter-quartile range, they were assumed to follow a normal distribution with standard deviation equal to the inter-quartile range divided by 1.35.
Hedges' g values were calculated to determine the effect size for the datasets included in this study, using a random effect size model. In this context, the Hedges' g value was defined as positive for the scenario where patients with progressive MCI had a higher CSF baseline measurement than the other groups with which they were compared (nonprogressive MCI or health control) for each selected biomarker. To detect publication or other biases, funnel plots were used. Heterogeneity across studies was primarily accessed by Higgins' I 2 .
Full details of the meta-analysis are set out in Appendix A.

Use of Diagnostic Criteria to Define MCI and AD
The majority of the studies included in the systematic review exclusively used Petersen's criteria for MCI for diagnostic purposes. For the period of this review (since 1994), we note the evolution of Petersen's criteria from 1995 to 2011 as summarized in Figure 2 [40][41][42].
Biomedicines 2022, 10, x FOR PEER REVIEW 6 of 40 other biases, funnel plots were used. Heterogeneity across studies was primarily accessed by Higgins' I 2 .
Full details of the meta-analysis are set out in Appendix A.

Use of Diagnostic Criteria to Define MCI and AD
The majority of the studies included in the systematic review exclusively used Petersen's criteria for MCI for diagnostic purposes. For the period of this review (since 1994), we note the evolution of Petersen's criteria from 1995 to 2011 as summarized in Figure 2 [40][41][42].
In the evolution of the diagnostic criteria for MCI, it was mentioned in Petersen's 2011 criteria that MCI could be non-amnestic or amnestic [42]. Under the early criteria for diagnosing MCI, the term "MCI" was equivalent to "amnestic MCI" in the latter definition. Overall, however, it appears that the diagnostic criteria have been generally consistent and are not expected to introduce a significant systematic error in the context of the present analysis. Herukka and co-workers used the CDR (clinical dementia rating) score, where patients were diagnosed as MCI if they had a CDR score of 0.5 and were performing below the age-adjusted norms in at least one cognitive domain in any of memory, language, attention, and executive function or global function [34]. Brys and coworkers used CDR score and GDS (Global Deterioration Scale) for the diagnosis of MCI [43].
In determining the diagnostic criteria for AD, all the research included in this study used at least one of the following criteria: NINCDS-ADRDA, DSM-IV, or DSM-III-R. Hansson and Buchhave combined DSM-III-R with NINCDA-ADRDA as Buchhave's work was an extension of follow-up of Hansson's [32,44].

Methods to Determine Analyte Concentration in CSF
There are two major types of CSF analysis method reported in the studies identified in the systematic review. These are xMAP (multi-analyte profiling) and ELISA (enzyme linked immunosorbent assays) method. In presenting the results graphically, the findings were sorted first by analysis method (xMAP or ELISA) and then by year of publication. Both methods provide reported values as an absolute value (the mass of analyte per unit volume of CSF), and as longitudinal change is the parameter of interest, both xMAP and ELISA data are included here as reported in the source papers.
In the evolution of the diagnostic criteria for MCI, it was mentioned in Petersen's 2011 criteria that MCI could be non-amnestic or amnestic [42]. Under the early criteria for diagnosing MCI, the term "MCI" was equivalent to "amnestic MCI" in the latter definition. Overall, however, it appears that the diagnostic criteria have been generally consistent and are not expected to introduce a significant systematic error in the context of the present analysis. Herukka and co-workers used the CDR (clinical dementia rating) score, where patients were diagnosed as MCI if they had a CDR score of 0.5 and were performing below the age-adjusted norms in at least one cognitive domain in any of memory, language, attention, and executive function or global function [34]. Brys and co-workers used CDR score and GDS (Global Deterioration Scale) for the diagnosis of MCI [43].
In determining the diagnostic criteria for AD, all the research included in this study used at least one of the following criteria: NINCDS-ADRDA, DSM-IV, or DSM-III-R. Hansson and Buchhave combined DSM-III-R with NINCDA-ADRDA as Buchhave's work was an extension of follow-up of Hansson's [32,44].

Methods to Determine Analyte Concentration in CSF
There are two major types of CSF analysis method reported in the studies identified in the systematic review. These are xMAP (multi-analyte profiling) and ELISA (enzyme linked immunosorbent assays) method. In presenting the results graphically, the findings were sorted first by analysis method (xMAP or ELISA) and then by year of publication. Both methods provide reported values as an absolute value (the mass of analyte per unit volume of CSF), and as longitudinal change is the parameter of interest, both xMAP and ELISA data are included here as reported in the source papers.
x FOR PEER REVIEW 7 of 40      In comparing the studies performed by xMAP and ELISA, there is, by observation, evidence of systematic differences between the concentrations detected with the two different analytical methods. The advantage of using xMAP for CSF analyte measurement is that it enables simultaneous tracking of multiple biomarkers. Compared to the classical singlebiomarker ELISA technique, xMAP reduces working time and still achieves correlations for P-tau and T-tau but not for Aβ42 [12,59]. However, it does present a problem for the comparison of absolute values between studies. In examining published data, we found moderate correlations between ELISA and xMAP measurements [59] of Aβ1-42 (r = 0.47) and stronger correlations for tau (r = 0.87 for P-tau, r = 0.96 for T-tau). Shaw and co-workers reported that concentrations measured by ELISA equate to~200% of Aβ1-42,~400% of T-tau, and~125% of P-tau as measured by xMAP [60]. The same trend is observed in the data we present in Figures 3-5, where the greatest difference in reported concentrations for xMAP versus ELISA measurements is evident for T-tau ( Figure 4). We compared the xMAP analytical approach in these T-tau measures. Spencer and co-workers [30] used data from ADNI, where the CSF sample analysis is described according to the published ADNI protocol [30]. For Hertze's analysis of baseline CSF, the patients were from Malmo University Hospital [47]. The systematic difference between xMAP technology and ELISA measurement was adjusted for by some (e.g., Bjerke, Mattsson, Hertze, Figure 3) but not necessarily by others (e.g., Palmqvist, Spencer, Figure 3), leading to a big offset in the values reported by xMAP for Aβ1-42. Indeed, in the study by Palmqvist, the effect of xMAP on absolute concentrations is not discussed [48]. While there is no accurate conversion factor to equate xMAP and ELISA data, the important factor in this present study is the relative change in each analyte within a group, so here it is deemed valid to retain both the xMAP and ELISA data for the analyte comparisons between healthy controls (HC), non-progressive MCI (Stable_MCI), and MCI that progresses to AD (MCI_AD). In comparing the studies performed by xMAP and ELISA, there is, by observation, evidence of systematic differences between the concentrations detected with the two different analytical methods. The advantage of using xMAP for CSF analyte measurement is that it enables simultaneous tracking of multiple biomarkers. Compared to the classical single-biomarker ELISA technique, xMAP reduces working time and still achieves correlations for P-tau and T-tau but not for Aβ42 [12,59]. However, it does present a problem for the comparison of absolute values between studies. In examining published data, we found moderate correlations between ELISA and xMAP measurements [59] of Aβ1-42 (r = 0.47) and stronger correlations for tau (r = 0.87 for P-tau, r = 0.96 for T-tau). Shaw and co-workers reported that concentrations measured by ELISA equate to ~200% of Aβ1-42, ~400% of T-tau, and ~125% of P-tau as measured by xMAP [60]. The same trend is observed in the data we present in Figures 3-5, where the greatest difference in reported concentrations for xMAP versus ELISA measurements is evident for T-tau ( Figure 4). We compared the xMAP analytical approach in these T-tau measures. Spencer and co-work-

Aβ1-42/T-Tau and Aβ1-42/P-Tau Ratio
As shown in Figure 6, a number of studies report ratios of Aβ42/T-tau and Aβ42/Ptau. This data format does not enable the use of the data in the following meta-analyses, except where there was access to the separate datasets for the three analytes, so the ratio information shown here is retained for discussion purposes but not included in subsequent calculations.
Biomedicines 2022, 10, x FOR PEER REVIEW 9 of Figure 6. The ratio of Aβ1-42 to (a) T-tau and (b) P-tau in CSF, using data for mean and standa deviation (upper bound shown as error bar) taken directly from the sub-set of papers in the syste atic review that included this information [32,37,44,48,54,55,57,61].

Follow-Up Duration
There are marked differences in study duration for the different studies included this work. The length of follow-up was converted into the format mean ± standard dev tion using the method described in Section 2.3. In several papers, only the maximu length of follow up duration was provided, which prohibited follow-up duration bei converted into the required format.
There is no standard follow-up duration for such studies, and this constrains analy and interpretation. For example, it is unclear whether the patients are studied over a su ficiently long period that they might progress to fully develop the syndrome of AD. Som reports in the field do not even record the follow-up duration. The effect of followperiod on the prediction accuracy and threshold value is considered in detail in the D cussion.

Follow-Up Duration
There are marked differences in study duration for the different studies included in this work. The length of follow-up was converted into the format mean ± standard deviation using the method described in Section 2.3. In several papers, only the maximum length of follow up duration was provided, which prohibited follow-up duration being converted into the required format.
There is no standard follow-up duration for such studies, and this constrains analysis and interpretation. For example, it is unclear whether the patients are studied over a sufficiently long period that they might progress to fully develop the syndrome of AD. Some reports in the field do not even record the follow-up duration. The effect of follow-up period on the prediction accuracy and threshold value is considered in detail in the Discussion.

Meta Analysis of Studies Investigating the Association between Levels of Amyloid Beta and Tau in CSF, and Progression to Alzheimer's Disease
We conducted meta-analyses comparing non-progressive MCI (stable MCI) with progressive MCI (MCI_AD) for each of the three analytes (amyloid beta 1-42, 'amyloid'; T-tau, and P-tau), and did likewise to compare healthy controls (HC) with MCI_AD. The key results from these six separate meta-analyses are summarized in Table 1. For each of the comparisons, a separate meta-analysis was conducted for the three different analytes, and these are described in full in Appendix A. Effect sizes were determined using random effects models, ensuring somewhat balanced weights across studies despite the inclusion of individual studies with much larger sample sizes than all others and considering differences in the populations of the individual studies. Roughly symmetric funnel plots confirmed that there is no clear evidence of bias in any of the comparisons. The heterogeneity of effect sizes across studies was confirmed with very high significance for all three meta-analyses comparing Stable_MCI versus MCI_AD (p < 0.0001), as well as for amyloid in the HC versus MCI_AD comparison (p < 0.0001), and a statistically significant difference was also observed for T-tau for HC versus MCI_AD (p < 0.05), although not for P-tau. Hence, as justified in Appendix A, random effects are used for all but the last comparison. Note that the small number of studies in the last condition means that we can only draw limited conclusions from this.
Effect sizes are given in standard units. In most of the comparisons, the absolute magnitude of the effect is between 1 and 1.2 standard error difference, except for amyloid in HC vs. MCI_AD, where it is even higher with 1.73. The direction of the effects also confirms the trends reported earlier in       Concerning the values of the three analytes with respect to baseline values, we observed that CSF Aβ and tau levels were differentiated by patient outcome. With either xMAP or ELISA, MCI patients who did not progress to AD (non-progressive MCI) tended to have slightly higher Aβ(1-42) levels than healthy controls (p-value < 0.001). Levels were significantly lower in MCI patients who progressed to AD (progressive MCI). The opposite, consistent with prior observations, was observed for P-tau and T-tau where progressive MCI patients exhibited the highest levels compared to non-progressive MCI and healthy controls. Concerning the values of the three analytes with respect to baseline values, we observed that CSF Aβ and tau levels were differentiated by patient outcome. With either xMAP or ELISA, MCI patients who did not progress to AD (non-progressive MCI) tended to have slightly higher Aβ(1-42) levels than healthy controls (p-value < 0.001). Levels were significantly lower in MCI patients who progressed to AD (progressive MCI). The opposite, consistent with prior observations, was observed for P-tau and T-tau where progressive MCI patients exhibited the highest levels compared to non-progressive MCI and healthy controls.

Limitations of Current Study
A constraint of the data available is that the raw data did not enable analyte trajectories to be plotted as a function of time for individual patients. The individual studies that were used as the source papers for this analysis constrained experimental designs that justify our decision to undertake the separate meta-analyses comparing two conditions in each. In principle, we might alternatively have performed a meta-analysis with three conditions (Stable_MCI, MCI_AD, and HC) using analysis frameworks developed for multiple treatment designs. In practice, this would have significantly reduced the number of studies that could be incorporated into the meta-analysis, and the priority here was to make the fullest possible use of the available data. On this occasion, the p-values are self-evidently so small that the statistically significant differences between the groups would be preserved after any standard adjustment for multiple testing (Bonferroni).
Historically it has been observed that CSF Aβ1-42 concentrations are lower in AD patients than in healthy controls and MCI patients. In the papers systematically reviewed for this study, it is noted that all progressive MCI (MCI_AD) patient cohorts have lower average baseline Aβ1-42 concentration than the non-progressive (stable MCI) patients. However, a limitation was that there was no way of predicting clinically, at the time of the baseline measurement, whether MCI patients would progress to develop AD. This points to a fundamental drawback, which is to determine whether a specific patient meets the criteria for AD is entirely based on clinical diagnosis. This creates scope for circular reasoning, where the accuracy of early prediction or diagnosis based on CSF biomarkers is totally dependent on the clinical diagnostic criteria used.
A further limitation of this analysis is that the participants in the systematicallyreviewed studies were selected for those studies, and it remains to be tested whether these participants were truly representative of wider (global) populations. Access to CSF samples is limited to those who were referred to take part in these clinical studies, typically because of concern about cognitive impairment, so access to the referral process introduces bias.
Lastly, over the period evaluated (since 1994), P-tau measures defaulted to being of P-tau181. By comparison, P-tau217 is a novel candidate biomarker, so it could not be incorporated into this analysis. Of the included studies reporting P-tau measures, 13 explicitly evaluated P-tau181, and it is an assumption (as detailed in Section 2.2) that for the four studies where this was not explicitly stated only P-tau181 residues could have been evaluated. Both P-tau181 and P-tau217 are currently being shown to have strong potential to differentiate patients with AD from other neurodegenerative disorders [62] and the relationship between P-tau181, P-tau217, and amyloid burden is also being examined [27]. In many respects, changes in P-tau217 levels appear to parallel P-tau181 changes, and P-tau181 remains a valuable marker [27]; it will, however, be interesting to see how P-tau217 performs (including the Aβ1-42/P-tau217 ratio) in future meta-analyses for the purpose of predicting progression from MCI to AD.

The Effect of Follow-Up Length
Within some of the papers included in this systematic review, healthy control (HC) groups are omitted. Our observation from the results in this review is that the CSF Aβ1-42 concentration for non-progressive MCI (Stable MCI) is more similar to HC, with higher values than reported for progressive MCI (MCI_AD).
In Table 2, we show an analysis of data concerning the values of the three CSF analytes for patients in a longitudinal study initiated by Hansson and extended by Buchhave et al. [32,44]. From the data summarized in Table 2a,b, we extrapolated the values (Table 2c) for those patients who had been identified as stable MCI at early follow up, but who transpired to have progressed to Alzheimer's disease or another form of dementia at a later follow-up. The diagnostic trajectories for these cohorts are illustrated in Figure 13. Table 2. Summary of published findings from (a) Hansson et al. [32] and (b) Buchhave et al. [44] to determine the analyte values in patients who progress to dementia after early and late follow up, respectively, (c) our calculated mean values of the three CSF analytes for the 15 patients who were initially classified as non-progressive (stable MCI) but who in fact were reported as having progressed at late follow up.   [44] to determine the analyte values in patients who progress to dementia after early and late follow up, respectively, (c) our calculated mean values of the three CSF analytes for the 15 patients who were initially classified as non-progressive (stable MCI) but who in fact were reported as having progressed at late follow up.  Table 2 (derived from Hansson et al. [32] and Buchhave et al. [44].) The key point here is that in Hansson's early follow-up [32], 15 patients who were previously classified as MCI according to Petersen's criteria had been classified as having dementia in Buchhave's late follow-up [41,44]. They are henceforth referred to as lateprogressive-MCI patients here. Of these 15 diagnosed with dementia, 11 were confirmed with AD. From our analysis included in Table 2, it can be observed that the baseline CSF Aβ1-42 concentrations for these 15 patients were closer to the values for progressive MCI (MCI_AD) at the early follow-up, and for P-tau they were closer to non-progressive MCI (stable MCI).
In context with the systematic review results from baseline CSF measurements in this study, our analysis of the longitudinal studies by Hansson and Buchhave support the conclusion that lower Aβ1-42 concentrations in individuals with an MCI diagnosis are more likely to develop AD. A longer follow-up period would increase the chance of identifying those MCI patients at risk of developing AD; we hypothesize that a longer follow-up interval would be associated with greater separation in the Aβ1-42 concentrations for nonprogressive versus progressive MCI patients.
We note from the analysis presented in Table 2 that at baseline, the average P-tau concentration in these 15 late-progressive MCI patients is closer to that for non-progressive MCI patients. For T-tau, the group average concentration at baseline in these 15 patients is approximately mid-way between the values for the non-progressive and  Table 2 (derived from Hansson et al. [32] and Buchhave et al. [44]).
The key point here is that in Hansson's early follow-up [32], 15 patients who were previously classified as MCI according to Petersen's criteria had been classified as having dementia in Buchhave's late follow-up [41,44]. They are henceforth referred to as lateprogressive-MCI patients here. Of these 15 diagnosed with dementia, 11 were confirmed with AD. From our analysis included in Table 2, it can be observed that the baseline CSF Aβ1-42 concentrations for these 15 patients were closer to the values for progressive MCI (MCI_AD) at the early follow-up, and for P-tau they were closer to non-progressive MCI (stable MCI).
In context with the systematic review results from baseline CSF measurements in this study, our analysis of the longitudinal studies by Hansson and Buchhave support the conclusion that lower Aβ1-42 concentrations in individuals with an MCI diagnosis are more likely to develop AD. A longer follow-up period would increase the chance of identifying those MCI patients at risk of developing AD; we hypothesize that a longer follow-up interval would be associated with greater separation in the Aβ1-42 concentrations for non-progressive versus progressive MCI patients.
We note from the analysis presented in Table 2 that at baseline, the average P-tau concentration in these 15 late-progressive MCI patients is closer to that for non-progressive MCI patients. For T-tau, the group average concentration at baseline in these 15 patients is approximately mid-way between the values for the non-progressive and progressive subgroups. It will be interesting to see if this finding is reproduced in larger cohorts in future studies.
For the studies in this systematic review that quantified the follow-up interval for non-progressive MCI patients, the values for the three analytes are summarized in Figure 14. Each study included here followed a similar protocol with a baseline measurement and regular assessment of cognitive status, usually at 6 monthly intervals. Importantly, the trends in these results from the overall cohort are consistent with the conclusions drawn from the longitudinal study detailed in Table 2 and Figure 13. For the studies in this systematic review that quantified the follow-up interval for non-progressive MCI patients, the values for the three analytes are summarized in Figure  14. Each study included here followed a similar protocol with a baseline measurement and regular assessment of cognitive status, usually at 6 monthly intervals. Importantly, the trends in these results from the overall cohort are consistent with the conclusions drawn from the longitudinal study detailed in Table 2 and Figure 13. In conclusion, all six meta-analyses showed statistically significant SMD (p < 0.0001) and reported values for the Aβ(1-42)/P-tau ratio indicated this as the most robust indicator of a patient transitioning from MCI to AD. The follow-up period for longitudinal evaluations was identified as especially critical to clinical study design, and based on the evidence in this analysis, extending this follow-up period should lead to greater separation in the analyte values for progressive versus non-progressive MCI patients. While the roles of amyloid-beta and tau continue to be debated, their value as markers of patient outcome is supported by these findings.

Supplementary Materials:
The following is available online at www.mdpi.com/xxx/s1, the accompanying PRISMA checklist, and Table S1 containing: sheet_Amyloid_3groups.csv, sheet_Ttau_3groups.csv and sheet_Ptau_3groups.csv.  In conclusion, all six meta-analyses showed statistically significant SMD (p < 0.0001) and reported values for the Aβ(1-42)/P-tau ratio indicated this as the most robust indicator of a patient transitioning from MCI to AD. The follow-up period for longitudinal evaluations was identified as especially critical to clinical study design, and based on the evidence in this analysis, extending this follow-up period should lead to greater separation in the analyte values for progressive versus non-progressive MCI patients. While the roles of amyloid-beta and tau continue to be debated, their value as markers of patient outcome is supported by these findings.

. Meta-Analysis Approach
We use the methods detailed in Part II Chapter 2 and Part III Chapter 5.1. of 'Meta-Analysis with R' by Schwarzer, Carpenter, and Rücker, Springer, Cham, first edition, which follows the Cochrane Handbook for Systematic Reviews of Interventions [64].

Effect Measure
The analysis focusses on two comparisons: Stable_MCI (non-progressive mild cognitive impairment throughout the period of clinical follow-up) compared with MCI_AD patients (those with MCI who will progress to a diagnosis of Alzheimer's disease during clinical follow-up), and MCI_AD compared with HC (healthy control). All measurements of interest are continuous. To accommodate different measurement technologies or analytical protocols potentially resulting in incompatible scales across the studies, a standardized mean difference is chosen to measure the effect. We use Hedges' g, which is based on pooled sample variance and very similar to Cohen's d but more appropriate to the group sizes in the present analysis.

Models and Significance Tests for Overall Effects
We conduct the meta-analyses using both a fixed effects model and a random effects model. The fixed effects model assumes that the individual studies included in the metaanalysis are sampled from the same population, so their observed means are the effect size, give or take an error term. To accommodate for differences in precision, weights inverse to the individual studies' variances are used in the construction of the overall effect estimator. The random effects model assumes that the individual studies' effects are normally distributed with variance tau 2 . While the fixed effects model attributes differences between observed effects entirely to sampling error, the random effects model attributes some of them to true differences between effect sizes across the studies. Significance tests for the overall effects are based on inverse variance methods.

Measurements and Tests of Between-Study Heterogeneity
In the context of this study, due to inter-lab differences, systematic differences between the study populations, and changes in diagnostic standards over time, a random effects model seems the right choice. To confirm this, we estimate measures of heterogeneity, the weighted sum of squares about the fixed effect estimate Q, and related quantities I 2 and H. Finally, we perform a heterogeneity test using that Q has a Chi 2 distribution under the Null tau 2 = 0. Higgins et al. [65] suggest guidelines for the interpretation of I 2 classifying 25% as low, 50% as medium, and 75% as high heterogeneity. While high heterogeneity indicates dissimilarity between the individual studies, the meta-analysis can still be valid. The size of the individual studies is of relevance here, as smaller studies have higher variation and, given they have been published, bear the risk of overestimating the effect size. While a fixed effects model tends to assign relatively lower weights to smaller studies due to their higher variability, a random effects model will weigh the studies more equally. It has been recommended by Israel et al. [66] to take into account the clinical context and to compare the results of fixed and random effects model analyses.

Normality Assumptions
By default, confidence intervals for the individual studies' are calculated from means and SDs using a normal assumption. By Cochrane's handbook (Section 10.5.3 in [64]), Altman's rule of thumb can be used to check normality based on a lower bound in conjunction with mean and SD.
Confidence interval and p-value for a significance test of the overall effect size in a meta-analysis are also based on a normality assumption. Given the moderate number of studies, formal inference about normality including checks for heavy tails is not possible, but visual inspection of the distributions shows approximate normality.

Bias Analysis
Funnel plots are used to visualize the relationship between the effect size and a measure of its precision, here the standard error. They can give evidence of bias, in particular publication bias. Filled circles represent estimated effect sizes and their precision (standard error) for each individual study. Also shown are fixed effect estimates (vertical dashed line) with 95% confidence interval limits (diagonal dashed lines) and the random effects estimate (vertical dotted line). Smaller studies (with larger standard errors) are expected to scatter more than larger studies, giving the funnel plot's characteristic triangular-shaped scatter. Publication bias filtering out studies that do not show the desired result tend to lead to an asymmetric plot. Typically, it is smaller studies that are not included leaving the lower right or the lower left area of the plot empty.

Summary Plot of the Meta-Analysis Results
Forest plots provide a graphical and numerical summary of the results of a metaanalysis and have become a common part of the Cochrane review framework. They list the individual studies with their sample sizes, study means, and SDs, and the metaanalysis' effect size with confidence interval, both numerically and visually, together with the random and fixed effect model estimates. Heterogeneity measures are also included.

Appendix A.2. Amyloid
Summary measures for amyloid (concentration of Aβ1-42 in CSF) from all studies are saved in the Supplementary Materials file sheet_Amyloid_3groups.csv for all conditions (stable MCI (MCI_St), MCI progressing to Alzheimer's disease (MCI_AD), and healthy control (HC)). After reading the data, we conduct separate meta-analyses comparing MCI_AD to each of the other two conditions.   We note very unbalanced weights in the fixed effect model assigning study 2 (Mattsson 2009) a weight of 30.4%, while all others have at most 8%. As discussed in Appendix A.1.2, this is expected due to the much larger size of study 2 as compared to the other studies in this analysis. The random effects model still assigns the highest weight to study 2, but with 5.7% it is only slightly higher than the other studies weights ranging from 3.0% to 5.3%. Still having the highest weight, it is only 6.6% followed by 10 other studies weighing at least 5%. While study 2 s effect size is on the lowest end of the spectrum, it is not an outlier and the random effect model's estimate is very similar to the fixed effect's model estimate, providing evidence for the validity of the meta-analysis despite the heterogeneity.

Normality Assumptions
The distribution of the effect sizes for stable MCI (MCI_St) versus MCI_AD is calculated as follows and shown in Figure A1: hist(meta$TE, xlim=c(0,4), breaks=12, cex.main=0.9, main="Histogram of standardised mean difference", xlab="Standardised mean difference (SMD)") ing at least 5%. While study 2′s effect size is on the lowest end of the spectrum, it is not an outlier and the random effect model's estimate is very similar to the fixed effect's model estimate, providing evidence for the validity of the meta-analysis despite the heterogeneity.

Normality Assumptions
The distribution of the effect sizes for stable MCI (MCI_St) versus MCI_AD is calculated as follows and shown in Figure A1: hist(meta$TE, xlim=c(0,4), breaks=12, cex.main=0.9, main="Histogram of standardised mean difference", xlab="Standardised mean difference (SMD)") Figure A1. The distribution of the SMD of stable MCI (MCI_St) versus MCI_AD is slightly skewed and shows an outlier, but is otherwise approximately normal.

Bias Analysis
The funnel plot is calculated as follows and shown in Figure A2

Bias Analysis
The funnel plot is calculated as follows and shown in Figure A2: ing at least 5%. While study 2′s effect size is on the lowest end of the spectrum, it is not an outlier and the random effect model's estimate is very similar to the fixed effect's model estimate, providing evidence for the validity of the meta-analysis despite the heterogeneity.

Normality Assumptions
The distribution of the effect sizes for stable MCI (MCI_St) versus MCI_AD is calculated as follows and shown in Figure A1: hist(meta$TE, xlim=c(0,4), breaks=12, cex.main=0.9, main="Histogram of standardised mean difference", xlab="Standardised mean difference (SMD)") Figure A1. The distribution of the SMD of stable MCI (MCI_St) versus MCI_AD is slightly skewed and shows an outlier, but is otherwise approximately normal.

Bias Analysis
The funnel plot is calculated as follows and shown in Figure A2   We note very unbalanced weights in the fixed effect model, with study 2 (Mattsson 2009) having a weight of 44.8% assigned, while all others are all under 11%. However, the random effects model gives more similar weights to the studies. The effect of study 2 is on the lower end of the spectrum, but not an outlier, and the effect size estimates of the fixed and the random effect models are quite close together, so we can argue as in Appendix A.2.1 that the heterogeneity does not undermine the meta-analysis approach.

Normality Assumptions
The distribution of the effect sizes for HC versus MCI_AD is calculated as follows and shown in Figure A3: hist(meta$TE, xlim=c(0,4), breaks=12, cex.main=0.9, main="Histogram of standardised mean difference", xlab="Standardised mean difference (SMD)") Biomedicines 2022, 10, x FOR PEER REVIEW 26 of 40 Figure A3. The distribution of the SMD of HC versus MCI_AD is slightly skewed but still approximately normal.

Bias Analysis
The funnel plot is calculated as follows and shown in Figure A4: See forest plot file, Figure 8 in the main manuscript, calculated as follows: pdf(file="forest_Amyloid_HCvsMCI_AD.pdf", paper="a4r") forest(meta, xlab="Standardised difference in mean response (HC -MCI_AD)", Figure A3. The distribution of the SMD of HC versus MCI_AD is slightly skewed but still approximately normal.

Bias Analysis
The funnel plot is calculated as follows and shown in Figure A4: Biomedicines 2022, 10, x FOR PEER REVIEW 26 of 40 Figure A3. The distribution of the SMD of HC versus MCI_AD is slightly skewed but still approximately normal.

Bias Analysis
The funnel plot is calculated as follows and shown in Figure A4:   As in the other comparisons, study 2 has a much higher weight (30.3%) than any of the other ones (at most 9%). However, the random effects model assigns the weights more similarly. The effect of study 2 is within the range of the individual studies and the effect size estimates of the fixed and the random effect models are nearly the same, so we can argue, as in Appendix A.2.1, that the heterogeneity does not undermine the meta-analysis approach.

Normality Assumptions
The distribution of the effect sizes for stable MCI versus MCI_AD is calculated as follows and shown in Figure A5  As in the other comparisons, study 2 has a much higher weight (30.3%) than any of the other ones (at most 9%). However, the random effects model assigns the weights more similarly. The effect of study 2 is within the range of the individual studies and the effect size estimates of the fixed and the random effect models are nearly the same, so we can argue, as in Section A.2.1, that the heterogeneity does not undermine the meta-analysis approach.

Normality Assumptions
The distribution of the effect sizes for stable MCI versus MCI_AD is calculated as follows and shown in Figure A5: hist(as.numeric(meta$TE, na.rm=TRUE), xlim=c(-3,0), breaks=12, cex.main=0.9, main="H istogram of standardised mean difference", xlab="Standardised mean difference (SMD)" ) Figure A5. The distribution of the SMD of stable MCI versus MCI_AD is slightly skewed and shows an outlier, but is still within an acceptable range of a normal distribution. Figure A5. The distribution of the SMD of stable MCI versus MCI_AD is slightly skewed and shows an outlier, but is still within an acceptable range of a normal distribution.

Bias Analysis
The funnel plot is calculated as follows and shown in Figure A6:

Bias Analysis
The funnel plot is calculated as follows and shown in Figure A6:  We note very unbalanced weights in the fixed effect model with study 2 (Mattsson 2009) having a weight of 47.3% assigned, while all others are under 11%. The random effects model gives more similar weights to all studies.
As in the other comparisons, study 2 has a much higher weight (47.3%) while all other ones are below 11%. However, the random effects model assigns the weights more similarly. The effect of study 2 is within the range of the individual studies and the effect size estimates of the fixed and the random effect models are nearly the same, so we can argue as in Appendix A.2.1 that the heterogeneity does not undermine the meta-analysis approach.

Normality Assumptions
The distribution of the effect sizes for HC versus MCI_AD is calculated as follows and shown in Figure A7

Normality Assumptions
The distribution of the effect sizes for HC versus MCI_AD is calculated as follows and shown in Figure A7: hist(as.numeric(meta$TE, na.rm=TRUE), xlim=c(-3,0), breaks=12, cex.main=0.9, main="H istogram of standardised mean difference", xlab="Standardised mean difference (SMD)" ) Figure A7. The distribution of the SMD of HC versus MCI_AD is not too dissimilar to a normal distribution, considering the limitations of a small sample.

Bias Analysis
The funnel plot is calculated as follows and shown in Figure A8: funnel(meta) Figure A8. The studies are still well balanced on either side of the effect estimates. So, the funnel plot does not reveal an obvious presence of publication or other bias for HC versus MCI_AD. Figure A7. The distribution of the SMD of HC versus MCI_AD is not too dissimilar to a normal distribution, considering the limitations of a small sample.

Bias Analysis
The funnel plot is calculated as follows and shown in Figure A8:

Normality Assumptions
The distribution of the effect sizes for HC versus MCI_AD is calculated as follows and shown in Figure A7: hist(as.numeric(meta$TE, na.rm=TRUE), xlim=c(-3,0), breaks=12, cex.main=0.9, main="H istogram of standardised mean difference", xlab="Standardised mean difference (SMD)" ) Figure A7. The distribution of the SMD of HC versus MCI_AD is not too dissimilar to a normal distribution, considering the limitations of a small sample.

Bias Analysis
The funnel plot is calculated as follows and shown in Figure A8: funnel(meta) Figure A8. The studies are still well balanced on either side of the effect estimates. So, the funnel plot does not reveal an obvious presence of publication or other bias for HC versus MCI_AD.  As in the other comparisons, study 2 has a much higher weight (29.1%) while all other ones are below 9%. However, the random effects model assigns the weights more similarly. The effect of study 2 is within the range of the individual studies and the effect size estimates of the fixed and the random effect models are still not that far apart, so we can argue as in Appendix A.2.1 that the heterogeneity does not undermine the meta-analysis approach.

Normality Assumptions
The distribution of the effect sizes for stable MCI (MCI_St) versus MCI_AD is calculated as follows and shown in Figure A9

Bias Analysis
The funnel plot is calculated as follows and shown in Figure A10: See separately saved forest plot file, Figure 11 in the main manuscript, plotted as fol- Figure A9. The distribution of the SMD of stable MCI (MCI_St) versus MCI_AD is asymmetric, but not too far from a normal distribution.

Bias Analysis
The funnel plot is calculated as follows and shown in Figure A10:

funnel(meta)
Biomedicines 2022, 10, x FOR PEER REVIEW 34 of 40 Figure A9. The distribution of the SMD of stable MCI (MCI_St) versus MCI_AD is asymmetric, but not too far from a normal distribution.

Bias Analysis
The funnel plot is calculated as follows and shown in Figure A10: See separately saved forest plot file, Figure 11 in the main manuscript, plotted as follows: pdf(file="forest _ Ptau _ MCI _ StvsMCI _ AD.pdf", paper="a4r") The test for heterogeneity is negative, and the fixed effect and random effects models show the same statistically significant effect (−1.10 [−1.23; −0.96], p < 0.0001) in the standardized mean difference.
We note very unbalanced weights in both fixed effect and random effects model with study 2 dominating with a weight of 55.9%, while all others are no more than 13%. While less extreme, such heterogeneity was also observed in the other comparisons. In this case, the random effects model does not make a difference. The effects in the individual studies not being too dissimilar adds some robustness to the meta-analysis approach. There are, however, some limitations due to the large number of missing values, as this comparison was only included in some of the individual studies.

Normality Assumptions
The distribution of the effect sizes for HC versus MCI_AD is calculated as follows and shown in Figure A11  The test for heterogeneity is negative, and the fixed effect and random effects models show the same statistically significant effect (−1.10 [−1.23; −0.96], p < 0.0001) in the standardized mean difference.
We note very unbalanced weights in both fixed effect and random effects model with study 2 dominating with a weight of 55.9%, while all others are no more than 13%. While less extreme, such heterogeneity was also observed in the other comparisons. In this case, the random effects model does not make a difference. The effects in the individual studies not being too dissimilar adds some robustness to the meta-analysis approach. There are, however, some limitations due to the large number of missing values, as this comparison was only included in some of the individual studies.

Normality Assumptions
The distribution of the effect sizes for HC versus MCI_AD is calculated as follows and shown in Figure A11

Bias Analysis
The funnel plot is calculated as follows and shown in Figure A12:

Bias Analysis
The funnel plot is calculated as follows and shown in Figure A12: