MACE for Diagnosis of Dementia and MCI: Examining Cut-Offs and Predictive Values

The definition of test cut-offs is a critical determinant of many paired and unitary measures of diagnostic or screening test accuracy, such as sensitivity and specificity, positive and negative predictive values, and correct classification accuracy. Revision of test cut-offs from those defined in index studies is frowned upon as a potential source of bias, seemingly accepting any biases present in the index study, for example related to sample bias. Data from a large pragmatic test accuracy study examining the Mini-Addenbrooke’s Cognitive Examination (MACE) were interrogated to determine optimal test cut-offs for the diagnosis of dementia and mild cognitive impairment (MCI) using either the maximal Youden index or the maximal correct classification accuracy. Receiver operating characteristic (ROC) and precision recall (PR) curves for dementia and MCI were also plotted, and MACE predictive values across a range of disease prevalences were calculated. Optimal cut-offs were found to be a point lower than those defined in the index study. MACE had good metrics for the area under the ROC curve and for the effect size (Cohen’s d) for both dementia and MCI diagnosis, but PR curves suggested the superiority for MCI diagnosis. MACE had high negative predictive value at all prevalences, suggesting that a MACE test score above either cut-off excludes dementia and MCI in any setting.


Introduction
The Mini-Addenbrooke's Cognitive Examination (MACE) is a shortened version of the Addenbrooke's Cognitive Examination-Revised (ACE-R) and ACE-III developed by Mokken scaling analysis of these longer instruments [1]. MACE comprises tests of attention, memory (7-item name and address), verbal fluency, clock drawing and memory recall (score range 0-30, impaired to normal), and takes between 5-10 min to administer.
In the index MACE study, two cut-off points were identified in the cohort examined (n = 242; Alzheimer's disease 28, behavioural variant frontotemporal dementia 23, primary progressive aphasia 82, corticobasal syndrome 21, controls 78): ≤25/30 had high sensitivity (0.85) and high specificity (0.87); and ≤21/30 had high specificity (1.00), and hence an abnormal score was almost certain to have come from a dementia patient. MACE was found to be more sensitive than the Mini-Mental State Examination (MMSE) and less likely to have ceiling effects [1].
The general applicability of these MACE cut-offs for the diagnosis of dementia and mild cognitive impairment (MCI) has not been widely examined. A Spanish translation administered to a cohort of mixed dementia patients and controls (n = 175) with relatively low educational experience found that a cut-off between 16/30 and 17/30 had optimal sensitivity (0.867) and specificity (0.870) for dementia diagnosis [2].
MACE was adopted in the author's practice, based in a dedicated Cognitive Function Clinic located at a regional neuroscience centre in the northwest United Kingdom in June 2014 [3], and has been routinely used since then. Access to a large dataset has provided the opportunity to examine a variety of parameters not hitherto examined. The aims of this study were: • To determine optimal MACE cut-off points for the diagnosis of dementia and MCI, with the anticipation that cut-offs optimising sensitivity would also minimise false negative rate (FNR) and optimise negative predictive value (NPV) whilst cut-offs optimising specificity would minimise false positive rate (FPR) and optimise positive predictive value (PPV); • To plot receiver operating characteristic (ROC) and precision recall (PR) curves for dementia and MCI, and to calculate areas under the ROC curves and Q* index (a measure of diagnostic value); • To calculate MACE effect sizes (Cohen's d) for diagnosis of dementia and MCI; • To calculate MACE predictive values across a range of disease prevalences.

Methods
Consecutive new patient referrals administered the MACE were included, seen over the period June 2014-December 2018 (inclusive), including data reported in previous studies [3][4][5]. Other than those with a pre-existing diagnosis of dementia, there were no exclusion criteria. As previously detailed [3][4][5], criterion diagnosis of dementia or mild cognitive impairment was by judgement of an experienced clinician using standard diagnostic criteria (DSM-IV; Petersen); in those without evidence of cognitive impairment, a diagnosis of subjective memory complaint (SMC) was made. MACE scores were not used to make criterion diagnoses to avoid review bias. Subjects gave informed consent, and the study protocol was approved by the institute's committee on human research. MACE scores were plotted against diagnosis, and the Pearson 2 skewness coefficient (Sk2) was used to assess skew, where: with values lying between -1 and +1 deemed acceptable for the elimination of floor or ceiling effects [6].
Two methods to optimise test cut-offs were examined [10,11], namely, maximising either Acc or Youden index.
ROC curves were plotted (false positive rate versus sensitivity) and areas under the curve (AUC ROC) were calculated and categorised according to the scale of Metz [16]. The Q* index, a measure of diagnostic value [17], was determined as the point in ROC space where the anti-diagonal intersected the ROC curve (i.e., where Sens = Spec). Precision recall (PR) curves [18] were also plotted, as these have been recommended over ROC curves when analysing highly skewed datasets [19]. The F measure, or F1 score-the harmonic mean of precision and sensitivity-was also calculated as a global measure of accuracy [20].
Effect sizes (Cohen's d) were calculated as the difference of the means of diagnostic groups divided by the weighted pooled standard deviations of the groups [21]. Cohen's d values were categorised according to Sawilowsky's extension of Cohen's rules of thumb [22].
Predictive values across a range of disease prevalence (Prev) were calculated from observed sensitivity and specificity at the maximum Youden index for dementia and MCI, specifically at prevalence rates of 5%, 10%, 20% and 40% using the standard formulae:

Results
A total of 755 patients were assessed with MACE (F:M = 352:403, 47% female; median age 60 years), of whom 114 were diagnosed with dementia (prevalence = 0.15) and 222 with MCI (prevalence = 0.29). The distribution of MACE scores by diagnosis, as seen in Figure 1, showed the anticipated unimodal negative skew (to the right, i.e., higher test scores = better performance). The Pearson 2 skewness coefficient (Sk2) was −0.48, suggesting that the population sampled was not from a normal distribution, although, as the value was between −1 and +1, the presence of floor or ceiling effects was probably excluded. datasets [19]. The F measure, or F1 score-the harmonic mean of precision and sensitivity-was also calculated as a global measure of accuracy [20]. Effect sizes (Cohen's d) were calculated as the difference of the means of diagnostic groups divided by the weighted pooled standard deviations of the groups [21]. Cohen's d values were categorised according to Sawilowsky's extension of Cohen's rules of thumb [22].
Predictive values across a range of disease prevalence (Prev) were calculated from observed sensitivity and specificity at the maximum Youden index for dementia and MCI, specifically at prevalence rates of 5%, 10%, 20% and 40% using the standard formulae:

Results
A total of 755 patients were assessed with MACE (F:M = 352:403, 47% female; median age 60 years), of whom 114 were diagnosed with dementia (prevalence = 0.15) and 222 with MCI (prevalence = 0.29). The distribution of MACE scores by diagnosis, as seen in Figure 1, showed the anticipated unimodal negative skew (to the right, i.e., higher test scores = better performance).The Pearson 2 skewness coefficient (Sk2) was -0.48, suggesting that the population sampled was not from a normal distribution, although, as the value was between -1 and +1, the presence of floor or ceiling effects was probably excluded. For the diagnosis of dementia (Tables 1 and 2), looking at all MACE cut-off values, the optimal cut-off determined by maximal Youden index was ≤20/30 (sensitivity 0.91, specificity 0.71). By maximal correct classification accuracy the optimal cut-off was ≤14/30 (sensitivity 0.59, specificity 0.92). This latter cut-off also had the maximal values of PSI and LDM (Table 3), the latter also found at ≤15/30, which was also the cut-off for the maximal value of SUI and F measure.  For the diagnosis of dementia (Tables 1 and 2), looking at all MACE cut-off values, the optimal cut-off determined by maximal Youden index was ≤20/30 (sensitivity 0.91, specificity 0.71). By maximal correct classification accuracy the optimal cut-off was ≤14/30 (sensitivity 0.59, specificity 0.92). This latter cut-off also had the maximal values of PSI and LDM (Table 3), the latter also found at ≤15/30, which was also the cut-off for the maximal value of SUI and F measure.
For the diagnosis of MCI (Tables 4 and 5), looking at all MACE cut-off values, the optimal cut-off determined by maximal Youden index was ≤24/30 (sensitivity 0.90, specificity 0.57). By maximal correct classification accuracy the optimal cut-off was ≤19/30 (sensitivity 0.47, specificity 0.88). Both cut-offs coincided with the maximal values of LDM (Table 6), whereas maximum PSI and F measure were at the same cut-off as maximal Youden index (cf. diagnosis of dementia), whilst maximal SUI was at ≤22/30 and ≤21/30.          Effect sizes (Cohen's d) were 1.74 for dementia and 1.13 for MCI, hence very large and large, respectively [22].
PPV and NPV for MACE calculated at prevalence rates of 5%, 10%, 20% and 40% using the sensitivity and specificity figures at the maximum Youden index showed high NPV (≥0.9) at all prevalences examined, but with less-impressive figures for PPV, optimal at higher disease prevalences (Table 7).  Effect sizes (Cohen's d) were 1.74 for dementia and 1.13 for MCI, hence very large and large, respectively [22].
PPV and NPV for MACE calculated at prevalence rates of 5%, 10%, 20% and 40% using the sensitivity and specificity figures at the maximum Youden index showed high NPV (≥0.9) at all prevalences examined, but with less-impressive figures for PPV, optimal at higher disease prevalences (Table 7). Effect sizes (Cohen's d) were 1.74 for dementia and 1.13 for MCI, hence very large and large, respectively [22].
PPV and NPV for MACE calculated at prevalence rates of 5%, 10%, 20% and 40% using the sensitivity and specificity figures at the maximum Youden index showed high NPV (≥0.9) at all prevalences examined, but with less-impressive figures for PPV, optimal at higher disease prevalences (Table 7).

Discussion
This study of a large cohort of patients examined with the Mini-Addenbrooke's Cognitive Examination suggested that the optimal test cut-offs differ slightly from those suggested in the index study, being a point lower for both high-sensitivity (≤24/30 vs. ≤25/30) and high-specificity (≤20/30 vs. ≤21/30) cut-offs. Even allowing for the objections raised to changing test cut-offs because of risk of bias [23], these findings suggest a possible need for cut-off revision when using MACE in general cognitive clinics, as well as for patient educational level [2].
ROC curves suggested that MACE had adequate accuracy for the diagnosis of dementia and MCI-a finding corroborated by the measure of effect size (Cohen's d). The Q* index for dementia (0.80) was comparable to that found for other cognitive screening instruments [24], but the Q* index for MCI was lower (0.73). However, PR curves suggested better MACE performance (i.e., distinguishable classification performance) for the diagnosis of MCI than for dementia, consistent with findings in previous comparative studies of MACE with the Montreal Cognitive Assessment (MoCA)-a test designed specifically for MCI diagnosis (equivalent MACE performance) [25,26], and Free-Cog (superior MACE performance) [5]. The use of PR curves [18] in diagnostic test accuracy studies is worth emphasizing (to the author's knowledge this is the first such use for dementia test accuracy studies), since these avoid some of the "optimism" of ROC curves (resulting from their combining test accuracy over a range of thresholds which may be both clinically relevant and clinically nonsensical) [27]. PR curves are more informative than ROC curves for skewed datasets [19]. Area under the PR curve may be calculated, although this is not straightforward [28], and visual interpretation may be adequate to denote better classification performance (as for ROC curves).
The comparison of NNS [9] and NNSU [15] in this study was also instructive, demonstrating some of the difficulties in working with reciprocals (multiplicative inverses). By definition, the identification index (II) ranges from −1 to +1, and hence NNS ranges from −∞ to +∞ [9]. As II approaches 0, values of NNS are inflated (e.g., Table 3, cut-off ≤24/30; Table 6, cut-off ≤27/30), and when II is negative then NNS also has a negative value. The latter finding is problematic from the clinical standpoint: "number needed to" metrics were originally designed to appeal intuitively at the individual level, so negative values (representing non-individuals?) are meaningless. The construction of SUI is such that by definition its range is from 0 to 2, and hence NNSU ranges from ∞ (no screening value) to 0.5 (perfect screening utility) [15], avoiding the problems encountered with II and NNS.
The high NPV (≥0.9) at all disease prevalences examined (0.05 to 0.4) suggests that a MACE test score above either cut-off excludes dementia and MCI in any setting. PPV was less impressive, but improved with increasing disease prevalence, suggesting a case-finding role for MACE in dedicated cognitive and memory clinics.
In summary, in addition to being quick, easy to use and score and acceptable to patients [1,3,4], when using appropriate cut-offs MACE is a sensitive test for the identification of cognitive impairment and for excluding dementia and MCI with scores below the cut-offs.