Diagnostic Performance of the Magnetic Resonance Parkinsonism Index in Differentiating Progressive Supranuclear Palsy from Parkinson’s Disease: An Updated Systematic Review and Meta-Analysis

Progressive supranuclear palsy (PSP) and Parkinson’s disease (PD) are difficult to differentiate especially in the early stages. We aimed to investigate the diagnostic performance of the magnetic resonance parkinsonism index (MRPI) in differentiating PSP from PD. A systematic literature search of PubMed-MEDLINE and EMBASE was performed to identify original articles evaluating the diagnostic performance of the MRPI in differentiating PSP from PD published up to 20 February 2021. The pooled sensitivity, specificity, and 95% CI were calculated using the bivariate random-effects model. The area under the curve (AUC) was calculated using a hierarchical summary receiver operating characteristic (HSROC) model. Meta-regression was performed to explain the effects of heterogeneity. A total of 14 original articles involving 484 PSP patients and 1243 PD patients were included. In all studies, T1-weighted images were used to calculate the MRPI. Among the 14 studies, nine studies used 3D T1-weighted images. The pooled sensitivity and specificity for the diagnostic performance of the MRPI in differentiating PSP from PD were 96% (95% CI, 87–99%) and 98% (95% CI, 91–100%), respectively. The area under the HSROC curve was 0.99 (95% CI, 0.98–1.00). Heterogeneity was present (sensitivity: I2 = 97.29%; specificity: I2 = 98.82%). Meta-regression showed the association of the magnet field strength with heterogeneity. Studies using 3 T MRI showed significantly higher sensitivity (100%) and specificity (100%) than those of studies using 1.5 T MRI (sensitivity of 98% and specificity of 97%) (p < 0.01). Thus, the MRPI could accurately differentiate PSP from PD and support the implementation of appropriate management strategies for patients with PSP.


Introduction
Progressive supranuclear palsy (PSP) and Parkinson's disease (PD) are difficult to differentiate especially in the early stages because supranuclear vertical gaze palsy, a characteristic symptom of PSP, does not appear in the early stages of the disease [1][2][3]. In addition, as the supranuclear vertical gaze palsy is characteristic for the most common phenotype of PSP, Richardson's syndrome (PSP-RS), it may remain absent in PSP-Parkinsonism Predominant (PSP-P) [4][5][6][7]. Magnetic resonance imaging (MRI) has been widely used to differentiate PSP from PD. Various studies have been conducted using MRI with various protocols, including conventional MRI, diffusion-weighted MRI [8,9], susceptibility-weighted MRI [10], and functional MRI [11].
It has been proven that quantitative measurement with conventional MRI is a useful method to differentiate PSP from PD. Various quantitative measures, including the midbrain to pons ratio [12], area of the midbrain [13], volume of the superior cerebellar peduncle [14], and magnetic resonance parkinsonism index (MRPI), have been evaluated. Among the quantitative measures, the MRPI (the pons area to midbrain area ratio multiplied by the middle cerebellar peduncle width to superior cerebellar peduncle width ratio) has been found to accurately differentiate PSP from PD in several articles [15][16][17][18][19][20][21]. Moreover, MRPI in the early stages may be more beneficial in the differential diagnosis of PSP-P, regarding the fact that many other neuroimaging methods as perfusion SPECT do not provide sufficient differentiating [22]. For this reason, several studies on the MRPI have begun to report results according to the PSP phenotype [15,19,23]. A meta-analysis by Zhang et al. [24] reported a pooled sensitivity and specificity of 98% and 99%, respectively, for the MRPI. However, the accuracy of this result is compromised because the authors searched for articles only from PubMed, control groups were heterogeneous, they did not perform meta-regression analysis, and they did not use a hierarchical summary receiver operating characteristic curve (HSROC) model. Therefore, we aimed to perform an updated systematic review and meta-analysis in terms of the diagnostic performance of the MRPI for the differentiation of PSP from PD.

Materials and Methods
This study was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [25].

Literature Search
A systematic literature search of PubMed-MEDLINE and EMBASE was performed to identify original articles evaluating the diagnostic performance of the MRPI for the differentiation of PSP from PD published up to February 20, 2021. The search terms were as follows: (("progressive supranuclear palsy ") OR (PSP)) AND ((Parkinson disease) OR (parkinsonism)) AND (("magnetic resonance imaging") OR ("MR imaging") OR ("MRI")). No additional filters were applied.

Eligibility Criteria
To investigate the diagnostic performance of the MRPI for the differentiation of PSP from PD, studies were included if all of the following criteria were met: (1) patients with PSP or PD; (2) patients assessed with the MRPI using T1-weighted MR images; (3) reference standard: clinical diagnosis based on the criteria of each disease, e.g., PD [26,27] or PSP [3,28]; and (4) sufficient information for the reconstruction of 2 × 2 tables to investigate the diagnostic performance of the MRPI for the differentiation of PSP from PD.
Studies were excluded if they were (1) review articles; (2) case reports or case series with less than 10 patients; (3) conference abstracts; (4) editorials, chapters, and notes; (5) studies with a partially overlapping cohort; or (6) studies with incomplete data for the reconstruction of 2 × 2 tables. For studies with a partially overlapping cohort, those with the largest population were selected.

Data Extraction and Quality Assessment
A standardized form was used to extract the following information from the selected studies: (1) Study characteristics: author, institution, duration of patient recruitment, study design, consecutive or non-consecutive enrollment, and reference standard; (2) Demographic and clinical characteristics: total number of patients with PSP or PD, number of patients with PSP, mean age with standard deviation (SD), and male to female ratio; (3) Technical characteristics of MRI: magnetic field strength, vendor, scanner, MR sequences, and number and experience of the reader(s).
Quality assessment of the selected studies was performed using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool [29]. The literature search, selection based on eligibility criteria, data extraction, and quality assessment were independently conducted by two reviewers (S.K. and C.H.S.; 2 and 10 years of experience in diagnostic radiology, respectively).

Data Synthesis and Analysis
For each study, we reconstructed 2 × 2 tables. The primary outcome of our study was the diagnostic performance of the MRPI for the differentiation of PSP from PD. To evaluate the diagnostic performance of the MRPI, the pooled sensitivity, specificity, and 95% CI were calculated using the bivariate random-effects and HSROC models, and forest plots were constructed [30][31][32][33]. A HSROC curve with 95% confidence and prediction regions was also plotted.
To assess the heterogeneity among the studies, we used the following tests: (1) Cochran's Q test with p < 0.05 indicating the presence of heterogeneity; (2) Higgins inconsistency index (I 2 ) test with a value >50% indicating the presence of heterogeneity [33,34]; (3) Visual assessment of the difference between the 95% confidence region and prediction region in the HSROC curve (large difference indicating heterogeneity); (4) Visual assessment of the coupled forest plots to assess the presence of a threshold effect, i.e., a positive correlation between sensitivity and false positive rate among the selected studies; (5) Spearman correlation coefficient analysis with a value >0.6 revealing a threshold effect [35]. Publication bias was evaluated using Deeks' funnel plot, and statistical significance was assessed using Deeks' asymmetry test [36,37]. Meta-regression analysis was performed to explain the effects of heterogeneity. The magnet field strength (1.5 T vs. 3 T) was considered for the bivariate meta-regression model. Statistical analyses were conducted by one of the authors (C.H.S.; 7 years of experience in performing systematic reviews and meta-analyses) using the "metandi" and "midas" modules in Stata 15.0 (StataCorp, College Station, TX, USA) and the "meta" package in R version 3.1.2 (R Foundation for Statistical Computing, Vienna, Austria). A value of P < 0.05 was taken to indicate statistical significance.

Quality Assessment
The results of quality assessment based on QUADAS-2 criteria are shown in Figure 2. Overall, the quality of the studies was considered high. In the patient selection domain, five studies indicated an unclear risk of bias because of their non-consecutive enrollment [16,17,[55][56][57]. The remaining studies indicated a low risk of bias [15,[18][19][20][21]23,54,58,59], and all of the included studies indicated a low concern on applicability [ [15][16][17][18][19][20][21]23,[54][55][56][57][58][59]. In the index test domain, three studies indicated an unclear risk of bias because it was unclear whether the MRPI was calculated blinded to the reference standard [54,55,58]. There was one study that indicated an unclear concern on applicability because the MRI protocols used for calculating the MRPI were different from those used in other studies [18]. In the reference standard domain, two studies indicated an unclear risk of bias and unclear concern on applicability because of the lack of sufficient information about the diagnosis of PSP or PD [55,57]. In the flow and timing domain, all of the included studies indicated an unclear risk of bias because there was no information about the interval between the index test and reference standard [15][16][17][18][19][20][21]23,[54][55][56][57][58][59].

Quality Assessment
The results of quality assessment based on QUADAS-2 criteria are shown in Figure  2. Overall, the quality of the studies was considered high. In the patient selection domain, five studies indicated an unclear risk of bias because of their non-consecutive enrollment [16,17,[55][56][57]. The remaining studies indicated a low risk of bias [15,[18][19][20][21]23,54,58,59], and all of the included studies indicated a low concern on applicability [ [15][16][17][18][19][20][21]23,[54][55][56][57][58][59]. In the index test domain, three studies indicated an unclear risk of bias because it was unclear whether the MRPI was calculated blinded to the reference standard [54,55,58]. There was one study that indicated an unclear concern on applicability because the MRI protocols used for calculating the MRPI were different from those used in other studies [18]. In the reference standard domain, two studies indicated an unclear risk of bias and unclear concern on applicability because of the lack of sufficient information about the diagnosis of PSP or PD [55,57]. In the flow and timing domain, all of the included studies indicated an unclear risk of bias because there was no information about the interval between the index test and reference standard [15][16][17][18][19][20][21]23,[54][55][56][57][58][59].

Diagnostic Performance of the MRPI
The sensitivity and specificity of the MRPI in differentiating PSP from PD were available in all 14 studies. The sensitivity and specificity of the studies ranged from 66% to 100% and 68% to 100%, respectively. The cut-off value for the MRPI ranged from 8.98 to 19.42. In addition, the sensitivity and specificity of the studies using 1.5 T scanners ranged from 82% to 100% and 76% to 100%, respectively [15,16,20,55,59]. The sensitivity and specificity of the studies using 3 T scanners ranged from 78% to 100% and 82% to 100%, respectively [17,19,57]. The cut-off value for the MRPI using 1.5 T and 3 T scanners ranged from 10.67 to 19.42 and 13.37 to 13.88, respectively.

Diagnostic Performance of the MRPI
The sensitivity and specificity of the MRPI in differentiating PSP from PD were available in all 14 studies. The sensitivity and specificity of the studies ranged from 66% to 100% and 68% to 100%, respectively. The cut-off value for the MRPI ranged from 8.98 to 19.42. In addition, the sensitivity and specificity of the studies using 1.5 T scanners ranged from 82% to 100% and 76% to 100%, respectively [15,16,20,55,59]. The sensitivity and specificity of the studies using 3 T scanners ranged from 78% to 100% and 82% to 100%, respectively [17,19,57]. The cut-off value for the MRPI using 1.5 T and 3 T scanners ranged from 10.67 to 19.42 and 13.37 to 13.88, respectively.
The pooled sensitivity and specificity for the diagnostic performance of the MRPI in differentiating PSP from PD were 96% (95% CI, 87-99%) and 98% (95% CI, 91-100%), respectively ( Figure 3). The area under the HSROC curve was 0.99 (95% CI, 0.98-1.00), which indicated high diagnostic performance (Figure 4).    Cochran's Q test showed that heterogeneity was present among the selected studies (sensitivity: Q = 480.21, p < 0.01; specificity: Q = 1101.73, p < 0.01). In addition, Higgins I 2 test showed that heterogeneity was present (sensitivity: I 2 = 97.29%; specificity: I 2 = 98.82%). There was a large difference between the 95% prediction region and the 95% confidence region, indicating a high possibility of heterogeneity among the selected studies. The coupled forest plots indicated no threshold effect, and the Spearman correlation coefficient between sensitivity and false positive rate was −0.841 (95% CI, −0.562--0.949), also indicating a low likelihood of a threshold effect. Deeks' funnel plot showed a low possibility of publication bias (p = 0.59) ( Figure 5). Cochran's Q test showed that heterogeneity was present among the selected studies (sensitivity: Q = 480.21, p < 0.01; specificity: Q = 1101.73, p < 0.01). In addition, Higgins I 2 test showed that heterogeneity was present (sensitivity: I 2 = 97.29%; specificity: I 2 = 98.82%). There was a large difference between the 95% prediction region and the 95% confidence region, indicating a high possibility of heterogeneity among the selected studies. The coupled forest plots indicated no threshold effect, and the Spearman correlation coefficient between sensitivity and false positive rate was −0.841 (95% CI, −0.562-−0.949), also indicating a low likelihood of a threshold effect. Deeks' funnel plot showed a low possibility of publication bias (p = 0.59) ( Figure 5).

Discussion
We investigated the diagnostic performance of the MRPI for the differentiation of PSP from PD using the bivariate random-effects and HSROC models. Our updated metaanalysis demonstrated the excellent diagnostic performance of the MRPI in differentiating PSP from PD. The pooled sensitivity was 96% (95% CI, 87-99%), the pooled specificity was 98% (95% CI, 91-100%), and the area under the HSROC curve was 0.99 (95% CI, 0.98-1.00). Heterogeneity was present among the selected studies; however, meta-regression showed significantly higher sensitivity and specificity when using 3 T MRI compared with 1.5 T MRI. Therefore, the MRPI may have great potential to accurately differentiate PSP from PD and could help with the implementation of appropriate management strategies for patients with PSP.
Several studies have evaluated the diagnostic performance of the MRI for the differentiation of atypical parkinsonism from PD using various measurement methods and techniques, i.e., measurement of the midbrain area, pons area to midbrain area ratio, or MRPI and voxel-based morphometry using a supervised machine learning algorithm [13,43,49]. Our updated meta-analysis focused on 14 articles that used the MRPI only for

Discussion
We investigated the diagnostic performance of the MRPI for the differentiation of PSP from PD using the bivariate random-effects and HSROC models. Our updated metaanalysis demonstrated the excellent diagnostic performance of the MRPI in differentiating PSP from PD. The pooled sensitivity was 96% (95% CI, 87-99%), the pooled specificity was 98% (95% CI, 91-100%), and the area under the HSROC curve was 0.99 (95% CI, 0.98-1.00). Heterogeneity was present among the selected studies; however, meta-regression showed significantly higher sensitivity and specificity when using 3 T MRI compared with 1.5 T MRI. Therefore, the MRPI may have great potential to accurately differentiate PSP from PD and could help with the implementation of appropriate management strategies for patients with PSP.
Several studies have evaluated the diagnostic performance of the MRI for the differentiation of atypical parkinsonism from PD using various measurement methods and techniques, i.e., measurement of the midbrain area, pons area to midbrain area ratio, or MRPI and voxel-based morphometry using a supervised machine learning algorithm [13,43,49]. Our updated meta-analysis focused on 14 articles that used the MRPI only for the differentiation of PSP from PD. The main source of heterogeneity was the magnet field strength; however, the sensitivity and specificity of each subgroup were still high (all of the values were higher than 97%). Therefore, our study demonstrated that the MRPI could be used to differentiate PSP from PD.
The introduction of MRPI facilitated the differentiation of atypical parkinsonism from PD, but PSP-P was difficult to differentiate from PD with the MRPI [7,15]. MRPI 2.0 has been introduced to distinguish not only PSP-RS, but also PSP-P from PD, and several recent studies introduced MRPI 2.0 [19,23,58]. Notably, Quattrone et al. [19] reported that both MRPI and MRPI 2.0 had excellent diagnostic performances in differentiating PSP-RS from PD, but the MRPI 2.0 outperformed MRPI in distinguishing PSP-P from PD. As there were few studies on MRPI 2.0 [19,23,58], we studied on MRPI. If more research on MRPI 2.0 comes out, it will be necessary to analyze it. Furthermore, there have been attempts to differentiate atypical parkinsonism from PD using automated volumetry or machine learning algorithms [17,19,43,58]. Nigro et al. [17] demonstrated that automated measurement of the MRPI showed good performance in comparison with manual measurement. In addition, Salvatore et al. [43] suggested that a machine learning algorithm can allow the differentiation of PSP from PD. As described above, several notable studies using various measurement methods or techniques have been performed; however, subgroup analysis was not possible because of the paucity of the data. Further studies should be conducted to address the issue of paucity.
Although Zhang et al. [24] previously performed a systematic review and metaanalysis, there were several limitations in that study. First, their search strategy was inadequate; they searched for articles only from PubMed. In comparison, our study included articles from both PubMed and EMBASE. Second, Zhang et al. included articles that differentiated PSP patients from healthy controls; our study excluded 12 of 34 articles that did not differentiate PSP from PD, i.e., articles that differentiated PSP patients from non-PSP patients including multiple system atrophy or healthy controls. Our study focused on the differentiation of PSP from PD. Third, our study used the HSROC curve to evaluate heterogeneity and the accuracy of the MRPI, and Zhang et al. only used the summary receiver operating characteristic curve. Finally, their study did not perform meta-regression analysis. However, our study uncovered a major source of heterogeneity with metaregression analysis.
There are several limitations in our meta-analysis. First, there was heterogeneity among the selected studies. We performed meta-regression analysis to address this problem. In addition, the potential source of the heterogeneity could be that the progression stages of PD and PSP differed between each study. Second, although several latest studies on the MRPI report results according to the PSP phenotype, we did not divide the PSP group into subgroups of patients with two major phenotypes. Further studies on the performance of the MRPI according to the PSP phenotype will be needed. Third, the age of the data was the limitation. The selected 14 articles included the studies published before 2017 that were based on old criteria of PSP diagnosis. Fourth, because the number of selected studies was small, we could not perform sub-group analysis. Last, slice thickness or whether 3D images were used may have affected our meta-regression which revealed higher sensitivity and specificity when using 3 T MRI compared with 1.5 T MRI. Further studies should be conducted with standardized patient groups and protocols.
In conclusion, our meta-analysis demonstrated that the MRPI may have great potential to accurately differentiate PSP from PD and could help with the implementation of appropriate management strategies for patients with PSP.