Causal Forest Machine Learning Analysis of Parkinson’s Disease in Resting-State Functional Magnetic Resonance Imaging
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Set
2.2. Image Pre-Processing
2.3. Generation of Time Series and Feature Extraction
2.4. Feature Selection with Causal Forest
2.5. Classification
2.6. PD Detection Performance Assessment
2.7. Multiple Correspondence Analysis
3. Results
3.1. Parkinson Disease Detection Performance Assessment
3.2. Identification of the Most Relevant Characteristics for PD Detection
3.3. Brain Regions with the Highest Average Contribution to Causal Forest in Four Populations
3.4. Association between Brain Regions and Populations
4. Discussion
4.1. Data Set
4.2. Selection of the Most Relevant Causal Characteristics in Each Group for PD Detection
4.3. PD Detection
4.4. Associations between Brain Regions and Groups of Participants
4.5. Limitations
4.6. Parkinsonism vs. Parkinson as Future Work
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
CF | Causal Forest |
EPI | Echo Planar Image |
FCP | 1000 Functional Connectomes Project |
fMRI | Functional Magnetic Resonance Imaging |
HP | Healthcare Professionals |
HYS | Hoehn and Yahr scale |
LASSO | Least Absolute Shrinkage and Selection Operator |
MCA | Multiple Correspondence Analysis |
ML | Machine Learning |
PD | Parkinson’s Disease |
PPMI | Parkinson’s Progression Markers Initiative |
RF | Random Forest |
ROI | Region of Interest |
rs-fMRI | Resting-State Functional Magnetic Resonance Imaging |
TR | Repetition Time |
UPDRS | Unified Parkinson’s Disease Rating Scale |
WFSS | Wrapper Feature Subset Selection |
XGBoost | Extreme Gradient Boosting |
References
- Braak, H.; Ghebremedhin, E.; Rüb, U.; Bratzke, H.; Del Tredici, K. Stages in the development of Parkinson’s disease-related pathology. Cell Tissue Res. 2004, 318, 121–134. [Google Scholar] [CrossRef] [PubMed]
- Moustafa, A.A.; Chakravarthy, S.; Phillips, J.R.; Gupta, A.; Keri, S.; Polner, B.; Frank, M.J.; Jahanshahi, M. Motor symptoms in Parkinson’s disease: A unified framework. Neurosci. Biobehav. Rev. 2016, 68, 727–740. [Google Scholar] [CrossRef] [PubMed]
- Ryman, S.G.; Poston, K.L. MRI biomarkers of motor and non-motor symptoms in Parkinson’s disease. Park. Relat. Disord. 2020, 73, 85–93. [Google Scholar] [CrossRef] [PubMed]
- Tahmasian, M.; Bettray, L.M.; van Eimeren, T.; Drzezga, A.; Timmermann, L.; Eickhoff, C.R.; Eickhoff, S.B.; Eggers, C. A systematic review on the applications of resting-state fMRI in Parkinson’s disease: Does dopamine replacement therapy play a role? Cortex 2015, 73, 80–105. [Google Scholar] [CrossRef] [PubMed]
- Nawaz, A.; Rehman, A.U.; Ali, T.M.; Hayat, Z.; Khaleeq, U.; Zaman, U.; Ali, A.R. A Comprehensive Literature Review of Application of Artificial Intelligence in Functional Magnetic Resonance Imaging for Disease Diagnosis ABSTRACT. Appl. Artif. Intell. 2021, 35, 1420–1438. [Google Scholar] [CrossRef]
- Khosla, M.; Jamison, K.; Ngo, G.H.; Kuceyeski, A.; Sabuncu, M.R. Machine learning in resting-state fMRI analysis. Magn. Reson. Imaging 2019, 64, 101–121. [Google Scholar] [CrossRef] [PubMed]
- van Timmeren, J.E.; Cester, D.; Tanadini-Lang, S.; Alkadhi, H.; Baessler, B. Radiomics in medical imaging—“How-to” guide and critical reflection. Insights Imaging 2020, 11, 91. [Google Scholar] [PubMed]
- Calesella, F.; Testolin, A.; De Filippo De Grazia, M.; Zorzi, M. A comparison of feature extraction methods for prediction of neuropsychological scores from functional connectivity data of stroke patients. Brain Inform. 2021, 8, 8. [Google Scholar] [CrossRef] [PubMed]
- Pospelov, N.; Tetereva, A.; Martynova, O.; Anokhin, K. The Laplacian eigenmaps dimensionality reduction of fMRI data for discovering stimulus-induced changes in the resting-state brain activity. Neuroimage Rep. 2021, 1, 100035. [Google Scholar] [CrossRef]
- Shi, D.; Yao, X.; Li, Y.; Zhang, H.; Wang, G.; Wang, S.; Ren, K. Classification of Parkinson’s disease using a region-of-interest- and resting-state functional magnetic resonance imaging-based radiomics approach. Brain Imaging Behav. 2022, 16, 2150–2163. [Google Scholar] [CrossRef]
- Liu, G.; Lu, W.; Qiu, J.; Shi, L. Identifying individuals with attention-deficit/hyperactivity disorder based on multisite resting-state functional magnetic resonance imaging: A radiomics analysis. Hum. Brain Mapp. 2023, 44, 3433–3445. [Google Scholar] [CrossRef] [PubMed]
- Cao, X.; Wang, X.; Xue, C.; Zhang, S.; Huang, Q.; Liu, W. A Radiomics Approach to Predicting Parkinson’s Disease by Incorporating Whole-Brain Functional Activity and Gray Matter Structure. Front. Neurosci. 2020, 14, 751. [Google Scholar] [PubMed]
- Zhang, X.; Cao, X.; Xue, C.; Zheng, J.; Zhang, S.; Huang, Q.; Liu, W. Aberrant functional connectivity and activity in Parkinson’s disease and comorbidity with depression based on radiomic analysis. Brain Behav. 2021, 11, e02103. [Google Scholar] [CrossRef] [PubMed]
- Shi, D.; Zhang, H.; Wang, G.; Wang, S.; Yao, X.; Li, Y.; Guo, Q.; Zheng, S.; Ren, K. Machine learning for detecting parkinson’s disease by resting-state functional magnetic resonance imaging: A multicenter radiomics analysis. Front. Aging Neurosci. 2022, 14, 806828. [Google Scholar] [CrossRef] [PubMed]
- Shi, C.; Zhang, J.; Wu, X. An fMRI feature selection method based on a minimum spanning tree for identifying patients with autism. Symmetry 2020, 12, 1995. [Google Scholar] [CrossRef]
- Yu, K.; Liu, L.; Li, J. A unified view of causal and non-causal feature selection. ACM Trans. Knowl. Discov. Data (TKDD) 2021, 15, 63. [Google Scholar] [CrossRef]
- Athey, S.; Tibshirani, J.; Wager, S. Generalized random forests. Ann. Stat. 2019, 47, 1179–1203. [Google Scholar] [CrossRef]
- Venkatasubramaniam, A.; Mateen, B.A.; Shields, B.M.; Hattersley, A.T.; Jones, A.G.; Vollmer, S.J.; Dennis, J.M. Comparison of causal forest and regression-based approaches to evaluate treatment effect heterogeneity: An application for type 2 diabetes precision medicine. BMC Med. Inform. Decis. Mak. 2023, 23, 110. [Google Scholar] [CrossRef]
- Suk, Y.; Kang, H.; Kim, J.S. Random forests approach for causal inference with clustered observational data. Multivar. Behav. Res. 2021, 56, 829–852. [Google Scholar] [CrossRef]
- Gulen, H.; Jens, C.; Page, T.B. Balancing External vs. Internal Validity: An Application of Causal Forest in Finance. October 2022. Available online: https://ssrn.com/abstract=3583685 (accessed on 2 June 2024). [CrossRef]
- Clark, A.G.; Foster, M.; Prifling, B.; Walkinshaw, N.; Hierons, R.M.; Schmidt, V.; Turner, R.D. Testing Causality in Scientific Modelling Software. ACM Trans. Softw. Eng. Methodol. 2023, 33, 10. [Google Scholar] [CrossRef]
- Michael, J. Fox Foundation for Parkinson Research Data Resources. 2018. Available online: https://www.michaeljfox.org/data-resources (accessed on 2 June 2024).
- Nueroimaging Tools and Resources Collaboratory. Neuroimaging Data Repository. 2023. Available online: https://www.nitrc.org/xnat/index.php (accessed on 2 June 2024).
- Esteban, O.; Markiewicz, C.J.; Blair, R.W.; Moodie, C.A.; Isik, A.I.; Erramuzpe, A.; Kent, J.D.; Goncalves, M.; DuPre, E.; Snyder, M.; et al. fMRIPrep: A robust preprocessing pipeline for functional MRI. Nat. Methods 2019, 16, 111–116. [Google Scholar] [CrossRef] [PubMed]
- The fMRIPrep Developers fMRIPrep: A Robust Preprocessing Pipeline for fMRI Data. 2023. Available online: https://fmriprep.org/en/stable/ (accessed on 2 June 2024).
- Schaefer, A.; Kong, R.; Gordon, E.M.; Laumann, T.O.; Zuo, X.N.; Holmes, A.J.; Eickhoff, S.B.; Yeo, B.T.T. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cereb Cortex 2018, 28, 3095–3114. [Google Scholar] [CrossRef]
- Gohel, S.R.; Biswal, B.B. Functional integration between brain regions at rest occurs in multiple-frequency bands. Brain Connect. 2015, 5, 23–34. [Google Scholar] [CrossRef] [PubMed]
- Jawadekar, N.; Kezios, K.; Odden, M.C.; Stingone, J.A.; Calonico, S.; Rudolph, K.; Hazzouri, A.Z.A. Practical Guide to Honest Causal Forests for Identifying Heterogeneous Treatment Effects. Am. J. Epidemiol. 2023, 192, 1155–1165. [Google Scholar] [CrossRef]
- Spirtes, P. Introduction to causal inference. J. Mach. Learn. Res. 2010, 11, 1643–1662. [Google Scholar]
- Ho, T.K. Random Decision Forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, San Jose, CA, USA, 14–16 August 1995; pp. 278–282. [Google Scholar]
- Ho, T.K. The Random Subspace Method for Constructing Decision Forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar] [CrossRef]
- Roobaert, D.; Karakoulas, G.; Chawla, N.V. Information gain, correlation and support vector machines. Stud. Fuzziness Soft Comput. 2006, 207, 463–470. [Google Scholar]
- Oprescu, M.; Syrgkanis, V.; Battocchi, K.; Hei, M.; Lewis, G. EconML: A Python Package for ML-Based Heterogeneous Treatment Effects Estimation. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; Available online: https://cpb-us-w2.wpmucdn.com/sites.coecis.cornell.edu/dist/a/238/files/2019/12/Id_112_final.pdf (accessed on 2 June 2024).
- Kohavi, R.; John, G.H. Wrappers for feature subset selection. Artif. Intell. 1997, 97, 273–324. [Google Scholar] [CrossRef]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning with Applications in R; Springer Texts in Statistics; Springer: New York, NY, USA, 2013. [Google Scholar]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), San Francisco, CA, USA, 13–17 August 2016; pp. 1–13. [Google Scholar]
- Budholiya, K.; Shrivastava, S.K.; Sharma, V. An Optimized XGBoost Based Diagnostic System for Effective Prediction of Heart Disease. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 4514–4523. [Google Scholar] [CrossRef]
- Farzipour, A.; Elmi, R.; Nasiri, H. Detection of Monkeypox Cases Based on Symptoms Using XGBoost and Shapley Additive Explanations Methods. Diagnostics 2023, 13, 2391. [Google Scholar] [CrossRef]
- Guan, X.; Du, Y.; Ma, R.; Teng, N.; Ou, S.; Zhao, H.; Li, X. Construction of the XGBoost model for early lung cancer prediction based on metabolic indices. BMC Med. Inform. Decis. Mak. 2023, 23, 107. [Google Scholar] [CrossRef]
- Dutschmann, T.M.; Kinzel, L.; ter Laak, A.; Baumann, K. Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation. J. Cheminform. 2023, 15, 832–844. [Google Scholar] [CrossRef] [PubMed]
- Soares-Costa, P.; Correia-Santos, N.; Cunha, P.; Cotter, J.; Sousa1, N. The use of multiple correspondence analysis to explore associations between categories of qualitative variables in Healthy ageing. J. Aging Res. 2013, 2013, 302163. [Google Scholar] [CrossRef]
- Ayele, D.; Zewotir, T.; Mwambi, H. Multiple correspondence analysis as a tool for analysis of large health surveys in African settings. Afr. Health Sci. 2014, 14, 1036–1045. [Google Scholar] [CrossRef]
- Alhuzali, T.; Beh, E.; Stojanovski, E. Multiple correspondence analysis as a tool for examining Nobel Prize data from 1901 to 2018. PLoS ONE 2022, 14, e0265929. [Google Scholar] [CrossRef]
- Shulman, L.M.; Gruber-Baldini, A.L.; Anderson, K.E.; Fishman, P.S.; Reich, S.G.; Weiner, W.J. The clinically important difference on the unified Parkinson’s disease rating scale. Arch. Neurol. 2010, 67, 64–70. [Google Scholar] [CrossRef]
- Kazeminejad, A.; Golbabaei, S.; Soltanian-Zadeh, H. Graph theoretical metrics and machine learning for diagnosis of Parkinson’s disease using rs-fMRI. In Proceedings of the 2017 Artificial Intelligence and Signal Processing Conference (AISP), Shiraz, Iran, 25–27 October 2017; pp. 134–139. [Google Scholar]
- Guo, X.; Tinaz, S.; Dvornek, N.C. Characterization of Early Stage Parkinson’s Disease From Resting-State fMRI Data Using a Long Short-Term Memory Network. Front. Neuroimaging 2022, 1, 952084. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, K.P.; Raval, V.; Treacher, A.; Mellema, C.; Yu, F.; Pinho, M.C.; Subramaniam, R.M.; Richard, B.; Dewey, J.; Montillo, A. Predicting Parkinson’s disease trajectory using clinical and neuroimaging baseline measures. Park. Relat. Disord. 2021, 84, 44–51. [Google Scholar] [CrossRef]
- Rubbert, C.; Mathys, C.; Jockwitz, C.; Hartmann, C.J.; Eickhoff, S.B.; Hoffstaedter, F.; Caspers, S.; Eickhoff, C.R.; Sigl, B.; Teichert, N.A.; et al. Machine-learning identifies Parkinson’s disease patients based on resting-state between-network functional connectivity. Br. J. Radiol. 2019, 92, 20180886. [Google Scholar] [CrossRef]
- Haq, N.F.; Cai, J.; Yu, T.; McKeown, M.J.; Wang, Z.J. Parkinson’s Disease Detection from fMRI-Derived Brainstem Regional Functional Connectivity Networks; Springer: Berlin/Heidelberg, Germany, 2020; pp. 33–43. [Google Scholar]
- Ram, A. Analysis, Identification and Prediction of Parkinson’s disease sub-types and progression through Machine Learning. Open Access Libr. J. 2024, 11, 1–15. [Google Scholar] [CrossRef]
Data Set | PD | Control | ||||
---|---|---|---|---|---|---|
Female | Male | Female | Male | Age | TR | |
PPMI | 70 | 147 | 4 | 18 | 38–77 | 2.4 |
1000 Functional Connectome Project | 0 | 0 | 103 | 101 | 19–65 | 2–2.3 |
Total | 70 | 147 | 107 | 119 | 19–77 | 2–2.4 |
Dataset | CF Reduction | Number of Features | PD Detection Accuracy |
---|---|---|---|
Mixed XGBoost | 98% | 192 | 0.976 |
Mixed LR | 96% | 26 | 0.976 |
Female XGBoost | 99% | 445 | 0.957 |
Female LR | 96% | 55 | 0.956 |
Male XGBoost | 96% | 16 | 0.930 |
Male LR | 96% | 37 | 0.965 |
Family of Features | CF Female Control | CF Male Control | CF Female PD | CF Male PD |
---|---|---|---|---|
Voxel gray intensity | 59.21% | 31.41% | 43.05% | 31.5% |
Frequency | 35.43% | 65.52% | 56.53% | 64.82% |
Connectivity | 5.36% | 3.07% | 0.41% | 3.67% |
Author and Year | Dataset | Framework | Performance |
---|---|---|---|
Kazeminejad et al., 2017 [45] | Resting state fMRI data from 18 healthy controls and 19 patients | Global graph theoretical metrics were extracted and used as features to a support vector machine classifier | Accuracy of 95% with leave-one-out cross-validation |
Guo et al., 2022 [46] | 84 subjects (56 in stage 2 and 28 in stage 1) from PPMI data set | Long short-term memory (LSTM) network to characterize early stages of PD | Accuracy of 71.63% with 10-fold stratified cross-validation |
Nguyen et al., 2020 [47] | PPMI data from 82 PD subjects are used to predict clinical severity and progression at 1 year, 2 years, and 4 years | High- and low-progression is classified with Gradient Boosting, ElasticNet and SVM | Positive predictive values up to 71% and negative predictive values up to 84% |
Cao et al., 2020 [12] | Private data from fifty healthy controls and 70 PD patients | 6664 features are extracted and fed to SVM | Accuracy 100% |
Rubbert et al., 2019 [48] | Private data from 42 PD patients and 47 controls | Boosted Logistic Regression models were trained with correlation matrices | Accuracy 76.2%, sensitivity 81%, specificity 72.7% |
Fariah Haq et al., 2020 [49] | University of British Columbia data from fifteen control participants and seventeen PD patients | Graph theoretic features were extracted and fed to an SVM classifier | Sensitivity 94% |
Proposed method | PPMI data from 46 male and 22 female patients. PPMI data from 18 male and 4 controls. Data from the 1000 Functional Connectcomes project dataset: 28 male and 18 female controls | Logistic Classifier and XGBoost, gray intensity and frequency features, feature selection | Accuracy 95.7% (women), 96.5% (men) and 97.6% (mixed). Precision 96.7% (women), 97.4% (men) and 97.4% (mixed). Recall 96.6% (women), 96.1% (men) and 97.7% (mixed). score 95.1% (women), 96.1% (men), 97.5% (mixed) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Solana-Lavalle, G.; Cusimano, M.D.; Steeves, T.; Rosas-Romero, R.; Tyrrell, P.N. Causal Forest Machine Learning Analysis of Parkinson’s Disease in Resting-State Functional Magnetic Resonance Imaging. Tomography 2024, 10, 894-911. https://doi.org/10.3390/tomography10060068
Solana-Lavalle G, Cusimano MD, Steeves T, Rosas-Romero R, Tyrrell PN. Causal Forest Machine Learning Analysis of Parkinson’s Disease in Resting-State Functional Magnetic Resonance Imaging. Tomography. 2024; 10(6):894-911. https://doi.org/10.3390/tomography10060068
Chicago/Turabian StyleSolana-Lavalle, Gabriel, Michael D. Cusimano, Thomas Steeves, Roberto Rosas-Romero, and Pascal N. Tyrrell. 2024. "Causal Forest Machine Learning Analysis of Parkinson’s Disease in Resting-State Functional Magnetic Resonance Imaging" Tomography 10, no. 6: 894-911. https://doi.org/10.3390/tomography10060068
APA StyleSolana-Lavalle, G., Cusimano, M. D., Steeves, T., Rosas-Romero, R., & Tyrrell, P. N. (2024). Causal Forest Machine Learning Analysis of Parkinson’s Disease in Resting-State Functional Magnetic Resonance Imaging. Tomography, 10(6), 894-911. https://doi.org/10.3390/tomography10060068