Evaluation of Facebook as a Longitudinal Data Source for Parkinson’s Disease Insights
Abstract
1. Introduction
2. Materials and Methods
2.1. Participants and Data Collection
2.2. PD-Related Post Identification
2.2.1. Text Extraction from JSON Files
2.2.2. PD-Related Term Dictionary and Search Strategy
2.2.3. Development of Ground-Truth Dataset
2.2.4. Classifier Development
2.3. Analyses
3. Results
3.1. Dataset Characteristics and Participant Engagement
3.1.1. Recruitment and Demographics
3.1.2. Comparison of Facebook Users and Non-Users
3.1.3. Facebook Account History
3.2. Identification of PD-Related Posts
3.2.1. Ground-Truth Dataset
3.2.2. Classifier Performance
3.3. PD-Related Facebook Activity
3.3.1. Prevalence of PD-Related Posts by Diagnosis Group and Timeframe
3.3.2. Within-Group Changes in PD-Related Posting After Diagnosis
3.3.3. Between-Group Comparisons Before and After Diagnosis
3.3.4. Thematic Shifts in PD Discourse over Time
4. Discussion
4.1. Principal Findings
4.1.1. Recruitment Feasibility and Platform Representation
4.1.2. Longitudinal Engagement and Health-Related Disclosures
4.1.3. Early Signals and the Limits of Specificity
4.1.4. Content Evolution and Behavioral Patterns
4.1.5. Methodological Considerations and Future Directions
4.2. Ethical Considerations
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
API | Application Programming Interface |
AP | Atypical Parkinsonism |
AUC | Area Under the Curve |
CI | Confidence Interval |
ET | Essential Tremor |
HIPAA | Health Insurance Portability and Accountability Act |
JSON | JavaScript Object Notation |
KNN | K-Nearest Neighbor |
MDS-UPDRS | Movement Disorder Society-Unified Parkinson’s Disease Rating Scale |
PD | Parkinson’s disease |
PwPD | People with Parkinson’s disease |
ROC | Receiver Operating Characteristic |
SVM | Support Vector Machine |
TF-IDF | Term Frequency–Inverse Document Frequency |
Appendix A
Appendix B
- Naïve Bayes: α = 0.1, class_prior = None, fit_prior = True, force_alpha = True;
- Random Forest: 100 estimators, entropy criterion, max features = sqrt, min_samples_leaf = 2, random_state = 42;
- XGBoost: binary logistic objective, learning_rate = 0.1, max_depth = 3, subsample = 0.8, colsample_bytree = 0.8, n_estimators = 100, tree_method = hist, random_state = 42;
- Decision Tree: Gini criterion, max_depth = 30, min_samples_leaf = 2, splitter = best, random_state = 42;
- SVM: linear kernel, C = 1, probability = True, decision_function_shape = ovr;
- AdaBoost: SAMME algorithm, 100 estimators, learning_rate = 1, random_state = 42
- KNN: Euclidean metric, n_neighbors = 8, weights = uniform;
- A soft-voting ensemble classifier was implemented, combining KNN, SVM, Random Forest, AdaBoost, Naïve Bayes, Decision Tree, and XGBoost models.
References
- Kelil, T.; Jaswal, S.; Matalon, S.A. Social Media and Global Health: Promise and Pitfalls. RadioGraphics 2022, 42, E109–E110. [Google Scholar] [CrossRef]
- Hanslo, S. Facebook Business Report. SSRN Electron. J. 2024. [Google Scholar] [CrossRef]
- Gil-Clavel, S.; Zagheni, E. Demographic Differentials in Facebook Usage around the World. Proc. Int. AAAI Conf. Web Soc. Media 2019, 13, 647–650. [Google Scholar] [CrossRef]
- Dudina, V.; Judina, D.; Platonov, K. Personal Illness Experience in Russian Social Media: Between Willingness to Share and Stigmatization. In Proceedings of the Internet Science; El Yacoubi, S., Bagnoli, F., Pacini, G., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 47–58. [Google Scholar]
- Bodnar, T.; Barclay, V.C.; Ram, N.; Tucker, C.S.; Salathé, M. On the Ground Validation of Online Diagnosis with Twitter and Medical Records. In Proceedings of the 23rd International Conference on World Wide Web, Seoul, Republic of Korea, 7–11 April 2014; pp. 651–656. [Google Scholar]
- Lejeune, A.; Robaglia, B.-M.; Walter, M.; Berrouiguet, S.; Lemey, C. Use of Social Media Data to Diagnose and Monitor Psychotic Disorders: Systematic Review. J. Med. Internet Res. 2022, 24, e36986. [Google Scholar] [CrossRef]
- Sheikhalishahi, S.; Miotto, R.; Dudley, J.T.; Lavelli, A.; Rinaldi, F.; Osmani, V. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med. Inform. 2019, 7, e12239. [Google Scholar] [CrossRef]
- Damier, P.; Henderson, E.J.; Romero-Imbroda, J.; Galimam, L.; Kronfeld, N.; Warnecke, T. Impact of Off-Time on Quality of Life in Parkinson’s Patients and Their Caregivers: Insights from Social Media. Park. Dis. 2022, 2022, 1800567. [Google Scholar] [CrossRef]
- Dorsey, E.R.; Bloem, B.R. The Parkinson Pandemic—A Call to Action. JAMA Neurol. 2018, 75, 9. [Google Scholar] [CrossRef]
- Bloem, B.R.; Okun, M.S.; Klein, C. Parkinson’s Disease. Lancet 2021, 397, 2284–2303. [Google Scholar] [CrossRef]
- Gibb, W.R.; Lees, A.J. Anatomy, Pigmentation, Ventral and Dorsal Subpopulations of the Substantia Nigra, and Differential Cell Death in Parkinson’s Disease. J. Neurol. Neurosurg. Psychiatry 1991, 54, 388–396. [Google Scholar] [CrossRef]
- Blonder, L.X. Historical and Cross-Cultural Perspectives on Parkinson’s Disease. J. Complement. Integr. Med. 2018, 15, 1–15. [Google Scholar] [CrossRef]
- Zhao, M.; Yang, C.C. Drug Repositioning to Accelerate Drug Development Using Social Media Data: Computational Study on Parkinson Disease. J. Med. Internet Res. 2018, 20, e271. [Google Scholar] [CrossRef] [PubMed]
- Al-Busaidi, I.S. Qualitative Analysis of Parkinson’s Disease Information on Social Media: The Case of YouTubeTM. Eur. Assoc. Predict. Prev. Pers. Med. 2017, 8, 273–277. [Google Scholar] [CrossRef] [PubMed]
- Martínez-Pérez, B.; De La Torre-Díez, I.; Bargiela-Flórez, B.; López-Coronado, M.; Rodrigues, J.J. Content Analysis of Neurodegenerative and Mental Diseases Social Groups. Health Inform. J. 2015, 21, 267–283. [Google Scholar] [CrossRef] [PubMed]
- Zhang, H.; Parsia, B.; Poliakoff, E.; Harper, S. Tracking Social Behaviour with Smartphones in People with Parkinson’s: A Longitudinal Study. Behav. Inf. Technol. 2024, 43, 2323–2342. [Google Scholar] [CrossRef]
- Cevik, F.; Kilimci, Z.H. Analysis of Parkinson’s Disease Using Deep Learning and Word Embedding Models. Acad. Perspect. Procedia 2019, 2, 786–797. [Google Scholar] [CrossRef]
- Chu, H.S.; Jang, H.Y. Exploring Unmet Information Needs of People with Parkinson’s Disease and Their Families: Focusing on Information Sharing in an Online Patient Community. Int. J. Environ. Res. Public Health 2022, 19, 2521. [Google Scholar] [CrossRef]
- Little, M.; Wicks, P.; Vaughan, T.; Pentland, A. Quantifying Short-Term Dynamics of Parkinson’s Disease Using Self-Reported Symptom Data From an Internet Social Network. J. Med. Internet Res. 2013, 15, e20. [Google Scholar] [CrossRef]
- Algarni, M.; Fasano, A. The Overlap between Essential Tremor and Parkinson Disease. Park. Relat. Disord. 2018, 46, S101–S104. [Google Scholar] [CrossRef]
- Goetz, C.G.; Tilley, B.C.; Shaftman, S.R.; Stebbins, G.T.; Fahn, S.; Martinez-Martin, P.; Poewe, W.; Sampaio, C.; Stern, M.B.; Dodel, R.; et al. Movement Disorder Society-Sponsored Revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS): Scale Presentation and Clinimetric Testing Results. Mov. Disord. 2008, 23, 2129–2170. [Google Scholar] [CrossRef]
- Harris, P.A.; Taylor, R.; Minor, B.L.; Elliott, V.; Fernandez, M.; O’Neal, L.; McLeod, L.; Delacqua, G.; Delacqua, F.; Kirby, J.; et al. The REDCap Consortium: Building an International Community of Software Platform Partners. J. Biomed. Inform. 2019, 95, 103208. [Google Scholar] [CrossRef]
- Drug Treatments for Parkinson’s. Available online: https://www.webmd.com/parkinsons-disease/drug-treatments (accessed on 27 January 2024).
- Powell, J.M.; Guo, Y.; Sarker, A.; McKay, J.L. Classification of Fall Types in Parkinson’s Disease from Self-Report Data Using Natural Language Processing. In Artificial Intelligence in Medicine; Juarez, J.M., Marcos, M., Stiglic, G., Tucker, A., Eds.; Lecture Notes in Computer Science; Springer Nature Switzerland: Cham, Switzerland, 2023; Volume 13897, pp. 163–172. ISBN 978-3-031-34343-8. [Google Scholar]
- Porter, M.F. An Algorithm for Suffix Stripping. Program 1980, 14, 130–137. [Google Scholar] [CrossRef]
- Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
- Loper, E.; Bird, S. NLTK: The Natural Language Toolkit. arXiv 2002, arXiv:cs/0205028. [Google Scholar]
- Owoputi, O.; O’Connor, B.; Dyer, C.; Gimpel, K.; Schneider, N.; Smith, N.A. Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters. In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, Westin Peachtree Plaza Hotel, Atlanta, GA, USA, 9–14 June 2013; Association for Computational Linguistics: Stroudsburg, PA, USA, 2013; pp. 380–390. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Esper, C.D.; Valdovinos, B.Y.; Schneider, R.B. The Importance of Digital Health Literacy in an Evolving Parkinson’s Disease Care System. J. Park. Dis. 2024, 14, 1–9. [Google Scholar] [CrossRef]
- Rojvirat, C.; Arismendi, G.R.; Feinstein, E.; Guzman, M.; Citron, B.A.; Delic, V. Systematic Review of Post-Traumatic Parkinsonism, an Emerging Parkinsonian Disorder Among Survivors of Traumatic Brain Injury. Neurotrauma Rep. 2024, 5, 37–49. [Google Scholar] [CrossRef]
- Mancosu, M.; Vegetti, F. What You Can Scrape and What Is Right to Scrape: A Proposal for a Tool to Collect Public Facebook Data. Soc. Media Soc. 2020, 6, 2056305120940703. [Google Scholar] [CrossRef]
Variable | Parkinson’s Disease (N = 38) | Essential Tremor (N = 4) | Atypical Parkinsonism (N = 3) | Caregivers (N = 15) | Total (N = 60) |
---|---|---|---|---|---|
Age in years, mean (SD) | 69.2 (10.1) | 54.1 (25.1) | 68.6 (15.8) | 62.6 (13.8) | 66.5 (13.0) |
Women, N (%) | 13 (34%) | 0 (0%) | 1 (33%) | 12 (80%) | 26 (43%) |
Race N (%) | |||||
African American/Black | 6 (16%) | 0 (0%) | 1 (33%) | 1 (7%) | 8 (13%) |
Asian | 0 (0%) | 0 (0%) | 0 (0%) | 2 (13%) | 2 (3%) |
White | 32 (84%) | 4 (100%) | 1 (33%) | 11 (73%) | 48 (80%) |
More Than One Race | 0 (0%) | 0 (0%) | 1 (33%) | 1 (7%) | 2 (3%) |
Ethnicity | |||||
Hispanic or Latino | 0 (0%) | 0 (0%) | 1 (33%) | 0 (0%) | 1 (2%) |
Not Hispanic or Latino | 35 (92%) | 4 (100%) | 2 (67%) | 15 (100%) | 56 (93%) |
Unknown/Not Reported | 3 (8%) | 0 (0%) | 0 (0%) | 0 (0%) | 3 (5%) |
Completed Education | |||||
High School | 3 (8%) | 1 (25%) | 1 (33%) | 1 (7%) | 6 (10%) |
Junior College | 5 (13%) | 0 (0%) | 0 (0%) | 2 (13%) | 7 (12%) |
College | 16 (42%) | 3 (75%) | 0 (0%) | 5 (33%) | 24 (40%) |
Graduate Degree | 14 (37%) | 0 (0%) | 2 (67%) | 7 (47%) | 23 (38%) |
Shared Facebook data | 30 (79%) | 3 (75%) | 1 (33%) | 12 (80%) | 46 (77%) |
Disease Duration in years, mean (SD) | 8.6 (4.7) | 14.7 (6.8) | 5.2 (0.7) | - | - |
MDS UPDRS Part 1 | |||||
Mean (SD) | 14.0 (7.5) | 4.5 (3.1) | 10.0 (9.6) | - | - |
N-Miss | 2 | 0 | 0 | - | - |
MDS UPDRS Part 2 | |||||
Mean (SD) | 13.8 (9.7) | 2.5 (3.0) | 18.3 (17.0) | - | - |
N-Miss | 2 | 0 | 0 | - | - |
MDS UPDRS Part 4 | |||||
Mean (SD) | 6.2 (4.0) | - | 5.3 (6.8) | - | - |
N-Miss | 2 | - | 0 | - | - |
Model | Recall (95% CI) | Precision (95% CI) | F1-Score (95% CI) |
---|---|---|---|
Naïve Bayes | 0.86 (0.84–0.88) | 0.89 (0.87–0.90) | 0.87 (0.85–0.89) |
Ensemble | 0.84 (0.82–0.86) | 0.90 (0.88–0.91) | 0.86 (0.84–0.88) |
SVM | 0.84 (0.82–0.86) | 0.86 (0.84–0.88) | 0.85 (0.83–0.87) |
Decision Tree | 0.81 (0.78–0.83) | 0.83 (0.81–0.85) | 0.81 (0.79–0.84) |
Random Forest | 0.79 (0.77–0.81) | 0.87 (0.85–0.89) | 0.81 (0.78–0.83) |
XGBoost | 0.79 (0.77–0.81) | 0.87 (0.85–0.89) | 0.81 (0.78–0.83) |
KNN | 0.77 (0.75–0.79) | 0.83 (0.81–0.85) | 0.79 (0.76–0.81) |
AdaBoost | 0.72 (0.70–0.74) | 0.85 (0.83–0.87) | 0.73 (0.70–0.75) |
Overall | Before Diagnosis | After Diagnosis | ||||
---|---|---|---|---|---|---|
Group | N | Percent ± STD | N | Percent ± STD | N | Percent ± STD |
PD | 29 | 3.6% ± 6.6% | 26 | 1.7% ± 2.6% | 29 | 4.0% ± 7.1% |
ET | 3 | 0.8% ± 0.1% | 3 | 1.0% ± 0.1% | 3 | 0.7% ± 0.1% |
AP | 1 | 5.1% ± NA | 1 | 9.3% ± NA | 1 | 4.9% ± NA |
CG | 12 | 2.6% ± 5.6% | 7 | 1.1% ± 1.1% | 9 | 1.0% ± 1.1% |
Excluding Exercise-Related Posts | ||||||
PD | 29 | 2.2% ± 3.6% | 26 | 1.2% ± 2.5% | 29 | 2.2% ± 4.0% |
ET | 3 | 0.5% ± 0.1% | 3 | 0.2% ± 0.3% | 3 | 0.5% ± 0.0% |
AP | 1 | 3.6% ± NA | 1 | 0.3% ± NA | 1 | 3.8% ± NA |
CG | 12 | 1.7% ± 4.0% | 7 | 0.8% ± 1.0% | 9 | 0.4% ± 0.6% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Powell, J.M.; Cao, C.; Means, K.; Lakamana, S.; Sarker, A.; Mckay, J.L. Evaluation of Facebook as a Longitudinal Data Source for Parkinson’s Disease Insights. J. Clin. Med. 2025, 14, 4093. https://doi.org/10.3390/jcm14124093
Powell JM, Cao C, Means K, Lakamana S, Sarker A, Mckay JL. Evaluation of Facebook as a Longitudinal Data Source for Parkinson’s Disease Insights. Journal of Clinical Medicine. 2025; 14(12):4093. https://doi.org/10.3390/jcm14124093
Chicago/Turabian StylePowell, Jeanne M., Charles Cao, Kayla Means, Sahithi Lakamana, Abeed Sarker, and J. Lucas Mckay. 2025. "Evaluation of Facebook as a Longitudinal Data Source for Parkinson’s Disease Insights" Journal of Clinical Medicine 14, no. 12: 4093. https://doi.org/10.3390/jcm14124093
APA StylePowell, J. M., Cao, C., Means, K., Lakamana, S., Sarker, A., & Mckay, J. L. (2025). Evaluation of Facebook as a Longitudinal Data Source for Parkinson’s Disease Insights. Journal of Clinical Medicine, 14(12), 4093. https://doi.org/10.3390/jcm14124093