Sentiment Analysis and Text Mining of Questionnaires to Support Telemonitoring Programs
Abstract
:1. Introduction
- to increase self-management skills for patients whether they have a chronic condition or, for instance, during recovery or a rehabilitation phase after surgery or in the follow-up after a long time hospitalization and also during the treatment for depression and other mental health conditions;
- to improve patient surveillance and medication adherence by remote monitoring of their health status; and
- to reduce healthcare costs by optimizing the doctor’s work by reducing the number of accesses to the emergency room and reducing the average of hospital admissions per patient.
2. Materials and Methods
2.1. Preliminary Information
2.2. Experimental Protocol and Dataset Description
2.3. Administration of Questionnaire
2.4. System Architecture
- Data collection: Only completed survey data are automatically downloaded and cleaned to eliminate redundant information and duplicates. If the same respondent at the same administration epoch sent two surveys, the most recent one is considered.
- Data integration: Collected data are joined with adherence data according to a common schema. All the steps, typical of an ETL (Extraction, Transformation and Loading) approach, have been implemented through the Pandas library [22]. The integrated data table may be downloaded as a comma separate value (csv) file.
- Data analysis: Collected and integrated data encompass both structured information and free texts, resulting from patients’ answers to the open-ended TDO questions. Therefore, this level implements different analysis pipelines, depending on the integrated data and analysis type.
2.4.1. Data Analysis Pipeline
- Text preprocessing: It includes standard NLP (Natural Language Processing) techniques, i.e., tokenization, stop word removal, and lemmatization. The preprocessing step was executed by using SpaCy (https://spacy.io/), a popular library for NLP in Python, which provides a set of preprocessing algorithms also for the Italian language.
- Feature extraction: To each open-ended free-text answer, a polarity score in the range was assigned through the VADER [23] lexicon-based method adapted to the Italian language and considered as a numerical feature.
- Statistical hypothesis testing: Data were sorted by respondents and survey submission date and, for each open-ended question, the sequence of assigned polarity was modeled as a time series, as the sequence of Adh-scores. Augmented Dickey–Fuller Test [24] was used to check for stationarity, while Granger-causality hypothesis test model [25] was examined to discuss the existence of directed causal interactions between the polarity score associated with free-text answers and adherence.
- Data visualization: To provide useful insights and summarize patient answers, different visualization techniques were used. In particular, preprocessed free-text answers are visualized through word clouds, while a graph shows the time-series of polarity scores at different submission epochs.
2.4.2. Sentiment Polarity Extraction
- Punctuation: The exclamation mark (!) is a valence booster, i.e., it increases sentiment intensity without affecting sentiment orientation.
- Capitalization: Uppercase words in the presence of lower cases words should be treated as a sentiment intensifier without affecting sentiment orientation.
- Degree modifiers: Nouns, adjectives, and adverbs, as well as idioms, are known as intensifiers or down-toning, which impact sentiment intensity by increasing or decreasing it.
- Contrastive particles: The “but” conjunction between two sentences shifts sentiment polarity in favor of the second part of the text.
- Negation: Negations reverse the polarity orientation of the lexical particles they are referred to. The investigation of Trigram preceding sentiment-laden terms enables the identification of negations for that specific term.
2.4.3. Granger-Causality Testing
3. Results and Discussion
3.1. Exploratory Data Analysis
- Q1 - “What do you think about telemedicine?”
- Q2 - “Since you joined the telemonitoring program, what has improved the quality of your life?”
3.2. Testing Granger-Causality
3.3. Discussion
- better assistance to the cystic fibrosis patient, both clinical and psychological (quality of life and self-management of the disease); and
- improvement of the quality of the telemonitoring program.
4. Conclusions
- the design of a self-hosted web-based survey instrument built on top of LimeSurvey for the management of online inquiries over time;
- an analysis pipeline that exploits sentiment analysis techniques to infer a sentiment polarity score for each open-ended answer and uses it as a numerical feature (to the best of our knowledge, this is the first time that this kind of approach has been proposed); and
- the validation of both the survey instrument and the analysis pipeline, which were applied to collect and analyze 169 TDO survey responses sent by 38 patients enrolled in a home telemonitoring program provided by the Cystic Fibrosis Unit at the “Bambino Gesu” Children Hospital in Rome, Italy.
Author Contributions
Funding
Conflicts of Interest
Appendix A
References
- Ryu, S. Telemedicine: Opportunities and Developments in Member States: Report on the Second Global Survey on eHealth 2009 (Global Observatory for eHealth Series, Volume 2). Healthc. Inf. Res. 2012, 18, 153. [Google Scholar] [CrossRef]
- White, L.A.E.; Krousel-Wood, M.A.; Mather, F. Technology meets healthcare: Distance learning and telehealth. Ochsner J. 2001, 3, 22–29. [Google Scholar] [PubMed]
- Suter, P.; Suter, W.N.; Johnston, D. Theory-based telehealth and patient empowerment. Popul. Health Manag. 2011, 14, 87–92. [Google Scholar] [CrossRef] [PubMed]
- Nielsen, M.K.; Johannessen, H. Patient empowerment and involvement in telemedicine. J. Nurs. Educ. Pract. 2019, 9, 54–58. [Google Scholar] [CrossRef]
- Delgoshaei, B.; Mobinizadeh, M.; Mojdekar, R.; Afzal, E.; Arabloo, J.; Mohamadi, E. Telemedicine: A systematic review of economic evaluations. Med J. Islam. Repub. Iran (MJIRI) 2017, 31, 754–761. [Google Scholar] [CrossRef]
- Armaignac, D.L.; Saxena, A.; Rubens, M.; Valle, C.A.; Williams, L.M.S.; Veledar, E.; Gidel, L.T. Impact of Telemedicine on Mortality, Length of Stay, and Cost Among Patients in Progressive Care Units: Experience From a Large Healthcare System. Crit. Care Med. 2018, 46, 728. [Google Scholar] [CrossRef] [Green Version]
- Polisena, J.; Tran, K.; Cimon, K.; Hutton, B.; McGill, S.; Palmer, K. Home telehealth for diabetes management: A systematic review and meta-analysis. Diabetes Obes. Metab. 2009, 11, 913–930. [Google Scholar] [CrossRef]
- Gorst, S.L.; Armitage, C.J.; Brownsell, S.; Hawley, M.S. Home telehealth uptake and continued use among heart failure and chronic obstructive pulmonary disease patients: A systematic review. Ann. Behav. Med. 2014, 48, 323–336. [Google Scholar] [CrossRef] [Green Version]
- Tagliente, I.; Solvoll, T.; Murgia, F.; Bella, S. Telemonitoring in cystic fibrosis: A 4-year assessment and simulation for the next 6 years. Interact. J. Med. Res. 2016, 5, e11. [Google Scholar] [CrossRef]
- Sibley, C.D.; Rabin, H.; Surette, M.G. Cystic fibrosis: A polymicrobial infectious disease. Future Microbiol. 2006, 1, 53–61. [Google Scholar] [CrossRef]
- Bella, S.; Murgia, F.; Tozzi, A.; Cotognini, C.; Lucidi, V. Five years of telemedicine in cystic fibrosis disease. La Clinica Terapeutica 2009, 160, 457–460. [Google Scholar] [PubMed]
- Crombie, I.K. Research in Health Care: Design, Conduct and Interpretation of Health Services Research; John Wiley & Sons: Hoboken, NJ, USA, 1996. [Google Scholar]
- Allery, L.A. Design and use questionnaires for research in medical education. Educ. Prim. Care 2016, 27, 234–238. [Google Scholar] [CrossRef]
- Popping, R. Analyzing open-ended questions by means of text analysis procedures. Bull. Sociol. Methodol. De Méthodologie Sociol. 2015, 128, 23–39. [Google Scholar] [CrossRef] [Green Version]
- Schuman, H.; Presser, S. The open and closed question. Am. Sociol. Rev. 1979, 44, 692–712. [Google Scholar] [CrossRef]
- Schwarz, N. Self-reports: How the questions shape the answers. Am. Psychol. 1999, 54, 93. [Google Scholar] [CrossRef]
- Friborg, O.; Rosenvinge, J.H. A comparison of open-ended and closed questions in the prediction of mental health. Qual. Quant. 2013, 47, 1397–1411. [Google Scholar] [CrossRef]
- Zucco, C.; Calabrese, B.; Agapito, G.; Guzzi, P.H.; Cannataro, M. Sentiment analysis for mining texts and social networks data: Methods and tools. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, e1333. [Google Scholar] [CrossRef]
- Murgia, F.; Cilli, M.; Renzetti, E.; Majo, F.; Soldi, D.; Lucidi, V.; Bella, F.; Bella, S. Remote telematic control in cystic fibrosis. La Clinica Terapeutica 2011, 162, e121–e124. [Google Scholar]
- Zucco, C.; Bella, S.; Paglia, C.; Tabarini, P.; Cannataro, M. Predicting Abandonment in Telehomecare Programs Using Sentiment Analysis: A System Proposal. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018, Madrid, Spain, 3–6 December 2018; Zheng, H.J., Callejas, Z., Griol, D., Wang, H., Hu, X., Schmidt, H.H.H.W., Baumbach, J., Dickerson, J., Zhang, L., Eds.; IEEE Computer Society: Washington, DC, USA, 2018; pp. 1734–1739. [Google Scholar] [CrossRef]
- Team, L.; Carsten, S. LimeSurvey: An open source survey tool. LimeSurvey Project 2012. Available online: https://www.limesurvey.org/en/ (accessed on 25 November 2020).
- McKinney, W. Pandas: A Foundational Python library for Data Analysis and Statistics. Python High Perform. Sci. Comput. 2011, 14. Available online: https://www.dlr.de/sc/portaldata/15/resources/dokumente/pyhpc2011/submissions/pyhpc2011_submission_9.pdf (accessed on 25 November 2020).
- Hutto, C.; Vader, G.E. A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the Eighth International AAAI Conference on Weblogs and Social, Ann Arbor, MI, USA, 1–4 June 2014; pp. 1–4. [Google Scholar]
- Dickey, D.A.; Fuller, W.A. Distribution of the estimators for autoregressive time series with a unit root. J. Am. Stat. Assoc. 1979, 74, 427–431. [Google Scholar]
- Granger, C.W. Testing for causality: A personal viewpoint. J. Econ. Dyn. Control 1980, 2, 329–352. [Google Scholar] [CrossRef]
- Pennebaker, J.W.; Boyd, R.L.; Jordan, K.; Blackburn, K. The Development and Psychometric Properties of LIWC2015. Technical Report. 2015. Available online: https://repositories.lib.utexas.edu/handle/2152/31333 (accessed on 25 November 2020).
- Bradley, M.M.; Lang, P.J. Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings; Technical Report for C-1; The Center for Research in Psychophysiology, University of Florida: Gainesville, FL, USA, 1999. [Google Scholar]
- Basile, V.; Nissim, M. Sentiment analysis on Italian tweets. In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Atlanta, Georgia, 14 June 2013; Association for Computational Linguistics: Atlanta, Georgia, 2013; pp. 100–107. [Google Scholar]
- Pianta, E.; Bentivogli, L.; Girardi, C. MultiWordNet: Developing an Aligned Multilingual Database. In Proceedings of the First International Conference on Global WordNet, Mysore, India, 21–25 January 2002; pp. 293–302. [Google Scholar]
Patients | n. | Percentage |
---|---|---|
Enrolled | 78 | |
Active | 46 | 58.97 |
Drop-out | 32 | 41.03 |
Drop-Out Patients | n. | % of Total | % of Dropout |
---|---|---|---|
poor adherence | 16 | 20.51 | 50 |
died | 6 | 7.69 | 18.75 |
other | 10 | 12.82 | 31.25 |
Epoch_1 | Epoch_2 | Epoch_3 | Epoch_4 | Epoch_5 | |
---|---|---|---|---|---|
mean | 42.67 | 41.77 | 38.67 | 47.39 | 51.45 |
std | 57.25 | 57.70 | 59.64 | 59.54 | 51.82 |
min | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
max | 384.62 | 365.76 | 293.55 | 260.96 | 229.50 |
Q1⟶ Adh-Score | Adh-Score⟶ Q1 | |||||
---|---|---|---|---|---|---|
F-test | Chi2 test | Likelihood-ratio | F-test | Chi2 test | Likelihood-ratio | |
1 | 0.0368 | 0.0339 | 0.0350 | 0.4076 | 0.4027 | 0.4031 |
2 | 0.0998 | 0.0908 | 0.0936 | 0.6667 | 0.6585 | 0.6592 |
3 | 0.1063 | 0.0917 | 0.0963 | 0.6628 | 0.6479 | 0.6496 |
4 | 0.2032 | 0.1760 | 0.1833 | 0.8200 | 0.8061 | 0.8064 |
Q2⟶ Adh-Score | Adh-Score⟶ Q2 | |||||
---|---|---|---|---|---|---|
F-test | Chi2 test | Likelihood-ratio | F-test | Chi2 test | Likelihood-ratio | |
1 | 0.0020 | 0.0016 | 0.0028 | 0.9970 | 0.9969 | 0.9969 |
2 | 0.0174 | 0.0141 | 0.0155 | 0.7455 | 0.7390 | 0.7394 |
3 | 0.0547 | 0.0446 | 0.0482 | 0.8235 | 0.8148 | 0.9154 |
4 | 0.0865 | 0.0685 | 0.0744 | 0.7566 | 0.7387 | 0.7406 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zucco, C.; Paglia, C.; Graziano, S.; Bella, S.; Cannataro, M. Sentiment Analysis and Text Mining of Questionnaires to Support Telemonitoring Programs. Information 2020, 11, 550. https://doi.org/10.3390/info11120550
Zucco C, Paglia C, Graziano S, Bella S, Cannataro M. Sentiment Analysis and Text Mining of Questionnaires to Support Telemonitoring Programs. Information. 2020; 11(12):550. https://doi.org/10.3390/info11120550
Chicago/Turabian StyleZucco, Chiara, Clarissa Paglia, Sonia Graziano, Sergio Bella, and Mario Cannataro. 2020. "Sentiment Analysis and Text Mining of Questionnaires to Support Telemonitoring Programs" Information 11, no. 12: 550. https://doi.org/10.3390/info11120550
APA StyleZucco, C., Paglia, C., Graziano, S., Bella, S., & Cannataro, M. (2020). Sentiment Analysis and Text Mining of Questionnaires to Support Telemonitoring Programs. Information, 11(12), 550. https://doi.org/10.3390/info11120550