Investigating Machine Learning & Natural Language Processing Techniques Applied for Predicting Depression Disorder from Online Support Forums: A Systematic Literature Review
Abstract
:1. Introduction
- To review the existing problems and solutions in Depression Identification in OSF using computational techniques and comparison between them.
- To assess the strengths and limitations of existing techniques and discussion of emerging computational techniques for online social media text analytics.
- To discuss about the main open issues faced by the research community when dealing with health-related textual information and the computational approaches available.
2. Methods
2.1. Search Strategy
2.2. Study Selection Criteria
2.3. Data Extraction
3. Results
3.1. How Are Online Mental Support Forums Getting Involved with Depression Disorder?
3.2. How Have Depression-Related Textual Clues Been Extracted from Forum Data?
3.3. What Kind of Machine Learning Approaches Have Been Considered in Order to Classify Depression Disorder from Forum Data?
4. Discussion
4.1. Open Issues and Limitations
- Post validation: most of the time, without having much knowledge on depression disorder, forum users are often confronted with misleading information or misdiagnose themselves. Therefore, to easily identify relevant symptoms from online forum data available as textual representations, we need a good quality analyser able to identify variations in behavioural and emotional patterns.
- Semantics-based understanding of the post: since forum posts are lengthy in nature, some consist of contradictory semantics, which misleads the extraction of the overall meaning of the post, making it hard for machines to understand depression disorder accurately.
- Severity assessments: since depression disorder has different severity levels, treatments vary accordingly requiring the need to identify unique textual characteristics for each stage. Only after identifying the degree of severity can a forum moderator or clinician provide reliable recommendations.
- Anomalous vs. normal user identification: due to social stigma, most depressed users of OSF try to be anonymous. Being able to access linked data to their other social media profiles would enable the more accurate diagnosis of their depression.
- Technological barrier: This review focuses only on the questions asked in OSF which requires by definition that only those who have access to the internet and satisfy the technological requirements are covered. There still are many people who suffer from depression who do not have any access to such technology platforms.
- Cultural and language barrier: In different cultures, different languages are used by people to communicate with each other. As a result, people try to express their emotions, symptoms of disease and sensitive information through unique terms and expressions that belong to their own language and cultures.
4.2. Implication of Future Research
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Dosani, S.; Harding, C.; Wilson, S. Online Groups and Patient Forums. Curr. Psychiatry Rep. 2014, 16, 1–6. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lloyd-Williams, M. Difficulties in diagnosing and treating depression in the terminally ill cancer patient. Postgrad. Med. J. 2000, 76, 555–558. [Google Scholar] [CrossRef]
- American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders (DSM-5); American Psychiatric Association: Arlington, VA, USA, 2013; Volume 5. [Google Scholar]
- Beck, A.T.; Ward, C.H.; Mendelson, M.; Mock, J.; Erbaugh, J. An Inventory for Measuring Depression. Arch. Gen. Psychiatry 1961, 4, 561–571. [Google Scholar] [CrossRef] [Green Version]
- DSM-5 Criteria: Major Depressive Disorder Treatment of Major Depressive Disorder. 2018. Available online: Medicaidmentalhealth.org (accessed on 3 January 2021).
- ICD-10 Version: 2016. Available online: https://icd.who.int/browse10/2016/en#/F32.8 (accessed on 3 January 2021).
- Lovibond, S.H.; Lovibond, P.F. Manual for the Depression Anxiety Stress Scales, 2nd ed.; Psychology Foundation: Sydney, Australia, 1995. [Google Scholar]
- Andrews, G.; Issakidis, C.; Carter, G. Shortfall in mental health service utilisation. Br. J. Psychiatry 2001, 179, 417–425. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Islam, M.R.; Kabir, M.A.; Ahmed, A.; Kamal, A.R.M.; Wang, H.; Ulhaq, A. Depression detection from social network data using machine learning techniques. Health Inf. Sci. Syst. 2018, 6, 1–12. [Google Scholar] [CrossRef]
- Shrestha, A.; Spezzano, F. Detecting depressed users in online forums. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2019, Vancouver, BC, Canada, 27–30 August 2019; pp. 945–951. [Google Scholar] [CrossRef]
- Sagar-Ouriaghli, I.; Godfrey, E.; Bridge, L.; Meade, L.; Brown, J.S.L. Improving Mental Health Service Utilization Among Men: A Systematic Review and Synthesis of Behavior Change Techniques Within Interventions Targeting Help-Seeking. Am. J. Mens. Health 2019, 13. [Google Scholar] [CrossRef] [PubMed]
- Shrestha, A.; Serra, E.; Spezzano, F. Multi-modal social and psycho-linguistic embedding via recurrent neural networks to identify depressed users in online forums. Netw. Model. Anal. Health Inform. Bioinforma. 2020, 9, 1–11. [Google Scholar] [CrossRef]
- Bujnowska-Fedak, M.M.; Waligóra, J.; Mastalerz-Migas, A. The internet as a source of health information and services. In Advances in Experimental Medicine and Biology; Springer: Cham, Switzerland, 2019; Volume 1211, pp. 1–16. [Google Scholar]
- Prescott, J.; Hanley, T.; Ujhelyi, K. Peer Communication in Online Mental Health Forums for Young People: Directional and Nondirectional Support. JMIR Ment. Health 2017, 4, e29. [Google Scholar] [CrossRef]
- Vydiswaran, V.G.V.; Reddy, M. Identifying peer experts in online health forums. BMC Med. Inform. Decis. Mak. 2019, 19, 41–49. [Google Scholar] [CrossRef] [Green Version]
- Lyons, M.; Aksayli, N.D.; Brewer, G. Mental distress and language use: Linguistic analysis of discussion forum posts. Comput. Hum. Behav. 2018, 87, 207–211. [Google Scholar] [CrossRef]
- Bobicev, V.; Sokolova, M.; Oakes, M. What Goes Around Comes Around: Learning Sentiments in Online Medical Forums. Cognit. Comput. 2015, 7, 609–621. [Google Scholar] [CrossRef] [Green Version]
- Guntuku, S.C.; Preotiuc-Pietro, D.; Eichstaedt, J.C.; Ungar, L.H. What Twitter Profile and Posted Images Reveal About Depression and Anxiety. In Proceedings of the International AAAI Conference on Web and Social Media 2019, Munich, Germany, 11–14 June 2019. [Google Scholar]
- Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med. 2009, 6, e1000097. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yates, A.; Cohan, A.; Goharian, N. Depression and self-harm risk assessment in online forums. In Proceedings of the EMNLP 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Copenhagen, Denmark, 9–11 September 2017; pp. 2968–2978. [Google Scholar] [CrossRef] [Green Version]
- Ríssola, E.A.; Losada, D.E.; Crestani, F. Discovering latent depression patterns in online social media. CEUR Workshop Proc. 2019, 2441, 13–16. [Google Scholar]
- Coppersmith, G.; Dredze, M.; Harman, C. Quantifying Mental Health Signals in Twitter. In Proceedings of the Workshop on computational linguistics and clinical psychology: From linguistic signal to clinical reality, Baltimore, MA, USA, 27 June 2015; pp. 51–60. [Google Scholar] [CrossRef] [Green Version]
- Henderson, C.; Evans-Lacko, S.; Thornicroft, G. Mental illness stigma, help seeking, and public health programs. Am. J. Public Health 2013, 103, 777–780. [Google Scholar] [CrossRef] [PubMed]
- Lally, J.; Conghaile, A.O.; Quigley, S.; Bainbridge, E.; McDonald, C. Stigma of mental illness and help-seeking intention in university students. Psychiatrist 2013, 37, 253–260. [Google Scholar] [CrossRef] [Green Version]
- De Choudhury, M.; Counts, S.; Horvitz, E.J.; Hoff, A. Characterizing and predicting postpartum depression from shared facebook data. In Proceedings of the ACM Conference on Computer supported cooperative work & social computing, Baltimore, MA, USA, 15–19 February 2014; pp. 625–637. [Google Scholar] [CrossRef] [Green Version]
- Guntuku, S.C.; Yaden, D.B.; Kern, M.L.; Ungar, L.H.; Eichstaedt, J.C. Detecting depression and mental illness on social media: An integrative review. Curr. Opin. Behav. Sci. 2017, 18, 43–49. [Google Scholar] [CrossRef]
- Szlyk, H.; Deng, J.; Xu, C.; Krauss, M.J.; Cavazos-Rehg, P.A. Leveraging social media to explore the barriers to treatment among individuals with depressive symptoms. Depress. Anxiety 2020, 37, 458–465. [Google Scholar] [CrossRef]
- Jeri-Yabar, A.; Sanchez-Carbonel, A.; Tito, K.; Ramirez-delCastillo, J.; Torres-Alcantara, A.; Denegri, D.; Carreazo, Y. Association between social media use (Twitter, Instagram, Facebook) and depressive symptoms: Are Twitter users at higher risk? Int. J. Soc. Psychiatry 2019, 65, 14–19. [Google Scholar] [CrossRef]
- Moreno, M.A.; Jelenchick, L.A.; Egan, K.G.; Cox, E.; Young, H.; Gannon, K.E.; Becker, T. Feeling bad on facebook: Depression disclosures by college students on a social networking site. Depress. Anxiety 2011, 28, 447–455. [Google Scholar] [CrossRef] [Green Version]
- Eichstaedt, J.C.; Smith, R.J.; Merchant, R.M.; Ungar, L.H.; Crutchley, P.; Preoţiuc-Pietro, D.; Asch, D.A.; Schwartz, H.A. Facebook language predicts depression in medical records. Proc. Natl. Acad. Sci. USA 2018, 115, 11203–11208. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pedersen, T. Screening twitter users for depression and PTSD with lexical depression lists. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, Denver, CO, USA, 5 June 2015; pp. 46–53. [Google Scholar]
- Sasso, M.P.; Giovanetti, A.K.; Schied, A.L.; Burke, H.H.; Haeffel, G.J. #Sad: Twitter Content Predicts Changes in Cognitive Vulnerability and Depressive Symptoms. Cognit. Ther. Res. 2019, 43, 657–665. [Google Scholar] [CrossRef]
- Tsugawa, S.; Kikuchi, Y.; Kishino, F.; Nakajima, K.; Itoh, Y.; Ohsaki, H. Recognizing depression from twitter activity. In Proceedings of the Conference on Human Factors in Computing Systems, Seoul, Korea, 18–23 April 2015; pp. 3187–3196. [Google Scholar] [CrossRef]
- Coppersmith, G.; Dredze, M.; Harman, C.; Hollingshead, K.; Mitchell, M. CLPsych 2015 Shared Task: Depression and PTSD on Twitter. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, Denver, CO, USA, 5 June 2015; pp. 31–39. [Google Scholar] [CrossRef] [Green Version]
- Madani, A.; Boumahdi, F.; Boukenaoui, A. USDB at eRisk 2020: Deep learning models to measure the Severity of the Signs of Depression using Reddit Posts. In Proceedings of the Working Notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020; pp. 22–25. [Google Scholar]
- Zirikly, A.; Resnik, P.; Uzuner, O.; Hollingshead, K. CLPsych 2019 Shared Task: Predicting the Degree of Suicide Risk in Reddit Posts. In Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology, Minneapolis, MN, USA, 6 June 2019; 24–33. [Google Scholar]
- Pirina, I.; Çöltekin, Ç. Identifying Depression on Reddit: The Effect of Training Data. In Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task, Brussels, Belgium, 31 October 2018; pp. 9–12. [Google Scholar] [CrossRef]
- Tadesse, M.M.; Lin, H.; Xu, B.; Yang, L. Detection of depression-related posts in reddit social media forum. IEEE Access 2019, 7, 44883–44893. [Google Scholar] [CrossRef]
- Arumae, K.; Qi, G.J.; Liu, F. A study of question effectiveness using Reddit “Ask me Anything” threads. In Proceedings of the FLAIRS 2017, 30th International Florida Artificial Intelligence Research Society Conference, Marco Island, FL, USA, 22–24 May 2017; pp. 26–31. [Google Scholar]
- Bandaragoda, T.R.; De Silva, D.; Alahakoon, D.; Ranasinghe, W.; Bolton, D. Text Mining for Personalized Knowledge Extraction From Online Support Groups. J. Assoc. Inf. Sci. Technol. 2018, 69, 1446–1459. [Google Scholar] [CrossRef]
- Farnood, A.; Johnston, B.; Mair, F.S. An analysis of the diagnostic accuracy and peer-to-peer health information provided on online health forums for heart failure. J. Adv. Nurs. 2021, 1–14. [Google Scholar] [CrossRef]
- Sudau, F.; Friede, T.; Grabowski, J.; Koschack, J.; Makedonski, P.; Himmel, W. Sources of information and behavioral patterns in online health forums: Observational study. J. Med. Internet Res. 2014, 16, e2875. [Google Scholar] [CrossRef] [PubMed]
- Cohan, A.; Young, S.; Goharian, N. Triaging Mental Health Forum Posts. In Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, San Diego, CA, USA, 16 June 2016; pp. 143–147. [Google Scholar] [CrossRef]
- Yoo, M.; Lee, S.; Ha, T. Semantic network analysis for understanding user experiences of bipolar and depressive disorders on Reddit. Inf. Process. Manag. 2019, 56, 1565–1575. [Google Scholar] [CrossRef]
- Wolohan, J.T.; Hiraga, M.; Mukherjee, A.; Sayyed, Z.A. Detecting Linguistic Traces of Depression in Topic-Restricted Text: Attending to Self-Stigmatized Depression with NLP. In Proceedings of the First International Workshop on Language Cognition and Computational Model, Santa Fe, NM, USA, 20 August 2018; pp. 11–21. [Google Scholar]
- Tadesse, M.M.; Lin, H.; Xu, B.; Yang, L. Detection of Suicide Ideation in Social Media Forums Using Deep Learning. Algorithms 2020, 13, 7. [Google Scholar] [CrossRef] [Green Version]
- Cacheda, F.; Fernandez, D.; Novoa, F.J.; Carneiro, V. Early detection of depression: Social network analysis and random forest techniques. J. Med. Internet Res. 2019, 21, e12554. [Google Scholar] [CrossRef]
- Gaur, M.; Sheth, A.; Kursuncu, U.; Daniulaityte, R.; Pathak, J.; Alambo, A.; Thirunarayan, K. “Let me tell you about your mental health!” Contextualized classification of reddit posts to DSM-5 for web-based intervention. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018; pp. 753–762. [Google Scholar] [CrossRef]
- Gaikar, M.; Chavan, J.; Indore, K.; Shedge, R. Depression Detection and Prevention System by Analysing Tweets. SSRN Electron. J. 2019, 12, 1–6. [Google Scholar] [CrossRef]
- Burdisso, S.G.; Errecalde, M.; Montes-Y-gómez, M. Using text classification to estimate the depression level of reddit users. J. Comput. Sci. Technol. 2021, 21, 1–10. [Google Scholar] [CrossRef]
- Havigerová, J.M.; Haviger, J.; Kučera, D.; Hoffmannová, P. Text-based detection of the risk of depression. Front. Psychol. 2019, 10, 1–11. [Google Scholar] [CrossRef] [PubMed]
- Löwe, B.; Kroenke, K.; Herzog, W.; Gräfe, K. Measuring depression outcome with a brief self-report instrument: Sensitivity to change of the Patient Health Questionnaire (PHQ-9). J. Affect. Disord. 2004, 81, 61–66. [Google Scholar] [CrossRef]
- Alghamdi, N.S.; Hosni Mahmoud, H.A.; Abraham, A.; Alanazi, A.; García-Hernández, L. Predicting Depression Symptoms in an Arabic Psychological Forum. IEEE Access 2020, 8, 57317–57334. [Google Scholar] [CrossRef]
- Choudhury, M.D.; Scott, G.M.; Counts, S. Predicting Depression via Social Media. Compr. Child Adolesc. Nurs. 2013, 36, 168–169. [Google Scholar] [CrossRef]
- Nguyen, T.; Phung, D.; Dao, B.; Venkatesh, S.; Berk, M. Affective and content analysis of online depression communities. IEEE Trans. Affect. Comput. 2014, 5, 217–226. [Google Scholar] [CrossRef]
- Farías-Anzalduá, A.A.; Montes-Y-Gómez, M.; Pastor López-Monroy, A.; González-Gurrola, L.C. UACH-INAOE participation at eRisk2017. In Proceedings of the Conference and Labs of the Evaluation Forum CLEF, Thessaloniki, Greece, 22–25 September 2017; Volume 1866. [Google Scholar]
- Meng, C.; Detecting Depression on Reddit with Texutual Data. School of Information and Library Science of the University of North Carolina, Chapel Hill, North Carolina. 2020. Available online: https://cdr.lib.unc.edu/concern/masters_papers/jd473337j (accessed on 14 October 2021).
- Stankevich, M.; Latyshev, A.; Kuminskaya, E.; Smirnov, I.; Grigoriev, O. Depression detection from social media texts. CEUR Workshop Proc. 2019, 2523, 279–289. [Google Scholar]
- Trotzek, M.; Koitka, S.; Friedrich, C.M. Utilizing Neural Networks and Linguistic Metadata for Early Detection of Depression Indications in Text Sequences. IEEE Trans. Knowl. Data Eng. 2020, 32, 588–601. [Google Scholar] [CrossRef] [Green Version]
- Rao, G.; Zhang, Y.; Zhang, L.; Cong, Q.; Feng, Z. MGL-CNN: A Hierarchical Posts Representations Model for Identifying Depressed Individuals in Online Forums. IEEE Access 2020, 8, 32395–32403. [Google Scholar] [CrossRef]
- Burdisso, S.G.; Errecalde, M.; Montes-y-Gómez, M. A text classification framework for simple and effective early depression detection over social media streams. Expert Syst. Appl. 2019, 133, 182–197. [Google Scholar] [CrossRef] [Green Version]
- Paul, S.; Kalyani, J.S.; Basu, T. Early detection of signs of anorexia and depression over social media using effective machine learning frameworks. In Proceedings of the Working Notes of CLEF 2018—Conference and Labs of the Evaluation Forum, Avignon, France, 10–14 September 2018; Volume 2125. [Google Scholar]
- Shah, F.M.; Ahmed, F.; Saha Joy, S.K.; Ahmed, S.; Sadek, S.; Shil, R.; Kabir, M.H. Early Depression Detection from Social Network Using Deep Learning Techniques. In Proceedings of the IEEE Region 10 Sympsium TENSYMP 2020, Dhaka, Bangladesh, 5–7 June 2020; pp. 823–826. [Google Scholar] [CrossRef]
- Maupomé, D.; Armstrong, M.D.; Rancourt, F.; Soulas, T.; Meurs, M.-J. Early Detection of Signs of Pathological Gambling, Self-Harm and Depression through Topic Extraction and Neural Networks. In Proceedings of the Working Notes of CLEF 2021—Conference and Labs of the Evaluation Forum, Bucharest, Romania, 21–24 September 2021. [Google Scholar]
- Mali, A.; Sedamkar, R.R. Prediction of depression using Machine Learning and NLP approach. In Proceedings of the e-Conference on Data Science and Intelligent Computing 2020, Mumbai, India, 27–28 November 2020; pp. 46–50. [Google Scholar]
- Ireland, M.E.; Schler, J.; Gecht, G. Niederhoffer, Profiling Depression in Neutral Reddit Posts. GOOD Workshop KDD’20 2020, 2020, 1–6. [Google Scholar]
- Zainab, R. Detecting and Explaining Depression in Social Media Text with Machine Learning. GOOD Workshop KDD’20 2020, 2020, 1–4. [Google Scholar]
- Trifan, A.; Antunes, R.; Matos, S.; Oliveira, J.L. Understanding depression from psycholinguistic patterns in social media texts. In Advances in Information Retrieval; Springer: Cham, Switzerland, 2020; pp. 402–409. [Google Scholar] [CrossRef]
- Fatima, I.; Abbasi, B.U.D.; Khan, S.; Al-Saeed, M.; Ahmad, H.F.; Mumtaz, R. Prediction of postpartum depression using machine learning techniques from social media text. Expert Syst. 2019, 36, 1–13. [Google Scholar] [CrossRef]
- Chiong, R.; Budhi, G.S.; Dhakal, S.; Chiong, F. A textual-based featuring approach for depression detection using machine learning classifiers and social media texts. Comput. Biol. Med. 2021, 135, 104499. [Google Scholar] [CrossRef] [PubMed]
- Zhang, T.; Cho, J.H.D.; Zhai, C. Understanding User Intents in Online Health Forums. IEEE J. Biomed. Health Inform. 2015, 19, 1392–1398. [Google Scholar] [CrossRef] [PubMed]
- Carrillo-de-Albornoz, J.; Aker, A.; Kurtic, E.; Plaza, L. Beyond opinion classification: Extracting facts, opinions and experiences from health forums. PLoS ONE 2019, 14, e0209961. [Google Scholar] [CrossRef]
- Mowery, D.; Bryan, C.; Conway, M. Towards Developing an Annotation Scheme for Depressive Disorder Symptoms: A Preliminary Study using Twitter Data. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality; Association for Computational Linguistics, Denver, CO, USA, 5 June 2015; pp. 89–98. [Google Scholar]
- Moran, M. APA Advocacy Wins Coverage of DSM Codes in 12 States, D.C. Psychiatr. News 2016, 51, 1. [Google Scholar] [CrossRef]
- Wang, X.; Zhang, C.; Ji, Y.; Sun, L.; Wu, L.; Bao, Z. A depression detection model based on sentiment analysis in micro-blog social network. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, Proceedings of the PAKDD 2013 International Workshops: DMApps, DANTH, QIMIE, BDM, CDA, CloudSD, Gold Coast, QLD, Australia, 14–17 April 2013; Springer: Berlin/Heidelberg, Germany, 2013; Volume 7867, pp. 201–213. [Google Scholar] [CrossRef]
- Dhawan, V.; Zanini, N. Big Data and Social Media Analytics. Res. Matters A Camb. Assess. Publ. 2014, 18, 36–41. [Google Scholar]
- Nanomi Arachchige, I.A.; Weerasinghe, R.; Jayasuriya, V.H. A Dataset for Research on Modelling Depression Severity in Online Forum Data. In Proceedings of the Student Research Workshop associated with RANLP-2021, varna, Balgeria, 1–3 September 2021; pp. 144–153. [Google Scholar] [CrossRef]
Dataset | Authors and Year | Post/User Counts | Types of Features | NLP Methods |
---|---|---|---|---|
CLPsych 2016 | Arman Cohan, Sydney Young and Nazli Goharian (2016) [43] | 1188 posts | Linguistic, Contextual Features, Textual Statistics | LIWC, BOW, N-gram, LDA |
CLPsych 2017 | Anu Shrestha, Francesca Spezzano (2019) [12] | 147,619 posts | Linguistic Features, Punctuation features, Summary features, Network Features, Reciprocity, Clustering Coefficient, post embedding | TF-IDF, LIWC, WD |
CLEF eRisk 2017 | Esteban A. Ríssola, David E. Losada, Fabio Crestani (2019) [21] | 531,453 forum posts | Semantic Proximity | LSA |
Alan A. Faras-Anzaldua, Manuel Montes-Gomez, A. Pastor Lopez-Monroy, Luis C. Gonz_alez-Gurrola (2017) [56] | Post level features, User Level features | N-grams, BOW | ||
Chenlu Meng (2020) [57] | Linguistic features | BOW, TF-IDF, WD, LIWC | ||
Maxim Stankevich, Andrey Latyshev, Evgenia Kuminskaya, Ivan Smirnov, and Oleg Grigoriev (2018) [58] | Linguistic features, Stylometric features (text length, lexicon size), Morphological features | TF-IDF, GloVe, N gram, POS tags | ||
Marcel Trotzek, Sven Koitka, and Christoph M. Friedrich (2018) [59] | Word and Grammar Usage, Readability, Emotions and Sentiment, Metadata Feature Summary | LIWC, GloVe, fastText | ||
Guozheng Rao,Yue Zhang, Li Zhang, Qing Cong And Zhiyong Feng (2020) [60] | Post-level operation, User-level operation | BOW, TFIDF, LIWC, WD | ||
Sergio G. Burdissoa,b, Marcelo Errecaldea, Manuel Montes-y-G’omezc (2019) [61] | Linguistic features | WD | ||
CLEF eRisk 2018 | Sayanta Paul, Jandhyala Sree Kalyani? and Tanmay Basu (2018) [62] | 1,076,582 posts | Linguistic features | BOW, UML |
CLEF eRisk 2019 | Sergio G. Burdissoa,b, Marcelo Errecaldea, Manuel Montes-y-G´omezc (2019) [61] | 531,453 posts | Word polarity, Mutual information, Semantic Similarity | LIWC, GloVe |
Faisal Muhammad Shah, Farzad Ahmed, Sajib Kumar Saha Joy, Sifat Ahmed, Samir Sadek, Md. Hasanul Kabir (2020) [63] | word embedding techniques, metadata features | GloVe, fastText, Word2Vec | ||
CLEF eRisk 2020 | Amina Madani, Fatima Boumahdi, Anfel Boukenaoui, Mohamed Chaouki Kritli, And Hamza Hentabli (2020) [35] | 35,562 posts | Linguistic features | WD |
CLEF eRisk 2021 | Diego Maupomé, Maxime D. Armstrong, Fanny Rancourt, Thomas Soulas and Marie-Jean Meurs (2021) [64] | 2348 users | Topic Modelling, Authorship decision | WD |
Reddit dataset | Minjoo Yoo, Sangwon Lee, Taehyun Ha (2018) [44] | 5409 posts | Word class, emotions, Speech features | LIWC, TF-IDF |
JT Wolohan, Misato Hiraga, Atreyee Mukherjee, Zeeshan Ali Sayyed (2018) [45] | 23,583 posts | Linguistic features, Sentiment features | LIWC, N gram, TF-IDF | |
Michael M. Tadesse, Hongfei Lin, Bo Xu, And Liang Yang (2019) [38] | 10,000 posts | Linguistic features | LIWC, LDA, N gram | |
Fidel Cacheda, Diego Fernandez, Francisco J Novoa, Victor Carneiro (2019) [47] | 500,000 posts | Semantic Similarity Features, Writing Features, Textual Similarity Features, Subject Behaviour | LSA, LIWC | |
Inna Pirina, Çağrı Çöltekin (2018) [37] | 10,000 posts | Linguistic features | N-gram | |
Manas Gaur, Ugur Kursuncu Amanuel Alambo Jyotishman Pathak (2018) [48] | 8043 users | Statistical Characteristics, Topical Analysis, Entropy Analysis | WD, TF-IDF | |
Amrat Mali, RR Sedamkar (2020) [65] | 13,321 posts | Psycholinguistic features | TF-IDF, BOW, LIWC | |
Molly E. Ireland, Jonathan Schler & Gilad Gecht, Kate G. Niederhoffer (2020) [66] | 303,649 posts | Linguistic features | BOW, N-gram, LIWC, POS tags | |
Rida Zainab, Rajarathnam Chandramouli (2020) [67] | 20,000 posts | Linguistic features | BOW, TFIDF, TTR, BI, HS | |
Alina Trifan, Rui Antunes, S´ergio Matos, and Jose Lu´ıs Oliveira (2020) [68] | 9210 users | Absolutist Words, Analysis of Lexical Categories, Self-related Speech, Posts Length | TF-IDF, WD | |
Iram Fatima Burhan Ud Din Abbasi Sharifullah Khan Majed Al-Saeed Hafiz Farooq Ahmad Rafia Mumtaz (2019) [69] | 1588 posts | Linguistic features | LIWC | |
Raymond Chionga, Gregorius Satia Budhi, Sandeep Dhakal and Fabian Chiong (2021) [70] | 50,000 posts | Linguistic features | BOW, POS tags | |
MedHelp dataset | Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai (2014) [71] | 1200 posts | Word Features, Pattern Features, | TF-IDF, POS tags |
eDisease dataset | Jorge Carrillo-de-Albornoz, Ahmet Aker, Emina Kurtic, Laura PlazaID (2019) [72] | 1029 posts | Lexical, Syntactic, Network-based, Sentiment-based, Semantic feature | TF-IDF, BOW, Word2Vec |
Online forum dataset (www.mentalhealthforum.net, www.psychforums.com, www.beyondblue.org) (accessed on 7 October 2021) | Minna Lyons, Nazli Deniz Aksayli, & Gayle Brewer [16] | 463 Posts | Linguistic analysis | LIWC |
Author & Year | Dataset (Type) | Algorithms Used | Findings |
---|---|---|---|
Arman Cohan, Sydney Young and Nazli Goharian (2016) [43] | CLPsych 2016 | SVM, RF, AB, LR | Model for identifying depression and self-harm posts on ReachOut forum based on Lexical, contextual, Topic and statistical features. |
Esteban A. Ríssola, David E. Losada, Fabio Crestani (2019) [21] | CLEF eRisk 2017 | SVM | Model to measure the semantic proximity between textual posts and a set of words with topical relevance to depression. |
Minjoo Yoo, Sangwon Lee, Taehyun Ha (2018) [44] | Reddit data | Semantic network analysis | Word usage of posts and comments in online communities for bipolar and depressive disorder to understand how people perceived these mental disorders and shared their experiences. |
JT Wolohan, Misato Hiraga, Atreyee Mukherjee, Zeeshan Ali Sayyed (2018) [45] | SVM | Linguistic patterns of depressed Reddit users are consistent with popular depression batteries. | |
Michael M. Tadesse, Hongfei Lin, Bo Xu, And Liang Yang (2019) [20] | SVM, MLP | Model to detect any factors that may reveal the depression attitudes of relevant online users in Reddit users’ posts. | |
Fidel Cacheda, Diego Fernandez, Francisco J Novoa, Victor Carneiro (2019) [47] | RF | Model to detect depressed subjects and nondepressed subjects based on features defined from textual, semantic, and writing similarities. | |
Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai (2014) [71] | MedHelp | SVM | Machine learning approach to identifying user intents from original thread posts from online health forums related to depression. |
Anu Shrestha, Francesca Spezzano (2019) [12] | CLPsych 2017 | KNN, LR | Model to detect depressed users in online forums using network features and linguistic features. |
Jorge Carrillo-de-Albornoz, Ahmet Aker, Emina Kurtic, Laura PlazaID (2019) [72] | eDiseases dataset | SVM | Automatic classifier to predict patient-generated contents based on lexical, syntactic, semantic, network-based and emotional properties of texts related to depression. |
Inna Pirina, Çağrı Çöltekin (2018) [37] | SVM | Identify sources for successful detection of depression from social media text (by employing different datasets) | |
Sayanta Paul, Jandhyala Sree Kalyani and Tanmay Basu (2018) [62] | CLEF eRisk 2018 | AB, LR, RF, SVM, RNN | Frameworks to early identify anorexia or depression of the individual documents using state-of-the-art classifiers. |
Manas Gaur, Ugur Kursuncu Amanuel Alambo Jyotishman Pathak (2018) [48] | RF | Algorithm on analysing subreddits to utilise the curated medical knowledge bases to quantify relationship to DSM-V categories. | |
Alan A. Faras-Anzaldua, Manuel Montes-Gomez, A. Pastor Lopez-Monroy, Luis C. Gonz_alez-Gurrola (2017) [56] | eRisk2017 | NB | Approach to identify depression through online publications, which is based on individual posts to characterise users’ behaviour and analyse users from a higher point of view. |
Diego Maupomé, Maxime D. Armstrong, Fanny Rancourt, Thomas Soulas and Marie-Jean Meurs (2021) [64] | CLEF eRisk 2021 | NN | Task 3 was designed to measure the Severity of the Signs of Depression through BDI scores based on similarity measurement. |
Amrat Mali, RR Sedamkar (2020) [65] | NB, KNN | Building a topic model to identify hidden topics that act as a depression triggering points. | |
Minna Lyons, Nazli Deniz Aksayli, & Gayle Brewer [16] | Online forums | Statistical Analysis | Model to identify distinctive linguistic patterns displayed by those experiencing mental health and difficulties interactive online communication |
Molly E. Ireland, Jonathan Schler & Gilad Gecht, Kate G. Niederhoffer (2020) [66] | CNN | Implemented an array of machine learning and regression models to classify posts from neutral Reddit forums as written by depressed users (self-identified) or random controls. | |
Amina Madani, Fatima Boumahdi, Anfel Boukenaoui, Mohamed Chaouki Kritli, And Hamza Hentabli (2020) [35] | eRisk 2020 | CNN, Bi-LSTM | Model to measure the severity of the signs of depression from a thread of user post. |
Sergio G. Burdisso, Marcelo Errecalde, and Manuel Montes-y-G´omez (2021) [50] | CLEF eRisk 2019 | SVM, GPT | Machine learning model to demonstrate that language usage can provide strong evidence in detecting depressive people, connecting with the severity level of depression. |
Chenlu Meng (2020) [57] | CLEF eRisk 2017 | LR, SVM | Supervised classifier to act as a preliminary screening method before depression diagnosis. |
Maxim Stankevich, Andrey Latyshev, Evgenia Kuminskaya, Ivan Smirnov, and Oleg Grigoriev (2018) [58] | CLEF eRisk 2017 | SVM, CNN | Automatic detection of depression signs from textual messages of Russian social network VKontakte user |
Marcel Trotzek, Sven Koitka, Christoph M. Friedrich(2018) [59] | CLEF eRisk 2017 | LR | Model on early detection of depression using machine learning and based on messages on a social platform. |
Raymond Chionga, Gregorius Satia Budhi, Sandeep Dhakal and Fabian Chiong (2021) [70] | Reddit, Twitter, Victoria’s diary | AB, RF, GB, BP | Generalised approaches using ML methods and social media texts can be effectively used to detect signs of depression. |
Rida Zainab, Rajarathnam Chandramouli (2020) [67] | LR, RF | Machine learning and Explainable AI analysis of depression and non-depression reddit text data in English and Urdu language. | |
Faisal Muhammad Shah, Farzad Ahmed, Sajib Kumar Saha Joy, Sifat Ahmed, Samir Sadek, Md. Hasanul Kabir (2020) [63] | CLEF eRisk 2019 | Bi-LSTM | Model to classify depressed users but also to reduce the amount of time to predict the state of the users |
Guozheng Rao,Yue Zhang, Li Zhang, Qing Cong And Zhiyong Feng (2020) [60] | Reddit and CLEF eRisk 2017 | MGL CNN, SVM, MNB, LSTM | Posts representations models for identifying depressed individuals, which was more accurate and efficient than general early depression detection models. |
Alina Trifan, Rui Antunes, S´ergio Matos, and Jose Lu´ıs Oliveira (2020) [68] | SVM, PAC, MNB, SGD | A model based on hand-crafted psycholinguistic features as possible improvements to standard classification approaches of depressed online personas. | |
Iram Fatima Burhan Ud Din Abbasi Sharifullah Khan Majed Al-Saeed Hafiz Farooq Ahmad Rafia Mumtaz (2019) [69] | MLP, SVM, LR | A model that combines the text-based features and machine learning techniques to classify depressive and non-depressive posts and then further identify posts representing characteristics of postpartum depression | |
Sergio G. Burdisso, Marcelo Errecaldea, Manuel Montes-y-G´omezc (2019) [61] | CLEF eRisk2017 | KNN, LR, SVM, NB | SS3 (incremental, early classification and explain-ability), a novel text classifier that can be used as a framework to build systems for early risk detection ERD |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nanomi Arachchige, I.A.; Sandanapitchai, P.; Weerasinghe, R. Investigating Machine Learning & Natural Language Processing Techniques Applied for Predicting Depression Disorder from Online Support Forums: A Systematic Literature Review. Information 2021, 12, 444. https://doi.org/10.3390/info12110444
Nanomi Arachchige IA, Sandanapitchai P, Weerasinghe R. Investigating Machine Learning & Natural Language Processing Techniques Applied for Predicting Depression Disorder from Online Support Forums: A Systematic Literature Review. Information. 2021; 12(11):444. https://doi.org/10.3390/info12110444
Chicago/Turabian StyleNanomi Arachchige, Isuri Anuradha, Priyadharshany Sandanapitchai, and Ruvan Weerasinghe. 2021. "Investigating Machine Learning & Natural Language Processing Techniques Applied for Predicting Depression Disorder from Online Support Forums: A Systematic Literature Review" Information 12, no. 11: 444. https://doi.org/10.3390/info12110444
APA StyleNanomi Arachchige, I. A., Sandanapitchai, P., & Weerasinghe, R. (2021). Investigating Machine Learning & Natural Language Processing Techniques Applied for Predicting Depression Disorder from Online Support Forums: A Systematic Literature Review. Information, 12(11), 444. https://doi.org/10.3390/info12110444