Next Article in Journal
gEYEded: Subtle and Challenging Gaze-Based Player Guidance in Exploration Games
Next Article in Special Issue
Text Mining in Cybersecurity: Exploring Threats and Opportunities
Previous Article in Journal
Digital Storytelling to Enhance Adults’ Speaking Skills in Learning Foreign Languages: A Case Study
Previous Article in Special Issue
Unsupervised Keyphrase Extraction for Web Pages
Open AccessArticle

Data-Driven Lexical Normalization for Medical Social Media

1
Leiden Institute for Advanced Computer Science, Leiden University, 2333 CA Leiden, The Netherlands
2
Department of Biomedical Informatics, Emory University, Atlanta, GA 30322, USA
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in Social Media Mining for Health Applications workshop (SMM4H), ACL 2019.
Multimodal Technologies Interact. 2019, 3(3), 60; https://doi.org/10.3390/mti3030060
Received: 30 June 2019 / Revised: 9 August 2019 / Accepted: 13 August 2019 / Published: 20 August 2019
(This article belongs to the Special Issue Text Mining in Complex Domains)
In the medical domain, user-generated social media text is increasingly used as a valuable
complementary knowledge source to scientific medical literature. The extraction of this knowledge is
complicated by colloquial language use and misspellings. However, lexical normalization of such
data has not been addressed effectively. This paper presents a data-driven lexical normalization
pipeline with a novel spelling correction module for medical social media. Our method significantly
outperforms state-of-the-art spelling correction methods and can detect mistakes with an F1 of 0.63
despite extreme imbalance in the data. We also present the first corpus for spelling mistake detection
and correction in a medical patient forum.
Keywords: spelling correction; social media; health; natural language processing; lexical normalization spelling correction; social media; health; natural language processing; lexical normalization
MDPI and ACS Style

Dirkson, A.; Verberne, S.; Sarker, A.; Kraaij, W. Data-Driven Lexical Normalization for Medical Social Media. Multimodal Technologies Interact. 2019, 3, 60.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map

1
Back to TopTop