Data-Driven Lexical Normalization for Medical Social Media†
complementary knowledge source to scientific medical literature. The extraction of this knowledge is
complicated by colloquial language use and misspellings. However, lexical normalization of such
data has not been addressed effectively. This paper presents a data-driven lexical normalization
pipeline with a novel spelling correction module for medical social media. Our method significantly
outperforms state-of-the-art spelling correction methods and can detect mistakes with an F1 of 0.63
despite extreme imbalance in the data. We also present the first corpus for spelling mistake detection
and correction in a medical patient forum.
Dirkson, A.; Verberne, S.; Sarker, A.; Kraaij, W. Data-Driven Lexical Normalization for Medical Social Media. Multimodal Technologies Interact. 2019, 3, 60.
Dirkson A, Verberne S, Sarker A, Kraaij W. Data-Driven Lexical Normalization for Medical Social Media. Multimodal Technologies and Interaction. 2019; 3(3):60.Chicago/Turabian Style
Dirkson, Anne; Verberne, Suzan; Sarker, Abeed; Kraaij, Wessel. 2019. "Data-Driven Lexical Normalization for Medical Social Media." Multimodal Technologies Interact. 3, no. 3: 60.