Next Article in Journal
Formulation of Laccase Nanobiocatalysts Based on Ionic and Covalent Interactions for the Enhanced Oxidation of Phenolic Compounds
Next Article in Special Issue
A Hospital Recommendation System Based on Patient Satisfaction Survey
Previous Article in Journal
Thermo-Responsive Shape-Memory Effect and Surface Features in Polycarbonate (PC)
Previous Article in Special Issue
Reformulation-Linearization Technique Approach for Kidney Exchange Program IT Healthcare Platforms
Article Menu
Issue 8 (August) cover image

Export Article

Open AccessArticle
Appl. Sci. 2017, 7(8), 846;

Learning Word Embeddings with Chi-Square Weights for Healthcare Tweet Classification

Department of Computer Science and Engineering, Lehigh University, 19 Memorial Dr. West, Bethlehem, PA 18015, USA
Author to whom correspondence should be addressed.
Received: 16 July 2017 / Revised: 10 August 2017 / Accepted: 11 August 2017 / Published: 17 August 2017
(This article belongs to the Special Issue Smart Healthcare)
Full-Text   |   PDF [252 KB, uploaded 21 August 2017]   |  


Twitter is a popular source for the monitoring of healthcare information and public disease. However, there exists much noise in the tweets. Even though appropriate keywords appear in the tweets, they do not guarantee the identification of a truly health-related tweet. Thus, the traditional keyword-based classification task is largely ineffective. Algorithms for word embeddings have proved to be useful in many natural language processing (NLP) tasks. We introduce two algorithms based on an existing word embedding learning algorithm: the continuous bag-of-words model (CBOW). We apply the proposed algorithms to the task of recognizing healthcare-related tweets. In the CBOW model, the vector representation of words is learned from their contexts. To simplify the computation, the context is represented by an average of all words inside the context window. However, not all words in the context window contribute equally to the prediction of the target word. Greedily incorporating all the words in the context window will largely limit the contribution of the useful semantic words and bring noisy or irrelevant words into the learning process, while existing word embedding algorithms also try to learn a weighted CBOW model. Their weights are based on existing pre-defined syntactic rules while ignoring the task of the learned embedding. We propose learning weights based on the words’ relative importance in the classification task. Our intuition is that such learned weights place more emphasis on words that have comparatively more to contribute to the later task. We evaluate the embeddings learned from our algorithms on two healthcare-related datasets. The experimental results demonstrate that embeddings learned from the proposed algorithms outperform existing techniques by a relative accuracy improvement of over 9%. View Full-Text
Keywords: word embedding; healthcare; classification word embedding; healthcare; classification

Graphical abstract

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Kuang, S.; Davison, B.D. Learning Word Embeddings with Chi-Square Weights for Healthcare Tweet Classification. Appl. Sci. 2017, 7, 846.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Appl. Sci. EISSN 2076-3417 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top