Next Article in Journal
Assessment of Ultra-Short Heart Variability Indices Derived by Smartphone Accelerometers for Stress Detection
Next Article in Special Issue
Intelligent Identification for Rock-Mineral Microscopic Images Using Ensemble Machine Learning Algorithms
Previous Article in Journal
Deep Visible and Thermal Image Fusion for Enhanced Pedestrian Visibility
Previous Article in Special Issue
Real-Time Water Surface Object Detection Based on Improved Faster R-CNN
Open AccessArticle

A Method of Short Text Representation Based on the Feature Probability Embedded Vector

1
School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
2
Department of General Computer, College of Humanities and Science of Northeast Normal University, Changchun 130117, China
*
Authors to whom correspondence should be addressed.
Sensors 2019, 19(17), 3728; https://doi.org/10.3390/s19173728
Received: 4 July 2019 / Revised: 24 August 2019 / Accepted: 26 August 2019 / Published: 28 August 2019
Text representation is one of the key tasks in the field of natural language processing (NLP). Traditional feature extraction and weighting methods often use the bag-of-words (BoW) model, which may lead to a lack of semantic information as well as the problems of high dimensionality and high sparsity. At present, to solve these problems, a popular idea is to utilize deep learning methods. In this paper, feature weighting, word embedding, and topic models are combined to propose an unsupervised text representation method named the feature, probability, and word embedding method. The main idea is to use the word embedding technology Word2Vec to obtain the word vector, and then combine this with the feature weighted TF-IDF and the topic model LDA. Compared with traditional feature engineering, the proposed method not only increases the expressive ability of the vector space model, but also reduces the dimensions of the document vector. Besides this, it can be used to solve the problems of the insufficient information, high dimensions, and high sparsity of BoW. We use the proposed method for the task of text categorization and verify the validity of the method. View Full-Text
Keywords: word embedding; latent Dirichlet allocation; feature weighting; text representation word embedding; latent Dirichlet allocation; feature weighting; text representation
Show Figures

Figure 1

MDPI and ACS Style

Zhou, W.; Wang, H.; Sun, H.; Sun, T. A Method of Short Text Representation Based on the Feature Probability Embedded Vector. Sensors 2019, 19, 3728. https://doi.org/10.3390/s19173728

AMA Style

Zhou W, Wang H, Sun H, Sun T. A Method of Short Text Representation Based on the Feature Probability Embedded Vector. Sensors. 2019; 19(17):3728. https://doi.org/10.3390/s19173728

Chicago/Turabian Style

Zhou, Wanting; Wang, Hanbin; Sun, Hongguang; Sun, Tieli. 2019. "A Method of Short Text Representation Based on the Feature Probability Embedded Vector" Sensors 19, no. 17: 3728. https://doi.org/10.3390/s19173728

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop