Next Article in Journal
Why Bohmian Mechanics? One- and Two-Time Position Measurements, Bell Inequalities, Philosophy, and Physics
Next Article in Special Issue
Applying Time-Dependent Attributes to Represent Demand in Road Mass Transit Systems
Previous Article in Journal
Glass Transition, Crystallization of Glass-Forming Melts, and Entropy
Previous Article in Special Issue
Using Entropy in Web Usage Data Preprocessing
Open AccessArticle

Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification

by Jie Hu 1,2, Shaobo Li 1,3,*, Yong Yao 3, Liya Yu 3, Guanci Yang 1 and Jianjun Hu 2,3,*
1
Key Laboratory of Advanced Manufacturing Technology of Ministry of Education, Guizhou University, Guiyang 550025, China
2
Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA
3
School of Mechanical Engineering, Guizhou University, Guiyang 550025, China
*
Authors to whom correspondence should be addressed.
Entropy 2018, 20(2), 104; https://doi.org/10.3390/e20020104
Received: 8 November 2017 / Revised: 24 January 2018 / Accepted: 30 January 2018 / Published: 2 February 2018
(This article belongs to the Special Issue Entropy-based Data Mining)
Many text mining tasks such as text retrieval, text summarization, and text comparisons depend on the extraction of representative keywords from the main text. Most existing keyword extraction algorithms are based on discrete bag-of-words type of word representation of the text. In this paper, we propose a patent keyword extraction algorithm (PKEA) based on the distributed Skip-gram model for patent classification. We also develop a set of quantitative performance measures for keyword extraction evaluation based on information gain and cross-validation, based on Support Vector Machine (SVM) classification, which are valuable when human-annotated keywords are not available. We used a standard benchmark dataset and a homemade patent dataset to evaluate the performance of PKEA. Our patent dataset includes 2500 patents from five distinct technological fields related to autonomous cars (GPS systems, lidar systems, object recognition systems, radar systems, and vehicle control systems). We compared our method with Frequency, Term Frequency-Inverse Document Frequency (TF-IDF), TextRank and Rapid Automatic Keyword Extraction (RAKE). The experimental results show that our proposed algorithm provides a promising way to extract keywords from patent texts for patent classification. View Full-Text
Keywords: keyword extraction; information gain; patent classification; deep learning keyword extraction; information gain; patent classification; deep learning
Show Figures

Graphical abstract

MDPI and ACS Style

Hu, J.; Li, S.; Yao, Y.; Yu, L.; Yang, G.; Hu, J. Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification. Entropy 2018, 20, 104.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop