Next Article in Journal
Design of Digital Interaction for Complex Museum Collections
Previous Article in Journal
Location-Based Games and the COVID-19 Pandemic: An Analysis of Responses from Game Developers and Players
Open AccessArticle

Semantic Unsupervised Automatic Keyphrases Extraction by Integrating Word Embedding with Clustering Methods

Institute for Applied Mathematics and Information Technologies “Enrico Magenes” (IMATI), National Research Council—CNR, Via Bassini, 15, 20133 Milan, Italy
*
Author to whom correspondence should be addressed.
Multimodal Technologies Interact. 2020, 4(2), 30; https://doi.org/10.3390/mti4020030
Received: 31 March 2020 / Revised: 4 June 2020 / Accepted: 17 June 2020 / Published: 19 June 2020
Increasingly, the web produces massive volumes of texts, alone or associated with images, videos, photographs, together with some metadata, indispensable for their finding and retrieval. Keywords/keyphrases that characterize the semantic content of documents should be, automatically or manually, extracted, and/or associated with them. The paper presents a novel method to address the problem of the automatic unsupervised extraction of keywords/phrases from texts, expressed both in English and in Italian. The main feature of this approach is the integration of two methods that have given interesting results: word embedding models, such as Word2Vec or GloVe able to capture the semantics of words and their context, and clustering algorithms, able to identify the essence of the terms and choose the more significant one(s), to represent the contents of a text. In the paper, the datasets used are presented, together with the method implemented and the results obtained. These results will be discussed, commented, and compared with those obtained in previous experimentations, using TextRank, Rapid Automatic Keyword Extraction (RAKE), and TF-IDF. View Full-Text
Keywords: unsupervised automatic keyword extraction; clustering algorithms; word embedding models; Italian datasets; information retrieval; evaluation; word2vec; GloVe unsupervised automatic keyword extraction; clustering algorithms; word embedding models; Italian datasets; information retrieval; evaluation; word2vec; GloVe
Show Figures

Figure 1

MDPI and ACS Style

Gagliardi, I.; Artese, M.T. Semantic Unsupervised Automatic Keyphrases Extraction by Integrating Word Embedding with Clustering Methods. Multimodal Technologies Interact. 2020, 4, 30.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop