Next Article in Journal
Analysis of Machine Learning Algorithms for Opinion Mining in Different Domains
Next Article in Special Issue
The Number of Topics Optimization: Clustering Approach
Previous Article in Journal
Exploiting Genomic Relations in Big Data Repositories by Graph-Based Search Methods
Open AccessArticle

Using the Outlier Detection Task to Evaluate Distributional Semantic Models

Centro Singular de Investigación en Tecnoloxías da Información (CiTIUS), Campus Vida, Universidade de Santiago de Compostela, 15782 Santiago de Compostela, Galiza, Spain
Mach. Learn. Knowl. Extr. 2019, 1(1), 211-223; https://doi.org/10.3390/make1010013
Received: 2 August 2018 / Revised: 16 November 2018 / Accepted: 19 November 2018 / Published: 22 November 2018
(This article belongs to the Special Issue Language Processing and Knowledge Extraction)
In this article, we define the outlier detection task and use it to compare neural-based word embeddings with transparent count-based distributional representations. Using the English Wikipedia as a text source to train the models, we observed that embeddings outperform count-based representations when their contexts are made up of bag-of-words. However, there are no sharp differences between the two models if the word contexts are defined as syntactic dependencies. In general, syntax-based models tend to perform better than those based on bag-of-words for this specific task. Similar experiments were carried out for Portuguese with similar results. The test datasets we have created for the outlier detection task in English and Portuguese are freely available. View Full-Text
Keywords: distributional semantics; dependency analysis; evaluation; word similarity distributional semantics; dependency analysis; evaluation; word similarity
Show Figures

Figure 1

MDPI and ACS Style

Gamallo, P. Using the Outlier Detection Task to Evaluate Distributional Semantic Models. Mach. Learn. Knowl. Extr. 2019, 1, 211-223.

Show more citation formats Show less citations formats

Article Access Map by Country/Region

1
Back to TopTop