Next Article in Journal
Conceptualization and Non-Relational Implementation of Ontological and Epistemic Vagueness of Information in Digital Humanities
Previous Article in Journal
The Effects of Motion Artifacts on Self-Avatar Agency
Open AccessArticle

Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla—A Low Resourced Language

1
Department of Computer Science, West Bengal State University, Kolkata 700126, India
2
Department of Computer Science, University of Minnesota Duluth, Duluth, MN 55812, USA
3
Department of Computer Science & Engineering, Jadavpur University, Kolkata 700032, India
4
Linguistic Research Unit, Indian Statistical Institute, Kolkata 700108, India
5
Department of Bengali, West Bengal State University, Kolkata 700126, India
*
Author to whom correspondence should be addressed.
Informatics 2019, 6(2), 19; https://doi.org/10.3390/informatics6020019
Received: 17 February 2019 / Revised: 14 April 2019 / Accepted: 20 April 2019 / Published: 5 May 2019
Semantic similarity is a long-standing problem in natural language processing (NLP). It is a topic of great interest as its understanding can provide a look into how human beings comprehend meaning and make associations between words. However, when this problem is looked at from the viewpoint of machine understanding, particularly for under resourced languages, it poses a different problem altogether. In this paper, semantic similarity is explored in Bangla, a less resourced language. For ameliorating the situation in such languages, the most rudimentary method (path-based) and the latest state-of-the-art method (Word2Vec) for semantic similarity calculation were augmented using cross-lingual resources in English and the results obtained are truly astonishing. In the presented paper, two semantic similarity approaches have been explored in Bangla, namely the path-based and distributional model and their cross-lingual counterparts were synthesized in light of the English WordNet and Corpora. The proposed methods were evaluated on a dataset comprising of 162 Bangla word pairs, which were annotated by five expert raters. The correlation scores obtained between the four metrics and human evaluation scores demonstrate a marked enhancement that the cross-lingual approach brings into the process of semantic similarity calculation for Bangla. View Full-Text
Keywords: semantic similarity; Word2Vec; translation; low-resource languages; WordNet semantic similarity; Word2Vec; translation; low-resource languages; WordNet
Show Figures

Figure 1

MDPI and ACS Style

Pandit, R.; Sengupta, S.; Naskar, S.K.; Dash, N.S.; Sardar, M.M. Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla—A Low Resourced Language. Informatics 2019, 6, 19.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop