Next Article in Journal
Analytics on Anonymity for Privacy Retention in Smart Health Data
Previous Article in Journal
Software Design and Experimental Evaluation of a Reduced AES for IoT Applications
Previous Article in Special Issue
A Sentiment-Aware Contextual Model for Real-Time Disaster Prediction Using Twitter Data
Article

Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets

1
Language Technology Group, Universität Hamburg, Grindelallee 117, 20146 Hamburg, Germany
2
Faculty of Computing, Bahir Dar Institute of Technology, Bahir Dar University, Bahir Dar 6000, Ethiopia
3
International Institute of Information Technology, Bangalore 560100, India
4
College of Informatics, University of Gondar, Gondar 6200, Ethiopia
*
Author to whom correspondence should be addressed.
Academic Editors: Massimo Esposito, Giovanni Luca Masala, Aniello Minutolo and Marco Pota
Future Internet 2021, 13(11), 275; https://doi.org/10.3390/fi13110275
Received: 11 October 2021 / Revised: 24 October 2021 / Accepted: 25 October 2021 / Published: 27 October 2021
(This article belongs to the Special Issue Natural Language Engineering: Methods, Tasks and Applications)
The availability of different pre-trained semantic models has enabled the quick development of machine learning components for downstream applications. However, even if texts are abundant for low-resource languages, there are very few semantic models publicly available. Most of the publicly available pre-trained models are usually built as a multilingual version of semantic models that will not fit well with the need for low-resource languages. We introduce different semantic models for Amharic, a morphologically complex Ethio-Semitic language. After we investigate the publicly available pre-trained semantic models, we fine-tune two pre-trained models and train seven new different models. The models include Word2Vec embeddings, distributional thesaurus (DT), BERT-like contextual embeddings, and DT embeddings obtained via network embedding algorithms. Moreover, we employ these models for different NLP tasks and study their impact. We find that newly-trained models perform better than pre-trained multilingual models. Furthermore, models based on contextual embeddings from FLAIR and RoBERTa perform better than word2Vec models for the NER and POS tagging tasks. DT-based network embeddings are suitable for the sentiment classification task. We publicly release all the semantic models, machine learning components, and several benchmark datasets such as NER, POS tagging, sentiment classification, as well as Amharic versions of WordSim353 and SimLex999. View Full-Text
Keywords: datasets; neural networks; semantic models; Amharic NLP; low-resource language; text tagging datasets; neural networks; semantic models; Amharic NLP; low-resource language; text tagging
Show Figures

Figure 1

MDPI and ACS Style

Yimam, S.M.; Ayele, A.A.; Venkatesh, G.; Gashaw, I.; Biemann, C. Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets. Future Internet 2021, 13, 275. https://doi.org/10.3390/fi13110275

AMA Style

Yimam SM, Ayele AA, Venkatesh G, Gashaw I, Biemann C. Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets. Future Internet. 2021; 13(11):275. https://doi.org/10.3390/fi13110275

Chicago/Turabian Style

Yimam, Seid M., Abinew A. Ayele, Gopalakrishnan Venkatesh, Ibrahim Gashaw, and Chris Biemann. 2021. "Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets" Future Internet 13, no. 11: 275. https://doi.org/10.3390/fi13110275

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop