Next Article in Journal
Analytics on Anonymity for Privacy Retention in Smart Health Data
Next Article in Special Issue
Special Issue “Natural Language Engineering: Methods, Tasks and Applications”
Previous Article in Journal
Software Design and Experimental Evaluation of a Reduced AES for IoT Applications
Previous Article in Special Issue
A Sentiment-Aware Contextual Model for Real-Time Disaster Prediction Using Twitter Data
 
 
Article
Peer-Review Record

Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets

Future Internet 2021, 13(11), 275; https://doi.org/10.3390/fi13110275
by Seid Muhie Yimam 1,*, Abinew Ali Ayele 1,2, Gopalakrishnan Venkatesh 3, Ibrahim Gashaw 4 and Chris Biemann 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Future Internet 2021, 13(11), 275; https://doi.org/10.3390/fi13110275
Submission received: 11 October 2021 / Revised: 24 October 2021 / Accepted: 25 October 2021 / Published: 27 October 2021
(This article belongs to the Special Issue Natural Language Engineering: Methods, Tasks and Applications)

Round 1

Reviewer 1 Report

In the paper, the authors discussed different semantic models for Amharic. In general, it is a good approach, but in my opinion, the paper in its current form is not ready for publication. Some of my issues that should be improved:
1) Discuss in more detail the latest achievements in using machine learning for semantic models. 
In general, the used bibliography is outdated. I must ask the authors for rewriting a whole part to the current state of knowledge (mainly 2019-2021)
2) Add pseudocode for better understanding and analysis.
3) Add formal, mathematical models.
4) Discuss the differences between solutions. 
5) Add a table with pros/cons for all analyzed solutions that would be very helpful for readers.
6) All models should be explained in more detail, discuss how did you train these models, what was theirs architectures, coefficients. 
7) Make a proper comparison with state-of-art.

Author Response

Comments and Suggestions for Authors

Thank you very much for your the time to review our paper. We address comments and the revised version is uploaded. One of the issues we like to clarify here is that comparing the results with state-of-the-art models is very difficult for Amharic for several reasons. 1) While most of the research is conducted as part of a Master’s thesis requirement, the resources are not usually published publicly. One has to contact the authors to obtain the resources, in many cases, it is not possible to obtain the resources. 2) Related to the first issue, even if some of the resources are obtained, they are not benchmark datasets, which makes comparing results irrelevant. Our paper is filling these two gaps, releasing the resources publicly, and preparing a benchmark so that further research can compare the results when a different approach is employed. 

In the paper, the authors discussed different semantic models for Amharic. In general, it is a good approach, but in my opinion, the paper in its current form is not ready for publication. Some of my issues that should be improved:

1) Discuss in more detail the latest achievements in using machine learning for semantic models. 

Thank you for your comment. It is a good idea to discuss some of the recent works in other languages (mainly English) to show the success of semantic models (word embedding models). Now, we have covered some of the works that have integrated word embedding models for NLP tasks in the “Introduction” section.

In general, the used bibliography is outdated. I must ask the authors for rewriting a whole part to the current state of knowledge (mainly 2019-2021)

We have included recent works when it is appropriate. However, especially for Amharic NLP tasks, we have to cover all existing old works, as we, in general, lack scientific papers for Amharic.

2) Add pseudocode for better understanding and analysis.

We have included the word2Vec, transformer, and BERT/Flair model architectures that show how the attention-based contextualized word representations are built, see Figures 1--4.

3) Add formal, mathematical models.

We have included the word2Vec, transformer, and BERT/Flair model architectures that show how the attention-based contextualized word representations are built, see Figures 1--4.

4) Discuss the differences between solutions. 

The discussion section (Section 4) has a detailed explanation of the different results 

5) Add a table with pros/cons for all analyzed solutions that would be very helpful for readers.

The different training parameters are indicated in Table 2.

6) All models should be explained in more detail, discuss how did you train these models, what was theirs architectures, coefficients. 

The different training parameters are indicated in Table 2.

7) Make a proper comparison with state-of-art.

This is the first work to publicly release benchmark datasets and resources to make state-of-the-art comparisons. We have included the state-of-the-art results for POS tagging (Table 7) and word similarity (Table 6) tasks but they are not comparable. For the wordSim dataset, they are in different languages (ours is the first Amharic dataset) and for the POS tagging, they are not based on the benchmark datasets. We are creating these resources to create a benchmark dataset for future research

Reviewer 2 Report

The study reports on a comprehensive survey of semantic models for Amharic. Amharic is a worthful use case for rarely spoken languages. Most language models are optimized for English but show poor performance for rarely spoken languages like Danish, Greek, or (in this case) Amharic.

The survey focuses on the NLP tasks of

  • top-n similar words
  • word relatedness
  • Part-of-speech tagging
  • Named entity recognition
  • Sentiment analysis

For rarely spoken languages, not for all of these NLP tasks exist language models. In these cases, the authors trained additional models based on their own collected data sets:

  • news portals
  • social media texts
  • general web corpus crawled by an Amharic web crawler

While the overall study is well written and understandable, it sadly has some shortcomings regarding methodology and benchmarking.

First of all, the mix of a survey study, including missing self-trained language models for particular NLP tasks, is quite confusing for the reader. It stays throughout the paper widely unclear which models are new and which models are the 'baseline' models. At least this must be documented much more clearly. However, it is a suboptimal study design to mix survey parts with model training parts from my point of view.

The same is true for the used language corpus. For some tasks existing Amharic corpora are used. For other NLP tasks further corpora have been crawled using their own and specific technological solutions. The question stays, whether these corpora are comparable? The paper says little about these crawling steps. The study design is hardly repeatable within this particular aspect. The resulting threats to validity are not discussed at all by the authors. So, the authors should add a threat of validity discussion section to give the reader a more accurate impression.

Overall, it is an interesting and very comprehensive study of a rarely spoken language. This alone provides value on its own. However, the authors should clearly separate the survey and model comparison/rating aspects of their paper. It must become clear which models are contributions of the authors and which models are used as baseline models.

The model building process of own models could be much better and more detailly presented.

The authors could improve the presentation of the comparing of model accuracies as well.

A bit astonishing is that this study is presented in Future Internet and not in more NLP-focused venues.

Author Response

Comments and Suggestions for Authors

Thank you very much for your the time to review our paper. We are happy to recognize the contributions we have made to address the important NLP issues for low-resource languages. Your comment helps us to address some of the issues we have easily overlooked. We address the comments and the revised version is uploaded. One of the issues we like to clarify here is that comparing the results with state-of-the-art models is very difficult for Amharic for several reasons. 1) While most of the research is conducted as part of a Master’s thesis requirement, the resources are not usually published publicly. One has to contact the authors to obtain the resources, in many cases, it is not possible to obtain the resources. 2) Related to the first issue, even if some of the resources are obtained, they are not benchmark datasets, which makes comparing results irrelevant. Our paper is filling these two gaps, releasing the resources publicly, and preparing a benchmark so that further research can compare the results when a different approach is employed. 

The study reports on a comprehensive survey of semantic models for Amharic. Amharic is a worthful use case for rarely spoken languages. Most language models are optimized for English but show poor performance for rarely spoken languages like Danish, Greek, or (in this case) Amharic.

The survey focuses on the NLP tasks of

  • top-n similar words
  • word relatedness
  • Part-of-speech tagging
  • Named entity recognition
  • Sentiment analysis

For rarely spoken languages, not for all of these NLP tasks exist language models. In these cases, the authors trained additional models based on their own collected data sets:

  • news portals
  • social media texts
  • general web corpus crawled by an Amharic web crawler

While the overall study is well written and understandable, it sadly has some shortcomings regarding methodology and benchmarking.

First of all, the mix of a survey study, including missing self-trained language models for particular NLP tasks, is quite confusing for the reader. It stays throughout the paper widely unclear which models are new and which models are the 'baseline' models. At least this must be documented much more clearly. However, it is a suboptimal study design to mix survey parts with model training parts from my point of view.

Thank you for indicating this point. We have now included the required information in Table 1, expanding our contribution section.

The same is true for the used language corpus. For some tasks existing Amharic corpora are used. For other NLP tasks further corpora have been crawled using their own and specific technological solutions. The question stays, whether these corpora are comparable? The paper says little about these crawling steps. The study design is hardly repeatable within this particular aspect. The resulting threats to validity are not discussed at all by the authors. So, the authors should add a threat of validity discussion section to give the reader a more accurate impression.

Thank you for your comment. We have provided details of datasets, models, and tools in Table 1. The table explicitly shows the contributions and resources that are currently available. The dataset collection strategies are presented in Section 1.1.2

Overall, it is an interesting and very comprehensive study of a rarely spoken language. This alone provides value on its own. However, the authors should clearly separate the survey and model comparison/rating aspects of their paper. It must become clear which models are contributions of the authors and which models are used as baseline models.

We have summarized in Table the models we have built and the models that exist publicly. fastText, multFlair, and XLMR are the publicly available models. Since these models are trained on a multilingual setup, they are usually built with a smaller in size and noisy dataset. We have indicated this on page 2, lines 41--46.

 

The model building process of own models could be much better and more detailly presented.

The different training parameters for each of the models are included in Table 2. We have also discussed each model, both static and contextual models in Section 2.1--2.4

The authors could improve the presentation of the comparing of model accuracies as well.

A bit astonishing is that this study is presented in Future Internet and not in more NLP-focused venues.

Thanks for the suggestion, however, we are submitting to the Future Internet “special Issue Natural Language Engineering: Methods, Tasks and Applications” where one of the topics of interest is “Low-resource natural language processing”

Round 2

Reviewer 1 Report

Accept

Reviewer 2 Report

I am fine with the revision.

Back to TopTop