Next Article in Journal
Identifying the Early Signs of Preterm Birth from U.S. Birth Records Using Machine Learning Techniques
Previous Article in Journal
Understanding Entertainment Trends during COVID-19 in Saudi Arabia
Previous Article in Special Issue
On the Use of Mouse Actions at the Character Level
 
 
Article
Peer-Review Record

Human Evaluation of English–Irish Transformer-Based NMT

Information 2022, 13(7), 309; https://doi.org/10.3390/info13070309
by Séamus Lankford 1,2,*, Haithem Afli 2 and Andy Way 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Information 2022, 13(7), 309; https://doi.org/10.3390/info13070309
Submission received: 6 May 2022 / Revised: 20 June 2022 / Accepted: 21 June 2022 / Published: 25 June 2022
(This article belongs to the Special Issue Frontiers in Machine Translation)

Round 1

Reviewer 1 Report

First of all, thank you for inviting me to review the paper “Human Evaluation of English-Irish Transformer-based NMT”. Human evaluation of English-Irish were tested on RNN and NMT with the Transformer model significantly reduces both accuracy and fluency errors when compared with an RNN-based model.

Some recommendations are provided:

Abstract – must be improved, initial occurrence of acronyms – should provide full meaning, eg. Neural Machine Translation (NMT) for better consistency and clarity. Subword model should be clarify since this is the main finding. Abstract can be more concise and clear.

Introduction

Subword model types must be clarify. Subword as compared to character based models, since there are also many studies that refer to this method. Should also expand on the “low-resource scenario”.

Background

Lines 82-83 - Experiments in MT tasks show such models are better in quality due to greater parallelization while requiring significantly less time to train. – citation needed

Could also expand this section to include previous studies using subword models, RNN and NMT

 

Should also expand the “low-resource scenario”, only 1 instance? reference [28]

Tables – please check, table number and title heading should be above the actual table (if iam not mistaken)

Line 165- Nuances, such as how gender or cases are handled, are not uncovered by this approach – limitation of the study?

Any other studies using similar data analysis method?

Lines 239 to 240 -  the validity of the findings from our research, it is important to check the level of agreement between our annotators. – citation needed

 

Did you able to include inter-annotator agreement results within the discussion?

Environmental impact was included in the results – please elaborate on its significant

Limitations of the study?

Author Response

Thank you for taking the time to review the manuscript. All of the points are well made. Due to time constraints, I have managed to update most, but not all of the items in the list. I hope the revised version is acceptable.

Abstract – must be improved, initial occurrence of acronyms – should provide full meaning, eg. Neural Machine Translation (NMT) for better consistency and clarity. Subword model should be clarify since this is the main finding. Abstract can be more concise and clear.

  • the abstract has now been rewritten. Acronyms have been expanded and the content has been simplified for greater clarity.

Introduction

Subword model types must be clarify. Subword as compared to character based models, since there are also many studies that refer to this method. Should also expand on the “low-resource scenario”.

  • Comparison with character based models is now included. Parts of the introduction have also been rewritten.

Background

Lines 82-83 - Experiments in MT tasks show such models are better in quality due to greater parallelization while requiring significantly less time to train. – citation needed

  •  an appropriate citation is now included.

Tables – please check, table number and title heading should be above the actual table (if iam not mistaken)

  • Correct. The format for tables is to have the caption at the top of the table. All tables have now be updated with the correct format.

Lines 239 to 240 -  the validity of the findings from our research, it is important to check the level of agreement between our annotators. – citation needed

  •  Appropriate citation is now included 

Did you able to include inter-annotator agreement results within the discussion?

Environmental impact was included in the results – please elaborate on its significant

  •  a separate paper is being planned for this area. The significance of the environmental results is simply to track CO2 usage in model development. 

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper is well-written and the topic is still hot and interesting for the community. However, I have some questions/comments that the authors need to address as a minor revision.

1) I'd like to see the results based on "accuracy" and "perplexity"y as well. It would be good to add these to the list of your results.

2) A fair comparison between your results and the state-of-the-art is highly recommended for the revised version.

3) Having a separate "Related Work" section to cover majority most of the previous work is highly recommended.

4) The following items are suggested to be added to the list of your references:

4-1) Dual Learning for Machine Translation. He et al., 2016. Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS).

4-2) Augmenting Neural Machine Translation through Round-Trip Training Approach. Ahmadnia and Dorr 2019. Open Computer Science (De Gruyter). 9(1):268-278.

5) Applying the proposed method to other architectures than Transformer is recommended.

I'd be happy to review the revised version.

Author Response

Thank you for taking the time to review the manuscript. All of the points are well made. Due to time constraints, I have managed to update most, but not all of the items in the list. I hope the revised version is acceptable.

In particular: 

1) I'd like to see the results based on "accuracy" and "perplexity"y as well. It would be good to add these to the list of your results.

- The Perplexity and Accuracy graphs (Figures 3 and 4) highlight the models'  performance using both accuracy and PPL.

2) A fair comparison between your results and the state-of-the-art is highly recommended for the revised version.

- This was the subject of a previous paper we wrote where we won the Shared Task for our Transformer model: "Machine Translation in the Covid domain: an English-Irish case study for LoResMT 2021"

4) The following items are suggested to be added to the list of your references:

- Both works are now cited

5) Applying the proposed method to other architectures than Transformer is recommended.

- The proposed method was applied to the RNN and Transformer architectures.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Thank you for letting me review the revised version of this paper, the authors have revised the paper according to the recommendations. 

Just a minor part the author was not able to address.

Did you able to include inter-annotator agreement results within the discussion?

Author Response

Thanks for reviewing the manuscript again.  I think the Overleaf change control tool might not have picked up a new section which I had added. 

Please see, below, a copy of the section which was added to the discussion. Thank you for pointing it out in the initial version since such an addition is important in validating the results.

I hope this addresses the query.

"

6.1. Inter-annotator reliability
In Cohen’s original article [ 39], the interpretation of specific k scores is clearly outlined. There is no agreement with values 0, none to slight agreement when scores are in a range of 0.01–0.20, fair agreement is represented by 0.21–0.40, 0.41–0.60 is moderate, 0.61–0.80 is substantial, and 0.81–1.00 is almost perfect agreement. The literature [ 53] recommends a minimum of 80% agreement for good inter-annotator agreement. As illustrated in Table 4, there is almost perfect agreement between the annotators when evaluating output from the NMT models. In the case of the RNN outputs, there is disagreement in the mistranslation category but agreement in all other categories. Given
these scores, we have a high degree of confidence in our human evaluation of both the RNN and NMT outputs.

"

Reviewer 2 Report

Great job! The current version is now qualified for publication. Congratulations to the authors for their hard work.

Author Response

Thanks for reviewing the manuscript on both occasions. All updates are now in place so I trust the review report is ready for signing.

If anything else is needed, please let me know. Thanks, again.

Back to TopTop