Next Article in Journal
Machine Learning Approaches for Sharing Unlicensed Millimeter-Wave Bands in Heterogeneously Integrated Sensing and Communication Networks
Previous Article in Journal
SoC-VRP: A Deep-Reinforcement-Learning-Based Vehicle Route Planning Mechanism for Service-Oriented Cooperative ITS
Previous Article in Special Issue
X-ray Detection of Prohibited Item Method Based on Dual Attention Mechanism
 
 
Article
Peer-Review Record

Online Mongolian Handwriting Recognition Based on Encoder–Decoder Structure with Language Model

Electronics 2023, 12(20), 4194; https://doi.org/10.3390/electronics12204194
by Daoerji Fan *, Yuxin Sun, Zhixin Wang and Yanjun Peng
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Electronics 2023, 12(20), 4194; https://doi.org/10.3390/electronics12204194
Submission received: 19 September 2023 / Revised: 7 October 2023 / Accepted: 8 October 2023 / Published: 10 October 2023
(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence)

Round 1

Reviewer 1 Report

This paper is devoted to the training of the proposed approach which incorporate a pretrained language model at character level with encoder-decoder structure in online Mongolian handwriting dataset.
The title reflects the essence of the paper.
The abstract
reflects the main goals, contributions and results obtained by the authors. The introduction highlights the unique and complex features of Mongolian handwritings. Related work is done at a good level.

The main contribution of this paper is the proposed Fusion Model which incorporates the pretrained character-level LM into (Seq2Seq+AM) entirely based on the model proposed in reference [4]. Due to the interaction between the LM and the decoder, three Fusion models are introduced: The Former Fusion, The Latter Fusion and the Complete Fusion Model. Each model is well explained.   

However there are some shortcomings that should be better explained.

-         In page 10, it is written “Four layers of GRU network were selected“ while it is three layers!!  

-         The first evaluation metric “average char error count” by its definition, the same as evaluation metric “Character error rate CER”. In table 3, train_ACEC and test_ACEC were termed train_CER and test_CER in the original paper in table 6 in reference [4], what is the reason for  using another term for CER?

-         Moreover, CER should be presented as WER  ( percentage value) for comparison purposes.

-         It is known in handwritten recognition models, the WER is three to four times higher than the CER and is proportional to it. But in all the proposed models, the CER is about two times higher than WER! Some explanations are needed at this point.

 As mentioned in conclusion, the proposed model improved the results, but it has not yet reached the level required for practical end-to-end applications.

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

1. Paper summary: This paper presented a Mongolian handwriting recognition model, which is very interesting. The authors applied Seq2seq learning and attention mechanism as the basic deep neural network architecture. Then fusion method are applied to enhance the deep network performance. Result shows that the trained fusion model produces best results among other published baseline models, and the absolute word error rate is acceptable.

2. Strength: The paper is well written in English with reasonable arranged sections. The proposed model architecture is described with clear graphs and detailed math formulas. The experimental settings are described with enough details and hence convincing. Finally, the proposed model produces best performance among other published baseline models, making the result reliable.

As indicated, this is a very interesting paper applying deep learning to novel research areas. I never read through another paper applying deep learning approaches to cultural or rare language hand writings. I believe that readers will be very interested to read this paper.

3. Weakness: The authors should provide a bit more introduction on the basic ideas and fundamental formulas of attention mechanism. The AM is applied after seq2seq layer outputs, but it seems that the authors fail to include details about attention learning fundamentals. This may help readers who are not very familiar with attention-based methods to read this paper.

4. Conclusion: In general, this paper apply deep learning to a novel research topic, which is very interesting. The neural network structures are described with enough details, and the results are competitive and convincing. Hence, I will recommend to accept this paper after the minor update as indicated in above.

The English expression is fine. I did not find any serious grammar errors or typos.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

According to the authors, Mongolian online handwriting recognition is a complex task due to the script’s intricate characters and extensive vocabulary. The proposal of this article includes a novel approach by integrating a pre-trained language model into the Seq2Seq+AM model to enhance recognition accuracy. The used approach includes three fusion models, including former, latter, and complete fusion, showing according to the authors substantial improvements over the baseline model. The complete fusion model, combined with synchronized language model parameters, achieved the best results, significantly reducing character and word error rates.

The article is very well written in terms of ease of reading, clarity and scientific component. The structure is also quite complete and the article from a general point of view is very balanced.

There are only two types of issues, in my view, that should be corrected by authors:

- The presentation of the various equations presented throughout the article must have a reference to them in the text;

- The first appearance of certain algorithms and technologies must have a reference (e.g. RMSprop page 10, line 288).

One of the questions I raised when reading the article was related to why a grid search strategy should be used to tune the hyperparameters of language models when there are currently other types of more efficient methods. However, I saw that the authors identified this issue as a limitation for future work.

The theme and approach are not original for other languages, being original eventually for the Mongolian language and for the task of online handwriting recognition. Thinking about the specificity of the Mongolian writing style, this article can be a good contribution to the state of the art. To this extent, my suggestion is to approve the article for publication.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop