# Applying a Hybrid Sequential Model to Chinese Sentence Correction

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Existing Work

#### 2.1. Sequence to Sequence (Seq2Seq)

#### 2.2. Recurrent Neural Network

#### 2.3. Long Short-Term Memory (LSTM)

#### 2.4. Gated Recurrent Unit (GRU)

#### 2.5. Transformer

#### 2.6. BERT

## 3. Proposed Method

#### 3.1. Preprocessing

#### 3.2. Vocabulary

#### 3.3. Tokenizer

#### 3.4. Embedding Layer

#### 3.5. Language Model

#### 3.6. Encoder Properties

#### 3.7. Decoder Properties

#### 3.8. Analysis

#### 3.9. Hybrid Architecture, BERT-RNN

#### 3.10. BERT-LSTM

#### 3.11. BERT-GRU

#### 3.12. Training Methods

#### 3.13. Greedy Decoding

#### 3.14. Beam Search

## 4. Experiment

#### 4.1. Dataset

#### 4.2. Environment

#### 4.3. Experiment Settings

#### 4.4. Vocabulary Setup

#### 4.5. Evaluation Metric

#### 4.6. Model Performance Comparison

#### 4.7. Experiment Comparison

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Huang, C.M.; Wu, M.C.; Chang, C.C. Error Detection and Correction Based on Chinese Phonemic Alphabet in Chinese Text. In Proceedings of the International Conference on Modeling Decisions for Artificial Intelligence, Kitakyushu, Japan, 16–18 August 2007; Volume 16, pp. 463–476. [Google Scholar] [CrossRef]
- Shiue, Y.T.; Huang, H.H.; Chen, H. Correcting Chinese Word Usage Errors for Learning Chinese as a Second Language. In Proceedings of the COLING, Santa Fe, NM, USA, 20–26 August 2018. [Google Scholar]
- Cheng, S.M.; Yu, C.H.; Chen, H.H. Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners. In Proceedings of the COLING 2014, the 25th International Conference on Computational Linguistics, Dublin, Ireland, 23–29 August 2014; Technical Papers. Dublin City University and Association for Computational Linguistics: Dublin, Ireland, 2014; pp. 279–289. [Google Scholar]
- Eason, G.; Noble, B.; Sneddon, I. On certain integrals of Lipschitz-Hankel type involving products of bessel functions. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Sci.
**1955**, 247, 529–551. [Google Scholar] - Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 7–12 July 2002. [Google Scholar] [CrossRef] [Green Version]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv
**2016**, arXiv:1409.0473. [Google Scholar] - Ge, T.; Zhang, X.; Wei, F.; Zhou, M. Automatic Grammatical Error Correction for Sequence-to-sequence Text Generation: An Empirical Study. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; Association for Computational Linguistics: Florence, Italy, 2019; pp. 6059–6064. [Google Scholar] [CrossRef] [Green Version]
- Schmaltz, A.; Kim, Y.; Rush, A.M.; Shieber, S. Sentence-Level Grammatical Error Identification as Sequence-to-Sequence Correction. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, San Diego, CA, USA, 16 June 2016; Association for Computational Linguistics: San Diego, CA, USA, 2016; pp. 242–251. [Google Scholar] [CrossRef] [Green Version]
- Li, S.; Zhao, J.; Shi, G.; Tan, Y.; Xu, H.; Chen, G.; Lan, H.; Lin, Z. Chinese Grammatical Error Correction Based on Convolutional Sequence to Sequence Model. IEEE Access
**2019**, 7, 72905–72913. [Google Scholar] [CrossRef] - Sherstinsky, A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. arXiv
**2018**, arXiv:1808.03314. [Google Scholar] [CrossRef] [Green Version] - Recurrent Neural Network. Available online: https://www.cs.toronto.edu/~tingwuwang/rnn_tutorial.pdf (accessed on 20 November 2020).
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. arXiv
**2014**, arXiv:1409.3215. [Google Scholar] - Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed] - Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv
**2014**, arXiv:1412.3555. [Google Scholar] - Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv
**2017**, arXiv:1412.6980. [Google Scholar] - Loper, E.; Bird, S. NLTK: The Natural Language Toolkit. arXiv
**2002**, arXiv:0205028. [Google Scholar] - Zhao, Y.; Jiang, N.; Sun, W.; Wan, X. Overview of the NLPCC 2018 Shared Task: Grammatical Error Correction. In Proceedings of the 7th CCF International Conference, NLPCC 2018, Hohhot, China, 26–30 August 2018; pp. 439–445. [Google Scholar] [CrossRef]
- Chen, T.; Li, M.; Li, Y.; Lin, M.; Wang, N.; Wang, M.; Xiao, T.; Xu, B.; Zhang, C.; Zhang, Z. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv
**2015**, arXiv:1512.01274. [Google Scholar] - Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv
**2018**, arXiv:1810.04805. [Google Scholar] - Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv
**2014**, arXiv:1406.1078. [Google Scholar]

**Figure 6.**Bidirectional Encoder Representations from Transformers (BERT)-Recurrent Neural Network (RNN).

Phase | Parallelizable | Direction |
---|---|---|

Training | Depends on model | Depends on model |

Inference | Depends on model | Depends on model |

RNN | Transformer | |
---|---|---|

Direction | Uni-Direction | Bi-Direction |

Parallelizable | False | True |

Performance | Low | High |

Computational Cost | Low | High |

Phase | Parallelizable | Direction |
---|---|---|

Training | Depends on training method, model | Uni-Direction |

Inference | False | Uni-Direction |

RNN | Transformer | |
---|---|---|

Direction | Uni-Direction | Uni-Direction |

Parallelizable | False | False |

Performance | Low | High |

Computational Cost | Low | High |

Length | Train Set | Test Set |
---|---|---|

25 | 686,130 | 49,157 |

50 | 1,056,324 | 69,446 |

128 | 1,093,564 | 71,653 |

Model | BLEU Score- BS | Inference Speed- BS | BLEU Score- GD | Inference Speed- GD | Training Speed |
---|---|---|---|---|---|

GRU- GRU | 0.7591 | 447.246 | 0.7553 | 793.454 | 221.48 |

LSTM- LSTM | 0.7549 | 425.297 | 0.7490 | 692.05 | 197.106 |

TRANS- TRANS | 0.7645 | 115.955 | 0.7596 | 410.753 | 362.085 |

BERT- TRANS | 0.7667 | 112.603 | 0.7625 | 403.235 | 359.804 |

BERT- GRU | 0.7676 | 535.172 | 0.7627 | 1038.811 | 277.962 |

BERT- LSTM | 0.7669 | 504.521 | 0.7617 | 952.377 | 261.673 |

Model | BLEU Score- BS | Inference Speed- BS | BLEU Score- GD | Inference Speed- GD | Training Speed |
---|---|---|---|---|---|

GRU- GRU | 0.7975 | 214.261 | 0.7953 | 417.462 | 132.446 |

LSTM- LSTM | 0.7970 | 200.268 | 0.7945 | 363.304 | 114.758 |

TRANS- TRANS | 0.7995 | 30.857 | 0.7966 | 149.292 | 264.928 |

BERT- TRANS | 0.7997 | 33.657 | 0.7959 | 141.639 | 265.218 |

BERT- GRU | 0.8021 | 272.526 | 0.7986 | 579.366 | 176.705 |

BERT- LSTM | 0.8017 | 243.564 | 0.7982 | 533.153 | 162.79 |

Model | BLEU Score- BS | Inference Speed- BS | BLEU Score- GD | Inference Speed- GD | Training Speed |
---|---|---|---|---|---|

GRU- GRU | 0.8042 | 111.094 | 0.8022 | 233.648 | 55.635 |

LSTM- LSTM | 0.8034 | 100.455 | 0.8010 | 203.333 | 47.137 |

TRANS- TRANS | 0.8068 | 10.91 | 0.8041 | 68.936 | 142.064 |

BERT- TRANS | 0.8069 | 12.362 | 0.8039 | 70.803 | 142.21 |

BERT- GRU | 0.8092 | 152.276 | 0.8059 | 391.5 | 76.775 |

BERT- LSTM | 0.809 | 130.146 | 0.8057 | 359.848 | 69.633 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Chen, J.W.; Sigalingging, X.K.; Leu, J.-S.; Takada, J.-I.
Applying a Hybrid Sequential Model to Chinese Sentence Correction. *Symmetry* **2020**, *12*, 1939.
https://doi.org/10.3390/sym12121939

**AMA Style**

Chen JW, Sigalingging XK, Leu J-S, Takada J-I.
Applying a Hybrid Sequential Model to Chinese Sentence Correction. *Symmetry*. 2020; 12(12):1939.
https://doi.org/10.3390/sym12121939

**Chicago/Turabian Style**

Chen, Jun Wei, Xanno K. Sigalingging, Jenq-Shiou Leu, and Jun-Ichi Takada.
2020. "Applying a Hybrid Sequential Model to Chinese Sentence Correction" *Symmetry* 12, no. 12: 1939.
https://doi.org/10.3390/sym12121939