# Applying a Hybrid Sequential Model to Chinese Sentence Correction

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Existing Work

#### 2.1. Sequence to Sequence (Seq2Seq)

#### 2.2. Recurrent Neural Network

#### 2.3. Long Short-Term Memory (LSTM)

#### 2.4. Gated Recurrent Unit (GRU)

#### 2.5. Transformer

#### 2.6. BERT

## 3. Proposed Method

#### 3.1. Preprocessing

#### 3.2. Vocabulary

#### 3.3. Tokenizer

#### 3.4. Embedding Layer

#### 3.5. Language Model

#### 3.6. Encoder Properties

#### 3.7. Decoder Properties

#### 3.8. Analysis

#### 3.9. Hybrid Architecture, BERT-RNN

#### 3.10. BERT-LSTM

#### 3.11. BERT-GRU

#### 3.12. Training Methods

#### 3.13. Greedy Decoding

#### 3.14. Beam Search

## 4. Experiment

#### 4.1. Dataset

#### 4.2. Environment

#### 4.3. Experiment Settings

#### 4.4. Vocabulary Setup

#### 4.5. Evaluation Metric

#### 4.6. Model Performance Comparison

#### 4.7. Experiment Comparison

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

**Figure 6.**Bidirectional Encoder Representations from Transformers (BERT)-Recurrent Neural Network (RNN).

Phase | Parallelizable | Direction |
---|---|---|

Training | Depends on model | Depends on model |

Inference | Depends on model | Depends on model |

RNN | Transformer | |
---|---|---|

Direction | Uni-Direction | Bi-Direction |

Parallelizable | False | True |

Performance | Low | High |

Computational Cost | Low | High |

Phase | Parallelizable | Direction |
---|---|---|

Training | Depends on training method, model | Uni-Direction |

Inference | False | Uni-Direction |

RNN | Transformer | |
---|---|---|

Direction | Uni-Direction | Uni-Direction |

Parallelizable | False | False |

Performance | Low | High |

Computational Cost | Low | High |

Length | Train Set | Test Set |
---|---|---|

25 | 686,130 | 49,157 |

50 | 1,056,324 | 69,446 |

128 | 1,093,564 | 71,653 |

Model | BLEU Score- BS | Inference Speed- BS | BLEU Score- GD | Inference Speed- GD | Training Speed |
---|---|---|---|---|---|

GRU- GRU | 0.7591 | 447.246 | 0.7553 | 793.454 | 221.48 |

LSTM- LSTM | 0.7549 | 425.297 | 0.7490 | 692.05 | 197.106 |

TRANS- TRANS | 0.7645 | 115.955 | 0.7596 | 410.753 | 362.085 |

BERT- TRANS | 0.7667 | 112.603 | 0.7625 | 403.235 | 359.804 |

BERT- GRU | 0.7676 | 535.172 | 0.7627 | 1038.811 | 277.962 |

BERT- LSTM | 0.7669 | 504.521 | 0.7617 | 952.377 | 261.673 |

Model | BLEU Score- BS | Inference Speed- BS | BLEU Score- GD | Inference Speed- GD | Training Speed |
---|---|---|---|---|---|

GRU- GRU | 0.7975 | 214.261 | 0.7953 | 417.462 | 132.446 |

LSTM- LSTM | 0.7970 | 200.268 | 0.7945 | 363.304 | 114.758 |

TRANS- TRANS | 0.7995 | 30.857 | 0.7966 | 149.292 | 264.928 |

BERT- TRANS | 0.7997 | 33.657 | 0.7959 | 141.639 | 265.218 |

BERT- GRU | 0.8021 | 272.526 | 0.7986 | 579.366 | 176.705 |

BERT- LSTM | 0.8017 | 243.564 | 0.7982 | 533.153 | 162.79 |

Model | BLEU Score- BS | Inference Speed- BS | BLEU Score- GD | Inference Speed- GD | Training Speed |
---|---|---|---|---|---|

GRU- GRU | 0.8042 | 111.094 | 0.8022 | 233.648 | 55.635 |

LSTM- LSTM | 0.8034 | 100.455 | 0.8010 | 203.333 | 47.137 |

TRANS- TRANS | 0.8068 | 10.91 | 0.8041 | 68.936 | 142.064 |

BERT- TRANS | 0.8069 | 12.362 | 0.8039 | 70.803 | 142.21 |

BERT- GRU | 0.8092 | 152.276 | 0.8059 | 391.5 | 76.775 |

BERT- LSTM | 0.809 | 130.146 | 0.8057 | 359.848 | 69.633 |

