Korean Grammatical Error Correction Based on Transformer with Copying Mechanisms and Grammatical Noise Implantation Methods

Grammatical Error Correction (GEC) is the task of detecting and correcting various grammatical errors in texts. Many previous approaches to the GEC have used various mechanisms including rules, statistics, and their combinations. Recently, the performance of the GEC in English has been drastically enhanced due to the vigorous applications of deep neural networks and pretrained language models. Following the promising results of the English GEC tasks, we apply the Transformer with Copying Mechanism into the Korean GEC task by introducing novel and effective noising methods for constructing Korean GEC datasets. Our comparative experiments showed that the proposed system outperforms two commercial grammar check and other NMT-based models.


Introduction
Grammatical Error Correction (GEC), as shown in Figure 1, is the task of automatically detecting and correcting various types of grammatical errors and typos in texts. It typically focuses on all the textual mistakes and errors including morphological, lexical, syntactic, and semantic irregularities that could be appeared in texts [1]. Until now, almost all the previous approaches to GEC for Korean have utilized the rule-based methods where all the target error patterns as well as corresponding correction logics should be recognized in advance and consistently expanded [2]. However, it is obvious that the rule-based mechanisms have a disadvantage in that they require much of manual labor in achieving the error patterns and correction logics. Furthermore, it is unlikely to promptly reflect a radical change in the current linguistic environment such as the rise of newly coined words and the natural extinction of old-fashioned words and syntactic rules [1].
To address the limitations and problems mentioned earlier, many researchers are now attempting to apply Neural Machine Translation (NMT) models for the GEC because they are perfectly appropriate for the task translating grammatically incorrect sentences to correct sentences. The NMT-based models have two advantages. Firstly, their neural encoder-decoder mechanism effectively encodes various grammatical errors in training data and generates the corresponding corrected texts based on the encoded information [3]. In addition, their error handling coverage is much broader than the conventional methods even handling infrequent and rare error patterns with the generalization ability of the mechanism [1]. These strength of the models leads to the remarkable performance improvements in the recent English GEC tasks showing the promising potentials of the approaches as a future research direction [3].
In this paper, we introduce an effective Korean Grammatical Error Correction model based on Transformer equipped with the Copying Mechanism and various noising methods for automatically generating a training set. Transformer is a model derived from "Attention is all you need," a paper published by Google in 2017. It follows the existing seq2seq structural encoder-decoder, but it is a model implemented only with Attention as the name of the paper [4]. It is shown that during the GEC execution, about 80% of input texts remain unchanged and only 20% are recognized as errors and thus the system changes their lexical and syntactic structures. The Copying Mechanism can effectively cope with the phenomenon by enhancing the preservation capability of the Transformer [5,6]. Following the promising results of the English GEC task, we apply the Transformer with Copying Mechanism into the Korean GEC task by introducing novel and effective noising methods for building Korean GEC datasets. In the case of the current Korean language, since there is no officially released GEC parallel corpus data, only the data generated by the noising methods were trained and tested for the model. Our contributions are summarized as follows: • We introduce a novel approach to create Korean GEC datasets by implanting various realistic grammatical errors appearing in Korean texts into original correct sentences and thus capable of creating Korean parallel corpora for GEC in an effective manner.

•
We implemented a Transformer-based Korean GEC engine equipped with the Copying Mechanism and a realistic grammatical error detection and correction rule set for many errors that cannot be handled by the main model.

•
We showed that the proposed system drastically outperforms two commercial GEC engines in various aspects.

•
We analyze the results by comparing the performance with other NMT-based models.

Related Work
Recently, many studies have been conducted on grammatical error correction models based on neural machine translation [1]. The early stages of the research on the NMTbased GEC mainly focused on LSTM-based encoder/decoder [7]. The introduction of the attention mechanisms into the sequence-to-sequence models [8] improves the performance of the GEC [9].
With Transformer [4] actively exploited in many NLP areas, the recent NMT-based GEC approaches are now adapting the Transformer instead of the traditional RNN-based encoder-decoder models and enjoying their competitive and promising performance compared to the conventional architectures [10,11]. The Copying Mechanism introduced for the machine translation for preserving unknown and special words appeared in source sentences [5] was applied to the GEC models and showed the improved performance in ACL BEA 2019 [12].
The current studies of the NMT-based GEC for Korean language are severely suffering from the lack of the necessary parallel corpora, which makes it very difficult to develop and improve their systems unlike the English GEC. Recently, grammatical noise implantation methods are facilitating the automatic construction of the parallel corpora for the Korean GEC while there is no systematic and effective approach to the noising models specialized for Korean language. Several recent initial attempts are now trying to build the parallel corpora and utilize the Transformer for Korean GEC [3,13]. In the case of China, which is an East Asian cultural region such as Korea, there is a lack of corpus to be used for GEC learning, like Korean. So, in the case of Zhao and Wang [14], the method of giving noise was overcome by applying the dynamic masking technique.

Methods
We introduce four noising methods for automatically generating a training dataset and a Korean GEC model based on Transformer with Copying Mechanism as shown in Figure 2. The training dataset is created by four noise generation methods consisting of Grapheme to phoneme, Heuristic-based, Word spacing, and Heterograph noising rules. Grapheme to Phoneme Noising Rules automatically generate Korean spell errors applying the Korean pronunciation rules. Heuristic-based Noising Rules automatically generate grammatical errors that Koreans are often mistaken. Word spacing noising rules generate spacing errors through the ChatSpace model, and Heterograph noise rules generate grammatical error by converting a word into another form of a word that is similarly pronounced. We train the models of Sequence to Sequence (Seq2Seq) [15], Seq2Seq with Attention [8], Transformer [4], and Transformer with Copying Mechanism model [5] with the data set to compare and analyze the performance. Seq2Seq is an end-to-end model based on a recurrent neural network (RNN) and is composed of an encoder-decoder structure. The encoder transforms the input sequence into a context vector using an RNN-based model, and the decoder converts this context vector into an output sequence. However, the RNN-based Seq2Seq model has a disadvantage in that some information is lost in the process of converting an input sequence into a vector. The Seq2Seq with Attention model tried to solve the problem of the Seq2Seq model by using attention, but it was not completely solved. Unlike the Seq2Seq model, Transformer is a machine translation model that use only self-attention without using RNN-based model and widely used in GEC task. Transformer with Copying Mechanism is a model that improves performance by adding a Copying Mechanism to the Transformer model to enable training for the generation mechanism and the copy mechanism for words in the input sentence, respectively. Our main model is Transformer with Copying Mechanism model, so we focus on that model. The complicated pronunciation rules for Korean language lead to the radical and clear difference between its written texts and their pronunciations. The phenomenon causes various lexical errors when writing Korean sentences. One of the pronunciation rules causing errors is "linking sound rule." The linking sound rule is a phonological phenomenon in which the ending sound of the preceding syllable becomes the first sound of the latter syllable when a syllable that ends with a consonant is followed by a formal morpheme that begins with a vowel [16]. Normally, many people make a mistake by confusing the right words and sentences with their pronunciation, especially produced by the linking sound rule as shown in Table 1. "ᄋ ᅩᄅ ᅢ ᆫᄆ ᅡ ᆫᄋ ᅦ" is a Korean adverb that means "A long time," but "ᄋ ᅩᄅ ᅢ ᆫᄆ ᅡᄂ ᅦ" is a non-existent word and has no meaning. The noise rules were constructed by using Grapheme to Phoneme module for Korean (G2PK) [17] that can automatically generate Korean spell errors applying the above pronunciation rules. Table 2 shows a Korean sentence generated by the G2PK, in which the correct word, "ᄇ ᅡ ᆸᄋ ᅳ ᆯ" is pronounced as "bab-eul" and the incorrect (noised) word, "ᄇ ᅡᄇ ᅳ ᆯ" is sounded as "babeul" artificially generated by the G2PK. Words marked in blue in Table 2 are non-existent words and are the same in all tables.

Heuristic-Based Noising Rules
Korean language is morphologically agglutinative, and a word is composed of its component morphemes. Moreover, a single syllable typically consists of an initial, medial, and final consonant, which complicates the entire language system even more. These complications cause many people using Korean as their mother tongue to make various mistakes in writing texts. Korea has a history of Japanese colonization, so there are some cases where some Koreans use Japanese grammar and words without knowing whether they are grammatical errors. In addition, in Korean, there are many borrowed words written in Korean using the English pronunciation as it is. An example of English is a tsunami from Japanese. Furthermore, like other languages, Korean is also changing continuously in that newly coined words are created, and its grammatical system is also modified reflecting the current linguistic environment.
In this paper, to reflect this situation of Korean, grammar and spelling error rules that Koreans often miss are constructed. Examples of grammatical errors that Koreans easily commit were collected through Internet materials such as Korean language regulations published by the National Institute of the Korean Language, misuse of broadcast and newspaper companies, newspaper articles, and Wikipedia. The collected cases were categorized into about 120 rules and organized. Of the 120 cases, spelling errors were generated by constructing an error dictionary, and grammatical errors were constructed through regular expressions and Python coding to generate errors in the original sentence.

Type Sentence and Meaning
Original Sentence naneun jib-eul kkaekkeus-i cheongsohaessda. Meaning I cleaned my house cleanly.

Noised Sentence
Koreans naneun olaenman-e chinguleul mannattda. Meaning I met a friend after a long time.

Word Spacing Noising Rules
In Korean, the rules of spacing are complicated, so college students who have a higher education are often wrong [18]. In order to deal with word spacing errors, we also generate word spacing noises by using ChatSpace [19]. ChatSpace is an automatic Korean word spacing package, although its performance is not so good in practice, as underlined in Table 4.

Type Sentence and Meaning
Input Sentence Naneun geuleolsu eobsji. Meaning I cannot dothat.
We exploit the imperfect behavior of the ChatSpace. First of all, an input sentence is passed through the ChatSpace model with all spaces removed. ChatSpace should perform Sensors 2021, 21, 2658 6 of 13 the word spacing with the input and make some mistakes in the process. We consider these mistakes as the word spacing noises.

Heterograph Nosing Rules
Heterograph refers to words that have the same or similar pronunciation but have different spellings. In this paper, it is limited to syllable units, not words, and in English, the pronunciation is the same as "pe@(r)," but the different spellings are "Pair" and "Pear" as heterographs.
In order to generate a heterograph error, the syllables with the same phonetic symbol or similar phonetic symbol as shown in Table 5, were classified as neutral and final with reference to Roman pronunciation notation. In the case of neutral, the syllables with [a] added to the phonetic symbol were judged to have a similar phonetic symbol, and in the case of the final, the syllables with the same phonetic symbol or repeated phonetic symbols were judged to have similar phonetic symbols. Final As can be seen in Table 6, a grammatical errors is generated by replacing "ㄱ[k]" and "ㅆ[tt]" at each final position in "ᄆ ᅥ ᆨ-[meok-]" and "-ᄋ ᅥ ᆻ-[-eott-]" with "ㄲ[kk]" and "ㅅ[t]." " We exploit the imperfect behavior of the ChatSpace. First of all, an input sentence is passed through the ChatSpace model with all spaces removed. ChatSpace should perform the word spacing with the input and make some mistakes in the process. We consider these mistakes as the word spacing noises.

Heterograph Nosing Rules
Heterograph refers to words that have the same or similar pronunciation but have different spellings. In this paper, it is limited to syllable units, not words, and in English, the pronunciation is the same as "peə(r)," but the different spellings are "Pair" and "Pear" as heterographs.
In order to generate a heterograph error, the syllables with the same phonetic symbol or similar phonetic symbol as shown in Table 5, were classified as neutral and final with reference to Roman pronunciation notation. In the case of neutral, the syllables with [a] added to the phonetic symbol were judged to have a similar phonetic symbol, and in the case of the final, the syllables with the same phonetic symbol or repeated phonetic symbols were judged to have similar phonetic symbols.

Transformer
Our system is based on the attention-based Transformer architecture in which has an encoder and decoder as atomic modules. Each encoder and decoder consist of a multi-head self-attention layer followed by a position-wise feed-forward layer, along with residual connection and layer normalization [4]. Unlike the encoder, decoder consists of a total of three sub-layers, two of which are the same as the encoder's sub-layer, and the other is a sub-layer that calculates multihead attention for the output of the encoder. Transformer input embedding is combined with a positional embedding and the token embedding in the input sequence.

Copying Mechanism
Copying Mechanism has proven to be effective for text summarization and semantic parsing [5]. Copying Mechanism is added to the end of the Transformers. The output probability distribution of the Copying Mechanism is a mixture of and . is distribution generated from the decoder. is copy distribution, which is defined as the layer of copy attention that assigns a distribution for tokens that appear in the input sentence , which plays the most important role in the Copying Mechanism, defined per each decoding step.

Transformer
Our system is based on the attention-based Transformer architecture in which has an encoder and decoder as atomic modules. Each encoder and decoder consist of a multi-head self-attention layer followed by a position-wise feed-forward layer, along with residual connection and layer normalization [4]. Unlike the encoder, decoder consists of a total of three sub-layers, two of which are the same as the encoder's sub-layer, and the other is a sub-layer that calculates multi-head attention for the output of the encoder. Transformer input embedding is combined with a positional embedding and the token embedding in the input sequence.

Copying Mechanism
Copying Mechanism has proven to be effective for text summarization and semantic parsing [5]. Copying Mechanism is added to the end of the Transformers. The output probability distribution of the Copying Mechanism is a mixture of p gen and p copy . p gen is distribution generated from the decoder. p copy is copy distribution, which is defined as the layer of copy attention that assigns a distribution for tokens that appear in the input is a balance factor that decides whether to reflect the distribution of the input sentence or the distribution generated by the Transformer. It is calculated through the copy scores A T t , which is the output of the copy attention, and the value V of the copy attentions hidden state.
As shown in the formula above, if the α copy t value is greater than 0.5 it reflects copy distribution more in the final distribution value, and if it is less than 0.5, it reflects generation distribution. The finally computed distribution determines the word with a high probability as the word in the output sentence [5]. The final architecture of our GEC model is shown in Figure 3.

Experiments and Discussion
In this paper, a Korean GEC experiment was conducted by comparing the performance of two commercial GEC engines and NMT-based GEC models. Commercial GEC engines are Py-Hanspell (Naver API) [20] and Hanspell (Kakao API) [21] provided by portal sites most used in Korea, respectively, and are currently available for free as a beta service. The performance was measured through Precision, Recall, F0.5-score, BLEU [22], and GLEU [23].

Data
By applying the previously mentioned noising rules, we constructed a parallel dataset for the Korean GEC by using AI-Hub Korean-English parallel corpus [24] released by NIA. The dataset includes 1.1 million Korean-English literary-style sentence pairs and 500 K colloquial sentence pairs. Table 7 shows the detailed information of the dataset. The dataset includes 1,600,000 sentences from various domains such as news articles, web pages, formal documents, and even daily conversations, which reflects broad linguistic aspects. We applied the grammatical noise implantation rules into the dataset and generated a large set of sentence pairs for the Korean GEC. For the experiments, we generated 6,409,672 sentence pairs of noise implanted sentences and original ones. Each noise method was applied to the original sentence. In addition, sentences that do not generate errors because there is no noise rule in the original sentence were also configured in the data set. The reason is that not everyone uses only the wrong sentences in the spell checker, and when the model receives the correct sentence, it has to be returned as it is. A total of 4,486,756 pairs were used for the training set and 640,956 and 1,281,960 pairs were used for the development set and test set, respectively.

Model and Parameters
Our GEC model uses a typical configuration of the Transformer with Copying Mechanism in that all the input tokens are embedded and encoded by the conventional positional encoding mechanism. As shown in Table 8, we use a 4096-dimensional position-wise feed-forward layer. In addition, both the token embedding size and hidden size are 512. For the Copying Mechanism, we apply a single layer with eight attention heads. Adam optimizer was used in the training. The batch size during training was set to 100 and the dropout ratio and label smoothing value were all set to 0.1. We trained our own tokenizer by using SentencePiece [25] where the size of the source (encoder) and target (decoder) vocabulary was set to 30,000. In this study, in order to prevent overfitting of the model, early stopping was performed when there was no improvement in the performance of the verification data for three epochs during the training process.

Evaluation Metrics
To evaluate the model's performance, GLEU (Generalized Language Evaluation) [23], BLEU (Bilingual Evaluation Understudy) [22] and F0.5 scores were used. BLEU, which is often used to evaluate machine translation models, derives performance by calculating the similarity between system prediction results and reference data. In this paper, BLEU1~BLEU4 were calculated and evaluated. BLEU can be used regardless of language and has a fast calculation speed, and higher means better performance. The GLEU metric is a variant of BLEU proposed for evaluating grammatical error corrections using n-gram overlap with a set of reference sentences, as opposed to precision/recall of specific annotated errors [23]. Like BLEU, GLEU shows better performance with higher numbers.
F0.5 is a performance evaluation that emphasizes precision rather than recall. On the GEC, task recall is calculated as the percentage of correct predictions for the positive class out of all positive predictions, indicating the proportion of the actual corrected sentences among the total grammatical error sentences. Precision refers to the proportion of sentences with grammatical errors among the corrected sentences by calculating the percentage of correct predictions for the positive class. In the case of GEC task, finding the wrong part and correcting the wrong part are both important, but using F0.5 means, more importantly, whether the wrong part is corrected properly. Table 9 shows the comparative results of the proposed system and the other models by using both BLEU and GLEU scores. The bold text in the Table 9 indicates the best performance in the experiment. As can be seen from Table 9, the model presented in this study outperforms other NMTbased models and two commercial grammatical error correctors. In particular, NMT-based models are ahead of the performance of the two commercial grammar services with GLEU and BLEU scores. In addition, in every part of score, our model outperforms than Seq2Seq, Seq2Seq with Attention, and Transformer models. Table 10 shows the detailed evaluation results of the systems denoting precision, recall and F0.5 scores by using our test data mentioned earlier, and the bold text is the best performance model in the experiment. Our grammatical noise implantation method mainly reflects typically and frequently committed grammatical errors that all the conventional grammar checking, and correcting systems should handle effectively. Therefore, the comparison using the test set seems to be fair and objective. Table 10 shows similar results to the BLEU and GLEU performance evaluation. The model presented in this study outperforms other NMT-based models and two commercial grammar error correctors and shows a large difference in performance when compared to a commercial grammar service. In addition, the model using the Copying Mechanism shows higher performance in Precision, Recall, and F0.5 than that of Transformer. Through the results, it was confirmed that the Korean grammar correction performance of the Transformer with Copying Mechanism model applied was the highest. The Seq2Seq model using the existing Bi-LSTM showed lower performance than the models capable of parallel processing. This is because the Seq2Seq model using Bi-LSTM tends to forget the data at the beginning of the input, and the performance decreases as the length of the sentence increases. In the case of the model to which the attention was applied, the above problem was partially solved, but when looking at the results of this experiment, the above model was still not completely overcome. Unlike the Seq2Seq model, the Transformer solved the above disadvantages by using self-attention rather than using the RNN series model, and it can be seen that the results of the experiment show high performance in correcting Korean grammar errors. However, since Transformer approaches the problem from the point of view of generating the entire sentence, it has the disadvantage of copying the word as it is. In the case of the Transformer model with Copying Mechanism applied, the performance was higher than that of the Transformer model because the generating part and the copying part can be trained separately. Table 11 denotes the outputs of the three systems used in the experiment with an input sentence with various grammatical errors including a pronunciation-related error, contextual error, and word spacing error. In the sentence, the pronunciation-related error is denoted in italic, the contextual error is indicated by boldface, the word spacing error is marked by underscore, and non-existent words are marked in blue. While Py-Hanspell (Naver API) could detect and correct the second word spacing error, it fails to handle all the others. In particular, Py-Hanspell (Naver API) incorrectly revised the first word spacing error suggesting an overly spaced token. Besides this, Hanspell (Kakao API) fails to handle all the errors in the sentence. On the contrary, our system successfully detects, and correct all the errors in the sentence. In particular, our system could detect and correct the contextual error by revising the word "ᄀ ᅡᄅ ᅵᄏ ᅧ ᆻᄃ ᅡ (pointed to)" which is lexically correct but inappropriate semantically to "ᄀ ᅡᄅ ᅳᄎ ᅧ ᆻᄃ ᅡ (taught)." Table 11. Error Correction Results by Four Systems.

Meaning
In class, my teacher taught math. Table 12 shows that the Transformer with Copying Mechanism model corrects the grammatical errors constructed in this paper. The words highlighted in Table 12 are the same as in Table 11.
The first example is the result of correcting grammatical errors generated by the G2PK noise method. The G2PK noise is the phonological phenomenon grammatical error. In the case of example, grammatical error sentences are created by changing "ᄃ ᅳ ᆯᄋ ᅵ [deul-i]" to "ᄃ ᅳᄅ ᅵ [deuli]." When "ᄃ ᅳ ᆯᄋ ᅵ" is pronounced in Korean, it is expressed as "ᄃ ᅳᄅ ᅵ" due to the phonological phenomenon. Our model corrected grammatical errors generated by the G2PK noise method and corrected spacing errors. Then, the Chinese have to deulitupy, which is a bit difficult.

Predict
Korean geuleomyeon jung-gug salamdeul-i tupyoleul haejwoya haneunde daso eolyeobda. Meaning Then, the Chinese people have to vote, which is a bit difficult.

Correct
Korean geuleomyeon jung-gug salamdeul-i tupyoleul haejwoya haneunde daso eolyeobda. Meaning Then, the Chinese people have to vote, which is a bit difficult.

Meaning
Always wash Noah thoroughly with chanmullokwi and dry thoroughly before cooking.

Meaning
Always wash quinoa thoroughly with cold water and dry thoroughly before cooking.

Meaning
Always wash quinoa thoroughly with cold water and dry thoroughly before cooking.
The machine translation task creates a sentence in a different language than the input sentence. In contrast, grammatical error correction corrects only some words with grammatical errors, and most of the other words are output the same as the input. Therefore, applying a machine translation model to a grammar correction task can replace words without errors with new ones. Because of this problem, applying a Copying Mechanism that can copy words without errors is more suitable for grammatical error correction. This can be seen in Table 13. In the input sentence, the grammatically correct input word "경기도 [Gyeonggi-do]" (One of the provinces in Korea and the provinces surrounding Seoul) was not generated in the Transformer model. However, Transformer with Copying Mechanism model creates the same as the input statement. In other words, it can be seen that the Transformer model applying the Copying Mechanism is more suitable for grammatical error correction. Meaning A special credit card is released to make Gyeonggido a green city surrounded by trees and forests.
The third example is an example of correcting grammatical errors created by the Heterographs-based noise method. The Heterographs-based noise methods provide errors by converting a word into another form of a word that is similarly pronounced. For example, a grammatical error was created by changing "ᄋ ᅨᄉ ᅡ ᆼᄒ ᅡᄃ ᅡ [yesanghada]" and "ᄆ ᅩ ᆺ ᄒ ᅢ ᆻᄃ ᅡ [mothaetda]" to "ᄋ ᅤᄉ ᅡ ᆼᄒ ᅡᄃ ᅡ [yaesanghada]" and " Sensors 2021, 21, x FOR PEER REVIEW corrected grammatical errors generated by the G2PK spacing errors.
The second example is the result of correcting g by the heuristic-based noise method. The heuristic-b based on rules by investigating grammatical errors t case of the example, grammatical errors were gene and "완전히" to "깨끗히" and "완전이." "깨끗이 meaning "clearly," and "완전히 [wanjeonhi]" is pletely." Some Koreans write these two words as "완전이 [wanjeon-i]." These two words are not in th rected two heuristic-based grammatical errors that a the spacing error was also fixed.
The third example is an example of correcting g the Heterographs-based noise method. The Hetero provide errors by converting a word into another fo pronounced. For example, a grammatical error "예상하다 [yesanghada]" and "못했다 [mothaetd hada]" and "뫃햏다 [mothaetda]." "예상하다" is a "못했다" is a past auxiliary verb of "couldn't." O words in the correct format, and, like other example also corrected at the same time. ." "ᄋ ᅨᄉ ᅡ ᆼᄒ ᅡᄃ ᅡ" is a verb meaning "predict," and "ᄆ ᅩ ᆺᄒ ᅢ ᆻᄃ ᅡ" is a past auxiliary verb of "couldn't." Our model corrected the two words in the correct format, and, like other examples, the spacing correction was also corrected at the same time.
The machine translation task creates a sentence in a different language than the input sentence. In contrast, grammatical error correction corrects only some words with grammatical errors, and most of the other words are output the same as the input. Therefore, applying a machine translation model to a grammar correction task can replace words without errors with new ones. Because of this problem, applying a Copying Mechanism that can copy words without errors is more suitable for grammatical error correction. This can be seen in Table 13. In the input sentence, the grammatically correct input word "ᄀ ᅧ ᆼᄀ ᅵᄃ ᅩ [Gyeonggi-do]" (One of the provinces in Korea and the provinces surrounding Seoul) was not generated in the Transformer model. However, Transformer with Copying Mechanism model creates the same as the input statement. In other words, it can be seen that the Transformer model applying the Copying Mechanism is more suitable for grammatical error correction. Meaning A special credit card is released to make Gyeonggido a green city surrounded by trees and forests.