Show Me All Writing Errors: A Two-Phased Grammatical Error Corrector for Romanian

Tudose, Mihai-Cristian; Ruseti, Stefan; Dascalu, Mihai

doi:10.3390/info16030242

Open AccessArticle

Show Me All Writing Errors: A Two-Phased Grammatical Error Corrector for Romanian

by

Mihai-Cristian Tudose

¹

,

Stefan Ruseti

¹

and

Mihai Dascalu

^1,2,*

¹

Faculty of Automatic Control and Computers, National University of Science and Technology POLITEHNICA Bucharest, 313 Splaiul Independentei, 060042 Bucharest, Romania

²

Academy of Romanian Scientists, Str. Ilfov, Nr. 3, 050044 Bucharest, Romania

^*

Author to whom correspondence should be addressed.

Information 2025, 16(3), 242; https://doi.org/10.3390/info16030242

Submission received: 28 January 2025 / Revised: 5 March 2025 / Accepted: 13 March 2025 / Published: 18 March 2025

(This article belongs to the Collection Natural Language Processing and Applications: Challenges and Perspectives)

Download

Browse Figure

Versions Notes

Abstract

Nowadays, grammatical error correction (GEC) has a significant role in writing since even native speakers often face challenges with proficient writing. This research is focused on developing a methodology to correct grammatical errors in the Romanian language, a less-resourced language for which there are currently no up-to-date GEC solutions. Our main contributions include an open-source synthetic dataset of 345,403 Romanian sentences, a manually curated dataset of 3054 social media comments, a two-phased GEC approach, and a comparison with several Romanian models, including RoMistral and RoLama3, but also LanguageTool, GPT-4o mini, and GPT-4o. We consider a synthetic dataset to finetune our models, while we rely on two real-life datasets with genuine human mistakes (i.e., CNA and RoComments) to evaluate performance. Building an artificial dataset was necessary because of the scarcity of real-life mistake datasets, whereas introducing RoComments, a new genuine dataset, is argued by the necessity to cover errors amongst native speakers encountered in social media comments. We also introduce a two-phased approach, where we first identify the location of erroneous tokens in the sentence; next, the erroneous tokens are replaced by an encoder–decoder model. Our approach achieved an

F_{0.5}

of 0.57 on CNA and 0.64 on RoComments, surpassing by a considerable margin LanguageTool as well as an end-to-end version based on Flan-T5 and mT0 in most setups. While our two-phased method did not outperform GPT-4o, arguably by its smaller size and language exposure, it obtained on-par results with GPT-4o mini and achieved higher performance than all Romanian LLMs.

Keywords:

grammatical error correction; two-phased approach with detection and correction; end-to-end correction; Romanian resources

1. Introduction

With the increasing reliance on written communication in both professional and everyday contexts, the ability to clearly and accurately express ideas is increasingly valued, making error correction essential for improving overall language competence. As such, grammatical error correction (GEC) is an essential task that can help people who may not be familiar with rules specific to a language or have learning language difficulties that require improving their communication skills. As such, it is an important educational resource for the general population to assist individuals in improving their writing skills.

In addition to the obvious correction support, GEC has many usage scenarios. Jesús-Ortiz and Calvo-Ferrer [1] presented how native language (L1) patterns can make second-language (L2) acquisition more cumbersome, especially in areas where the two languages differ structurally. For instance, when considering L1 (Spanish) and L2 (English) learners, errors arise since Spanish speakers are used to a system where grammatical gender is tied to the noun (possessum) rather than the possessor; in contrast, possessive determiners depend on the possessor’s gender in English when the possessor is a singular third person.

GEC tasks cannot be solved efficiently using transfer learning due to specific language rules. It is challenging to design a set of applicable rules that covers common language aspects from multiple languages. Rei and Yannakoudakis [2] explored improving error detection within a sentence by incorporating various features such as frequency, error type (e.g., incorrect verb form), the first language of the learner, part-of-speech (POS), and grammatical relations. Their experiments on CoNLL-14 [3] and FCE [4] showed that word frequency and first language identification do not significantly improve error detection performance. In contrast, including error types, POS tags, and grammatical relations leads to consistent improvements.

We now focus on the language targeted by this study, namely Romanian, which presents challenges in terms of agreement issues and diacritics. However, Romanian is a limited-resource language compared to widely spoken languages such as English. There is no advanced GEC solution for Romanian, except for limited grammatical corrections in Microsoft Word or LanguageTools [5], a popular open-source grammar and style checker with Romanian support. As such, developing an autonomous tool capable of correcting user inputs in Romanian is highly beneficial.

Our strategy for GEC is divided into two phases. In the first phase of grammatical error detection (GED), we focus on identifying incorrect tokens within a sentence. In the second phase centered on GEC, we rely on an encoder–decoder model to correct the highlighted errors by replacing the erroneous tokens with appropriate suggestions. Compared to traditional end-to-end neural models, our approach provides advantages mainly in terms of explainability because it isolates the mistaken words and corrects them using the surrounding context. In addition, we ensure modularity by separating the two phases mimicking human behavior; as such, each component can be improved separately over time.

In addition to the previous method, we developed a synthetic dataset specifically for training purposes. For testing purposes, we consider two smaller datasets, namely CNA [6], an updated and curated version, and RoComments, a newly introduced dataset with erroneous human comments from social media, both containing real-life mistakes.

The main contributions of this paper imply the following:

Introducing an open-source synthetic dataset of 345,403 sentences (https://huggingface.co/datasets/upb-nlp/gec-ro-texts, accessed on 24 January 2025) generated from scratch that covers a wide range of Romanian errors (frequent idiom structures, orthographic, spelling, disagreement, and incorrect noun forms);
Presenting a manually curated dataset of 3054 comments from social platforms that contain genuine mistakes (https://huggingface.co/datasets/upb-nlp/gec-ro-comments, accessed on 24 January 2025);
Proposing a two-phased approach with detection and correction as an alternative to end-to-end corrections using encoder–decoder models;
Introducing a comprehensive baseline, including LanguageTool, RoMistral, RoLama3, GPT-4o mini, and GPT-4o, and releasing our best models as open-source on Huggingface, both for the two-phased approach (the detector, https://huggingface.co/upb-nlp/RoGEC-robert-large, accessed on 24 January 2025, the corrector, https://huggingface.co/upb-nlp/RoGEC-decoder-mt0-xl, accessed on 24 January 2025), as well as for the end-to-end approach (https://huggingface.co/upb-nlp/RoGEC-mt0-xl, accessed on 24 January 2025).

2. State-of-the-Art

This section introduces four subsections. The first is focused on similar existing datasets and reports of corresponding performance metrics. Next, we present methods similar to our two-phased approach, followed by studies on other low-resource languages and then a study specifically targeting Romanian GEC.

2.1. Existing Datasets

Several datasets for grammatical error correction have been introduced throughout the years, especially for English. This necessity arose from the wide diversity of encountered errors. The most commonly used datasets are CoNLL-14 [3], First Certificate in English (FCE), NUS Corpus of Learner English (NUCLE) [7], Lang-8 [8], and The Hanyu Shuiping Kaoshi (HSK) [9].

CoNLL-14 [3] is a shared task where the goal is to identify and correct the mistakes made by English learners in their essays. It incorporates 28 types of errors, namely missing verbs, subject-verb agreements, and acronyms. According to Omelianchuk et al. [10], the highest performance obtained for this task was by assembling the top seven models used in their experiments (Chat-LLaMa-2-13B-FT, UL2-20B, Chat-LLaMa-2-7B-FT, EditScorer, T5-11B, CTC-Copy, and GECToR-2024) combined with the GRECO (grammaticality scorer for re-ranking corrections) and GTP-rerank, with an

F_{0.5}

score of 72.8%, precision of 83.9%, and recall of 47.5%.

FCE [4] is a corpus extracted from the Cambridge Learner Corpus that consists of samples of people who completed the Cambridge ESOL First Certificate between 2000 and 2001. Yuan and Briscoe [11] used a two-staged approach that involves a bidirectional RNN as an encoder and an attention-based decoder that accomplished an

F_{0.5}

score of 53.49%.

Lang-8 [8] is a Japanese website where people ask different questions that native speakers should answer. It offers assistance in many languages, including Japanese, English, and Mandarin. The dataset extracted from the website comprises user sentences and their corrected answers, if there are any. Rothe et al. [12] present various GEC models for German and Russian, used for inference on cLang8, a cleaned version of the Lang8. The state-of-the-art on cLang-8 is represented by a gt5-XXl model (a combination of mT5 [13] and a multilingual version of T5 [14] that sums up to 13B parameters) and achieved

F_{0.5}

scores of 75.96% for German and 51.62% for Russian.

HSK [9] is a manually annotated corpus based on the Chinese proficiency test (HSK), designed for foreigners who are keen on becoming experts in Chinese as their second writing language. After analyzing a sample of 100 sentences from the HSK corpus, the authors found spelling, word selection, redundant words, missing words, word order, redundant and missing constituents, and other sentence-level errors. These findings present how difficult it can be to master Chinese at a proficiency level. Lin et al. [15] proposed multiple methods for CGEC (Chinese grammatical error correction) that use two datasets in their experiments: NLPCC and HSK. They conducted experiments with training primarily on NLPCC and with and without fine-tuning on HSK across multiple methods. Their results share that the Transformer architecture models generally have higher

F_{0.5}

scores than other architectures. Incorporating HSK generally improves the

F_{0.5}

scores across all model architectures. For example, the transformer model without fine-tuning on HSK achieved an

F_{0.5}

score of 24.42%, whereas, with fine-tuning, it reached an

F_{0.5}

score of 28.25%.

2.2. Similar Two-Phased Approaches for GEC

Chen et al. [16] outline a two-phased method for GEC, consisting of erroneous span detection (ESD) and erroneous span correction (ESC). ESD relies on a binary sequence tagging model that identifies incorrect sentence spans. If a sentence contains no errors, then no action is taken; otherwise, the erroneous spans are annotated and corrected using the seq2seq model (ESC). This method achieved comparable results to traditional seq2seq models in both English and Chinese GEC benchmarks while reducing inference time by over 50%. Additionally, the models implied were fine-tuned on synthetic data, which improved performance. Although conceptually similar to our approach, their model considered at that time a BERT encoder for detection and a seq-to-seq transformer model trained from scratch on their English and Chinese data.

Qiu and Qu [17] presented a Chinese grammatical error correction (GEC) method that consisted of a two-phased model. Their model combined a spelling check model and a seq2seq GEC model. The spelling check model focused on correcting non-word spelling errors specific to Chinese. The authors also used an iterative checking approach to help improve the corrections. Their model outperformed previous systems on the NLPCC 2018 test set with an

F_{0.5}

score of 31.01%.

2.3. GEC for Low-Resourced Languages

In the context of grammatical error correction (GEC), addressing low-resource languages is crucial to ensure the accessibility and effectiveness of linguistic tools on a global scale. Many of these languages, such as Romanian, encounter challenges, including non-standard orthography, a lack of annotated corpora, and variations in dialects [18]. Research has been conducted on various low-resource languages, including Zarma [18], Bengali [19], Ukrainian [20], and Indonesian [21], each employing different methodologies and evaluation metrics beyond the traditional

F_{0.5}

score.

Keita et al. [18] studied Zarma, a language spoken by over five million people in West Africa. They tested rule-based methods, LLMs, and MT models. Their best model, M2M100 MT, achieved 95.82% detection and 78.90% suggestion accuracy, with native speakers rating corrections 3.0/5.0. Maity et al. [19] introduced a Bengali grammatical error explanation (GEE) system that corrects errors and provides explanations. The system generates a corrected sentence, categorizes errors, and explains corrections. Various LLMs were compared to human experts. GPT-4 Turbo outperformed other models but struggled with nuanced errors like word order and spelling, especially in short sentences with multiple errors. Human experts consistently performed better, highlighting the need for human oversight in Bengali GEC.

According to Lytvyn et al. [20], the Ukrainian language’s morphological complexity and the small size of its dataset (around one thousand texts) require the combination of different grammatical error correction (GEC) methods. Their study used pre-trained models like RoBERTa and MT5, and the authors reported the performance of the models using BLUE and METEOR scores. In BLUE, Ukrainian Roberta encoder–decoder scored 69.7%, mT5 achieved 90.8%, and M2M100 reached 84.7%. In terms of METEOR, Ukrainian Roberta scored 87.6%, mT5 95.6%, and M2M100 92.5%, demonstrating strong performance across the models.

Musyafa et al. [21] presented an automatic Indonesian grammatical error correction (IGEC) model based on transformers, incorporating a copy mechanism to handle unknown words. The study evaluated several GEC models, e.g., BRNN (bidirectional recurrent neural network), CNN (convolutional neural network), and SAN (self-attention network), using precision, recall, F1 score, and BLEU scores. The SAN-GEC model outperformed the other models with precision of 65.53, recall of 47.81, an F1 score of 55.28, and a BLEU score of 59.91.

2.4. Previous Studies for Romanian

We now focus on the language targeted by this study, namely Romanian, which at first impression may be perceived as problematic with specific characteristics since it considers diacritics (e.g., ă, â, î, ș, and ț) whose omission can alter the meaning of the sentence. Moreover, agreement problems (subject–verb, noun–adjective, and noun–pronoun) are challenging due to the Romanian morphological complexity, which, like other Romance languages, has rich inflections (i.e., gender, number, and case).

To our knowledge, only one previous GEC study centered on Romanian exists [6]. The authors introduced an adapted version of the ERRANT framework [22] for Romanian and a real-life dataset of mistakes, CNA, consisting of 10K pairs of incorrect and corrected sentences. Their most effective model considered pretraining a large Transformer model on artificially generated data derived from Wikipedia edits and achieved an

F_{0.5}

score of 53.76%.

3. Datasets

In this section, we describe the datasets used in our experiments, including one newly introduced artificial dataset (RoTexts), an updated and curated version of CNA [6], and a new dataset of corrected comments (RoComments). Table 1 presents different statistics about each dataset. In addition, Table 2 includes the error distribution across corpora according to ERRANT adapted for Romanian. Specific insights are detailed in each subsection targeting a specific dataset.

3.1. Synthetic Dataset

RoTexts is a synthetic dataset created from a combination of multiple publicly available Romanian texts [23] from various domains. It reproduces frequent Romanian mistakes, such as the erroneous form of nouns and missing diacritics. The sentences were specifically chosen to include diacritics, which are essential and distinctive to the Romanian language, from the following categories present in a curated source (https://github.com/aleris/ReadME-RoTex-Corpus-Builder?tab=readme-ov-file#sources, accessed on 24 January 2025): biblior, biblioteca-digitala-ase, carti-bune-gratis, historica-cluj, litera-net, and ziarul-lumina.

Our assumption was that all original sentences were correct. However, in some cases, we found a small subset of sentences that were mistaken, and in those cases, we manually corrected them (e.g., adding the missing comma before “dar”). A randomization approach was then employed to alter all 345,403 correct sentences. Probabilities to each predefined error category were assigned, i.e., diacritics (0.1), spelling (0.05), agreement (0.25), noun declination (0.25), orthographic errors (0.1), punctuation (0.05), and no modification (0.2). We then randomly selected the modification type based on these probabilities and applied a corresponding custom rule for each error.

First, we analyzed common patterns that Romanian native speakers often confuse. As such, we included errors related to spelling, noun form (e.g., adding or removing a suffix depending on noun gender and number), missing diacritics, and disagreement between auxiliary terms and verbs. Apart from these, we targeted specific words and replaced them with their most frequent incorrect form.

Second, we identified a subset of specific words and phrases commonly mistaken by people using the work of Bucur [24]. For instance, many people often use “vroiam” instead of “voiam”. Moreover, when forming the negative variant of a verb in the imperative, people often write a contracted form of the verb after the adverb “nu” instead of the verb’s infinitive form [25]. Furthermore, to create disagreement errors, we searched for auxiliaries that preceded the verb and changed their form based on a dictionary that contains auxiliary terms as keys (e.g., “s-a făcut” changes into “s-au făcut”, where “a”, and “au” are the auxiliaries).

Third, another frequent mistake consists of modifying the genitive mark article (e.g., “al” before a noun). According to Mititelu et al. [26], “al” belongs to the determiner (DET) part of speech. Using the spaCy (version 3.7.2) property head, we examined the number of the words it agrees with. If it is singular, we randomly replace it with one of the possible forms from both the singular and plural, regardless of gender (in this case, “ai”, “a”, “ale”). If the plural form occurs, we again randomly replace it with forms from both singular and plural, regardless of gender (for instance, “al” or “a”). As per Table 2, we observe that the spelling (SPELL), incorrect form of nouns (NOUN:FORM), and OTHER are the most common types of errors occurring in RoTexts.

3.2. CNA

CNA [6] is a Romanian dataset comprising real-world pairs of incorrect and correct sentences. This corpus was gathered from radio and TV broadcasts. It contains 2700 pairs of sentences that do not incorporate a verb out of 10,092 pairs overall. It is manually divided across three partitions: train (7063), dev (1514), and test (1515). We first reviewed the dataset to ensure its accuracy and discovered unfinished sentences, missing diacritics, and incorrect words kept in both source and target sentences; all previous errors were manually corrected. The updated version is available at https://huggingface.co/datasets/upb-nlp/gec_ro_cna (accessed on 24 January 2025). According to Table 2, the most common error types we encounter in CNA are related to spelling (SPELL), other (OTHER), and punctuation (PUNCT). The diacritics errors are included under spelling. Mainly, everything is categorized as spelling if more than half of the characters of the compared words are overlapping.

3.3. RoComments

We introduce RoComments, a new dataset comprising 3054 pairs of incorrect and correct sentences. RoComments focuses on real human language and covers informal language with typos met in online communication. RoComments is sourced from popular comments found on Romanian trend videos, Romanian posts on Reddit, and a subset of Romanian-known sentences that people often confuse. The comments from the platforms were manually corrected, whereas the sentences from the media websites already contained the correction. Table 1 shows that this dataset has a greater spread of erroneous tokens (21%) compared to RoTexts (7%) and CNA (14%); this was expected since we cannot control the percentage of real mistaken words within a sentence. The dataset was annotated by the main author of this study. The user’s comments were extracted without collecting any information about the user.

The primary source of YouTube comments (around 89%) came from music videos existing at the time of selection in the Romanian trending category, together with a small subset of Romanian YouTubers with an important number of subscribers. In addition to this, Reddit posts were a second source of selection. We explored various topics posted on famous Romanian Reddit pages such as “CasualRO” and “Romania”. Also, we included a small subset of frequently mistaken sentences in various situations suggested by different websites classified as other: https://adevarul.ro/showbiz/tv/greseli-gramaticale-la-tv-de-la-esece-1596951.html (accessed on 24 January 2025), https://www.viata-libera.ro/campanii-vlg/128645-televiziunile-cu-cele-mai-frecvente-greseli-de-vorbire (accessed on 24 January 2025), and https://brainly.ro/tema/3728371 (accessed on 24 January 2025). The sources are distributed as follows: YouTube (88.64%), Reddit (10.54%), and others (0.82%).

The Python library YouTube-comment-downloader (https://pypi.org/project/youtube-comment-downloader/ accessed on 24 January 2025) was used to extract comments from videos in JSON format. Before correcting the comments, all emojis were removed, and we selected only the sentences whose length varied between 10 to 25 tokens and contained the majority of words in Romanian. Additionally, the white spaces preceding the punctuation marks at the end of the sentence were removed.

A correction methodology was developed to properly perform the manual corrections. It included clear rules to ensure the quality of the sentences. First, sentences containing non-Romanian words were excluded. If the comments were grammatically correct, they are included, with diacritics added where necessary. Regionalisms specific to certain parts of the country were replaced with their modern equivalents. Abbreviations, even if commonly used, were not accepted. The following instructions were used to create the curated dataset:

The sentences that contain a mix of Romanian and English words are not taken into consideration;
If a word is found in the Romanian National Dictionary (https://dexonline.ro/) (accessed on 16 May 2024), but it is not the best suit in a specific context, it remains unmodified;
Regional words are replaced with their modern variants (e.g., “îs” becomes “sunt”; “io” becomes “eu”);
Pronominal adjectives are also accepted in their restricted version (e.g., “ăsta” remains “ăsta”, with no need to modify into its expansion form, acesta);
Diacritics are added where required;
Words like “pan’ ” or “tre’ ” will remain unchanged, but diacritics will be added (see the fifth rule). For instance, if the word “pan’ ” is found in this form, it will change to “pân’ ”;
If you encounter situations where words are not contracted but are separated by a space, they should be connected. For example, the form “IA I” should be turned into “IA-I”;
Pay attention to the construction of “să fi” versus “să fii”;
Be careful with the word “alții”. As per Destepti.ro [27], use one “i” when it precedes a noun that agrees with it. Use two “i’s” when it appears alone, not near a noun;
Abbreviations of words are not accepted, such as “pt” for “pentru”, except for address formulas (e.g., dvs);
If words are not articulated, they should be articulated (apostrophes are not added). For example, “omu” becomes “omul”, not “omu’ ”;
An excess of letters is not accepted, such as “muuuult” for “mult” or “buuuun” for “bun”;
All words must follow the Romanian National Dictionary. No words from verbal use are accepted. For example, “clip” becomes “videoclip”.

4. Method

In this section, we outline our proposed two-phased method. The main reason for our approach is to maintain control over the sentence structure. Instead of correcting the entire sentence, our focus is directed specifically toward the erroneous part. The architecture is presented in Figure 1.

4.1. Detection Phase

This is the first phase, targeting the identification of errors in a given sentence. At this point, we fine-tuned a BERT encoder (https://huggingface.co/upb-nlp/RoGEC-robert-large, accessed on 24 January 2025) to detect the erroneous tokens within the sentence. We tailored this model to our specific task of error detection by fine-tuning it on a dataset of sentences that contained both correct and erroneous tokens. We considered the edit distance at the token level to extract the minimum number of edits between the initial sentence and its corrected version to prepare the data in a binary format (i.e., marking the presence of an error). When the modification required inserting new tokens, we annotated the tokens surrounding the insertion point as errors. This helped the model detect the presence of inserted tokens as part of the error detection process.

To convert the raw outputs from the RoBERT-Large encoder into meaningful probabilities, we applied a softmax function to the logits produced by the model. We further extracted the probability representing the erroneous class. We experimented with threshold values ranging from 0.3 to 0.7 in increments of 0.1 to choose empirically the most effective decision boundary. The threshold that yielded the highest

F_{0.5}

score on the two-phased method was 0.5, with an

F_{0.5}

score of 66% on the validation set.

During the training process, we used a learning rate of

2 \times 10^{- 5}

, a weight decay of 0.005, and a batch size of 64; these hyperparameters are commonly used in fine-tuning transformer-based models for classification tasks and represent a balanced choice between memory efficiency and computational speed.

This phase represents a robust foundation for detecting erroneous tokens within sentences by combining the fine-tuning of a transformer-based encoder for binary classification at the token level, leveraging edit distance for precise error annotations, and systematically optimizing decision thresholds.

4.2. Correction Phase

This is the second phase relying on a sequence-to-sequence model for error correction (https://huggingface.co/upb-nlp/RoGEC-decoder-mt0-xl, accessed on 24 January 2025). Specifically, we fine-tuned an encoder–decoder model as a corrector while specifying inside two enclosing tags the mistaken sequence at the word level (see Figure 1). This tagging approach provides explicit error localization and enables the model to focus on refining incorrect segments while preserving the correct input portions.

We decided to use mT0-XL [28] as the corrector because it represents a multilingual seq2seq model, which performed better in our experiments compared to FLAN-T5 [29]; the increase in performance was significant, similar to the increase presented in the Results section with end-to-end corrector baselines. We settled on the XL version for computational efficiency reasons, ensuring optimal performance within our available resources. Furthermore, the mT0 model offers robust generalization across diverse linguistic structures, making it well-suited for error correction tasks, particularly in multilingual settings.

Furthermore, during inference, we used a beam width of 4 to generate more accurate corrections. Beam search mitigates issues related to greedy decoding by exploring multiple candidate outputs, thus improving the likelihood of selecting the most contextually appropriate correction. During training, we used a learning rate of

2 \times 10^{- 5}

, a weight decay of 0.005, a batch size of 16, and 4 gradient accumulation steps. A batch size of 16 was selected based on GPU memory constraints. These hyperparameter choices align with established best practices in fine-tuning encoder–decoder architectures for sequence generation tasks, ensuring an optimal balance between performance and computational feasibility.

Overall, this phase is central to correcting the previously identified erroneous text by leveraging a multilingual pre-trained sequence-to-sequence model combined with explicit error tagging and enhanced decoding strategies.

4.3. Performance Evaluation

Lists of differences in terms of text edits are considered to evaluate the performance of the correction phase; one list considers differences between the initial sentence and the generated one, whereas the second one marks all differences versus the ground truth (i.e., initial versus human correction). True positives, false negatives, and false positives are computed on elements from the two lists. All scores were computed using the ERRANT tool, which tracks the error position, the original words, and the required modifications for correction. Table 3 presents an example in which the first list consists of the following words: “datorită”- > “din cauza”, while the second list also includes: “an”- > ”am”. As such, we have 1 true positive for “datorită” correctly replaced with “din cauza”, 0 false positives, and 1 false negative generated by the uncorrected typo “an”.

5. Results

In this section, we analyze the results of both the error detection and the correction phases. Table 4 presents the

F_{0.5}

scores obtained for both CNA and RoComments datasets.

We first compared our proposed two-phased solution with Language Tool [5], an open-source grammar and spell-checking with support for various languages, including Romanian. Our approach outperforms the baseline, achieving an

F_{0.5}

score of 57% on CNA and 64% on RoComments.

Next, we compared our approach to two transformer architectures as end-to-end correctors using both Flan-T5-xl and mT0-xl. We input the synthetic incorrect sentences and generate their corrected version as output. During the training process, we used a learning rate of

2 \times 10^{- 5}

, a weight decay of 0.005, a batch size of four, and four gradient accumulation steps. As per Table 4, mT0 outperformed Flan-T5 in reworking the initial sentences. Specifically, mT0 achieved an

F_{0.5}

score of 50% on the CNA dataset and 78% on the RoComments dataset. In contrast, the flan-t5 model obtained

F_{0.5}

scores of 35% on CNA and 69% on RoComments. Additionally, our approach outperforms both models on the CNA dataset, achieving an

F_{0.5}

score of 57%, compared to 35% by Flan-T5 and 50% by mT0. On the other hand, our approach does not surpass Flan-T5 and mT0 on the RoComments dataset.

All LLM-based baselines consider a zero-shot approach in which the models must rewrite all sentences from scratch. Two prompts in Romanian and English were used in all experiments (see Table 5) and guided the model to correct the test sentences. For Romanian LLMs (i.e., RoMistral and RoLlama 3), we used the Romanian version of the prompt since these models were only exposed to Romanian data in their finetuning. Our best model outperforms the RoMistral corrector by a margin of 6% on RoComments and 17% on CNA.

The GPT-4o model from OpenAI, which we were unable to surpass in terms of performance, outperforms our method with a score

F_{0.5}

of 73% on CNA and 84% on RoComments versus 57% and 64%, respectively. Despite this performance gap, it is worth noting that the performance of our best open-source models is closely aligned with GPT-4o-mini, indicating that we remain competitive with smaller yet high-performing versions. This superior performance is likely due to its extensive pre-training on a large volume of high-quality text and exposure to a wide range of GEC datasets.

6. Discussion

In this section, we discuss the results of our experiments and present an iterative checking approach followed by a detailed error analysis.

Our approach surpassed most baseline models, and its lower performance when compared to GPT-4o is justifiable due to several factors. One of the primary reasons is the significantly larger size of the GPT-4o model compared to our open-source models. Size typically correlates with a greater capacity for processing and understanding complex tasks. Furthermore, GPT-4o was most likely exposed to extensive GEC datasets and human corrections in multiple languages. This disparity in data exposure and model scale provides a clear rationale for the differences in performance. Nevertheless, our main advantages are the model’s small scale and its open-source release, which enables easy deployment and inference on CPUs.

6.1. Iterative Rechecking

After examining the differences between the synthetic dataset and the human mistakes in the two test datasets, we concluded that the test datasets contained significantly more mistakes per sentence compared to the generated one. Because of this, we decided to iteratively run our two-phased pipeline to observe whether further errors can be found and corrected. After setting the threshold to 0.5 based on the validation set using the same methodology, we ran three iterations on the test set. As observed in Table 6, additional iterations resulted in a 1% improvement in the CNA dataset after the first iteration, after which performance stabilized. In contrast, for the RoComments dataset, further iterations did not improve the score; instead, it decreased during the second iteration before stabilizing. As such, the detection phase may need further refinement to improve precision and reduce error propagation across iterations.

6.2. Error Analysis

Table 7 presents

F_{0.5}

scores on the CNA dataset for each category type according to ERRANT. Our model works best on determiners (DET and DET:FORM) and performs strongly on verbs (VERB:FORM), adjectives (ADJ:FORM), and noun forms (NOUN:FORM), as well as spelling errors (SPELL). However, it has problems tackling adverbs (ADV), pronoun forms (PRON:FORM), and word order (WO) errors; nevertheless, these errors were not tackled in our automated generation procedure, and as such, our models have not been exposed to such errors. In addition, Table 8 presents a set of examples generated with our two-step approach. All examples are drawn from the CNA dataset and include explanations of addressed or overlooked corrections.

High

F_{0.5}

scores were achieved by our two-phased model and the zero-shot approaches LLM’s on determiners (DET and DET:FORM), correcting verb forms (VERB:FORM) and adjective forms (ADJ:FORM). The consistently high scores across all systems may suggest that these categories are well-represented and straightforward to address. Additionally, all three approaches effectively address noun inflection errors (NOUN:FORM), which are common in Romanian due to the language’s rich inflectional morphology. Strong performance is also achieved on spelling errors (SPELL) across all models, with GPT-4o achieving the highest score.

Punctuation (PUNCT) and other miscellaneous errors (OTHER) show moderate performance, with GPT-4o having partial success in handling punctuation but not at the level of previous categories. Word order (WO) and pronoun errors (PRON, PRON:FORM) remain largely unaddressed. This can be attributed to the fact that the model has not been explicitly trained to handle such errors. The comparatively low scores in some categories outline the model’s limitations, especially in handling complex syntactic structures and less frequent error types.

7. Limitations

When manually inspecting errors encountered across datasets, we observed differences as the artificial dataset does not cover all the mistakes that occur in real-life datasets (e.g., mistakes that involve the phrase ca și). Additionally, RoComments has a higher erroneous tokens rate compared to the CNA and RoTexts, as the comments have multiple mistakes within them. Moreover, the initial correct texts automatically altered are predominantly narrative and differ considerably from the prevalence of dialog messages in the other two datasets.

On the other hand, we introduced orthographic errors (e.g., lowercase a proper name or starting a sentence with lowercase letters instead of capital). However, we cannot identify them via the first phase of our two-staged approach because the tokenizer used by the encoder model is case-insensitive. One potential solution could be to explore using an uncased encoder model developed by Dumitrescu et al. [30].

8. Conclusions and Future Work

Our two-phased method, including grammar error detection (GED) and grammar error correction (GEC), offers better control of the correction process as it does not require to rewrite the entire sentence from scratch. This saves time and also minimizes the risk of introducing new errors by maintaining the flow of the sentence.

We have introduced a wide range of language resources, including an artificial dataset with a large spectrum of generated errors, a curated version of the existing dataset CNA, and a new dataset (RoComments) consisting of comments provided by native speakers from YouTube and Reddit platforms.

Our model surpassed by a large margin LangTools, Romanian LLMs in zero-shot setups, as well as end-to-end correctors with only one exception (mT0 on RoComments). GPT-4o outperformed our two-phase method, most likely due to its extensive pre-training on a large collection of high-quality texts and exposure to a wide range of GEC datasets in multiple languages. However, our results are on par with GPT-4o-mini, thus arguing for our smaller, open-source, yet high-performing version.

In terms of future work, we strive to extend the range of covered errors in the synthetic data generation procedure, especially the ERRANT categories not covered in Table 7. Given the difference in language, we will also consider transcribed dialogues upon which we will apply automated transformations in order to obtain samples that are more similar to RoComments. Furthermore, we plan to expand our approach to other low-resource languages and develop a multilingual GEC model for broader applicability.

Author Contributions

Conceptualization, M.-C.T., S.R. and M.D.; methodology, M.-C.T., S.R. and M.D.; software, M.-C.T.; validation, S.R. and M.D.; formal analysis, M.-C.T., S.R. and M.D.; investigation, S.R.; resources, M.-C.T.; data curation, M.-C.T.; writing—original draft preparation, M.-C.T.; writing—review and editing, S.R. and M.D.; visualization, M.-C.T.; supervision, S.R. and M.D.; project administration, M.D.; funding acquisition, M.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the project “Romanian Hub for Artificial Intelligence—HRIA”, Smart Growth, Digitization and Financial Instruments Program, 2021–2027, MySMIS no. 334906.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All the presented datasets are publicly available on HuggingFace. Ro-Texts can be found at https://huggingface.co/datasets/upb-nlp/gec-ro-texts (accessed on 24 January 2025), CNA at https://huggingface.co/datasets/upb-nlp/gec_ro_cna (accessed on 24 January 2025), and Ro-Comments at https://huggingface.co/datasets/upb-nlp/gec-ro-comments (accessed on 24 January 2025).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BERT	Bidirectional Encoder Representations from Transformers
CGEC	Chinese Grammatical Error Correction
CNA	The National Audiovisual Council (Romanian abbreviation)
ERRANT	ERRor ANnotation Toolkit
ESD	Erroneous Span Detection
ESC	Erroneous Span Correction
FCE	First Certificate in English
GEC	Grammatical Error Correction
GED	Grammatical Error Detection
LLM	Large Language Models
NLPCC	Natural Language Processing and Chinese Computing
NUCLE	NUS Corpus of Learner English
T5	Text-to-Text Transfer Transformer

References

Jesús-Ortiz, E.; Calvo-Ferrer, J.R. His or Her? Errors in Possessive Determiners Made by L2-English Native Spanish Speakers. Languages 2023, 8, 278. [Google Scholar] [CrossRef]
Rei, M.; Yannakoudakis, H. Auxiliary Objectives for Neural Error Detection Models. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, Association for Computational Linguistics, Copenhagen, Denmark, 8 September 2017; pp. 33–43. [Google Scholar]
Ng, H.T.; Wu, S.M.; Briscoe, T.; Hadiwinoto, C.; Susanto, R.H.; Bryant, C. The CoNLL-2014 Shared Task on Grammatical Error Correction. In Proceedings of the 18th Conference on Computational Natural Language Learning: Shared Task, Baltimore, MD, USA, 26–27 June 2014; Ng, H.T., Wu, S.M., Briscoe, T., Hadiwinoto, C., Susanto, R.H., Bryant, C., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2014; pp. 1–14. [Google Scholar] [CrossRef]
Yannakoudakis, H.; Briscoe, T.; Medlock, B. A New Dataset and Method for Automatically Grading ESOL Texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011. [Google Scholar]
Naber, D. A Rule-Based Style and Grammar Checker. 2003. Available online: https://www.researchgate.net/publication/239556866_A_Rule-Based_Style_and_Grammar_Checker (accessed on 24 January 2025).
Cotet, T.M.; Ruseti, S.; Dascalu, M. Neural grammatical error correction for romanian. In Proceedings of the 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), Baltimore, MD, USA, 9–11 November 2020; pp. 625–631. [Google Scholar]
Dahlmeier, D.; Ng, H.T.; Wu, S.M. Building a large annotated corpus of learner English: The NUS corpus of learner English. In Proceedings of the 8th Workshop on Innovative Use of NLP for Building Educational Applications, Atlanta, GA, USA, 13 June 2013; pp. 22–31. [Google Scholar]
Mizumoto, T.; Matsumoto, Y. Discriminative reranking for grammatical error correction with statistical machine translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1133–1138. [Google Scholar]
Zhang, B. Features and functions of the HSK dynamic composition corpus. Int. Chin. Lang. Educ. 2009, 4, 71–79. [Google Scholar]
Omelianchuk, K.; Liubonko, A.; Skurzhanskyi, O.; Chernodub, A.; Korniienko, O.; Samokhin, I. Pillars of Grammatical Error Correction: Comprehensive Inspection Of Contemporary Approaches In The Era of Large Language Models. In Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024), Mexico City, Mexico, 20 June 2024; pp. 17–33. [Google Scholar]
Yuan, Z.; Briscoe, T. Grammatical error correction using neural machine translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; Knight, K., Nenkova, A., Rambow, O., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2016; pp. 380–386. [Google Scholar] [CrossRef]
Rothe, S.; Mallinson, J.; Malmi, E.; Krause, S.; Severyn, A. A Simple Recipe for Multilingual Grammatical Error Correction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Online, 1–6 August 2021; Zong, C., Xia, F., Li, W., Navigli, R., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 702–707. [Google Scholar] [CrossRef]
Xue, L.; Constant, N.; Roberts, A.; Kale, M.; Al-Rfou, R.; Siddhant, A.; Barua, A.; Raffel, C. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 483–498. [Google Scholar]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
Lin, N.; Fu, Y.; Lin, X. A New Evaluation Method: Evaluation Data and Metrics for Chinese Grammar Error Correction. arxiv 2023, arXiv:2205.00217. [Google Scholar] [CrossRef]
Chen, M.; Ge, T.; Zhang, X.; Wei, F.; Zhou, M. Improving the Efficiency of Grammatical Error Correction with Erroneous Span Detection and Correction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; Webber, B., Cohn, T., He, Y., Liu, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 7162–7169. [Google Scholar] [CrossRef]
Qiu, Z.; Qu, Y. A two-stage model for Chinese grammatical error correction. IEEE Access 2019, 7, 146772–146777. [Google Scholar] [CrossRef]
Keita, M.K.; Homan, C.; Hamani, S.A.; Bremang, A.; Zampieri, M.; Alfari, H.A.; Ibrahim, E.A.; Owusu, D. Grammatical Error Correction for Low-Resource Languages: The Case of Zarma. arXiv 2024, arXiv:2410.15539. [Google Scholar]
Maity, S.; Deroy, A.; Sarkar, S. How Ready Are Generative Pre-trained Large Language Models for Explaining Bengali Grammatical Errors? arXiv 2024, arXiv:2406.00039. [Google Scholar]
Lytvyn, V.; Pukach, P.; Vysotska, V.; Vovk, M.; Kholodna, N. Identification and Correction of Grammatical Errors in Ukrainian Texts Based on Machine Learning Technology. Mathematics 2023, 11, 904. [Google Scholar] [CrossRef]
Musyafa, A.; Gao, Y.; Solyman, A.; Wu, C.; Khan, S. Automatic correction of indonesian grammatical errors based on transformer. Appl. Sci. 2022, 12, 10380. [Google Scholar] [CrossRef]
Bryant, C.; Felice, M.; Briscoe, T. Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; Barzilay, R., Kan, M.Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2017; pp. 793–805. [Google Scholar] [CrossRef]
RoTex. RoTex Corpus Builder. 2019. Available online: https://github.com/aleris/ReadME-RoTex-Corpus-Builder (accessed on 16 May 2024).
Bucur, C. Cele mai Frecvente Greșeli de Gramatică din Limba Română. 2018. Available online: https://life.ro/cele-mai-frecvente-greseli-de-gramatica-din-limba-romana/ (accessed on 16 May 2024).
SpotMedia. Cum Scriem Corect: Nu Face Sau nu fă? Cum Folosim Negația la Imperativ. 2021. Available online: https://spotmedia.ro/stiri/educatie/cum-scriem-corect-nu-face-sau-nu-fa-cum-folosim-negatia-la-imperativ (accessed on 16 May 2024).
Mititelu, V.B.; Irimia, E.; Perez, C.A.; Ion, R.; Simionescu, R.; Popel, M. UD Romanian RRT. Available online: https://universaldependencies.org/treebanks/ro_rrt/index.html (accessed on 21 May 2024).
Destepti.ro. Cum Este Corect–Alţii Sau Alţi? 2014. Available online: https://destepti.ro/cum-este-corect-altii-sau-alti/ (accessed on 16 May 2024).
Muennighoff, N.; Wang, T.; Sutawika, L.; Roberts, A.; Biderman, S.; Le Scao, T.; Bari, M.S.; Shen, S.; Yong, Z.X.; Schoelkopf, H.; et al. Crosslingual Generalization through Multitask Finetuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023. [Google Scholar]
Chung, H.W.; Hou, L.; Longpre, S.; Zoph, B.; Tay, Y.; Fedus, W.; Li, E.; Wang, X.; Dehghani, M.; Brahma, S.; et al. Scaling Instruction-Finetuned Language Models. arXiv 2022, arXiv:2210.11416. [Google Scholar] [CrossRef]
Dumitrescu, S.; Avram, A.M.; Pyysalo, S. The birth of Romanian BERT. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; pp. 4324–4328. [Google Scholar] [CrossRef]

Figure 1. Exemplification of the two-phased method.

Table 1. Dataset statistics (#—count; M—mean; SD—standard deviation).

Dataset	# Sentences	# Non-Err. Tokens	# Err. Tokens	Error Rate M (SD)
RoTexts	345,403	7,653,579	1,001,592	12% (7%)
CNA	10,047	169,586	24,312	13% (14%)
RoComments	3054	30,651	15,847	34% (21%)

Table 2. Error distribution across corpora according to ERRANT.

Error Type	Error Description	RoTexts (%)	CNA (%)	RoComments (%)
ADJ:FORM	Adjective Form	0.30	0.93	0.80
ADP	Adposition (pre/post positions)	3.32	8.03	5.45
DET:FORM	Determiner form	0.21	2.41	0.04
MORPH	Morphology	1.82	1.93	2.92
NOUN	Noun	0.57	3.35	2.58
NOUN:FORM	Noun form	29.06	8.16	2.92
ORTH	Orthography	0.35	6.45	1.04
OTHER	Other	12.34	19.15	28.58
PUNCT	Punctuation	0.47	11.31	13.04
SPELL	Spelling	46.33	22.29	37.50
VERB	Verb	0.49	1.79	2.71
VERB:FORM	Verb Form	4.40	5.25	2.13

Table 3. Example for performance assessment (red marks errors).

Type	Sentence	Explanation
Example	an pierdut datorită lui	“an” is a typo for “am”, and “datorită” is improperly used instead of “din cauza”.
Generated	an pierdut din cauza lui	Partially corrects the original by using “din cauza” instead of “datorită,” but retains the typo “an” instead of “am”.
Ground truth	am pierdut din cauza lui	eng. I lost because of him.

Table 4. Results (bold marks the best three results per dataset).

Dataset	Model	$F_{0.5}$
CNA	Two-Phased approach	0.57
	LangTool	0.20
	RoMistral	0.40
	RoLlama 3	0.35
	Flan-t5 end-to-end	0.35
	mT0 end-to-end	0.50
	GPT-4o mini	0.62
	GPT-4o	0.73
RoComments	Two-Phased approach	0.64
	LangTool	0.47
	RoMistral	0.58
	RoLlama 3	0.53
	Flan-t5 end-to-end	0.69
	mT0 end-to-end	0.78
	GPT-4o mini	0.82
	GPT-4o	0.84

Table 5. Prompt for English (GPT-4o) and Romanian (RoMistral and RoLlama).

Language	Prompt
English	You are a Romanian teacher eager to correct individual sentences written by your students in a public notebook throughout the school year. Please correct the following n sentences in Romanian without offering details about your judgment.
Romanian	Ești un profesor de limba română dornic să corecteze propozițiile scrise individual de elevii tăi într-un caiet public pe parcursul anului școlar. Te rog să corectezi următoarele n propoziții în limba română fără a oferi detalii despre judecata ta.

Table 6. Performance of two-phased model on different iterations with the threshold set to 0.5.

Dataset	Iteration	$F_{0.5}$
CNA	1	0.57
	2	0.58
	3	0.58
RoComments	1	0.64
	2	0.61
	3	0.61

Table 7.

F_{0.5}

scores for various categories on CNA (green—good performance >=50%; red—inferior performance).

Table 7.

F_{0.5}

scores for various categories on CNA (green—good performance >=50%; red—inferior performance).

Category	Two-Phased	mT0	GPT-4o
ADJ	0.00	0.59	0.41
ADJ:FORM	0.79	0.83	0.70
ADP	0.41	0.57	0.50
ADV	0.33	0.53	0.48
CCONJ	0.00	0.00	0.38
DET	0.82	0.77	0.80
DET:FORM	0.86	0.85	0.87
MORPH	0.78	0.75	0.74
NOUN	0.36	0.31	0.52
NOUN:FORM	0.78	0.69	0.85
ORTH	0.56	0.64	0.88
OTHER	0.48	0.57	0.73
PUNCT	0.48	0.39	0.56
PRON	0.00	0.00	0.16
PRON:FORM	0.29	0.00	0.70
SPELL	0.75	0.69	0.87
VERB	0.53	0.33	0.56
VERB:FORM	0.81	0.66	0.80
WO	0.00	0.00	0.36

Table 8. Examples of partial corrections generated with our two-phased method (red marks erroneous corrections).

Original Sentence	Generated Sentence	Correct Sentence	Explanation
În Java, alfabetul jawi este f olosit și de catre limbile sundaneză, unde este denumit pegon.	În Java, alfabetul jawi este folosit și de către limbile sundaneză, unde este denumit pegon.	În Java, alfabetul jawi este folosit și de către limbile sudaneză, unde este denumit pegon. In Java, the Jawi alphabet is also used by the Sundanese language, where it is called Pegon.	Two parts of the same word, f and olosit were unified, whereas the word sundaneză remained unchanged.
Deasemenea, capacitatea militariilor români a scăzut din cauza lipsei de combustibil, precum și de formare.	De asemenea, capacitatea militariilor români a scăzut din cauza lipsei de combustibil, precum și de formare.	De asemenea, capacitatea militarilor români a scăzut din cauza lipsei de combustibil, precum și de formare. Additionally, the capability of Romanian soldiers has declined due to a lack of fuel as well as training.	The structure Deasemenea was written correctly, whereas the word militariilor remained written incorrectly with double i in the middle.
Re-înregistrarea mărci se poate face cu cel mult 6 luni înainte ca aceasta să nu mai fie valabilă.	Re-înregistrarea mărcii se poate face cu cel mult 6 luni înainte ca aceasta să nu mai fie valabilă.	Reînregistrarea mărcii se poate face cu cel mult 6 luni înainte ca aceasta să nu mai fie valabilă. The trademark can be re-registered up to 6 months before it expires.	The word mărci is articulated correctly, but the hyphen was not added at the first word Reînregistrarea
În această perioadă neam concentrat mai mult pe deschiderea școli și lucrările necesare în acest scop.	În această perioadă missing comma ne-am concentrat mai mult pe deschiderea școlii și pe lucrările necesare în acest scop.	În această perioadă, ne-am concentrat mai mult pe deschiderea școlii și lucrările necesare în acest scop. During this period, we focused more on opening the school and the necessary work required for this purpose.	The comma is missing after the word perioadă, but the hyphen within the word neam was added correctly.
“În curând v-a fi anunțat și programul complet” a declarat promotorul Eduard Irimia.	“În curând va fi anunțat și programul complet“ missing a declarat promotorul eduard irimia.	“În curând va fi anunțat și programul complet”, a declarat promotorul Eduard Irimia. “The full schedule will be announced soon,” stated promoter Eduard Irimia.	The auxiliary a was removed despite it was not an error.
Spune-a “oh frate salveazămă”.	Spune-a “missing interjection oh frate, salvează-mă”.	Spunea “oh, frate, salvează-mă”. He said, “Oh, brother, save me”.	The interjection oh was removed despite it was not an error.
Lucrăriile de construcție a fost conduse de arh.	Lucrării de construcție au fost conduse de arh.	Lucrările de construcție au fost conduse de arh. The construction work was led by the architect.	The auxiliary a was replaced with its correct form in the context, au, but the word Lucrăriile was modified in an incorrect articulated form.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tudose, M.-C.; Ruseti, S.; Dascalu, M. Show Me All Writing Errors: A Two-Phased Grammatical Error Corrector for Romanian. Information 2025, 16, 242. https://doi.org/10.3390/info16030242

AMA Style

Tudose M-C, Ruseti S, Dascalu M. Show Me All Writing Errors: A Two-Phased Grammatical Error Corrector for Romanian. Information. 2025; 16(3):242. https://doi.org/10.3390/info16030242

Chicago/Turabian Style

Tudose, Mihai-Cristian, Stefan Ruseti, and Mihai Dascalu. 2025. "Show Me All Writing Errors: A Two-Phased Grammatical Error Corrector for Romanian" Information 16, no. 3: 242. https://doi.org/10.3390/info16030242

APA Style

Tudose, M.-C., Ruseti, S., & Dascalu, M. (2025). Show Me All Writing Errors: A Two-Phased Grammatical Error Corrector for Romanian. Information, 16(3), 242. https://doi.org/10.3390/info16030242

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Show Me All Writing Errors: A Two-Phased Grammatical Error Corrector for Romanian

Abstract

1. Introduction

2. State-of-the-Art

2.1. Existing Datasets

2.2. Similar Two-Phased Approaches for GEC

2.3. GEC for Low-Resourced Languages

2.4. Previous Studies for Romanian

3. Datasets

3.1. Synthetic Dataset

3.2. CNA

3.3. RoComments

4. Method

4.1. Detection Phase

4.2. Correction Phase

4.3. Performance Evaluation

5. Results

6. Discussion

6.1. Iterative Rechecking

6.2. Error Analysis

7. Limitations

8. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI